2026-07-02

美团发布万亿参数大模型LongCat-2.0并开源，采用LSA稀疏注意力等架构，国产五万卡集群训练，原生支持1M上下文，评测优秀，月调用量全球前三。 xAI 发布 Voice Agent Builder 测试版，基于 Grok Voice 的无代码平台，两分钟创建生产级语音智能体，在 τ-voice Bench 领先谷歌和 OpenAI 竞品，定价低且支持电…

美团 LongCat-2.0 正式发布：国产算力集群训练的万亿参数大模型 95

Tags: 大模型 模型发布 开源生态 国产算力
Source: AI HOT 精选 | 阅读原文

[摘要]
美团发布万亿参数大模型LongCat-2.0并开源，采用LSA稀疏注意力等架构，国产五万卡集群训练，原生支持1M上下文，评测优秀，月调用量全球前三。

xAI 发布 Voice Agent Builder 测试版 85

Tags: 产品发布 智能体 语音 无代码
Source: AI HOT 精选 | 阅读原文

[摘要]
xAI 发布 Voice Agent Builder 测试版，基于 Grok Voice 的无代码平台，两分钟创建生产级语音智能体，在 τ-voice Bench 领先谷歌和 OpenAI 竞品，定价低且支持电话集成。

OpenAI论文揭示GPT-5.6三个Pro变体，打破单一顶级策略 85

Tags: 模型发布 OpenAI 大模型
Source: AI HOT 精选 | 阅读原文

[摘要]
OpenAI论文披露GPT-5.6三个Pro变体（Luna、Terra、Sol），Sol Pro在基因组学基准中以31.5%通过率领先，打破传统单一顶级策略，但未公开token用量和是否落地。

NVIDIA 发布 Nemotron-Labs-TwoTower 开放权重扩散语言模型 85

Tags: 模型发布 开放权重 扩散语言模型 推理优化
Source: AI HOT 精选 | 阅读原文

[摘要]
NVIDIA发布Nemotron-Labs-TwoTower开放权重扩散语言模型，双塔架构在保留98.7%质量下生成吞吐量提升2.42倍。

A Theory of How Pretraining Shapes Inductive Bias in Fine-Tuning 85

Tags: 大模型 训练方法 理论研究
Source: arXiv Statistics - Machine Learning | 阅读原文

[摘要]
提出预训练微调中初始化影响归纳偏差的分析理论，揭示四种微调模式，解释初始化尺度对特征复用与精炼的作用，为特征学习提供理论依据。

Signed-Permutation Coordinate Transport for RMSNorm Transformers 85

Tags: 模型规范 可解释性 训练方法 推理优化
Source: arXiv Statistics - Machine Learning | 阅读原文

[摘要]
提出RMSNorm模型的符号排列坐标运输方法，解决模型规范不匹配问题，极大提升SAE重建、情感引导等工具迁移的准确性，并揭示训练状态保持的关键。

Learning by Surprise: Adaptive Mitigation of Model Collapse in Large Language Models 85

Tags: 模型研究 训练方法 数据过滤
Source: arXiv Computation and Language | 阅读原文

[摘要]
研究提出基于困惑度的过滤策略缓解LLM模型崩溃，无需区分人机数据，效果优于人类基线，为合成环境下的训练提供实用方案。

Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability 85

Tags: 大模型 推理优化 训练方法 科研进展
Source: arXiv Computation and Language | 阅读原文

[摘要]
提出SOAR不对称自博弈框架，利用meta-RL让教师模型为学生生成合成问题，在数学推理难题上实现从0%到可学习的突破，揭示问题结构比正确性更关键。

Bridging the Gap Between Latent and Explicit Reasoning with Looped Transformers 85

Tags: 推理优化 大模型 模型发布
Source: arXiv Computation and Language | 阅读原文

[摘要]
LOTUS提出循环Transformer并行潜在推理方法，首次在3B规模弥合与显式思维链的性能差距，推理延迟降低2.5-6.9倍，潜在空间可解释且与CoT对齐。

ADAPT: Attention Dynamics Alignment with Preference Tuning for Faithful MLLMs 85

Tags: 大模型 多模态 推理优化 AI安全
Source: arXiv Computation and Language | 阅读原文

[摘要]
提出ADAPT框架，通过动态对齐注意力与偏好调优解决多模态大模型幻觉问题，降低幻觉率40%-60%，在多项基准中达新最优。

Optimal Self-Consistency for Efficient Reasoning with Large Language Models 82

Tags: 推理优化 大模型 效率 研究
Source: arXiv Statistics - Machine Learning | 阅读原文

[摘要]
提出Blend-ASC自一致性方法，通过动态分配样本实现高效推理，样本用量减少4.8倍，无需超参数且支持批处理。

Learning from Failure: Inference-Time Self-Improvement for Computer-Use Agents 82

Tags: 智能体 大模型 推理优化
Source: arXiv Computation and Language | 阅读原文

[摘要]
提出利用失败轨迹进行推理时自我改进的方法，在OSWorld基准上将OpenCUA-72B成功率从42.3%提升至48.9%，无需额外训练，为智能体优化提供新思路。

构建AI智能体应优先设计路由 80

Tags: 智能体 推理优化 工程实践 成本优化
Source: AI HOT 精选 | 阅读原文

[摘要]
构建AI智能体应优先设计路由而非选择模型，通过分层路由将70-80%流量分配给本地或异步模型，可降低AI开销90%+，Coinbase已实践并显著削减成本。

Meta效仿SpaceX，将过剩AI算力变现 80

Tags: 公司动态 算力 云服务 基础设施
Source: AI HOT 精选 | 阅读原文

[摘要]
Meta效仿SpaceX，计划推出Meta Compute云业务，对外出售AI算力和模型访问权限，直接与AWS、Google Cloud竞争，已承诺1829亿美元建设AI基础设施。

Anthropic在Claude Code中植入隐写术代码识别中国用户 80

Tags: 公司动态 AI安全 产品安全
Source: AI HOT 精选 | 阅读原文

[摘要]
Anthropic在Claude Code中植入隐写术，通过时区和域名列表识别中国用户，引发信任危机与争议。

On the Convergence of Self-Improving Online LLM Alignment 80

Tags: 模型训练 LLM对齐 理论研究
Source: arXiv Statistics - Machine Learning | 阅读原文

[摘要]
该研究提出SAIL-RevKL算法，通过引入反向KL散度惩罚，解决了自改进LLM对齐中的分布偏移和收敛难题，理论证明满足PL条件并实现近线性样本复杂度，在MuJoCo和LLM对齐任务上表现更优。

Training Therapeutic Judges and Multi-Agent Systems for Human-Aligned Mental Health Support 80

Tags: 大模型 智能体 AI医疗 模型发布
Source: arXiv Computation and Language | 阅读原文

[摘要]
提出开放治疗评估器TheraJudge与多智能体系统TheraAgent，通过人类对齐评估优化心理治疗响应，在安全、相关性和共情等维度上超越基线，开源代码。

Behavior Cloning is Not All You Need: The Optimality of On-Policy Distillation for Noisy Expert Feedback 80

Tags: 研究 训练方法 大模型
Source: arXiv Statistics - Machine Learning | 阅读原文

[摘要]
该研究理论分析了在噪声专家反馈下，在线策略蒸馏（OPD）优于离线行为克隆（SFT），样本复杂度从指数降为多项式，解释了实际中OPD更优的原因，对语言模型训练有指导意义。

Theory of Mind and Persuasion Beyond Conversation: Assessing the Capacity of LLMs to Induce Belief States via Planning and Action 80

Tags: 大模型 智能体 AI安全 模型评估
Source: arXiv Computation and Language | 阅读原文

[摘要]
研究提出NCP-ToM框架，评估LLMs通过行动而非对话诱导信念状态的能力，GPT-5达到80%成功率且超越人类，但稳健性不及人类，对AI安全与智能体对齐具有重要意义。

CLExEval: A Human-in-the-Loop Framework for Qualitative Evaluation of LLM Clinical Reasoning 80

Tags: 大模型 医学分析 AI安全 推理评估
Source: arXiv Computation and Language | 阅读原文

[摘要]
提出CLExEval人类在环框架，揭示LLM临床推理中评估幻觉、三种失败模式，并指出LLM作为评判者会高估诊断可靠性。

2026-07-02 ​

美团 LongCat-2.0 正式发布：国产算力集群训练的万亿参数大模型 95 ​

xAI 发布 Voice Agent Builder 测试版 85 ​

OpenAI论文揭示GPT-5.6三个Pro变体，打破单一顶级策略 85 ​

NVIDIA 发布 Nemotron-Labs-TwoTower 开放权重扩散语言模型 85 ​

A Theory of How Pretraining Shapes Inductive Bias in Fine-Tuning 85 ​

Signed-Permutation Coordinate Transport for RMSNorm Transformers 85 ​

Learning by Surprise: Adaptive Mitigation of Model Collapse in Large Language Models 85 ​

Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability 85 ​

Bridging the Gap Between Latent and Explicit Reasoning with Looped Transformers 85 ​

ADAPT: Attention Dynamics Alignment with Preference Tuning for Faithful MLLMs 85 ​

Optimal Self-Consistency for Efficient Reasoning with Large Language Models 82 ​

Learning from Failure: Inference-Time Self-Improvement for Computer-Use Agents 82 ​

构建AI智能体应优先设计路由 80 ​

Meta效仿SpaceX，将过剩AI算力变现 80 ​

Anthropic在Claude Code中植入隐写术代码识别中国用户 80 ​

On the Convergence of Self-Improving Online LLM Alignment 80 ​

Training Therapeutic Judges and Multi-Agent Systems for Human-Aligned Mental Health Support 80 ​

Behavior Cloning is Not All You Need: The Optimality of On-Policy Distillation for Noisy Expert Feedback 80 ​

Theory of Mind and Persuasion Beyond Conversation: Assessing the Capacity of LLMs to Induce Belief States via Planning and Action 80 ​

CLExEval: A Human-in-the-Loop Framework for Qualitative Evaluation of LLM Clinical Reasoning 80 ​

2026-07-02

美团 LongCat-2.0 正式发布：国产算力集群训练的万亿参数大模型 95

xAI 发布 Voice Agent Builder 测试版 85

OpenAI论文揭示GPT-5.6三个Pro变体，打破单一顶级策略 85

NVIDIA 发布 Nemotron-Labs-TwoTower 开放权重扩散语言模型 85

A Theory of How Pretraining Shapes Inductive Bias in Fine-Tuning 85

Signed-Permutation Coordinate Transport for RMSNorm Transformers 85

Learning by Surprise: Adaptive Mitigation of Model Collapse in Large Language Models 85

Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability 85

Bridging the Gap Between Latent and Explicit Reasoning with Looped Transformers 85

ADAPT: Attention Dynamics Alignment with Preference Tuning for Faithful MLLMs 85

Optimal Self-Consistency for Efficient Reasoning with Large Language Models 82

Learning from Failure: Inference-Time Self-Improvement for Computer-Use Agents 82

构建AI智能体应优先设计路由 80

Meta效仿SpaceX，将过剩AI算力变现 80

Anthropic在Claude Code中植入隐写术代码识别中国用户 80

On the Convergence of Self-Improving Online LLM Alignment 80

Training Therapeutic Judges and Multi-Agent Systems for Human-Aligned Mental Health Support 80

Behavior Cloning is Not All You Need: The Optimality of On-Policy Distillation for Noisy Expert Feedback 80

Theory of Mind and Persuasion Beyond Conversation: Assessing the Capacity of LLMs to Induce Belief States via Planning and Action 80

CLExEval: A Human-in-the-Loop Framework for Qualitative Evaluation of LLM Clinical Reasoning 80