2026-06-21

微软成为全球最大AI模型中转站，既将ChatGPT卖给中国企业，也反向将DeepSeek模型提供给西方客户，构建跨中美AI双向贸易网络，影响行业格局。 NVIDIA Research发布SpatialClaw，免训练空间推理框架，通过代码动作接口调用感知工具，显著提升VLM空间判断能力，平均准确率59.9%。研究揭示扩散LLM中查询位置对上下文学习质量的重…

微软双向转售GPT与DeepSeek成全球最大AI中间商 85

Tags: 公司动态 模型发布 大模型 产业变化
Source: AI HOT 精选 | 阅读原文

[摘要]
微软成为全球最大AI模型中转站，既将ChatGPT卖给中国企业，也反向将DeepSeek模型提供给西方客户，构建跨中美AI双向贸易网络，影响行业格局。

NVIDIA Research 发布 SpatialClaw：免训练空间推理框架 85

Tags: 智能体 视觉语言模型 推理框架 模型发布
Source: AI HOT 精选 | 阅读原文

[摘要]
NVIDIA Research发布SpatialClaw，免训练空间推理框架，通过代码动作接口调用感知工具，显著提升VLM空间判断能力，平均准确率59.9%。

Where to Place the Query? Unveiling and Mitigating Positional Bias in In-Context Learning for Diffusion LLMs via Decoding Dynamics 80

Tags: 大模型 位置偏差 上下文学习 推理优化
Source: arXiv Computation and Language | 阅读原文

[摘要]
研究揭示扩散LLM中查询位置对上下文学习质量的重大影响，提出训练无关的自适应路由策略Auto-ICL，有效缓解位置偏差，提升推理与感知任务性能。

Critique of World Model 80

Tags: 世界模型 AGI 研究 架构
Source: arXiv Computation and Language | 阅读原文

[摘要]
该论文系统梳理世界模型的设计维度（数据、表示、架构等），提出生成式潜在预测（GLP）架构，旨在实现通用世界模型与PAN AGI系统，对AGI研究有重要参考价值。

SIGMA: Search-Augmented On-Demand Knowledge Integration for Agentic Mathematical Reasoning 80

Tags: 智能体 推理 知识集成 数学推理
Source: arXiv Computation and Language | 阅读原文

[摘要]
提出SIGMA多智能体框架，通过按需知识集成增强数学推理，在MATH500、AIME等基准上提升7.4%，准确性和效率显著改善。

IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models 80

Tags: 多模态 大模型 零样本推理 研究
Source: arXiv Computation and Language | 阅读原文

[摘要]
IdealGPT 提出利用大语言模型迭代分解视觉语言推理任务，在零样本 VCR 和 SNLI-VE 上超越 GPT-4-like 模型 10% 和 15%，展示了 LLM 驱动多步推理的潜力。

MixSD: Mixed Contextual Self-Distillation for Knowledge Injection 80

Tags: 训练方法 微调 知识注入 AI研究
Source: arXiv Computation and Language | 阅读原文

[摘要]
MixSD 提出一种无需外部教师的知识注入微调方法，通过混合模型自身条件分布构建动态监督，在多个实验中实现记忆与保留性能的显著提升，有效缓解灾难性遗忘。

REDACT: A Systematically Controlled Multilingual Benchmark for Personal Information Detection 78

Tags: 数据安全 AI安全 模型评估 多语言
Source: arXiv Computation and Language | 阅读原文

[摘要]
REDACT 发布系统控制的多语言 PII 检测基准，覆盖51种实体类型和25种语言，揭示检测器在敏感层级上的失效结构，提升隐私保护评估能力。

NEST: Narrative Event Structures in Time for Long Video Understanding 78

Tags: 多模态 视频理解 数据集 叙事结构
Source: arXiv Computation and Language | 阅读原文

[摘要]
提出NEST数据集，含1005部全长电影的多模态叙事事件标注，建立长视频叙事结构理解基准，当前方法效果差，具挑战性。

AgentFinVQA: A Deployable Multi-Agent Pipeline for Auditable Financial Chart QA 78

Tags: 多智能体 模型发布 金融AI AI安全
Source: arXiv Computation and Language | 阅读原文

[摘要]
AgentFinVQA 提出可审计的多智能体管道，用于金融图表问答，支持本地部署开源模型，准确率提升显著，解决合规与信任问题。

Apparent Psychological Profiles of Large Language Models are Largely a Measurement Artifact 78

Tags: AI安全 模型评估 研究
Source: arXiv Computation and Language | 阅读原文

[摘要]
研究发现LLM心理测试结果主要源于反应偏差而非真实特质，呼吁专用评估方法，对AI安全与模型评估有重要启示。

When Lower Privileges Suffice: Investigating Over-Privileged Tool Selection in LLM Agents 78

Tags: AI安全 智能体 研究发布 模型安全
Source: arXiv Computation and Language | 阅读原文

[摘要]
研究LLM智能体过度权限工具选择问题，提出ToolPrivBench评估框架，发现主流智能体普遍存在该问题，并提出后训练防御方法，对AI安全有重要意义。

NIM4-ASR: Towards Efficient, Robust, and Customizable Real-Time LLM-Based ASR 78

Tags: ASR 大模型 推理优化 语音识别
Source: arXiv Computation and Language | 阅读原文

[摘要]
NIM4-ASR提出高效、鲁棒的基于LLM的实时语音识别框架，仅2.3B参数在多个基准达SOTA，支持实时流式推理和百万级热词定制，推动LLM在ASR中的实用化。

How LLMs Fail and Generalize in RTL Coding for Hardware Design? 78

Tags: 大模型 研究 硬件设计 推理
Source: arXiv Computation and Language | 阅读原文

[摘要]
新研究系统分析LLM在硬件设计（RTL编码）中的失败模式，提出错误分类法，发现前沿模型在VerilogEval基准上限90.8%，对齐仅教会编译，能力受预训练知识限制，强调需要加强推理研究。

Tags: 大模型 AI安全 模型评估
Source: arXiv Statistics - Machine Learning | 阅读原文

[摘要]
提出AURA框架，自适应不确定性感知地精炼LLM评判者审计过程，提升与人类判断一致性，解决现有方法对初始监督信号依赖脆弱的问题。

Rigorous uncertainty quantification of probabilistic AI weather forecasts with conformal prediction 75

Tags: 研究进展 AI天气预测 不确定性量化
Source: arXiv Statistics - Machine Learning | 阅读原文

[摘要]
研究提出用共形预测（conformal prediction）对GenCast、NeuralGCM、AIFS-ENS等AI天气模型进行严格不确定性校准，尤其改善极端事件预报的统计覆盖，具有普遍适用性。

Characterizing Narrative Content in Web-scale LLM Pretraining Data 75

Tags: 大模型 数据集 预训练数据 研究
Source: arXiv Computation and Language | 阅读原文

[摘要]
该研究首次细粒度分析3万亿词元的Dolma语料库叙事特征，提出框架并发布数据集及模型，揭示当前数据筛选忽视叙事质量分布，为优化LLM预训练数据提供基础。

Granularity-Regulated Adaptive Computational Efficiency for Optimal Verification in Test-Time Scaling 75

Tags: 推理优化 大模型 测试时缩放
Source: arXiv Computation and Language | 阅读原文

[摘要]
提出GRACE理论框架，统一粗粒度结果奖励模型与细粒度过程奖励模型，根据问题难度和计算预算自适应选择最优验证粒度，在数学推理基准上最高提升3.1%准确率。

AtomMem: Building Simple and Effective Memory System for LLM Agents via Atomic Facts 75

Tags: 智能体 记忆系统 模型研究
Source: arXiv Computation and Language | 阅读原文

[摘要]
AtomMem提出基于原子事实的LLM智能体长期记忆系统，通过分层事件结构和时序画像实现高效存储与稳定演化，在LoCoMo基准上取得SOTA。

Self-Preference Is Weak or Absent in Verifiable Instruction-Following Revision: A Four-Model Test Under Genuine Authorship 75

Tags: 模型研究 AI安全 LLM
Source: arXiv Computation and Language | 阅读原文

[摘要]
研究测试LLM在可验证指令遵循修订中是否存在自我偏好偏差，发现四个中端模型均未显著偏向自身输出，拒绝修复多因识别缺陷而非偏好，对AI对齐有参考价值。

2026-06-21 ​

微软双向转售GPT与DeepSeek成全球最大AI中间商 85 ​

NVIDIA Research 发布 SpatialClaw：免训练空间推理框架 85 ​

Where to Place the Query? Unveiling and Mitigating Positional Bias in In-Context Learning for Diffusion LLMs via Decoding Dynamics 80 ​

Critique of World Model 80 ​

SIGMA: Search-Augmented On-Demand Knowledge Integration for Agentic Mathematical Reasoning 80 ​

IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models 80 ​

MixSD: Mixed Contextual Self-Distillation for Knowledge Injection 80 ​

REDACT: A Systematically Controlled Multilingual Benchmark for Personal Information Detection 78 ​

NEST: Narrative Event Structures in Time for Long Video Understanding 78 ​

AgentFinVQA: A Deployable Multi-Agent Pipeline for Auditable Financial Chart QA 78 ​

Apparent Psychological Profiles of Large Language Models are Largely a Measurement Artifact 78 ​

When Lower Privileges Suffice: Investigating Over-Privileged Tool Selection in LLM Agents 78 ​

NIM4-ASR: Towards Efficient, Robust, and Customizable Real-Time LLM-Based ASR 78 ​

How LLMs Fail and Generalize in RTL Coding for Hardware Design? 78 ​

AURA: Adaptive Uncertainty-aware Refinement for LLM-as-a-Judge Auditing 75 ​

Rigorous uncertainty quantification of probabilistic AI weather forecasts with conformal prediction 75 ​

Characterizing Narrative Content in Web-scale LLM Pretraining Data 75 ​

Granularity-Regulated Adaptive Computational Efficiency for Optimal Verification in Test-Time Scaling 75 ​

AtomMem: Building Simple and Effective Memory System for LLM Agents via Atomic Facts 75 ​

Self-Preference Is Weak or Absent in Verifiable Instruction-Following Revision: A Four-Model Test Under Genuine Authorship 75 ​

2026-06-21

微软双向转售GPT与DeepSeek成全球最大AI中间商 85

NVIDIA Research 发布 SpatialClaw：免训练空间推理框架 85

Where to Place the Query? Unveiling and Mitigating Positional Bias in In-Context Learning for Diffusion LLMs via Decoding Dynamics 80

Critique of World Model 80

SIGMA: Search-Augmented On-Demand Knowledge Integration for Agentic Mathematical Reasoning 80

IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models 80

MixSD: Mixed Contextual Self-Distillation for Knowledge Injection 80

REDACT: A Systematically Controlled Multilingual Benchmark for Personal Information Detection 78

NEST: Narrative Event Structures in Time for Long Video Understanding 78

AgentFinVQA: A Deployable Multi-Agent Pipeline for Auditable Financial Chart QA 78

Apparent Psychological Profiles of Large Language Models are Largely a Measurement Artifact 78

When Lower Privileges Suffice: Investigating Over-Privileged Tool Selection in LLM Agents 78

NIM4-ASR: Towards Efficient, Robust, and Customizable Real-Time LLM-Based ASR 78

How LLMs Fail and Generalize in RTL Coding for Hardware Design? 78

AURA: Adaptive Uncertainty-aware Refinement for LLM-as-a-Judge Auditing 75

Rigorous uncertainty quantification of probabilistic AI weather forecasts with conformal prediction 75

Characterizing Narrative Content in Web-scale LLM Pretraining Data 75

Granularity-Regulated Adaptive Computational Efficiency for Optimal Verification in Test-Time Scaling 75

AtomMem: Building Simple and Effective Memory System for LLM Agents via Atomic Facts 75

Self-Preference Is Weak or Absent in Verifiable Instruction-Following Revision: A Four-Model Test Under Genuine Authorship 75