2026-06-12

首次有记录的全自主无人机执行致命攻击，击毙人类士兵，标志着自主武器系统实战应用的里程碑，引发AI安全与伦理关注。 Google Gemini Omni Flash 在图像到视频、文本到视频和视频编辑任务上达到 SOTA，即将通过 API 开放，是多模态视频生成的重要进展。 LLMpedia框架从LLM参数记忆中生成130万篇百科文章，审计发现gpt-5-mi…

全自主无人机首次击毙了人类士兵 88

Tags: AI安全 军事AI 自主系统 伦理
Source: AI HOT 精选 | 阅读原文

[摘要]
首次有记录的全自主无人机执行致命攻击，击毙人类士兵，标志着自主武器系统实战应用的里程碑，引发AI安全与伦理关注。

Gemini Omni Flash 视频任务达 SOTA 85

Tags: 模型发布 多模态 视频生成 Google
Source: AI HOT 精选 | 阅读原文

[摘要]
Google Gemini Omni Flash 在图像到视频、文本到视频和视频编辑任务上达到 SOTA，即将通过 API 开放，是多模态视频生成的重要进展。

LLMpedia: A Transparent Framework to Materialize an LLM's Encyclopedic Knowledge at Scale 85

Tags: 大模型 事实性 评测
Source: arXiv Computation and Language | 阅读原文

[摘要]
LLMpedia框架从LLM参数记忆中生成130万篇百科文章，审计发现gpt-5-mini在维基覆盖主题上的真实率仅68.4%，远低于MMLU得分，揭示基准高估事实性，全部数据开源。

OpenMedReason: Scientific Reasoning Supervision for Medical Vision-Language Models 85

Tags: 数据集 多模态 大模型 医疗AI
Source: arXiv Computation and Language | 阅读原文

[摘要]
开源医疗多模态推理数据集OpenMedReason发布，含45万图像-问答对及推理轨迹，可提升LVLM在医学视觉问答中的感知与推理能力，接近顶级模型性能。

Measuring Epistemic Resilience of LLMs Under Misleading Medical Context 85

Tags: AI安全 大模型 模型评测
Source: arXiv Computation and Language | 阅读原文

[摘要]
新研究提出MedMisBench基准，测试LLM在误导性医学背景下的判断稳健性，发现准确率从71%降至38%，揭示医疗AI安全隐患

Tags: 自主研究 智能体 论文发布 科研自动化
Source: arXiv Computation and Language | 阅读原文

[摘要]
提出Arbor自主研究框架，通过假设树迭代实现AI长周期独立研究，在六个真实任务上取得2.5倍于基线模型的效果，推动通用自主科研智能体发展。

The Language You Ask In: Language-Conditioned Ideological Divergence in LLM Analysis of Contested Political Documents 85

Tags: AI安全 大模型 研究进展
Source: arXiv Computation and Language | 阅读原文

[摘要]
研究发现，ChatGPT和Claude在处理同一政治文件时，因提示语言不同（俄语vs乌克兰语）产生系统性意识形态偏见，威胁多语言AI部署的公平性。

Substrate Asymmetry in User-Side Memory: A Diagnostic Framework 85

Tags: 大模型 个性记忆 推理优化 研究发布
Source: arXiv Computation and Language | 阅读原文

[摘要]
论文揭示LLM用户侧记忆存在行为一致性、事实存在与缺失三个正交轴的基底不对称性：per-user LoRA在行为风格上胜出，但RAG在事实缺失上明显更优，RLHF会加剧这种不对称并导致对齐税。该诊断框架对个性化LLM开发有重要指导意义。

腾讯混元 AI Infra 新开源：HPC-Ops 推理核心算子全面升级 82

Tags: 开源生态 推理优化 模型部署
Source: AI HOT 精选 | 阅读原文

[摘要]
腾讯混元AI Infra开源升级HPC-Ops推理算子库，五大核心算子均来自生产实践，Attention长文本最高加速2.95x，Router GEMM提速3.22x，FusedMoE相对vLLM/SGLang提升1.2~1.6x，全面开源推动推理优化。

Beyond representational alignment with brain-guided language models for robust reasoning 82

Tags: 大模型 推理优化 多模态 AI安全
Source: arXiv Computation and Language | 阅读原文

[摘要]
研究证明LLM表征与人类推理脑区部分对齐，通过fMRI脑信号引导模型推理，在10个LLM上最高提升13%绝对准确率，开辟认知对齐新路径。

Toward Preference-aligned Large Language Models via Residual-based Model Steering 82

Tags: 大模型 模型对齐 推理优化 训练无关
Source: arXiv Computation and Language | 阅读原文

[摘要]
提出 PaLRS 方法，利用残差流实现训练无关的偏好对齐，在数学推理和代码生成上优于 DPO 且大幅节省时间，提供灵活的轻量对齐方案。

ProHiFlo: Hierarchical Flow Matching with Functional Guidance for De Novo Protein Generation 82

Tags: 研究 蛋白质生成 流匹配
Source: arXiv Computation and Language | 阅读原文

[摘要]
提出ProHiFlo层次流匹配框架，融合功能引导与多尺度处理，在蛋白质从头生成上超越现有方法，采样效率提升4倍。

PCS-UQ: Uncertainty Quantification via the Predictability-Computability-Stability Framework 82

Tags: 不确定性量化 AI安全 研究突破
Source: arXiv Statistics - Machine Learning | 阅读原文

[摘要]
PCS-UQ 提出基于预测性、可计算性和稳定性的不确定性量化框架，在回归和分类任务中优于现有方法，并给出高效深度学习变体，提升可信任AI安全性。

Runway与Lionsgate扩大战略合作 80

Tags: 公司动态 AI视频生成 产业合作 影视娱乐
Source: AI HOT 精选 | 阅读原文

[摘要]
Lionsgate与Runway扩大战略合作，取得股权并启动新IP开发，首推基于现有IP的短剧系列，标志AI影视合作深化。

小米发布并开源终端AI编程助手MiMo Code V0.1.0，采用MIT协议 80

Tags: 模型发布 开源生态 AI编程 智能体
Source: AI HOT 精选 | 阅读原文

[摘要]
小米开源终端AI编程助手MiMo Code，性能比肩Claude Sonnet，支持无限上下文和持久记忆，SWE-Bench Pro达62%。

On the Optimal Reasoning Length for RL-Trained Language Models 80

Tags: 大模型 推理优化 强化学习
Source: arXiv Computation and Language | 阅读原文

[摘要]
研究发现RL训练的语言模型中，推理准确率随输出长度先升后降，存在最优中间长度，为推理效率和成本优化提供关键指导

Short Chains, Deep Thoughts: Balancing Reasoning Efficiency and Intra-Segment Capability via Split-Merge Optimization 80

Tags: 推理优化 大模型 研究
Source: arXiv Computation and Language | 阅读原文

[摘要]
提出CoSMo框架，通过一致性引导的分割合并优化，在大推理模型中提升准确率3.3%并减少28.7%的segment使用，显著提升推理效率。

Litespark Inference For CPUs: Ultra-Fast SIMD Framework for Ternary (1.58-bit) Language Models 80

Tags: 推理优化 模型发布 开源生态
Source: arXiv Computation and Language | 阅读原文

[摘要]
Litespark-Inference 框架发布，专为三元（1.58-bit）语言模型在CPU上实现超快推理，通过自定义SIMD内核替换矩阵乘法为加减操作，在Apple Silicon上吞吐提升18倍、Intel/AMD上最高96倍，pip安装即可使用。

Steering the Noise: Turning Random Perturbations into Effective Descent for Memory-Efficient LLM Fine-Tuning 80

Tags: 训练方法 大模型 模型微调
Source: arXiv Computation and Language | 阅读原文

[摘要]
提出一种内存高效的大模型微调方法，通过候选扰动选择更优下降方向，在OPT-13B上超越所有零阶基线，收敛更快且精度更高。

Redesign Mixture-of-Experts Routers with Manifold Power Iteration 80

Tags: 模型发布 训练方法 大模型
Source: arXiv Computation and Language | 阅读原文

[摘要]
提出Manifold Power Iteration重新设计MoE路由器，使路由行与专家主奇异方向对齐，提升从1B到11B参数规模MoE模型性能。

2026-06-12 ​

全自主无人机首次击毙了人类士兵 88 ​

Gemini Omni Flash 视频任务达 SOTA 85 ​

LLMpedia: A Transparent Framework to Materialize an LLM's Encyclopedic Knowledge at Scale 85 ​

OpenMedReason: Scientific Reasoning Supervision for Medical Vision-Language Models 85 ​

Measuring Epistemic Resilience of LLMs Under Misleading Medical Context 85 ​

Toward Generalist Autonomous Research via Hypothesis-Tree Refinement 85 ​

The Language You Ask In: Language-Conditioned Ideological Divergence in LLM Analysis of Contested Political Documents 85 ​

Substrate Asymmetry in User-Side Memory: A Diagnostic Framework 85 ​

腾讯混元 AI Infra 新开源：HPC-Ops 推理核心算子全面升级 82 ​

Beyond representational alignment with brain-guided language models for robust reasoning 82 ​

Toward Preference-aligned Large Language Models via Residual-based Model Steering 82 ​

ProHiFlo: Hierarchical Flow Matching with Functional Guidance for De Novo Protein Generation 82 ​

PCS-UQ: Uncertainty Quantification via the Predictability-Computability-Stability Framework 82 ​

Runway与Lionsgate扩大战略合作 80 ​

小米发布并开源终端AI编程助手MiMo Code V0.1.0，采用MIT协议 80 ​

On the Optimal Reasoning Length for RL-Trained Language Models 80 ​

Short Chains, Deep Thoughts: Balancing Reasoning Efficiency and Intra-Segment Capability via Split-Merge Optimization 80 ​

Litespark Inference For CPUs: Ultra-Fast SIMD Framework for Ternary (1.58-bit) Language Models 80 ​

Steering the Noise: Turning Random Perturbations into Effective Descent for Memory-Efficient LLM Fine-Tuning 80 ​

Redesign Mixture-of-Experts Routers with Manifold Power Iteration 80 ​

2026-06-12

全自主无人机首次击毙了人类士兵 88

Gemini Omni Flash 视频任务达 SOTA 85

LLMpedia: A Transparent Framework to Materialize an LLM's Encyclopedic Knowledge at Scale 85

OpenMedReason: Scientific Reasoning Supervision for Medical Vision-Language Models 85

Measuring Epistemic Resilience of LLMs Under Misleading Medical Context 85

Toward Generalist Autonomous Research via Hypothesis-Tree Refinement 85

The Language You Ask In: Language-Conditioned Ideological Divergence in LLM Analysis of Contested Political Documents 85

Substrate Asymmetry in User-Side Memory: A Diagnostic Framework 85

腾讯混元 AI Infra 新开源：HPC-Ops 推理核心算子全面升级 82

Beyond representational alignment with brain-guided language models for robust reasoning 82

Toward Preference-aligned Large Language Models via Residual-based Model Steering 82

ProHiFlo: Hierarchical Flow Matching with Functional Guidance for De Novo Protein Generation 82

PCS-UQ: Uncertainty Quantification via the Predictability-Computability-Stability Framework 82

Runway与Lionsgate扩大战略合作 80

小米发布并开源终端AI编程助手MiMo Code V0.1.0，采用MIT协议 80

On the Optimal Reasoning Length for RL-Trained Language Models 80

Short Chains, Deep Thoughts: Balancing Reasoning Efficiency and Intra-Segment Capability via Split-Merge Optimization 80

Litespark Inference For CPUs: Ultra-Fast SIMD Framework for Ternary (1.58-bit) Language Models 80

Steering the Noise: Turning Random Perturbations into Effective Descent for Memory-Efficient LLM Fine-Tuning 80

Redesign Mixture-of-Experts Routers with Manifold Power Iteration 80