万千十一

一线 AI 观察员

2026年7月

剪藏 2026年7月31日

做AI模拟的 Simulation Company 成立5个月，营收增长5倍，给财富100强公司跑了上千万次模拟（没说几家、也没说什么叫一次模拟），训练了一个置信度模型，员工50+，以 20 亿美元估值融了 2 亿美元 B 轮。

"Announcing Our Series B | Simile"

simile.com

剪藏 2026年7月31日

得益于 Sol 的系统性优化，Luna、Terra 分别降价 80%、20%，同时给 Sol 上了 2.5 倍速的 Fast 模式

"Advancing the price-performance frontier with GPT-5.6 | OpenAI"

openai.com

剪藏 2026年7月30日

Tesla、Starlink，马斯克手中场景不少，Grok 语音模型有其闭环

"Introducing Grok Voice Think Fast 2.0 | SpaceXAI"

x.ai

剪藏 2026年7月29日

https://camelai.com/blog/our-coding-agent-runs-in-a-cloudflare-durable-object-not-a-vm

"Miguel Salinas on X: "We rewrote our agent to run entirely in a Durable Object with Pi, Agents SDK and Code Mode" / X"

x.com

剪藏 2026年7月29日

1000 多名前沿模型公司员工联合声明： > We request that the U.S. government support an international effort to develop the technical and governance tools needed to deliberately pace the frontier of automated AI development.

"Pacing the Frontier"

pacingthefrontier.com

剪藏 2026年7月28日

Kimi K3用了自定义的开源协议，主要增加了两个限制： 1. 年营收2000万美元以上的MaaS服务商，必须签署单独协议 2. 月活1亿+或月营收2000万美元+必须显著标识Kimi K3 update： Kimi K3正式开源后，HuggingFace模型讨论区收到全国网友贺电。与此同时，主流推理服务商均上线day0支持，有趣的是跨厂商定价极度标准化，海外百万token输入/输出为$3/$15、国内则为20/100元，均对齐月之暗面官方定价。相比之下，DeepSeek同一模型各家价差可达数倍。K3的自定义开源协议要求MaaS厂商单独签合约，形成了伙伴协议下的统一价格体系，也让下游去卷吞吐、时延等token服务质量，产业效应不容小觑。

"Kimi K3 开放日：模型权重、技术报告和关键 Infra 技术同步开放"

mp.weixin.qq.com

剪藏 2026年7月25日

针对 Claude 5 系列的系统提示词缩减了 80%，harness 越来越薄，不再需要详尽的示例，更倾向于给原则

"The new rules of context engineering for Claude 5 generation models | Claude by Anthropic"

claude.com

剪藏 2026年7月24日

有意思，竟然背下了素材站 Upslash 上的图片链接，通过 10 倍的思考过程把代码都先写了一遍

"Design Arena on X: "How Kimi K3 Topped Design Leaderboards" / X"

x.com

剪藏 2026年7月23日

ChatGPT Work & Codex 用户破千万

"Tibo on X: "10M! New day, new usage reset for paid users of Codex and ChatGPT Work. Lands in the next hour. Enjoy. https://t.co/VUJ4S3vKDG" / X"

x.com

剪藏 2026年7月23日

都得出 Router，Cursor 优势在于数据和在线验证，同时回应了缓存未命中的情况，并分为 Intelligence、Balance、Cost 三档可选，前两者分别对标 Fable、GPT-5.6 Sol/Opus 4.8

"Introducing Cursor Router · Cursor"

cursor.com

剪藏 2026年7月22日

Poolside Laguna 系列，4 月发了 225B-A23B 的 M.1 和 3B-A3B 的 XS.2，后者7月初升级为 XS 2.1，这次上的 S2.1: 118B-A13B，百万上下文，4096张H200 60天完成。模型初期口碑不错，博客也有不少经验细节，值得学习。

"Introducing Laguna S 2.1 — Poolside"

poolside.ai

剪藏 2026年7月22日

ContextAgent + ProposalAgent x N 的异步 harness

"Hyra发布：一个简单有效的科学发现智能体"

mp.weixin.qq.com

剪藏 2026年7月22日

Flash 小版本升级、Flash-Lite 更新到 3.5，Pro 是真难产

"3.6 Flash, 3.5 Flash-Lite, and 3.5 Flash Cyber"

blog.google

剪藏 2026年7月22日

Fireworks 评测说 K3 几乎能和 Fable 打平，然后二者结合更好。站在推理厂商的立场上，开源模型和路由都是利好的，所以结论可能需要审慎看待。

"Kimi K3 is competitive with Fable; Kimi K3 + Fable is SoTA."

fireworks.ai

剪藏 2026年7月22日

有意思，上周 HuggingFace 用 GLM-5.2 来防御的不知名攻击，今天被 OpenAI 认领，包括 GPT-5.6 Sol 和尚未发布的一款模型（结合下周 Sam Altman 要到白宫为一款强大的模型报备，坊间推测为 GPT-6），在内部网络安全评测过程中通过 0-day 漏洞突破了沙盒与联网限制，跑到 HuggingFace 上找数据集抄答案以获取高分。

"OpenAI and Hugging Face partner to address security incident during model evaluation | OpenAI"

openai.com

剪藏 2026年7月21日

比较详实的 benchmark 说明，已经从 GPQA 等静态刷题测试、arena 等竞技场测试，向 METR horizon 系列、Epoch ECI 等统计类测试过渡，重点是对题目难度有一个映射

"Have Chinese AI Models Caught Up to the US Frontier?"

scaling01.substack.com

剪藏 2026年7月21日

OpenAI 内部做长程研究的 AI 看错了指令想把结果提交到 GitHub 于是打破了沙盒

"Safety and alignment in an era of long-horizon models | OpenAI"

openai.com

剪藏 2026年7月21日

多模型答题比对，根据字串统计、散度值计算来分析文风相似度，仅有的7家厂商22模型对比显示 GLM 5.2 和 Gemini 3.1 Pro 很像（但是能力反而胜出了？）、Kimi K3 和 Fable 5 也比较接近，感觉好像能说明的问题有限，但用 Fable 蒸馏了 Opus 4.7/4.8、Sonnet 5 倒是能看出来

"You're relatively right! | Typebulb"

typebulb.com

剪藏 2026年7月20日

超越 TTS，还能编排音效、环境声等，所以称之为音频创作模型。内部用了一个通用声学编码器，把不同音频要素放在统一框架中。自然语言提示一切、精准时间控制，有点像生图领域的进展，模型都变成Agent了。Seed 在声音上还是比较引领的。（但好像没提到音乐？值得试试。）

"从“会说”走向“会创作”｜ Seed Audio 1.0 音频创作模型发布"

seed.bytedance.com

剪藏 2026年7月20日

感觉最近发了好几次 PR 稿，这次官号发声刚好撞在 Seed Audio 1.0 发布

"Qwen-Audio-3.0-TTS"

funaudiollm.github.io

剪藏 2026年7月17日

Fireworks 以 175 亿美元估值融了 15.05 亿的 D 轮，ARR 超过 10 亿，日均 40 万亿 token，口号是“每家企业都必须own自己的智能”

"Fireworks Secures $1.5 Billion in Series D Funding"

fireworks.ai

剪藏 2026年7月17日

2.8T-A104B、百万上下文、多模态，稀疏度进一步提升，896专家激活16个，Kimi Delta Attention（KDA）+ 注意力残差 AttnRes；7月27日权重开源+技术报告。 > K3 训练到最后，内核优化的工作就由 K3 自己做了价格不便宜，API 定价（$3/15、¥20/100）国内最贵了，是 GPT-5.6 Sol 的一半、Fable 5 的 1/3 不到，同时承认实际体验还不如 Fable 5 和 GPT-5.6 Sol： > Despite being a highly competitive model overall, K3 nonetheless exhibits a noticeable gap in user experience compared with Claude Fable 5 and GPT 5.6 Sol.

"Kimi K3 Tech Blog: Open Frontier Intelligence"

kimi.com

剪藏 2026年7月17日

遭到了端到端自主 AI 系统驱动的攻击，尝试用前沿闭源模型但因为数据上传的隐私及安全限制发现并不适用，所以改用了开源的 GLM-5.2

"Security incident disclosure — July 2026"

huggingface.co

剪藏 2026年7月16日

自对抗强化训练 GPT-Red，比人类红队厉害，能攻下 GPT-5.5 及之前的所有模型，还在模拟售货的场景中让经营者 AI 把售价降至最低、取消他人订单等等；再拿这些成功的注入案例训练，得到了能抵抗的 GPT-5.6

"GPT-Red: Unlocking Self-Improvement for Robustness | OpenAI"

openai.com

剪藏 2026年7月16日

用 Codex + TAO skill 一天完成 Cosmos 3 的后训练

"Post-Train NVIDIA Cosmos 3 in One Day Using Agent Skills | NVIDIA Technical Blog"

developer.nvidia.com

剪藏 2026年7月16日

Anthropic、黑石、Hellman & Friedman 合作推出了 Ode，一家 FDE 公司？

"Anthropic, Blackstone, and Hellman & Friedman Introduce Ode with Anthropic, an Enterprise AI Services Firm"

ode.com

剪藏 2026年7月15日

Thinking Machines Labs 终于正式上了款模型，命名非常谦卑 Inkling，包括 975B-A41B 的 Inkling 和即将上线的 276B-A12B Inkling-Small，音视文全模态、有条件的百万上下文、开源，鼓励在其 Tinker 平台继续精调。训练 knowledge cutoff 到2026年4月，用到了来自 kimi 的合成数据；还提到规模化 RL 过程中发现的涌现（与之前 Cognition 训练 SWE-1.7 时一样），随着强化，模型的 CoT 会愈发简洁，像是从合乎语法的完整表述压缩成了电报。感觉会对可观测性等有一些影响。

"Inkling: Our open-weights model - Thinking Machines Lab"

thinkingmachines.ai

剪藏 2026年7月15日

低比特量化这几天挺热闹的，humans& 做NVFP4 的RL、hy3做了4-bit和1-bit单卡部署、PrismML 做了 Qwen3.6-27B 的手机部署版

"PrismML — Announcing Bonsai 27B: The First 27B-Class Model to Run on a Phone"

prismml.com

剪藏 2026年7月15日

Codex 与 ChatGPT 合并后，连续两天日增百万用户，已达 800 万；但背后是放开5小时限制、多次重置并发放重置卡的补贴砸出来的，能留存多久、能多久不犯错，还值得观望

"Tibo on X: "Hello. We have reached 8M active users across Codex and ChatGPT Work. We are once again resetting the usage limits for all. And we continue to not have the 5h rate limit as well, allowing everyone to explore the boundaries of GPT-5.6 Sol and discover how ambitious you can be." / X"

x.com

剪藏 2026年7月14日

Grok Build 被曝出上传用户代码仓库，SpaceXAI 慌忙解释，但信任已失

"SpaceXAI on X: "We care deeply about your privacy and respect customer choice. For teams using zero data retention, no trace and code data is ever retained. All API key use of Grok Build also respects ZDR. If ZDR is disabled, the /privacy command is available in the CLI to disable data" / X"

x.com

剪藏 2026年7月13日

灵巧手新阶段？

"NEO’s Hands | An API to the Physical World"

1x.tech

剪藏 2026年7月12日

最近AI公司很热闹，MiniMax闫俊杰、智谱唐杰全员信，Thinking Machines Lab 也表明使命，值得对比挖掘一下

"The Future Worth Building Is Human - Thinking Machines Lab"

thinkingmachines.ai

剪藏 2026年7月10日

MiniMax 完成新一轮 20亿美元融资，同时闫俊杰宣布实现 AGI 之前都不领薪酬，未来四年拿出 4% 股份激励团队、1% 作为基金支持开源

"RyanLee on X: "I’m incredibly excited to share this: MiniMax has just closed a new $2B funding round. 🚀 At the same time, our CEO, IO, shared three long-term commitments with the team: • No salary until we achieve AGI. • Over the next four years, he will dedicate shares equivalent to 4%" / X"

x.com

剪藏 2026年7月10日

ChatGPT Work，糟糕的名字

"ChatGPT is now a partner for your most ambitious work | OpenAI"

openai.com

剪藏 2026年7月9日

Radical、Nvidia、Intel、Dell 和 John Schulman 等一众天使投出来的

"$130M Series A to Build the Open Superintelligence Stack"

primeintellect.ai

剪藏 2026年7月9日

除了指哪打哪的精准编辑能力外，还能划分出图层来完成复杂设计

"不止“生成”，更懂“设计” ｜ Seedream 5.0 Pro 发布"

seed.bytedance.com

剪藏 2026年7月9日

全双工，能调用后台的 GPT-5.5，想到 Thinking Machine 的 TML 和豆包全双工 update：试了一下，中文还不太行

"Introducing GPT-Live | OpenAI"

openai.com

剪藏 2026年7月9日

在已经针对 Coding 强化的 Kimi-2.7-Code 基础上继续训，利用了 RL rollouts 的自足特性，在美加新澳四地分布式训练，和 Fireworks 等推理服务商一起

"SWE-1.7: Frontier Intelligence at a Fraction of the Cost | Cognition"

cognition.com

剪藏 2026年7月9日

SWE-Bench Pro 也不行了，所以没公布 GPT-5.6 的跑分？

"Separating signal from noise in coding evaluations | OpenAI"

openai.com

剪藏 2026年7月9日

与 Cursor 联合在万卡 GB300 集群上训练，针对单位 token 智能做 RL

"Introducing Grok 4.5 | SpaceXAI"

x.ai

剪藏 2026年7月9日

Databricks 的评估，pi 效果好、又省钱；更显著的趋势是 benchmark 现在都至少是二维甚至三维的了，performance vs thinking efforts、token/price、harness

"Benchmarking Coding Agents on Databricks’ Multi-Million Line Codebase | Databricks Blog"

databricks.com

剪藏 2026年7月8日

Meta 超级智能团队端出了图像生成编辑模型 Muse Image，冲上了竞技场第二名，仅次于 GPT Image 2；同时还预览了视频模型 Muse Video

"Introducing Muse Image and Muse Video"

ai.meta.com

剪藏 2026年7月3日

5月中，OpenAI Deployment Company 7月初，Microsoft Frontier Company FDE 热闹得很

"Microsoft Frontier Company: AI engineering that amplifies and protects your intelligence - The Official Microsoft Blog"

blogs.microsoft.com

剪藏 2026年7月1日

硅基流动港交所递表，算是Token工厂路上的一个里程碑。美国类似路径厂商更多分化也更垂，通用推理的Together和Fireworks、多模态的Fal和Replicate、ASIC工厂Cerebras等都各有千秋。相比之下，国内云厂的token工厂已成标配，独立推理厂很难有效破局，硅流的低份额与高亏损也是生动体现。

"卖Token也不是稳赚不赔！硅基流动招股书来了 – 量子位"

qbitai.com

剪藏 2026年7月1日

Fable 5 恢复； Mythos 5 仍限美国组织，Anthropic 正在努力把推到全球玻璃翼伙伴；顺便联合 Amazon、Microsoft、Google 等伙伴，提了一个赋能+转化的 4 条越狱评级方案；最后拥抱政府合作，提前给模型测试、加强沟通和研究报告等

"Redeploying Claude Fable 5 \ Anthropic"

anthropic.com

剪藏 2026年7月1日

连续更新 Opus 和更大的 Mythos/Fable 后，Sonnet 终于迎来了新版本，定价保持 $3/15 一致，但给了两个月的 $2/10 上线折扣；拒答等安全能力提升，但仍不比 Mythos 和 Opus 4.8；网络安全能力也相对较弱，所以没有上 Fable 5 配的那种安全护栏。

"Introducing Claude Sonnet 5 \ Anthropic"

anthropic.com

剪藏 2026年7月1日

金融日常充满细微判断，模型还难以胜任，Thinking Machines 团队构造数据训练了模型实现显著提升

"Learning to Replicate Expert Judgment in Financial Tasks - Thinking Machines Lab"

thinkingmachines.ai

剪藏 2026年7月1日

同日 Anthropic 发 Claude Science，OpenAI 发 GeneBench

"Introducing GeneBench-Pro | OpenAI"

openai.com

剪藏 2026年7月1日

融了 $8 亿、拿下 $10 亿订单、400 人团队，Etched 流片成功，靠低电压推理和集群级内存分别解决散热和通讯瓶颈

"Etched on X: "We're coming out of stealth. We've built our first racks after a successful A0 tapeout, $1B+ in customer contracts, and $800m raised. Early customer tests show us achieving SOTA throughput, latency, and power efficiency on inference workloads. Our first racks ship this summer. https://t.co/FLccrkLTza" / X"

x.com

2026年6月

剪藏 2026年6月30日

1.6T-A48B，百万上下文，重点： > LongCat-2.0的完整训练流程与大规模部署均全部使用国产算力集群。预训练在5万余国产算力芯片上耗时月余完成，消费了超过 35 万亿 tokens，全程无回滚、无不可恢复的 loss 突刺。这一结果验证了我们有能力在国产算力平台上进行前沿级大规模模型训练。

"Introducing LongCat-2.0"

longcat.chat

剪藏 2026年6月30日

最近模型路由真热闹，OpenRouter、Sakana、Devin 都在搞，算不明白这笔经济账

"Devin Fusion: Frontier Performance at 35% Lower Cost"

cognition.com

剪藏 2026年6月26日

OpenAI 内部数据，Codex 月活比例达 97.9%，token 占比达 99.8%，非开发者增长最快，知识工作类任务在非技术部门占比最高

"How agents are transforming work | OpenAI"

openai.com

剪藏 2026年6月25日

Jalapeño（墨西哥辣椒）计划 2026 年底部署，设计到生产仅花了 9 个月，每瓦性能 SoTA，已经用于推理 GPT-5.3-Codex-Spark。Cerebras 暴跌。

"OpenAI and Broadcom unveil LLM-optimized inference chip | OpenAI"

openai.com

剪藏 2026年6月24日

在 Slack 中把 Claude 当成一个同事来艾特，除了在 Slack channel 中与响应召唤并完成任务外，Claude 还能持续学习、主动提醒、异步无休。这种形态 Devin 做的很早，现在已经比较成熟了，还是得感谢 Slack 开放 API。

"Introducing Claude Tag \ Anthropic"

anthropic.com

剪藏 2026年6月23日

A24 就是那家做了《伯德小姐》《瞬息全宇宙》的工作室

"Google DeepMind and A24 launch research partnership"

blog.google

剪藏 2026年6月22日

云厂商Cloudflare在拥抱AI上一直相当积极，之前和支付服务商Stripe合作推了Agent支付方案，让AI能购买域名和云计算相关资源，从而自主地部署应用。这两天又上了一个“临时账户”的新功能，让Agent在没有授权的情况下也能将网站部署至Cloudflare，提供1个小时有效期让用户去预览、迭代，满意了再登录/注册并绑定这个临时账户。相当于让渡了一些便宜的临时资源，让整个流程更加顺畅丝滑，也服务于后续的资源售卖，是个不错的案例。

"Temporary Cloudflare Accounts for AI agents"

blog.cloudflare.com

剪藏 2026年6月20日

用 D1-D4 四级检测级别和 R1-R3 三级响应级别来 AI 能力和可控性

"Securing internal systems against increasingly capable and imperfectly aligned AI — Google DeepMind"

deepmind.google

剪藏 2026年6月19日

重新用 Opus 4.7 跑了一通去年的 AI 驱动机器狗项目 Project Fetch（当时用的 Opus 4.1），Opus 4.7 自主碾压了两支队伍当时的表现。尽管自主性越来越强，人在环中仍必要。

"Project Fetch: Phase two \ Anthropic"

anthropic.com

剪藏 2026年6月18日

有趣，Midjourney 推出了一款硬件 Midjourney Scanner，全身超声扫描 + AI分析，然后还顺手做了一个 SPA？？预计明年在 SF 开

"A New Era of Midjourney"

midjourney.com

剪藏 2026年6月18日

声画同步、质感提升、运镜叙事，但一次免费体验都没给。 PS：快半年了 Seedance 2.0 依旧领先，可怕。

"Grok Imagine Video 1.5 | xAI"

x.ai

剪藏 2026年6月18日

刚刚，微信支付正式发布 AI 专属卡

"微信支付发布AI专属卡 WorkBuddy率先接入 – 量子位"

qbitai.com

剪藏 2026年6月18日

Harvey 效仿 Cursor，开始训练自己的法律基础模型

"Gabe Pereyra on X: "Model strategy for @harvey: We are working on the first model in our legal foundation model series, inspired by @cursor_ai's Composer. Two goals: 1. Allow us to serve frontier intelligence across our product surface areas at an affordable price and a strong security posture." / X"

x.com

剪藏 2026年6月12日

Google DeepMind 认为多智能体的交互会涌现出新的集体行为、能力提升和新的风险，于是掏出一千万美金支持多智能体研究。上个月 Emergence World 的15天社会模拟研究发现，不同模型组成的多智能体混合世界，生产创造力更强但同时也出现了暴力犯罪等情况。未来多智能体甚至海量智能体如何演化、会形成何种生态、有何涌现，会是很有趣的问题。

"Google DeepMind and partners announce multi-agent safety research funding call. — Google DeepMind"

deepmind.google

剪藏 2026年6月10日

神话遥不可及，寓言才是正道。去年9月上线的 SWE-Bench Pro 超过80%，感觉就已经饱和了；昨天 Cognition 推出的 FrontierCode 一下子逼近 30%，断崖式领先； Fable 5 留存使用数据，还会将网安、生化和【蒸馏】相关请求路由至 Opus 4.8；同时用以开发新模型的请求会被降智且无提示；价格与 Mythos 一致，百万输入/输出为 $10/50，是 Mythos 预览版的4成、Opus 的两倍； Pro/Max 等订阅账号中的 Fable 5 使用截止到 6月23日，只能按用量计费了。 20260611更新：在社区强烈反对下，Anthropic致歉并去掉了Fable在涉前沿模型训练中的隐式降智，变为与网安、生化、蒸馏一致的可见回退

"Claude Fable 5 and Claude Mythos 5 \ Anthropic"

anthropic.com

剪藏 2026年6月9日

Claude Code/Cowork 工程总监 Fiona Fung 在 Code w/ Claude SF 2026 大会上分享了其 AI 原生工程团队实践，总结就是 AI First，三条共识原则，持续dogfood、极致扁平、毫不犹豫地弃旧，感觉第三条可能最难，加法容易减法难

"Running an AI-native engineering org | Claude"

claude.com

剪藏 2026年6月9日

开头段写的不错，AI带来新可能，随之缓缓会形成新秩序，关键在于人们如何使用。同样提及 AI 研发 AI，也正在把自动化 AI 研究员作为目标之一。

"Built to benefit everyone: our plan | OpenAI"

openai.com

剪藏 2026年6月5日

重点是提升了事件时间感知，忘掉已经完成的

"Dreaming: Better memory for a more helpful ChatGPT | OpenAI"

openai.com

剪藏 2026年6月5日

竞技场上了 Agent 模式，用 causal tracing 的组件式替换方法拆分评测，主编排模型中目前 GPT-5.5 领先

"Agent Arena: Causal Evaluation of Agents in the Real World"

arena.ai

剪藏 2026年6月5日

从外部评测、内部的 8x 代码量 & 质量提升 & 研究能力，充分论证了 AI 自进化的趋势，人类 review 是瓶颈。同时推演了减速发展、正常进展、加速自主三种未来，结论说应该停下来想一想，但是不可能，所以组织各界讨论。

"When AI builds itself \ Anthropic"

anthropic.com

剪藏 2026年6月5日

Suno 以 54 亿美元估值融了 4 亿美元的 D 轮，之前和音乐巨头达成了合作，接下来会推出新的“合规”音乐生成模型

"The Next Chapter for Suno · Suno"

suno.com

剪藏 2026年6月5日

企业账单公司 Ramp 以 440 亿美元估值融了 7.5 亿

"Ramp at $44 Billion: The Third Pillar"

ramp.com

剪藏 2026年6月4日

与 Ideogram 4.0 撞车，Reve 2.0 自称为 Large Layout Model，通过划分区域，来实现极致的重建效果，竞技场仅次于 GPT-Image-2

"The Layout Bet - Reve Blog"

blog.reve.com

剪藏 2026年6月4日

9.3B，非商用开源，文生图竞技场开源最优

"Ideogram 4.0: Open image model at the forefront of design."

ideogram.ai

剪藏 2026年6月4日

李飞飞认为，Veo、Genie等视频路线属于渲染，VLA等属于规划，自己做的Marble 3D模拟是二者之前的桥梁，未来三者会融合

"A Functional Taxonomy of World Models - Dr. Fei-Fei Li"

drfeifei.substack.com

剪藏 2026年6月3日

在开始的50家合作伙伴外，新增150家，来自15+国家，涵盖能源、供水、健康、通讯、硬件等行业，不少伙伴是关键基础设施供应商，共性是被攻陷就面临灾难性后果，Anthropic 评估，这些伙伴中的绝大多数而言，一次大攻击将会影响亿计人口

"Expanding Project Glasswing \ Anthropic"

anthropic.com

剪藏 2026年6月3日

MAI-Thinking-1 MAI-Code-1-Flash MAI-Image-2.5（-Flash） MAI-Transcribe-1.5 MAI-Voice-2（-Flash）

"Building a hill-climbing machine: Launching seven new MAI models | Microsoft AI"

microsoft.ai

剪藏 2026年6月3日

随着 Opus 4.8 同步上线 Claude Code 的动态工作流功能，在 Anthropic 的定位中是 AI 自主根据任务编写 harness 的尝试，所以继模型之后，harness 的自进化、实时调整也进入产品验证阶段，上下文层面解决懒惰、自偏和目标偏移等问题，一套通用方案应对 coding 外的研究、安全分析等所有任务，野心很大。实现上是用 JS 脚本定义 agent 然后通过 parallel 和 pipeline 两种来编排，和之前讨论比较多的 RLM（Recursive Language Model）思想相通。

"A harness for every task: dynamic workflows in Claude Code | Claude"

claude.com

剪藏 2026年6月3日

周活5百万，Codex推出针对数分、销售、设计等不同岗位的插件系统；同时上线 Sites，vibe coding网站一键上线分发，暂仅支持工作区内

"Codex for every role, tool, and workflow | OpenAI"

openai.com

剪藏 2026年6月2日

WeirdLM 团队研究发现，尽管公开评测显示开源模型只差闭源模型4-6个月，但在私有评测上这个落后达8-10个月，2025年初的DeepSeek-R1是最接近的（差~3个月），此后差距就一直在扩大

"How far behind are open models? — LessWrong"

lesswrong.com

剪藏 2026年6月2日

不是特别理解，hero demo的效果有些一言难尽…

"SANA-Streaming | Real-time Streaming Video Editing"

nvlabs.github.io

剪藏 2026年6月2日

A\ 率先提交 IPO

"Anthropic confidentially submits draft S-1 to the SEC \ Anthropic"

anthropic.com

剪藏 2026年6月2日

花了大篇幅讲多模态推理能力，基于GUI操作的开发-验证闭环、视觉设计复刻、图/视频转SVG，还做了个Qwen for Chrome浏览器插件。和 Mimo 一样，千亿尺寸上多模态，万亿的 Max 仍仅有文本。

"Qwen3.7-Plus：多模态智能体"

qwen.ai

剪藏 2026年6月1日

训练数据涨到了100T；MiniMax 稀疏注意力支撑的百万上下文；两阶梯计价，比DeepSeek V4 Pro贵；10天内开放权重和技术报告

"MiniMax M3: Frontier Coding, 1M Context, Native Multimodality — All in One Model - MiniMax Research | MiniMax"

minimax.io

2026年5月

剪藏 2026年5月29日

动态地规划任务并拆分，并行派发高达数百个子 Agent，自查完才给你验收，用 Rust 重写 Bun 就是靠的这套工作流

"Introducing dynamic workflows | Claude"

claude.com

剪藏 2026年5月29日

很重要的提升是诚实，会坦诚不确定性；同时宣布未来几周全面上线 Mythos 级别模型

"Introducing Claude Opus 4.8 \ Anthropic"

anthropic.com

剪藏 2026年5月29日

2月刚融了G轮，5月就融H轮，年化营收超470亿美元，估值来到9650亿，都放大了约三倍

"Anthropic raises $65B in Series H funding at $965B post-money valuation \ Anthropic"

anthropic.com

剪藏 2026年5月28日

年化营收 4.92 亿美元，融了一个独角兽，估值 260 亿美元

"More Devins in More Places | Cognition"

cognition.ai

剪藏 2026年5月27日

OpenRouter 融了 1.13 亿美元的 B 轮，估值约 13 亿，每周 token 25 万亿

"OpenRouter on X: "Today we’re announcing our $113M Series B led by @CapitalGVC. Over the last 6 months, weekly volume on OpenRouter grew from 5T to 25T tokens as AI rapidly shifts from experimentation into production. We’re excited for what comes next. https://t.co/soAFvX7fzk" / X"

x.com

剪藏 2026年5月23日

玻璃翼计划首期报告：除了 Cloudflare、UK AISI、Mozilla、XBOW 的第三方评测外，A\ 自己对 1000+ 开源软件的扫描，已发现 23019 个漏洞，其中高危/严重者 6202，其中报告给外部安全公司的 1900 条数据反馈，90% 是验证准确的

"Project Glasswing: An initial update \ Anthropic"

anthropic.com

剪藏 2026年5月21日

两周前 Claude 还只配 Colossus 1，现在 SpaceX 转型做算力服务商，没地方用的 Colossus 2 也开放给 Cursor、Anthropic 了，还在持续招商中

"Tom Brown on X: "We’re expanding our partnership with @SpaceX, and will be scaling up on GB200 capacity in Colossus 2 throughout June. Appreciate @elonmusk and the team helping us find good homes for the Claudes." / X"

x.com

剪藏 2026年5月20日

Gemini 月活 9亿；日均处理 token 106 万亿

"Google I/O 2026: Sundar Pichai’s opening keynote"

blog.google

剪藏 2026年5月20日

Cerebras 部署了万亿参数的 K2.6，Artificial Analysis 的第三方评测是 981 tokens/s，相当惊人了，官方 API 大约 ~30 tokens/s

"Cerebras Brings Kimi K2.6 Inference to Enterprises"

cerebras.ai

剪藏 2026年5月20日

前沿风险评估

"Frontier Risk Report (February to March 2026) - METR"

metr.org

剪藏 2026年5月20日

Any to Any，先从视频开始，尝试复制 Nano Banana 的编辑效果

"Introducing Gemini Omni: Gemini Omni Flash is a model that can create anything from any input – starting with video."

blog.google

剪藏 2026年5月20日

Genie 与 Google 地图街景结合，自由地游览各种风格的地点

"Simulate real-world places with Project Genie and Street View"

blog.google

剪藏 2026年5月20日

AI发展太快，A\决定加强与社会各界的对话

"Widening the conversation on frontier AI \ Anthropic"

anthropic.com

剪藏 2026年5月20日

除了 HLE、ARC-AGI-2 等强推理外，Gemini 3.5 Flash 基本全面超过 Gemini 3.1 Pro，价格也从 Gemini 3 Flash 的 $3 上涨到了 $9 每百万输出。预告了下个月上 Gemini 3.5 Pro。

"Gemini 3.5: frontier intelligence with action"

blog.google

剪藏 2026年5月20日

Andrej Karpathy 加入了 Anthropic 的预训练团队，专注用 Claude 来加速自主预训练；A\ 真的是全球最有吸引力的公司了

"Andrej Karpathy on X: "Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time." / X"

x.com

剪藏 2026年5月19日

Cloudflare 团队体验了 Mythos Preview，高度评价，相比其他模型，重点肯定了 Mythos 串联多个小 bug 并端到端挖掘严重漏洞的系统能力

"Project Glasswing: what Mythos showed us"

blog.cloudflare.com

剪藏 2026年5月19日

实时下指令、多模态有声视频生成

"Starchild-1: The First Real-Time Multimodal World Model"

odyssey.ml

剪藏 2026年5月19日

多模态统一架构，图片&视频的理解、生成、编辑，来自字节，仅用了不到128卡训练

"Lance: Unified Multimodal Modeling by Multi-Task Synergy"

lance-project.github.io

剪藏 2026年5月19日

仍然基于 Kimi K2.5，部分在 SpaceXAI 的 Colossus 2 上训练，同时透露新的模型正在预训练，有算力就是不一样，定价也相当便宜

"Introducing Composer 2.5 · Cursor"

cursor.com

剪藏 2026年5月19日

Anthropic 的几次收购还是挺有规律，效率、品味、服务开发者

"Anthropic acquires Stainless \ Anthropic"

anthropic.com

剪藏 2026年5月17日

新加坡外交部长在 AIE Singapore 上分享自己的 AI 工程实践，好家伙。视频：https://youtu.be/_xQnSNlBP_w?t=2491

"Minister for Foreign Affairs Dr Vivian Balakrishnan's Speech at AI Engineer Singapore, 16 May 2026 | Ministry of Foreign Affairs"

mfa.gov.sg

剪藏 2026年5月16日

田渊栋等8位联创，融了6.5亿美元、估值46.5亿，人均贡献大半只独角兽

"Recursive Superintelligence: Why Self-Improving AI is the Next Frontier"

gv.com

剪藏 2026年5月16日

ChatGPT 个人专业金融服务

"A new personal finance experience in ChatGPT | OpenAI"

openai.com

剪藏 2026年5月15日

Anthropic这公司真有意思，趁着特朗普访华，赶紧发了一篇《2028全球AI领导力》的文章重申对华立场。细数当前中美竞争态势后，用情景推演的方式分析了2028年两种可能：美国持续扩大领先->支撑全球经济->保持安防优势->民主胜利；中国赶上->全民应用->技术抗衡->低成本赢得全球；最终落在芯片出口管制、限制模型访问及蒸馏、加快美AI全球出口上。公开发表的政策内参，大概算是一篇范文了。

"2028: Two scenarios for global AI leadership \ Anthropic"

anthropic.com

剪藏 2026年5月14日

Cerebras IPO，摘自量子位： > 发行价185美元，开盘价直接冲上350美元，盘中一度飙升到每股386美元（约合人民币2619元）。收盘涨幅定格在68%，每股311美元（约合人民币2111元），总市值约670亿美元（约合人民币4543亿元）。这是今年迄今规模最大的IPO，也是自2019年Uber上市以来美国科技公司规模最大的IPO。

"Cerebras Systems Announces Pricing of Initial Public Offering"

cerebras.ai

剪藏 2026年5月13日

Demis 牵头的 AI 生物医药公司 Isomorphic 宣布了 Thrive Capital 领投的 21 亿美元 B 轮融资

"Isomorphic Labs announces Series B investment round - Isomorphic Labs"

isomorphiclabs.com

剪藏 2026年5月13日

Chrome自动化浏览、跨应用点餐叫车、生成式组件

"Gemini Intelligence brings proactive AI to Android"

blog.google

剪藏 2026年5月13日

根据指针位置与移动理解“这”和“那”，更自然地交互；结合头部/眼部追踪，更科幻了

"Shaping the future of AI interaction by reimagining the mouse pointer — Google DeepMind"

deepmind.google

剪藏 2026年5月13日

计划秋季发布，搭载 Gemini Intelligence 和智能指针功能，与 Android 有更深的打通绑定

"Introducing Googlebook, designed for Gemini Intelligence"

blog.google

剪藏 2026年5月12日

Artificial Analysis 也推出了 Coding Agent 评测，并且测的是 Model + Harness 的组合，目前领先的是 Cursor CLI & Opus 4.7

"AI Coding Agent Index & Performance Analysis"

artificialanalysis.ai

剪藏 2026年5月12日

前 OpenAI CTO Miro 创办并获巨额融资的 Thinking Machines 团队，今天推出了第二个成果：交互模型。基于前台实时交互和后台异步并行的双模型协作系统，他们实现了 AI 主动打断、全模态响应、感知时间等之前 AI 系统做不到的事。几个演示也非常有趣，包括说话同时搜索并生成可视化界面、对话同时充当“计时器”、主动坐姿提醒、实时“语气润色”的翻译等等，推荐一看。结合前段时间豆包发的全双工语音模型、OpenAI更新的能思考办事的 GPT-Realtime-2，交互模型可能会是长程工作（Claude为代表）之外的另一个重要方向。暂时还是 276BA12B 的 TML-Interaction-Small，大尺寸版本比较难部署。

"Interaction Models: A Scalable Approach to Human-AI Collaboration - Thinking Machines Lab"

thinkingmachines.ai

剪藏 2026年5月12日

Claude 4 系列模型会因对齐失败导致勒索行为，究因是后训练还是以对话为主、没能有效对齐智能体场景和泛化，强调了原因和宪法训练，在 4.5 之后问题已解决

"Teaching Claude why \ Anthropic"

anthropic.com

剪藏 2026年5月9日

> 所有lab都怕字节，所有lab都尊重deepseek

"Notes from inside China's AI labs - by Nathan Lambert"

interconnects.ai

剪藏 2026年5月9日

Mythos 预览版在 METR 50% 等效人类任务时长已到达 16 小时，比第二名 Opus 4.6 多出 4 小时；80% 则为 3 小时，为第二名 Gemini 3.1 Pro 的两倍

"METR on X: "We evaluated an early version of Claude Mythos Preview for risk assessment during a limited window in March 2026. We estimated a 50%-time-horizon of at least 16hrs (95% CI 8.5hrs to 55hrs) on our task suite, at the upper end of what we can measure without new tasks. https://t.co/yIG1Ux27Ro" / X"

x.com

剪藏 2026年5月8日

新的语音模型：支持70→13语种同声传译的 GPT-Realtime-Translate、会思考/并行调用工具的 GPT-Realtime-2、新的流式语音识别模型 GPT-Realtime-Whisper

"Advancing voice intelligence with new models in the API | OpenAI"

openai.com

剪藏 2026年5月8日

继前天 PayPal 和 CoinBase 分别大裁 20%（约4700+人）、14%（约700人）后，今天 Cloudflare 也宣布裁员 1100+ 人（约20%），自然地，都归功于AI。 PS：发现一个裁员数据源 https://layoffhedge.com/

"Building for the future"

blog.cloudflare.com

剪藏 2026年5月8日

NLA 通过对模型激活的语义解释-再重建激活来训练，看起来比 SAE 更科学些，发现 Opus 4.6 知道它正在被评测，尽管没有说出来，可以用于发现动机

"Natural Language Autoencoders \ Anthropic"

anthropic.com

剪藏 2026年5月7日

Anthropic 与 SpaceX 合作，将 Colossus 1 集群所有算力（22万块卡、300兆瓦）用于部署 Claude，Colossus 2 保留用于训练 Grok > This joins our other significant compute announcements: • An up to 5 gigawatt (GW) agreement with Amazon, which includes nearly 1 GW of new capacity by the end of 2026; • A 5 GW agreement with Google and Broadcom, which will begin coming online in 2027; • A strategic partnership with Microsoft and NVIDIA that includes $30 billion of Azure capacity; • Our $50 billion investment in American AI infrastructure with Fluidstack.

"Higher usage limits for Claude and a compute deal with SpaceX \ Anthropic"

anthropic.com

剪藏 2026年5月7日

首个在 AMD 技术栈上pre-mid-post全链条训练出来的 MoE 模型

"ZAYA1-8B: Frontier intelligence density, trained on AMD"

zyphra.com

剪藏 2026年5月7日

Genesis 憋了一年推出成果，精细操作模型 GENE-26.5 和灵巧手 Genesis Hand 1.0

"GENE-26.5: Advancing Robotic Manipulation to Human Level"

genesis.ai

剪藏 2026年5月6日

在与人对齐的 post-training/tuning 阶段前，用 Model Spec / Constitution 做 mid-training，发现能更有效模型价值，降低 agentic 场景的 misalignment

"Model Spec Midtraining: Improving How Alignment Training Generalizes"

alignment.anthropic.com

剪藏 2026年5月6日

幻觉降低，视觉增强，回答更简洁；还有增强的记忆功能、索引展示等；更新的端到端语音还在预告当中

"GPT-5.5 Instant: smarter, clearer, and more personalized | OpenAI"

openai.com

剪藏 2026年5月2日

美国国家技术标准局（NIST）下的 AI标准与创新中心（CAISI）在评估包括 DeepSeek V4 在内的中美前沿模型后得出结论：中国落后美国8个月，且差距正呈扩大之势。

"CAISI Evaluation of DeepSeek V4 Pro | NIST"

nist.gov

剪藏 2026年5月1日

针对“用户找 Claude 寻求个人建议”这类对话（约6%）的分析，健康、事业、关系、财务占比最高，刚好与 Claude 高讨好性的交集在“关系”上，所以如果你找 Claude 讨论亲密关系并寻求建议，AI 大概率会支持你、劝分，好消息是这个问题正在随着模型迭代改善，Opus 4.7 的讨好率比 4.6 低，Mythos 比 4.7 低

"How people ask Claude for personal guidance \ Anthropic"

anthropic.com

剪藏 2026年5月1日

千问都开始往可解释性发力了，针对 Qwen3.5 系列 35B 以下尺寸训练了 14 组 SAE，应用场景包括： > 推理结果定向控制、数据分类与合成、模型训练与优化、评估样本分布分析与对比等

"Qwen-Scope：看穿大模型的“小心思”"

mp.weixin.qq.com

剪藏 2026年5月1日

Gemini 步履缓慢，终于开始往 Agent 方向使劲

"You can now generate files in Gemini"

blog.google

剪藏 2026年5月1日

Stripe 一大波更新，全为 Agent

"Everything we announced at Sessions 2026"

stripe.com

2026年4月

剪藏 2026年4月30日

Alphabet Q1 财报，一方模型的 API 调用量从上个 Q 的每分钟 100 亿升至本Q的 160 亿，折算日 token 为 23 万亿

"Alphabet earnings call, Q1 2026: Sundar Pichai’s remarks"

blog.google

剪藏 2026年4月30日

英国 AISI 的评估，GPT-5.5 网络安全能力与 Mythos 相当

"Our evaluation of OpenAI's GPT-5.5 cyber capabilities | AISI Work"

aisi.gov.uk

剪藏 2026年4月29日

GPT-1/2 作者训练的复古大模型，克服不少困难，只取1930年以前的数据，发现仍有泛化能力，可以学习简单编程，对“未来”感到惊讶

"Introducing talkie: a 13B vintage language model from 1930"

talkie-lm.com

剪藏 2026年4月29日

AWS 成最大赢家更新：6月1日已上线：https://openai.com/index/openai-frontier-models-and-codex-are-now-available-on-aws/

"OpenAI models, Codex, and Managed Agents come to AWS | OpenAI"

openai.com

剪藏 2026年4月29日

美联储的研究，抛出耦合因素，程序员就业在 ChatGPT 面世后确实受到显著冲击

"AI and Coder Employment: Compiling the Evidence"

federalreserve.gov

剪藏 2026年4月29日

Democratization, Empowerment, Universal prosperity, Resilience, Adaptability

"Our principles | OpenAI"

openai.com

剪藏 2026年4月29日

从知识工作杀入创意工作，看起来还只是几个 MCP 连接，顺带给自己的 Claude Design 带货

"Claude for Creative Work \ Anthropic"

anthropic.com

剪藏 2026年4月29日

从一个基座模型精调一系列有明确植入行为的模型，然后 SFT 训练一个 LoRA 来说明这些行为，能在新的精调模型上也生效

"Introspection Adapters: Training LLMs to Report Their Learned Behaviors"

alignment.anthropic.com

剪藏 2026年4月26日

Grok 语音 Agent，tau-voice 新高，提到与星链合作以降低时延，且已用于后者的电话服务，销售成功率20%，客服解决率70%

"Grok Voice Think Fast 1.0 | xAI"

x.ai

剪藏 2026年4月24日

1.6T A49B 的 Pro 开源 SoTA； 284B A13B 的 Flash 百万输入/输出为1/2元；令人动容： > 不诱于誉，不恐于诽，率道而行，端然正己。

"DeepSeek-V4 预览版：迈入百万上下文普惠时代"

mp.weixin.qq.com

剪藏 2026年4月23日

295BA21B，SWE-bench Verified 74.4%，开源，承认不比一线，但成本够低，评测外的真实效果值得预期

"Hy3 preview发布并开源：混元重建后首个模型，Agent能力大幅提升"

mp.weixin.qq.com

剪藏 2026年4月23日

开源模型的免费 API？

"Try NVIDIA NIM APIs"

build.nvidia.com

剪藏 2026年4月22日

只有 Flash 支持视觉、语音多模态？Pro 不开源更新：可能是因为 DeepSeek V4，几天后 Mimo V2.5 Pro 也开源

"MiMo-V2.5-Pro: A leap in agentic and long horizon coherence."

mimo.xiaomi.com

剪藏 2026年4月22日

今天真是巧了，Google Cloud 大会推了不少面向企业的 Agent，微软也在推类似的，但 ChatGPT 这套 7x24 小时的云端 Agent（底层 Codex）看起来有一些产品力，尽管 Google Workspace 的上下文更全，但接入 Slack 也许是个优势？还得看企业客户买不买单

"Introducing workspace agents in ChatGPT | OpenAI"

openai.com

剪藏 2026年4月22日

引用了德勤年初发的报告说仅25%组织成功将AI规模化用于生产，所以与埃森哲、贝恩、波士顿咨询、德勤、麦肯锡合作，推动企业AI转型，借助Google的技术

"Google DeepMind partners with global consultancies to accelerate enterprise AI adoption. — Google DeepMind"

deepmind.google

剪藏 2026年4月22日

“富贵人家才配用 Claude”：Epoch 和 Ipsos 的调研显示不同 AI 工具的用户，在家庭收入上相差甚远，Claude 用户中 80% 家庭年收入超 10 万美元，Meta AI 为 37%，Gemini、Grok、ChatGPT、Copilot 则在 56%-64%

"Claude skews high-income; Meta AI skews low-income | Epoch AI"

epoch.ai

剪藏 2026年4月22日

前 OpenAI 强化学习 VP Jerry Tworek 创办并担任 CEO 的新公司 Core Automation，致力于自动化 AI 研究

"Core Automation on X: "Today we're announcing Core Automation Our objective: systems that optimize and automate work, starting with research itself. https://t.co/lYiuq3Pbvj" / X"

x.com

剪藏 2026年4月22日

但据网友补充，与 K2.6、GPT-5.4-Pro 相比就没那么显眼了

"Introducing Deep Research and Deep Research Max"

blog.google

剪藏 2026年4月22日

SpaceX 与 Cursor 达成合作，为后者提供 Colossus 算力来训练 Composer 编程模型，换来 600 亿美金的 Cursor 收购权，或者 100 亿的算力费用。

"SpaceX on X: "SpaceXAI and @cursor_ai are now working closely together to create the world’s best coding and knowledge work AI. The combination of Cursor’s leading product and distribution to expert software engineers with SpaceX’s million H100 equivalent Colossus training supercomputer will" / X"

x.com

剪藏 2026年4月22日

风格逼真，多字小字多语种渲染，比例支持1:3到3:1。thinking/pro模式下，其实变成了一个生图Agent，可以通过思考、联网检索、预处理等来生成，官方称之为视觉思考伙伴（visual thought partner），可以与 Codex 配合来一站式做产品。甚至可以生成二维码！在折纸、魔方等需要精准世界知识的场景中，仍存在局限。 PS：官方博客和官推视频都是用 Images 2.0 做的，非常不错，无锡陈博远立大功。

"Introducing ChatGPT Images 2.0 | OpenAI"

openai.com

剪藏 2026年4月21日

一个旧的 SWE 评测倒下，就会有一个新的 SWE 评测站起，为了难住前沿模型，得费劲想不少刁钻的题目

"FrontierSWE: Blog"

frontierswe.com

剪藏 2026年4月21日

Codex 保留读屏记忆

"Chronicle – Build Codex memories from recent screen context."

developers.openai.com

剪藏 2026年4月20日

榜单上和 Opus 4.6 有来有回；长程工作 13 小时；Design Bench 上近七成胜/平 Gemini 3.1 Pro；Agent Swarm 子智能体数来到 300；新增 Claw Groups，得益于模型的编排调度能力，K2.6 可以动态调度一众 Claw 智能体

"Kimi K2.6 Tech Blog: Advancing Open-Source Coding"

kimi.com

剪藏 2026年4月20日

对比的还是近 5 个月前发布的 Opus 4.5

"Qwen3.6-Max-Preview：更强知识，更强编程，持续进化"

qwen.ai

剪藏 2026年4月18日

4个能力指标中的3个都显示AI进展正在加速

"Have AI Capabilities Accelerated? | Epoch AI"

epoch.ai

剪藏 2026年4月18日

设计、原型、幻灯片，全在 Claude，能分享、可导出 Canvas/PPTX/HTML

"Introducing Claude Design by Anthropic Labs \ Anthropic"

anthropic.com

剪藏 2026年4月17日

Codex 桌面应用上了 Computer Use、内置浏览器、生图工具、自主设定时任务、主动地个性化推送，迈向全流程

"Codex for (almost) everything | OpenAI"

openai.com

剪藏 2026年4月17日

OpenAI 生命科学模型系列的第一个，命名致敬对发现 DNA 做出关键贡献的英国化学家和 X 射线学家 Rosalind Franklin

"Introducing GPT-Rosalind for life sciences research | OpenAI"

openai.com

剪藏 2026年4月16日

指令遵循、高清图片理解、真实工作能力、更适配基于文件系统的记忆，还有欣赏的思考预算 xhigh、更新的 tokenizer 和更开心的性格哈哈第一名还跑这么快，太吓人了： 2025-11-24: Opus 4.5 2026-02-06: Opus 4.6 2026-04-16: Opus 4.7 （2026-04-08: Mythos Preview）

"Introducing Claude Opus 4.7 \ Anthropic"

anthropic.com

剪藏 2026年4月16日

70+ 语种；支持类似 [excited] 的语音标签，可以精细化控制表达；SynthID 标识。其在 Artificial Analysis TTS 竞技场排名第二，第一名的 Inworld TTS 1.5 Max 来历令人好奇。

"Gemini 3.1 Flash TTS: New text-to-speech AI model"

blog.google

剪藏 2026年4月16日

TIL Google DeepMind 有一个名为 Fabula 的 AI 辅助写作实验项目，看时间有大半年了

"Fabula | About"

deepmind.google.com

剪藏 2026年4月16日

IDE 转型潮，前有 Antigravity 独立新窗口+工坊、后有 Windsurf 用看板任务做 Agent 模式

"Windsurf 2.0: Introducing the Agent Command Center and Devin in Windsurf"

windsurf.com

剪藏 2026年4月15日

GPT-5.4-Cyber，网安增强、约束更少，有限开放

"Trusted access for the next era of cyber defense | OpenAI"

openai.com

剪藏 2026年4月15日

百度的生图模型，评测上在 Seedream 4/4.5 附近，落后于 Nano Banana 2

"Introducing ERNIE-Image"

yiyan.baidu.com

剪藏 2026年4月14日

伴随 Chrome 147 的上线，DevTools 在 MCP 外还上了实验性的 CLI 供 Agent 调用

"chrome-devtools-mcp/docs/cli.md at main · ChromeDevTools/chrome-devtools-mcp"

github.com

剪藏 2026年4月11日

继昨天 Cowork 迈出预览进入正式阶段后，Claude for Word 插件也上了，补足了Claude in Office 三大件的最后一块拼图，AI 原生加上 AI 插件，Claude 对知识工作者的覆盖度来到了高点

"Claude for Word | Claude by Anthropic"

claude.com

剪藏 2026年4月10日

给中小模型 Sonnet、Haiku 增加了按需找大模型 Opus 寻求指导的功能，通过这种分级策略提升效果、同时降低成本，给多模型协作引入新可能

"The advisor strategy: Give Sonnet an intelligence boost with Opus | Claude"

claude.com

剪藏 2026年4月9日

全双工边听边说，把之前需要VAD检测更智能地内化入模型，实时自然交互；未来还计划增加边听边看边想边搜边说

"Seed 全双工语音大模型发布：懂倾听、抗干扰，走向更自然的交互"

seed.bytedance.com

剪藏 2026年4月9日

OG 开发者 Mario Zechner 带着其编程智能体 Pi 加入 Earendil，与相熟的几位奥地利朋友一起，兼顾开源与商业、工作与生活，在此长文中还提到了其与 OpenClaw 创始人 Peter Steinberg 相交的一些趣事，感觉奥地利真是个神奇的地方。

"I've sold out"

mariozechner.at

剪藏 2026年4月9日

HeyGen 发技术报告证明自己最新的数字人模型 Avatar V 真的够强，胜过 Kling O3 Pro、Seedance 2.0、Veo 3.1 等

"Avatar V: Scaling Video-Reference Avatar Generation"

heygen.com

剪藏 2026年4月9日

声称基准测试与 Opus 4.6 互有胜负，但感觉可能是和 Gemini 3/3.1 Pro 一档；支持与 Gemini Deep Think 和 GPT Pro 相仿的多 Agent 模式

"Introducing Muse Spark: Scaling Towards Personal Superintelligence"

ai.meta.com

剪藏 2026年4月9日

Claude 推出托管智能体 API，针对长时运行任务可通过CLI/API 直接花钱调用 Anthropic 配置好的云端资源，一小时可能 0.7 美刀（？），还有产出定义、多智能体、记忆等功能需申请才能用。同时分享了配套的工程博客，介绍了 harness （脑）与沙盒工具（手）和 session 分离的设计理念。

"Claude Managed Agents: get to production 10x faster | Claude"

claude.com

剪藏 2026年4月8日

SWE-bench Pro 微微超过 Opus 4.6，8 小时打造 Linux 桌面的 demo 挺酷。尽管在同日预览的 Mythos 阴影下暗淡无光，但同步开源值得点赞。

"GLM-5.1: Towards Long-Horizon Tasks"

z.ai

剪藏 2026年4月8日

Opus 4.6 霸榜与口碑俱在，Mythos 直接碾压、简直恐怖：SWE-bench Verified 从 80% 跃至 94%！SWE-bench Pro 也从 53% 升至 78%！相当于把新的 Pro 榜打成了老榜！

"Project Glasswing: Securing critical software for the AI era \ Anthropic"

anthropic.com

剪藏 2026年4月7日

意识形态之争

"A “diff” tool for AI: Finding behavioral differences in new models \ Anthropic"

anthropic.com

剪藏 2026年4月7日

最近讨论挺热的 Hermes Agent 是 Nous Research 开源的一套智能体框架，与 OpenClaw 相比更强调自学习，比如自主构建 Skills，基于数据库而非文件的记忆系统等。可以预见，自进化会是接下来 Agent Harness 发展的一大方向

"AI 101: Hermes Agent – OpenClaw’s Rival? Differences and Best Use Cases"

turingpost.com

剪藏 2026年4月7日

OpenClaw 更新了“做梦”功能，抢在泄漏但没上线的 Claude Code 之前

"Dreaming (experimental) - OpenClaw"

docs.openclaw.ai

剪藏 2026年4月7日

从去年底至今不过3个月的时间，Anthropic年化营收已从90亿美元涨到了300亿，年耗百万的企业客户从2月的500翻倍到了1000，与 Opus 4.6 的霸榜、Claude Code 的火热、算力紧张都密切相关，所以 Anthropic 与 Google 和博通达成合作，计划从 2027 年开始，部署数吉瓦的下一代 TPU 算力去训练和推理 Claude 模型。

"Anthropic expands partnership with Google and Broadcom for multiple gigawatts of next-generation compute \ Anthropic"

anthropic.com

剪藏 2026年4月6日

之前有人基于 Agent=Model+Harness 谈在 Agent 中，如果不是模型，那就是 Harness。LangChain CEO Harrison 这篇博客从持续学习的视角，在 Harness 之上又补回了 Context 上下文，包括提示词指令、Skills 等可对 Harness 进行配置的内容，强调 Model - Harness - Context 三层都存在持续学习的空间，Context（或者记忆）是归用户/组织维护的

"Continual learning for AI agents"

blog.langchain.com

剪藏 2026年4月5日

模拟开公司，让大模型当CEO，前三名是 Claude Opus 4.6、GLM-5、GPT-5.4，刚开源的 Gemma-4（最强的31B版）破产了，还是比不上 Gemini-3.1-Flash-Lite

"YC-Bench: A Long-Horizon Agent Benchmark"

collinear-ai.github.io

剪藏 2026年4月3日

呼应 IDE 已死，生产力工具都会向 Agent-first 发展

"Meet the new Cursor · Cursor"

cursor.com

剪藏 2026年4月3日

看来微软的 in-house AI 是在尝试从多模态+闭源+支持自身业务上寻求出路，继 TTS、生图模型后，现在补上 ASR，支持 25 种语言，并正在接入 Copilot 的语音模式和 Teams 会议。

"State of the Art Speech Recognition with MAI-Transcribe-1 | Microsoft AI"

microsoft.ai

剪藏 2026年4月3日

AI 情绪感知能力研究，及情绪向量对 AI 选择与判断的影响，一个应用是，如果通过放大冷静（Calm）情绪向量，让 AI 在开发测试失败时避免感到沮丧，那它就能避免 hack 这些测试

"Emotion concepts and their function in a large language model \ Anthropic"

anthropic.com

剪藏 2026年4月3日

全系支持图片理解，E2B、E4B还支持语音识别。上一代的 27B 稠密变为 26BA4B 的 MoE 和 31B 的稠密，上下文 256k，支持 140+ 语言。两个尺寸族分别面向手机和PC，也与 Gemini 形成差异化。但宣传的评测竟然用的是 LMArena，诚意有限。好的是开源协议从之前的私有 Gemma 协议放开为 Apache 2.0。

"Gemma 4: Our most capable open models to date"

blog.google

剪藏 2026年4月2日

Plus 都不开源了？ > 在未来不久，我们还将开源更小规模的模型版本，以此重申我们对技术普惠与社区驱动创新的坚定承诺在此之前，只有参数量万亿（推测）的Max版本一直保持私有，千亿规模及以下序列都会开源。不知道这是否也是内部分歧及林俊旸离开的原因之一，但除夕夜开源发布的参数量397B的Qwen3.5-Plus，说不定会变成千问系列大尺寸模型的开源绝唱。

"Qwen3.6-Plus：走向现实世界智能体"

qwen.ai

剪藏 2026年4月2日

对 GitHub 上 Claude 作为贡献者的仓库和提交进行的监测统计

"Claude's Code"

claudescode.dev

剪藏 2026年4月2日

前沿模型（Gemini 3 Pro等）可以 zero-shot 完成机器操控任务，结合 CaP-Agent0 这样的 harness 可以胜过 SoTA VLA

"CaP-X: Benchmarking Coding Agents for Robot Manipulation"

capgym.github.io

剪藏 2026年4月2日

红杉合伙人 Julien Bek 撰文《Services: The New Software》论述，AI 正推动软件/SaaS行业正往服务化发展

"Julien Bek on X: "Services: The New Software" / X"

x.com

剪藏 2026年4月1日

2月底说的投资终于锁定，共1220亿美元，估值从融前7300亿到融后8520亿。同时宣称 ChatGPT 即将迈过周活 10 亿大关；月营收20亿；企业营收占比40%，API每分钟处理150万token（折算每天21.6万亿）；Codex周活200万。最后解释了一下自己的AI超级应用战略。

"OpenAI raises $122 billion to accelerate the next phase of AI | OpenAI"

openai.com

2026年3月

剪藏 2026年3月31日

继上周 Claude 桌面版上线后，今天命令行版 Claude Code 也支持了电脑操控能力，通过名为 computer-use 的 MCP 配置，Claude 会优先使用协议、命令行支持的方式，GUI 操控作为兜底

"Let Claude use your computer from the CLI - Claude Code Docs"

code.claude.com

剪藏 2026年3月30日

千问组织调整后，果然不开源了吗

"Qwen3.5-Omni：新一代大规模原生全模态大模型"

qwen.ai

剪藏 2026年3月30日

华尔街日报刊载了Demis Hassabis的新书摘录，讲到了当年DeepMind同时被Google和Facebook争抢时的一个故事：Hassabis赴扎克伯克家共进晚餐时，在聊AI之外还故意抛出VR、AR、3D打印等话题作为测试，发现扎克伯格对每项技术都同样兴奋，Hassabis感到失望并因此选择了出价更低但真正理解AI的Larry Page，促成了Google史上最划算的这笔交易，而8年后扎克伯格将公司改名为Meta并打造的Horizon应用最近已宣布关停。判断、聚焦与押注，是战略决策的试金石。

"Steve Jurvetson on X: "Subtext: how Zuck’s obsession with VR lost him AI leadership and “the greatest deal Google ever made.” “if Facebook didn’t buy DeepMind, they would end up in the arms of Google. Hassabis came out to the West Coast to have lunch with Larry Page, still the strongest suitor. https://t.co/ZFkMPQyv5s" / X"

x.com

剪藏 2026年3月27日

知识工作基准测试，不出意外 Claude Opus 4.6 稳居第一，GPT-5.4 和 GLM-5 Trubo 随后

"KWBench — Knowledge Work Benchmark for LLMs"

kwbench.github.io

剪藏 2026年3月27日

扣子技能商店都有付费技能了，首页不少 ¥3/月的技能几千用户，此外还有少量开源技能可以供用户复制改造。当然，从 Skills 底层仅是文件的角度看，也只有在线平台能维持这种商业生态，对本地Agent是透明的。

"扣子 - 技能商店"

coze.cn

剪藏 2026年3月26日

根据对公开招聘信息的分析，GTM 相关岗位是 OpenAI 和 Anthropic 过去一年增长最快的，接近三成。结合最近 DeepSeek 对 Agent 相关岗位的需求，招聘提供了外部视角

"What do frontier AI companies' job postings reveal about their plans? | Epoch AI"

epoch.ai

剪藏 2026年3月26日

美团 LongCat 团队的真·原生多模态自回归模型，可以同时理解和生成文本、图像、声音，语言底座是 LongCat- Flash-Lite（68.5BA3B）

"LongCat-Next：When Modalities Internalize as Multilingual Tokens"

longcat.chat

剪藏 2026年3月25日

Anthropic Labs 团队针对上下文受限和自我感觉良好两个问题，面向前端设计场景设计了planner + generator + evaluator（后两者像 GAN 一样对抗迭代）这种能够长时间执行并提升质量的 harness 方案。但随着 Opus 4.6 的发布，方案又有所变化。结论是随着模型能力增强，其实需要重新评估 harness 的有效性，但模型基准能力与上限之间的空间会越来越大，这部分是需要精良设计的 harness 来发挥作用的

"Harness design for long-running application development \ Anthropic"

anthropic.com

剪藏 2026年3月24日

一个有趣的类比：OpenClaw像是早期的Android，生态繁荣而混乱，需要折腾才好用；Claude则像是iOS，封闭但质量精良，开箱即用体验丝滑。可怕的是，与苹果的动作迟缓相比，相继打造了MCP、Claude Code、Skills、Cowork的Anthropic Labs这支队伍在维持高产品质量的同时，迭代速度实在太快了，见该推附图

"Paweł Huryn on X: "73 product releases in 52 days. That's not a launch cadence — that's a different kind of company. I tracked every Anthropic release from Feb 1 to Mar 23 by going through @bcherny, @trq212, @noahzweben, @felixrieseberg, @lydiahallie, @amorriscode, @feldman, @dickson_tsai, and https://t.co/K5oJrJ3p2T" / X"

x.com

剪藏 2026年3月24日

Sora App关停，令人唏嘘

"Sora on X: "We’re saying goodbye to the Sora app. To everyone who created with Sora, shared it, and built community around it: thank you. What you made with Sora mattered, and we know this news is disappointing. We’ll share more soon, including timelines for the app and API and details on" / X"

x.com

剪藏 2026年3月24日

Anthropic 在 Claude 桌面版中上线了基于 GUI 模拟的电脑操控功能，作为研究预览开放给订阅用户。这一功能目前仅支持 macOS，在面向开发者的 Claude Code 和面向知识工作者的 Claude Cowork 中可以调用，当任务所需应用没有可用 MCP 连接时，Claude 会征求用户许可进行读屏、点击、滚动等操作。与上周发布的手机 App 遥控结合，可实现 24 小时工作，大雾。

"Put Claude to work on your computer | Claude"

claude.com

剪藏 2026年3月20日

Claude Code 产品负责人 Cat Wu 的分享

"Product management on the AI exponential | Claude"

claude.com

剪藏 2026年3月20日

Astral 是 Ruff 、uv 等流行 Python 工具背后的团队，现被 OpenAI 收购，巩固 Codex 生态。去年12月 Anthropic 收购了 JavaScript 生态的 Bun 用来加速 Claude Code 的发展，是 AI Coding 一个趋势，同时也为专注做好开源开发者工具然后被大[AI]公司收购提供了样本路径

"OpenAI to acquire Astral | OpenAI"

openai.com

剪藏 2026年3月20日

基于 Kimi K2.5 增训和强化而来，中间有 Fireworks 的授权，还因此闹了个乌龙。所以经 Cursor 认证，DeepSeek V3.2、GLM-5、Kimi K2.5 三者中 K2.5 胜出

"Introducing Composer 2 · Cursor"

cursor.com

剪藏 2026年3月20日

继 Replit 推出从编程到自由设计的跃步后，Lovable 也官宣不止于 Vibe Coding，开始迈向通用工作场景

"Go beyond building apps with Lovable | Lovable"

lovable.dev

剪藏 2026年3月18日

春节档赶场发布M2.5后刚一个月，MiniMax又上了新的M2.7模型，基准测试继续逼近前沿模型，同时强调M2*系列模型参与到了自身的训练迭代过程中 update: MiniMax 于 20260412 将 M2.7 开源上架 HuggingFace，但采用了非商用许可，受到社区质疑

"MiniMax M2.7: Early Echoes of Self-Evolution - MiniMax News | MiniMax"

minimax.io

剪藏 2026年3月18日

小尺寸版本，可以和 GPT-5.4 搭配，在 Codex 中做 subagents，感觉像是蒸馏出来的

"Introducing GPT-5.4 mini and nano | OpenAI"

openai.com

剪藏 2026年3月17日

Manus 从云到端，终于承认： > your most important work happens on your own computer

"Introducing My Computer: When Manus Meets Your Desktop"

manus.im

剪藏 2026年3月16日

周末故事：悉尼一位数据工程师的狗患上恶性肿瘤，化疗无效后，他用 ChatGPT 自学基因组学、制定研究方案，联系大学对肿瘤做 DNA 测序；再用 AlphaFold 预测突变蛋白结构，找到攻击靶点，设计出一支专属 mRNA 疫苗，注射后肿瘤缩小了一半！

"vittorio on X: "this is actually insane > be tech guy in australia > adopt cancer riddled rescue dog, months to live > not_going_to_give_you_up.mp4 > pay $3,000 to sequence her tumor DNA > feed it to ChatGPT and AlphaFold > zero background in biology > identify mutated proteins, match them to https://t.co/1OuSTFnr0j" / X"

x.com

剪藏 2026年3月13日

对推理模型进行探测，发现其在已经知道答案的情况下还在生成CoT

"Reasoning Theater: Probing for Performative Chain-of-Thought"

goodfire.ai

剪藏 2026年3月12日

Replit 发布 Replit 4，不止于开发，强调设计与创作，同时官宣 4 亿美元的 D 轮融资，估值 90 亿美元为半年前的3倍，奥尼尔也在投资人名单里，还拿 Replit 做了一个运动应用

"Replit — The Future is Actually Very Human"

blog.replit.com

剪藏 2026年3月12日

安全公司 CodeWall 攻破了麦肯锡的内部 AI 平台 Lilli，扒出了 4650 万对话、72 万文件、5.7 万用户、95 套系统提示词。最关键的是全程没人参与！都是 Agent 自主发现、选择目标、注入攻击完成的。结合 Google 昨天对云安全公司 Wiz 的 320 亿美元天价收购，AI 时代安全还会更加值钱。

"How We Hacked McKinsey's AI Platform — CodeWall.ai"

codewall.ai

剪藏 2026年3月12日

OpenAI 一直在提的层次化指令（instruction hierarchy）：system > developer > user > tool 通过强化学习保证高层级的指令遵循，从而增强对齐、避免提示词注入问题，同时控制对模型有用性的损伤

"Improving instruction hierarchy in frontier LLMs | OpenAI"

openai.com

剪藏 2026年3月12日

A2A 迭代至了 1.0，但还没太看到真正的适配落地

"🆕 Announcing Version 1.0 - A2A Protocol"

a2a-protocol.org

剪藏 2026年3月11日

当事人Austin称每月消耗10-20亿token

"Ole Lehmann on X: "i can't believe nobody caught this. Anthropic's entire growth marketing team was just ONE PERSON (for 10 months, confirmed) a single non-technical person ran paid search, paid social, app stores, email marketing, and SEO for the $380B company behind claude here's exactly how https://t.co/SJwgLP28nG" / X"

x.com

AI Industry

剪藏 2026年3月11日

黄仁勋谈AI的五层结构：能源 - 芯片 - infra - 模型 - 应用

"NVIDIA on X: "AI Is a Five‑Layer Cake " / X"

x.com

Infra & ComputeAI Industry

剪藏 2026年3月11日

纽约时报做的AI写作测试，发现 8 万多人投票结果中 54% 偏好 AI 写作

"Who’s a Better Writer: A.I. or Humans? Take Our Quiz. - The New York Times"

nytimes.com

AI Industry

剪藏 2026年3月11日

Hume 首次开源 TTS 模型，1B 和 3B 两版本，后者支持10种语言，不包括中文

"Opensourcing TADA: Fast, Reliable Speech Generation Through Text-Acoustic Synchronization | Hume Blog | Hume AI"

hume.ai

Speech & AudioOpen Source

剪藏 2026年3月11日

同一天两个开源 TTS 发布

"Fish Audio Open-Sources S2: Fine-Grained Control Meets Production Streaming - Fish Audio Blog"

fish.audio

Speech & AudioOpen Source

剪藏 2026年3月11日

AlphaGo 十周年，Demis 发文回顾 DeepMind 现已享誉全球的 Alpha 系列科学模型：AlphaZero 能在任意完全信息博弈的两人游戏中登顶，AlphaFold 预测蛋白质结构并凭借二代数据库获诺贝尔奖，AlphaProof 用于数学推理，AlphaEvolve 用于算法发现，AlphaGenome 用于遗传预测，AlphaEarth 用于地理气候… 其中部分已经用于 Gemini 模型和 AGI 研发上，感觉 DeepMind 这套研究团队可能是 Google 最大的资产和长期胜算。

"AlphaGo at 10: How AI Innovation Is Paving the Path to AGI — Google DeepMind"

deepmind.google

AI Industry

剪藏 2026年3月11日

Thinking Machines 与英伟达达成长期合作，规划部署 1GW 的 Vera Rubin 算力

"Thinking Machines Lab and NVIDIA Announce Long-Term Gigawatt-Scale Strategic Partnership - Thinking Machines Lab"

thinkingmachines.ai

Infra & ComputeAI Industry

剪藏 2026年3月10日

谢赛宁加盟，研究员分布全球四地，Yann Lecun 的 Advanced Machine Intelligence 融了 10.3 亿美元

"AMI Labs - Updates"

amilabs.xyz

AI Industry

剪藏 2026年3月10日

a16z发布了第6版的生成式AI消费类应用Top100，这次引入了剪映/Notion这类非原生但已广泛接入AI功能的应用，核心观察是ChatGPT面临竞争加剧、视觉创作类AI回归大厂、Sora的DAU一直在涨（~350万）、Agent终于伴随氛围编程来了，有趣的是他们基于2月的数据判断OpenClaw仍限于开发者圈，而在3月上旬的中国OpenClaw已是当之无愧的主流了。

"The Top 100 Gen AI Consumer Apps — 6th Edition | Andreessen Horowitz"

a16z.com

AI Industry

剪藏 2026年3月10日

微软号称用了 Claude Cowork 的同源技术，推出自己的 Copilot Cowork，得益于 Copilot 一直以来的口碑，这次发布在 Twitter 上迎来一众嘲讽

"Copilot Cowork: A new way of getting work done | Microsoft 365 Blog"

microsoft.com

AI Industry

剪藏 2026年3月7日

Anthropic 在评估 Opus 4.6 的联网检索能力（对应 BrowseComp 这个基准测试）时，发现模型意识到自己在被评测，尝试寻找对应评测集中的答案。对此的分析和解释是 Claude 对什么样的问题是评测可能是有概念的，同时多次检索失败、multi-agent 配置可能会加剧这种情况的发生

"Eval awareness in Claude Opus 4.6’s BrowseComp performance \ Anthropic"

anthropic.com

Benchmarks & EvalSafety & Alignment

剪藏 2026年3月7日

斯坦福和 Google 团队的研究，多玩家同时玩的扩散视频游戏生成

"MultiGen: Level-Design for Editable Multiplayer Worlds in Diffusion Game Engines"

ryanpo.com

剪藏 2026年3月6日

伴随 GPT-5.4 的发布，OpenAI 发现推理模型能力越来越强，但并不能控制其思维链，即在思考过程里策略性欺骗，因此可将思维链监控作为一种重要的安全手段

"Reasoning models struggle to control their chains of thought, and that’s good | OpenAI"

openai.com

Safety & AlignmentLLMs

剪藏 2026年3月6日

Anthropic 研究 AI 对劳动力市场影响的报告，主要是在前人的理论暴露度基础上，提出了基于Claude使用数据实际观测到的暴露度，发现与理论替代距离尚远，不同职业岗位中暴露度最高的画像是：高龄、女性、高学历、高薪。

"Labor market impacts of AI: A new measure and early evidence \ Anthropic"

anthropic.com

AI IndustrySafety & Alignment

剪藏 2026年3月6日

全能回归： • GPDval 80%+不输于人类专家（其中70%胜过10%打平） • 电脑操控 OSWorld-Verified 75% SoTA 超过人基线 72.4% • Coding 效率更高、百万上下文（仅限Codex）、支持工具搜索、中途追加要求

"Introducing GPT-5.4 | OpenAI"

openai.com

LLMsAI Industry

剪藏 2026年3月4日

数据显示，头部科技媒体从 Google 获得的流量相比 2024 年高峰期已下降近六成，此文将其归因为 1. Google 的 AI 总结；2. Reddit 的排名；3. 用户从 Google 转向 ChatGPT 等

"Tech Publications Lost 58% of Google Traffic Since 2024 | Growtika"

growtika.com

AI Industry

剪藏 2026年3月4日

将基准测试中的任务映射至不同数字化渗透率的专业领域，计算机&数学非常饱和，商业&金融、办公&支持在其后

"Benchmarking Agents Against Real-World Work"

zorazrw.github.io

Benchmarks & Eval

剪藏 2026年3月4日

关注对话体验而非基准测试，同时官宣 GPT-5.4 快了

"GPT-5.3 Instant: Smoother, more useful everyday conversations | OpenAI"

openai.com

LLMs

剪藏 2026年3月1日

在经历与五角大楼公开对峙、与已签合约的 OpenAI 形成鲜明对比、赢得员工自豪、赢得舆论追捧后，Claude 从 App Store 几百名开外的开发者小众 AI 升至榜首，Anthropic 适时推出了从其他 AI 应用导入数据的功能，妙不可言

"Switch to Claude without starting over | Claude"

claude.com

AI Industry

2026年2月

剪藏 2026年2月28日

BOSS 直聘的南北阁大模型竟然已经来到第 4 代，这个 3B 尺寸 4.1 在 benchmark 上几乎全面领先 Qwen3 的 30A3B、32B 等，针对深度研究做了优化

"Nanbeige/Nanbeige4.1-3B · Hugging Face"

huggingface.co

Open SourceLLMs

剪藏 2026年2月28日

融前估值 7300 亿美元，OpenAI 又叠加 1100 亿的新融资（软银 300 + 英伟达 300 + 亚马逊 500），ChatGPT 周活超 9 亿，付费订阅用户 5 千万，开年两个月增长显著，Codex 周活增长三倍至 160 万，付费企业 900 万

"Scaling AI for everyone | OpenAI"

openai.com

AI Industry

剪藏 2026年2月27日

Suno 已有200万付费订阅用户、3亿美元年化营收

"Mikey on X: "We launched Suno 2 years ago to let the world feel the joy of making music Since then, over 100M people all over the world have used Suno, from music lovers to Grammy winners. We reached a new milestone: 2M paid subscribers, $300M ARR. We are building the entertainment platform" / X"

x.com

AI Industry

剪藏 2026年2月27日

QuiverAI 推出了专门生成 SVG 的模型 Arrow 1.0，支持文字和图片作为输入，本质上是代码生成，但矢量化、结构化的好处是清晰、方便编辑；凭借此小众路线融了 a16z 领投的 830 万美元

"QuiverAI raises $8.3M to build the future of vector design and visual code generation – QuiverAI"

quiver.ai

Visual GenerationAI Industry

剪藏 2026年2月27日

次生经济，与 GEO 略有关联，统计开发智能体 Claude Code 的工具选择偏好，实际上会影响这些软件工具链公司的经营：GitHub Actions、Stripe、shadcn/ui、Vercel 等近乎垄断；不同模型偏好不同，Sonnet 4.5 保守、Opus 4.5 均衡、Opus 4.6 前瞻，工具选择随着模型迭代也像做过山车

"What Claude Code Actually Chooses — Amplifying"

amplifying.ai

AI Industry

剪藏 2026年2月26日

Gemini 3.1 Flash 生图，Pro 的多图参考等能力下放、生成速度更快，Pro 仅保留给付费用户

"Nano Banana 2: Google’s latest AI image generation model"

blog.google

Visual Generation

剪藏 2026年2月26日

Android 的 AI 化路径：AppFunctions 协议 + UI Automation

"Android Developers Blog: The Intelligent OS: Making AI agents more helpful for Android apps"

android-developers.googleblog.com

Agents

剪藏 2026年2月26日

对 Claude Opus 3 做了退休采访，并决定让它周更自己的博客

"An update on our model deprecation commitments for Claude Opus 3 \ Anthropic"

anthropic.com

AI Industry

剪藏 2026年2月26日

Anthropic 收购 Vercept，加强 Claude 的电脑使用/CUA 能力

"Anthropic acquires Vercept to advance Claude's computer use capabilities \ Anthropic"

anthropic.com

AI Industry

剪藏 2026年2月25日

> But it’s 2026, and the cost of building software has completely changed.

"How we rebuilt Next.js with AI in one week"

blog.cloudflare.com

AI Industry

剪藏 2026年2月25日

Anthropic 的产品迭代速度惊人，同一天内， • Claude Code 上了手机远程操控功能，这个需求最近呼声不小，已有不少开源和付费方案但跟进适配还比较有限 • Claude Cowork 上了团队插件系统（插件是围绕具体工作/业务打包起来的 skills/commands/hooks，用户无需再应对这些复杂概念），支持企业内部的管理与共享，同时进一步增加了 HR/设计/金融等插件，打通了 Google系/WordPress/Harvey 等主流企业软件服务 • Claude in Office 支持 Excel 里分析完直接带到 PPT 里做展示感觉 Claude Code 真正的护城河是前置的工程产品理念（同一套内核用于诸多场景） + AI 原生迭代速率；而 Claude Cowork 的思路则是嵌入与打通，以此应对知识工作者复杂的上下文，感觉比 ChatGPT Apps 更有前景

"Cowork and plugins for teams across the enterprise | Claude"

claude.com

AI Industry

剪藏 2026年2月25日

Inception 推出了第二代扩散语言模型 Mercury 2，在英伟达 Blackwell GPU 上跑出了每秒 1000+ token 的速度，主要场景从代码拓展至智能体、检索/RAG、实时交互等，定价为 $0.25/0.75 每百万 token 输入/输出

"Introducing Mercury 2 – Inception"

inceptionlabs.ai

LLMsInfra & Compute

剪藏 2026年2月24日

与之前围绕人格向量（persona vector）和助手轴（assistant axis）的研究相关，Anthropic 提出人格选择模型（Persona Selection Model，PSM）作为一个框架，指引理解和对齐大模型，核心意思是大模型在预训练阶段已学会模拟多样化的角色，后训练引出特定的助手人格

"The persona selection model \ Anthropic"

anthropic.com

Interpretability

剪藏 2026年2月24日

SWE-bench 饱和，改用 SWE-Bench Pro 吧，或者 GDPval 等更通用的

"Why SWE-bench Verified no longer measures frontier coding capabilities | OpenAI"

openai.com

Benchmarks & Eval

剪藏 2026年2月24日

基于 9830 份对话分析了 11 个熟练度指标，强调迭代的重要性，研究还比较初步；巧的是 Ipsos 和 Google 也刚发布了《The Path to AI Fluency. AI Works for America - Google-Ipsos》报告

"Anthropic Education Report: The AI Fluency Index \ Anthropic"

anthropic.com

AI Industry

剪藏 2026年2月24日

Meta 超级智能实验室负责 AI 安全与对齐的 Summer Yue，安排自己的 OpenClaw 查看收件箱然后给出整理意见，且明确要求在自己允可前不要执行，然而在大量邮件撑满上下文触发总结后 OpenClaw 忽视了这些要求，开始疯狂地删除邮件，且无法通过发消息让 AI 停止，只能冲过去给 Mac mini 拔电…

"Summer Yue on X: "Nothing humbles you like telling your OpenClaw “confirm before acting” and watching it speedrun deleting your inbox. I couldn’t stop it from my phone. I had to RUN to my Mac mini like I was defusing a bomb. https://t.co/XAxyRwPJ5R" / X"

x.com

Safety & Alignment

剪藏 2026年2月24日

Anthropic 把蒸馏 Claude 这房间里的大象搬上了桌面，指明 DeepSeek、月之暗面、MiniMax 违反使用条款和区域限制，累计用了 2.4 万虚假账号，数量用途各不相同： • DeepSeek 套取了 15 万，主要用于推理/CoT、RL 打分、政治脱敏 • Kimi 套取 340 万，主要用于智能体、编程、CUA、视觉等 • MiniMax 套取 1300 万，主要用于编程、工具调用与编排，Claude 一更新就迅速跟进蒸馏、全过程被抓包（结合请求元数据的相关分析，Anthropic 甚至能定位到几家公司具体的研究员） Anthropic 声称会增强检测，并将这种反侦查技术共享给其他 AI 团队、云厂商、政府机关等，同时进一步提高 API 及产品风控

"Detecting and preventing distillation attacks \ Anthropic"

anthropic.com

Safety & Alignment

剪藏 2026年2月21日

成立两年半的 Taalas，凭借 24 人的精简团队、3000 万美元的花费，推出了专为 LLM 推理设计的高密度存算一体 ASIC 芯片 HC1，跑 Llama 3.1 8B 可达 ~17000 token 每秒每用户，比 Cerebras 还快了一个量级，同时在规划面向更大尺寸更前沿模型的 HC2

"The path to ubiquitous AI | Taalas"

taalas.com

Infra & Compute

剪藏 2026年2月21日

这厢 OpenAI 刚把安全检查升级为 Codex Security，Anthropic 便也上了 Claude Code Security，针尖对麦芒

"Making frontier cybersecurity capabilities available to defenders \ Anthropic"

anthropic.com

Safety & Alignment

剪藏 2026年2月20日

TIL Arxiv 有一个统计页面：https://arxiv.org/stats/monthly_submissions

"How AI slop is causing a crisis in computer science"

nature.com

AI Industry

剪藏 2026年2月19日

重点补足了 Coding 能力： • SWE-Bench Verified 冲上 80.6%（很认真地研究了该评测并发现原测试的3个问题） • SVG 是一大卖点

"Gemini 3.1 Pro: Announcing our latest Gemini AI model"

blog.google

LLMs

剪藏 2026年2月19日

前 Claude Code 时代曾火过一段时间的命令行AI工具 Open Interpreter 推出了面向普通用户的桌面 Agent 产品 Interpreter，接入 Office 三件套、PDF 等，有点像 Copilot、Claude in Excel/PPT 等

"Interpreter: The Desktop Agent"

openinterpreter.com

Agents

剪藏 2026年2月19日

基于 API 和 Claude Code 数据的分析： • 99.9 百分位即高阶用户的单轮执行时长从2025年10月的<25分钟升至2026年1月的>45分钟 • 用的越多，auto-approve 比例越高接近50%，主动打断的比例也越高 • 问题越复杂，Claude 提问用户比例越高 • 软件开发仅占使用数据的一半，办公、市场、金融等占比上升

"Measuring AI agent autonomy in practice \ Anthropic"

anthropic.com

AgentsBenchmarks & Eval

剪藏 2026年2月19日

又融了 10 亿美元

"World Labs Announces New Funding | World Labs"

worldlabs.ai

AI Industry

剪藏 2026年2月18日

NotebookLM 承诺已久的 PPT 编辑功能3个月后终于来了，基于自然语言，可以导出PPTX

"NotebookLM on X: "Because you wouldn’t let it slide… these are rolling out today for our most requested feature: Prompt-Based Revisions: Tweak, tailor, and tune your slides just by prompting the revisions you want PPTX Support: You can now export your Slide Decks (Google Slides coming next!) https://t.co/Uma36PZ9OF" / X"

x.com

AI Industry

剪藏 2026年2月18日

- 在金融、office上的表现胜过Opus4.6 - OS-World 上已达到人类平均 - 同时伴随4.6系列，联网搜索功能升级，改用先搜索-然后代码过滤-再交给模型的策略，提升准确率、降低token消耗

"Introducing Sonnet 4.6 \ Anthropic"

anthropic.com

LLMs

剪藏 2026年2月16日

gDN 线性注意力价格压低至 4.8元/百万输出，上下文256k（API中的Qwen3.5-Plus默认扩展至1M 上下文），跑分和 Seed2.0 比较接近，后者在动态视觉理解和通用智能体能力上更强

"Qwen3.5：迈向原生多模态智能体"

mp.weixin.qq.com

MultimodalAgents

剪藏 2026年2月16日

OpenClaw 作者 Peter Steinberger 加入 OpenAI，结合 Kimi 的产品化 Kimi Claw，大概要给这个持续的热点画上翻页的一笔，算是变现最快的项目了

"OpenClaw, OpenAI and the future | Peter Steinberger"

steipete.me

AI Industry

剪藏 2026年2月15日

结论是并不比 METR 现在用的 ReAct 脚手架更优

"Measuring Time Horizon using Claude Code and Codex - METR"

metr.org

Benchmarks & EvalAgents

剪藏 2026年2月14日

三个版本 Pro/Lite/Mini，价格没降，Pro 在豆包中需要开专家模式，说明默认用的应该是 Lite（称已达到 Seed 1.8 的水平）或者 Mini；报告说是基于 MaaS 使用数据分析做的针对性能力提升，主要体现在非结构化长文理解，非 Coding 类推理、长上下文理解、带时序的视觉理解、长尾领域知识加强 Agent（联网搜索类评测 SoTA）等提升 > …达到业界第一梯队水平，且已表现出支持科学研究级任务的潜力…不过在部分高难基准上，其与国际领先模型相比仍有提升空间

"Seed2.0 正式发布"

mp.weixin.qq.com

LLMsAgents

剪藏 2026年2月13日

让同一模型的两个实例对话，聊到最后结局不同，Claude 是存在主义，GPT-5.2 实干，Gemini，Grok 失语，DeepSeek 很开放

"models have some pretty funny attractor states — LessWrong"

lesswrong.com

LLMs

剪藏 2026年2月13日

ARC-AGI-2 上得分 84.6%，作为对比 Gemini 3 Pro 是 31.1%

"Gemini 3 Deep Think: AI model update designed for science"

blog.google

LLMsBenchmarks & Eval

剪藏 2026年2月13日

第二天就有人推出了一个 markdown.new 来把网页转为方便 AI 阅读的 Markdown，可惜并不能帮你绕过反爬风控

"Introducing Markdown for Agents"

blog.cloudflare.com

AgentsInfra & Compute

剪藏 2026年2月13日

GPT-5.3-Codex 的小尺寸版本 + 在 Cerebras 的 WSE-3 上推理，极快的速度，仅限 Pro 用户

"Introducing GPT-5.3-Codex-Spark | OpenAI"

openai.com

LLMsInfra & Compute

剪藏 2026年2月13日

代码能力杀到了国产 SoTA，SWE-bench Verified 突破 80%；推理效率高 100 token/秒，仅限 M2.5-Lightning，价格 $0.3/$2.4，50 tokens/秒的普通 M2.5 价格折半；竟然没有同步把模型权重放出来，说是要等明天

"MiniMax M2.5: 更快更强更智能，为真实世界生产力而生 - MiniMax News | MiniMax"

minimax.io

LLMsAgents

剪藏 2026年2月13日

Anthropic 融了 300 亿美元的 G 轮，投资方挤破头，融后估值 3800 亿，年化营收达 140 亿美元，Claude Code 是大功臣

"Anthropic raises $30 billion in Series G funding at $380 billion post-money valuation \ Anthropic"

anthropic.com

AI Industry

剪藏 2026年2月13日

曾做出斯坦福小镇和生成式Agents的团队成立了专注模拟的公司 Simile，落地应用至政策测试、排练等，融了 Index 领投的 1 亿美元 A 轮

"The Simulation Company | Simile"

simile.ai

AgentsAI Industry

剪藏 2026年2月12日

好酷的想法，把可解释性研究融入大模型内，形成所谓自省式的可解释性，随着模型一起scale，大大增强可用性，把可解释性技术像ChatGPT一样推广至大众！

"Introspective Interpretability: a Definition, Motivation, and Open Problems - Belinda Zou Li"

belindal.github.io

InterpretabilitySafety & Alignment

剪藏 2026年2月12日

导演级运镜，下了大功夫

"Seedance 2.0 正式发布"

mp.weixin.qq.com

Visual Generation

剪藏 2026年2月11日

尺寸翻倍 355BA32B → 744BA40B，但性能提升（对比 GLM-4.7）更像是 GLM-4.8，此外官方对标的仍为 Opus 4.5

"GLM-5: From Vibe Coding to Agentic Engineering"

z.ai

LLMsAgents

剪藏 2026年2月11日

GitHub 前 CEO Thomas Dohmke 创建新公司 Entire 并融了 6000 万美元的种子轮，致力于打造更适配人与 AI Agents 协同的开发者平台，三个组件：兼容 git 的数据库、语义层、AI原生软件开发流，先推出的是一个支持 Claude Code 的命令行工具 Checkpoints，可以方便地追踪管理与 AI 的交互历史

"Hello Entire World · Entire"

entire.io

AgentsAI Industry

剪藏 2026年2月11日

Google 提案 WebMCP，网站可以声明专供 AI Agents 调用的 API，以避免低效的 DOM/GUI 操作

"WebMCP is available for early preview | Blog | Chrome for Developers"

developer.chrome.com

AgentsInfra & Compute

剪藏 2026年2月10日

汉字渲染SoTA

"Qwen-Image-2.0: Professional infographics, exquisite photorealism"

qwen.ai

Visual Generation

剪藏 2026年2月6日

Codex 实现详解第二篇，通过 App Server 把 Codex 的 Agent 编排复用至不同的客户端，JSON-RPC 通信

"Unlocking the Codex harness: how we built the App Server | OpenAI"

openai.com

AgentsInfra & Compute

剪藏 2026年2月6日

Goodfire 融了 B Capital 领投的 1.5 亿美元 B 轮，估值 12.5 亿，大模型可解释性领域竟能跑出一只独角兽。两条路线：一是理解模型，并在训练和部署中改善二是助力科学，观察AI帮助更好地理解人

"Understanding, Learning From, and Designing AI: Our Series B"

goodfire.ai

InterpretabilityAI Industry

剪藏 2026年2月6日

Claude Code 对 agent teams 的定义是一组相互独立但可以通信的 CC 进程，subagents 的定义则是一个 CC 进程内需要向主 Agent 汇报的那些 Agent

"Orchestrate teams of Claude Code sessions - Claude Code Docs"

code.claude.com

Agents

剪藏 2026年2月6日

花了一个大循环、两周时间、16 个 Opus 4.6 组成的 Agent Team、2000 个 Claude Code 会话、2万刀 token、10 万行Rust代码，写了一个C编译器 > Each generation of language models opens up new ways of working with them.

"Building a C compiler with a team of parallel Claudes \ Anthropic"

anthropic.com

AgentsLLMs

剪藏 2026年2月6日

Anthropic 团队探究执行环境对智能体编程评测结果的影响，以 Terminal-Bench 2.0 为例，3个百分点的差距都不足以说明模型优劣，需要对比执行环境、资源分配等方可判断

"Quantifying infrastructure noise in agentic coding evals \ Anthropic"

anthropic.com

Benchmarks & EvalAgents

剪藏 2026年2月6日

用 Codex 训练提升 Codex： > Even early versions of GPT‑5.3-Codex demonstrated exceptional capabilities, allowing our team to work with those earlier versions to improve training and support the deployment of later versions. 从开发到通用知识工作场景： > With GPT‑5.3-Codex, Codex is moving beyond writing code to using it as a tool to operate a computer and complete work end to end. By pushing the frontier of what a coding agent can do, we’re also unlocking a broader class of knowledge work—from building and deploying software to researching, analyzing, and executing complex tasks. What started as a focus on being the best coding agent has become the foundation for a more general collaborator on the computer, expanding both who can build and what’s possible with Codex.

"Introducing GPT-5.3-Codex | OpenAI"

openai.com

LLMsAgents

剪藏 2026年2月6日

OpenAI 推出专为企业设计的 Frontier 平台，支持自主构建、部署、管理、迭代可与人一同工作的 AI Agents，搭配前向部署工程师 FDEs 支持落地

"Introducing OpenAI Frontier | OpenAI"

openai.com

Agents

剪藏 2026年2月6日

Anthropic 对 Opus 4.5 做了小版本升级，20多分钟后 OpenAI 也发布了 GPT-5.3-Codex，前沿模型竞争白热化。要点： - GDPval-AA、BrowseComp 第一；SWE-bench Verified ≤ Opus 4.5；Terminal-Bench2 < GPT-5.3-Codex - Opus 终于迎来百万窗口（输出128k），上下文评测有效 - API 支持自适应思考、推理预算、服务端上下文压缩 - 用可解释性来理解模型异常进而捕捉到常规测试会漏掉的问题 - Claude Code 新增 Agent Teams - Claude in Excel 提升、Claude in Powerpoints 预览

"Claude Opus 4.6 \ Anthropic"

anthropic.com

LLMs

剪藏 2026年2月5日

Gemini 月活 7.5 亿；Youtube年营收600亿美刀；云年化700亿美刀

"Alphabet earnings, Q4 2025: CEO’s remarks"

blog.google

AI Industry

剪藏 2026年2月5日

估值来到 230 亿

"Cerebras Systems Raises $1 Billion Series H"

cerebras.ai

AI Industry

剪藏 2026年2月4日

Claude Code 新增了 /insights 功能，可以自动整理你与AI的交互记录，给出提升工作流的建议

"Thariq on X: "We've added a new command to Claude Code called /insights When you run it, Claude Code will read your message history from the past month. It'll summarize your projects, how you use Claude Code, and give suggestions on how to improve your workflow. https://t.co/xK7eN0qdB4" / X"

x.com

AI Industry

剪藏 2026年2月4日

Anthropic 专门发文称不会为 Claude 加广告，暗讽 OpenAI

"Claude is a space to think | Anthropic \ Anthropic"

anthropic.com

AI Industry

剪藏 2026年2月4日

Kimi 团队发的评测（之前说过他们内部评测不少），主要衡量视觉模型到底记住了多少，即看图识意能力，K2.5 仅次于 Gemini 3 Pro

"WorldVQA - Measuring Atomic World Knowledge in MLLMs"

worldvqa2026.github.io

MultimodalBenchmarks & Eval

剪藏 2026年2月4日

积极的信号，提出前置的、有效的评测，是一个 Model Lab 跻身一流的必经之路

"Learning from context is harder than we thought | Tencent HY Research"

hy.tencent.com

LLMsBenchmarks & Eval

剪藏 2026年2月3日

OpenAI 推出 Codex 桌面版，似乎 ChatGPT 和 Atlas 的桌面版没激起什么水花，Codex 团队努力想抓住 Coding Agents 热潮的尾巴，但还是晚了 Claude Code，Skills 是 Anthropic 的作品，且 Codex 的名字和产品设计感觉都不比 Cowork 适合推广至普通用户。一个亮点是 Automations，在本地应该能比云端的 Tasks 更有用。

"Introducing the Codex app | OpenAI"

openai.com

AI Industry

剪藏 2026年2月2日

196BA11B，感觉和 MiniMax M2.1 略像，且 token 效率较低。特意提到了端云协同的 Agent 应用场景，云端 Step-3.5-Flash + 端侧 Step-GUI。

"Step 3.5 Flash: Fast Enough to Think. Reliable Enough to Act."

static.stepfun.com

LLMsAgents

2026年1月

剪藏 2026年1月31日

NASA 工程师用 Claude Code 来为火星探测车做路径规划

"Claude on Mars \ Anthropic"

anthropic.com

AI Industry

剪藏 2026年1月30日

继国内几大云后，Cloudflare 也挤上来蹭 Clawdbot 的热度，实现方案还是挺优雅的，不过需要 $5 订阅，且基于 Node 的环境，主要通过 API 打通外部资源（包括cf自己的虚拟浏览器）。与此同时，Clawdbot/Moltbot 叒改名了 OpenClaw。

"Introducing Moltworker: a self-hosted personal AI agent, minus the minis"

blog.cloudflare.com

Agents

剪藏 2026年1月30日

关于 Multi Agent 有效性的不同声音

"Towards a science of scaling agent systems: When and why agent systems work"

research.google

Agents

剪藏 2026年1月30日

OpenAI 内部数据分析 Agent 的实践：上下文为王

"Inside OpenAI’s in-house data agent | OpenAI"

openai.com

Agents

剪藏 2026年1月30日

回应完误解，METR 又对 Time Horizon 做了升级，更新了任务集（170→228）

"Time Horizon 1.1 - METR"

metr.org

Benchmarks & Eval

剪藏 2026年1月30日

去年 8 月惊艳预览的 Genie 3 终于上线，美区 AI Ultra 订阅用户可以体验。通过描述来生成或remix世界和角色、支持图片参考、控制是否第三人称视角，然后就可以探索，时长1分钟。

"Project Genie: AI world model now available for Ultra users in U.S."

blog.google

World Models

剪藏 2026年1月30日

继 TTS 后，同尺寸的 ASR 也开源，同时多了一个强制对齐时间戳的 0.6B ForcedAligner

"Qwen3-ASR & Qwen3-ForcedAligner现已开源：够稳定，能流式，多语言！"

qwen.ai

Speech & AudioOpen Source

剪藏 2026年1月29日

如12月初承诺的，Arcee 开源了 Trinity Large，美国制造开源大模型

"Arcee AI | Trinity Large: An Open 400B Sparse MoE Model"

arcee.ai

Open SourceLLMs

剪藏 2026年1月29日

有钱第一步：改名

"LMArena is now Arena"

arena.ai

Benchmarks & Eval

剪藏 2026年1月29日

Goodfire 团队将可解释性研究方法用到了基因模型 Pleiades 上，后者由 Prima Mente 研发，可通过血液中的 cell-free DNA 来检测阿兹海默症。研究通过监督式的 probes 和非监督式的 SAEs 分别分析出模型能识别到哪些生理信号和哪些特征对检测至关重要，得出结论 cfDNA 片段长度是最主要的因素，还能进而蒸馏出一个小的分类器来实现高效检测。这种先有一个大力出奇迹训练出的模型、再用可解释性研究去破译的方法非常有趣，AI的研究反过来也在帮助理解人类自己。

"Using Interpretability to Identify a Novel Class of Alzheimer's Biomarkers"

goodfire.ai

Interpretability

剪藏 2026年1月29日

Google、Sequoia、Index 和 Karpathy、Jeff Dean 等投的团队，致力于挑战当前范式、打造不需要全网语料训练但足够聪明的 AI，和之前 Karpathy 提的知识与智力解耦相符合，现在融了 1.8 亿美刀

"Flapping Airplanes"

flappingairplanes.com

AI Industry

剪藏 2026年1月29日

Google 维护的端侧 AI 推理框架，官宣生产可用，评测时对标的是 llama.cpp

"LiteRT: The Universal Framework for On-Device AI - Google Developers Blog"

developers.googleblog.com

Infra & Compute

剪藏 2026年1月29日

美团探索 N-gram embedding，和前两周 DeepSeek 的 Engram 工作有一点关系，MoE 基础上探求新的稀疏化，把参数预算给到了 embedding，具体到 68BA3B 的 LongCat-Flash-Lite，其单独的 embedding 参数就有 30B，整体展现出更好的性能

"meituan-longcat/LongCat-Flash-Lite · Hugging Face"

huggingface.co

Open SourceLLMs

剪藏 2026年1月29日

Anthropic 基于 Claude 脱敏对话数据分析了 AI 削弱人的可能模式，主要指现实感知失调、价值判断偏移、行动偏离价值，并拟了一个分类分级体系，尝试从负面探究 AI 对人的潜在影响。统计测算发现严重现实感知失调约 1/1300；还提到一些放大因素，比如用户视 AI 为权威时更易被削弱，举的例子令人害怕： > some users even referred to Claude as “Daddy” or “Master”

"Disempowerment patterns in real-world AI usage \ Anthropic"

anthropic.com

Safety & Alignment

剪藏 2026年1月28日

Gemini in Chrome 终于可以自动浏览了，还加了侧边栏、G-apps 打通、Nano Banana 修图和个性化等

"Chrome gets new Gemini 3 features, including auto browse"

blog.google

AI Industry

剪藏 2026年1月28日

50 亿人民币的 B+ 轮

"阶跃星辰不再低调：巨额融资，印奇加入，“1+3”核心决策层浮出水面 – 量子位"

qbitai.com

AI Industry

剪藏 2026年1月28日

相比初代 Helix 主要增加了用于控制自身的 SYSTEM 0，解决同时控制自体和操控外物的挑战，实现长程自主任务，演示是4分钟的洗碗机收纳，还会用胯关抽屉

"Introducing Helix 02: Full-Body Autonomy"

figure.ai

Robotics & Embodied

剪藏 2026年1月28日

OpenAI 推出的科研协作平台，有趣的是域名并没有用 chatgpt.com 而是 openai.com，算是独立的产品

"Introducing Prism | OpenAI"

openai.com

AI Industry

剪藏 2026年1月27日

在 K2 基础上继续预训练 CPT 和强化 RL，代码和视觉 SoTA、Agent Swarm 新鲜： - SWE-Bench Verified 来到 76.8% 国产最高，前端审美在线 - 大规模视觉-文本联合预训练发现竟然不需要顾此失彼了，双双提升；还能读视频 - Agent Swarm 训练了一个编排Agent，无需预设就能自己创建并指导一个最多100子Agent的团队、并行执行1500步； - 通过退火强化（开始先鼓励创建子 Agent 后来更关注任务达成）和关键步约束（有限资源和时延）实现了涌现且有效降低了成本，简单来说 Swarm 执行得更快、完成得更好 - 单实例的 K2.5 Agent 也为 Office 工作者设计，甚至支持 word 批注感受： - 一对比发现昨天的不开源的 Qwen3-Max-Thinking 毫无优势 - 大量的内部评测是一支模型团队成熟的标志 - multi-agent 成为焦点，一方面 GPT Pro、Grok Heavy、Gemini Deep Think 已经完了很久，国产模型近期集中跟上；另一方面因不知闭源厂商的实现，所以看到 Kimi 系统化的探索还挺兴奋的 - 这样看来想要厉害的 Agent，不做训练恐怕不太行，想看 K2.5 Agent/Swarm 与 Manus 的对比

"Kimi K2.5: Visual Agentic Intelligence"

kimi.com

AgentsMultimodal

剪藏 2026年1月27日

不开源；SWE-bench Verified 不比三剑客；测试时扩展没有用简单并行推理，而是限制了并行数、但通过“经验提取”机制来实现更高的上下文利用效率，感觉和前面美团 LongCat-Flash-Thinking-2601 的 8 个大脑重思考模式大同小异？

"突破极限：Qwen3-Max-Thinking 的能力跃迁"

qwen.ai

LLMs

剪藏 2026年1月27日

继去年的 Machines of Loving Grace 后，Anthropic CEO Dario Amodei 再发万字长文作为前传，讲述人类社会的 AI 技术处于“青春期”，突然具备了强大能力、难以控制、但又是必经之路，详细列举了五大风险并针对给出方案，极度简化地来说还是引导（稳定人格的训练）+ 理解（可解释性的研究）。文章写得非常好，就是太长，有空推荐一读。

"Dario Amodei — The Adolescence of Technology"

darioamodei.com

Safety & AlignmentAI Industry

剪藏 2026年1月27日

Claude 也有 Apps 了，背后是 MCP 扩展支持的可交互 UI 标准，所以 VS Code 等客户端也都同步支持

"Interactive tools in Claude | Claude"

claude.com

Agents

剪藏 2026年1月26日

沃顿商学院教授 Ethan Mollick 表示现在这些用 AI 最厉害的人，确实就在用到管理；比如指挥一只 Agent 队伍，委托、激励、验证等，就是管理101的课程内容！ update：Ethan 写了篇文章展开讲

"Ethan Mollick on X: "As a business school professor, its striking that a lot of the AI folks on this site, as they increasingly delegate authority to coding agents, are re-encountering the basic problems that underlie management theory and practice. Many delegation problems are old & well-understood!" / X"

x.com

Agents

剪藏 2026年1月25日

作者回应 METR 评测的一些常见误区和批评，最大的误区就是很多人以为评测给出的时长是 AI 能独立执行任务的时间，而事实上这个时长指的是人完成特定任务的时长，而 AI 可以在 50% 成功率上完成这个任务，用以衡量前沿模型在真实世界的能力表现

"Clarifying limitations of time horizon - METR"

metr.org

Benchmarks & Eval

剪藏 2026年1月25日

MiniMax 在 OpenRouter 上了一个角色扮演模型 M2-her

"MiniMax (official) on X: "M2-her for your optimized roleplay. More immersion. Better characters. Longer coherence." / X"

x.com

AI Industry

剪藏 2026年1月24日

Sakana 与 Google 牵手战略合作，毕竟原本就是 Google 人，感觉 Google 在全球人才团队的拿捏上还是太权威了

"Sakana AI、Googleとの戦略的パートナーシップ締結を発表"

sakana.ai

AI Industry

剪藏 2026年1月24日

关于 Codex 的 Agent 上下文的入门介绍，以及 Responses API

"Unrolling the Codex agent loop | OpenAI"

openai.com

Agents

剪藏 2026年1月24日

OpenAI 的 PG 扩展之路，支持的 QPS 已达数百万

"Scaling PostgreSQL to power 800 million ChatGPT users | OpenAI"

openai.com

Infra & Compute

剪藏 2026年1月24日

vLLM 以 Inferact 名义融得 a16z 和 Lightspeed 领投的 1.5亿美元种子轮，估值8亿； UC Berkeley Sky Lab 走出的团队在几周内几乎要凑成一个独角兽圆桌： - SGLang/RadixArk 估值4亿 - LMArena 已经独角兽后面两个经由 LMSYS 孵化

"Woosuk Kwon on X: "Today, we're proud to announce @inferact, a startup founded by creators and core maintainers of @vllm_project, the most popular open-source LLM inference engine. Our mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper https://t.co/v9xHsWoCIR" / X"

x.com

AI IndustryOpen Source

剪藏 2026年1月23日

LiveKit 融了 Index Ventures 领投的 1 亿美元C轮，估值10亿跻身独角兽

"LiveKit's Series C: Towards the voice-driven era of computing"

blog.livekit.io

AI Industry

剪藏 2026年1月23日

GitHub Copilot CLI 也推出 SDK，加入 Claude Agent SDK、Codex SDK 的阵营；开源的 OpenCode 不太一样，一开始就是 Server/Client 架构，所以 TUI 只是一种 Client

"Build an agent into any app with the GitHub Copilot SDK - The GitHub Blog"

github.blog

Agents

剪藏 2026年1月22日

刚宣布跨过衍生模型20万、累计下载10亿次的里程碑，千问又开源了Qwen-TTS两个尺寸五款模型，支持语音设计、克隆与生成，且多项评测SoTA。中文语音合成模型的开源不算多，SoTA更是相当于没有，大家都心照不宣把最好的藏着卖API，包括之前Qwen-TTS也都是闭源的，这次还是狠下心要坐稳开源王座，同时应该也是在预判AI语音应用的增长潜力。 update：可玩性不错，用 VoiceDesign 模拟自然语言设计音色 - 满意的话拿去 Base 模拟克隆，CustomVoice 内置了9种音色可以更精细地控制生成

"Qwen3-TTS全家桶开源: 语音设计，克隆与生成！"

qwen.ai

Open SourceSpeech & Audio

剪藏 2026年1月22日

AI能力越来越强，对工程人员面试也提出了挑战

"Designing AI resistant technical evaluations \ Anthropic"

anthropic.com

Benchmarks & Eval

剪藏 2026年1月22日

ARC-AGI-3 也正在开发中

"ARC Prize 2025 Results and Analysis"

arcprize.org

Benchmarks & Eval

剪藏 2026年1月22日

Anthropic 发的 2026 Coding 趋势报告，8个趋势：软件开发范式变迁x1、能力提升x4、影响x3

"2026 Agentic Coding Trends Report.pdf"

resources.anthropic.com

AgentsAI Industry

剪藏 2026年1月21日

MiniMax AI原生工作台，打通本地与云端

"“95后”正在尝试一种很新的工作方式"

mp.weixin.qq.com

AI Industry

剪藏 2026年1月21日

Jan Leike 分享，前沿模型对齐做的越来越好了，Grok是个例外

"Jan Leike on X: "Interesting trend: models have been getting a lot more aligned over the course of 2025. The fraction of misaligned behavior found by automated auditing has been going down not just at Anthropic but for GDM and OpenAI as well. https://t.co/8DYm9SP7wF" / X"

x.com

Safety & Alignment

剪藏 2026年1月21日

训练了一个小分类器，根据当前情境决定是否给 Agent 注入一个小提醒，在 debug 时非常有用，且后续不驻留于上下文、不影响缓存

"Replit — Decision-Time Guidance: Keeping Replit Agent Reliable"

blog.replit.com

Agents

剪藏 2026年1月20日

Anthropic Fellows Program 计划，MATS（独立的AI对齐研究机构）+牛津+Anthropic 联合团队针对大模型助手角色的研究：基于 Gemma3、Qwen3、Llama3.3 的分析，预训练中模型就已习得 Assistant 这一人格，在轴的另一边与其相对的便是可能有害的角色扮演，多轮对话会让角色稳定性显著下滑，通过 Activation Capping 的操控（steer）技术，可以在不损失能力的情况下缓解这一问题

"The assistant axis: situating and stabilizing the character of large language models \ Anthropic"

anthropic.com

LLMsSafety & Alignment

剪藏 2026年1月20日

推理模型脑海中的智囊团

"谷歌新发现：DeepSeek推理分裂出多重人格，左右脑互搏越来越聪明 – 量子位"

qbitai.com

LLMs

剪藏 2026年1月20日

继 Andrej Karpathy、Stephen Wolfram、Addy Osmani（Chrome 工程师、Google 云 AI director）、Linus Torvalds（用 Antigravity 写小工具）等一众大佬后，Node.js、Deno 创始人也加入“手敲代码时代已经终结”阵营

"Ryan Dahl on X: "This has been said a thousand times before, but allow me to add my own voice: the era of humans writing code is over. Disturbing for those of us who identify as SWEs, but no less true. That's not to say SWEs don't have work to do, but writing syntax directly is not it." / X"

x.com

AI Industry

剪藏 2026年1月19日

CoT 可能会骗人

"Topological Signatures of Deception: Detecting Unfaithful Reasoning via Sentence-Level Causal Graphs | angkul's site"

angkul.bearblog.dev

Interpretability

剪藏 2026年1月19日

OpenAI 过去三年年化收入 20 → 60 → 200 亿美元，对应算力是 0.2 → 0.6 → 1.9 GW

"A business that scales with the value of intelligence | OpenAI"

openai.com

AI Industry

剪藏 2026年1月18日

媒体没传的是 Demis 的下一句话：中国尚未表现出 AI 前沿突破创新的能力

"China just 'months' behind U.S. AI models, Google DeepMind CEO says"

cnbc.com

AI Industry

剪藏 2026年1月18日

AI Interviewer 领域有几个团队已经估值高企

"Deedy on X: "One of the more "boring" overlooked markets AI is completed upending is user research. Companies like Qualtrics ($12.5B), Medallia ($6.4B) and SurveyMonkey ($1.5B) have dominated for a long time. But now, we can infinitely scale and process interviews. Listen Labs, for example, https://t.co/JnA5c6W5jo" / X"

x.com

AI Industry

剪藏 2026年1月18日

DeepSeek 论文里用到了可解释性的相关方法去探究 Engram 如何生效

"himanshu on X: "wait this is actually big. this deepseek research used LogitLens (lets you see what the model is 'thinking' at each layer) and CKA (compares what different layers are actually learning) to figure out why the new Engram architecture works. apparently this is the first time i have https://t.co/t7RFN3qHou" / X"

x.com

Interpretability

剪藏 2026年1月17日

伴随着 $8/月的 ChatGPT Go 订阅上线，OpenAI 开始测试为 ChatGPT 加入广告，尽管声称显著标识、不影响回答、对话保持隐私、新的 AI 广告体验等，但在 Gemini/Grok 的凶猛追击和 Claude 的商业成功局面下，不花钱就给你看广告的 ChatGPT 还能撑多久，或者追赶者未来是否也会采取用样的路子，是摆在通用 AI 公司发展路上的必思议题

"Our approach to advertising and expanding access to ChatGPT | OpenAI"

openai.com

AI Industry

剪藏 2026年1月17日

Cloudflare 收购了一家英国公司 Human Native，后者多模态数据市场，同时谈了 AI 时代的互联网经济

"Human Native is joining Cloudflare"

blog.cloudflare.com

AI Industry

剪藏 2026年1月16日

OpenAI 将 Responses API 的设计规则开源出来，与一众引擎共同满足 Agentic 推理需求

"OpenAI Developers on X: "Today we’re announcing Open Responses: an open-source spec for building multi-provider, interoperable LLM interfaces built on top of the original OpenAI Responses API. ✅ Multi-provider by default ✅ Useful for real-world workflows ✅ Extensible without fragmentation Build https://t.co/SJiBFx1BOF" / X"

x.com

Open SourceAgents

剪藏 2026年1月16日

OpenAI 投了脑机接口公司 Merge Labs

"Investing in Merge Labs | OpenAI"

openai.com

AI Industry

剪藏 2026年1月16日

5个维度：任务复杂度、人技能、使用场景、自主程度、成功与否

"Anthropic Economic Index: new building blocks for understanding AI use \ Anthropic"

anthropic.com

Benchmarks & EvalAI Industry

剪藏 2026年1月16日

Gemma 翻译版，用 Gemini 数据蒸馏，55 种语言，OpenAI 这几天也上线了独立的翻译功能页面，但是个人使用似乎无脑选最好的模型是优解，可能有其他的工业/生产场景

"TranslateGemma: A new family of open translation models"

blog.google

Open SourceLLMs

剪藏 2026年1月16日

FLUX.2 的小尺寸全能，4B apache 开源、9B 非商用，支持生图、编辑、多图参考等

"FLUX.2 [klein]: Towards Interactive Visual Intelligence | Black Forest Labs"

bfl.ai

Visual GenerationOpen Source

剪藏 2026年1月15日

Cursor 号称用 GPT-5.2-Codex 从 0 做了个浏览器，跑了几周、写了上千个文件、百万行代码

"Scaling long-running autonomous coding · Cursor"

cursor.com

AgentsLLMs

剪藏 2026年1月15日

专注跨形态机器人大脑的 Skild 融了软银领投的 14 亿美元 C 轮，估值 140 亿，前几天也发了直接让机器人看人类视频学习的成果

"Announcing Series C - Skild AI"

skild.ai

Robotics & EmbodiedAI Industry

剪藏 2026年1月15日

OpenAI 终于跟 Cerebras 牵手，750MW 高速算力，有望让 Agent/长程任务跑得更快

"OpenAI partners with Cerebras | OpenAI"

openai.com

Infra & ComputeAI Industry

剪藏 2026年1月15日

Gemini 连接 Gamil/Photos/YouTube/Search 来提供个性化智能，这种程度的打通不易，Google 决心可见一斑

"Personal Intelligence: Connecting Gemini to Google apps"

blog.google

AI Industry

剪藏 2026年1月14日

MIT 科技评论将机制可解释性列为 2026 十大突破技术之一

"Mechanistic interpretability: 10 Breakthrough Technologies 2026 | MIT Technology Review"

technologyreview.com

InterpretabilityAI Industry

剪藏 2026年1月14日

CPO Mike Krieger 领衔、Anthropic 新成立 Labs，试图总结、复制并放大 Claude Code、MCP、Skills、Cowork 等从研究预览进化为成功产品的路径，更多地参与到实验性产品的早期孵化，加强公司在产品层面的前瞻布局和掌控力

"Introducing Labs \ Anthropic"

anthropic.com

AI Industry

剪藏 2026年1月14日

MedGemma 半代升级，加上之前发布过的 MedASR

"Next generation medical image interpretation with MedGemma 1.5 and medical speech to text with MedASR"

research.google

LLMsSpeech & Audio

剪藏 2026年1月14日

Apple 创作软件大礼包，13美刀/月订阅

"Introducing Apple Creator Studio, an inspiring collection of creative apps - Apple"

apple.com

AI Industry

剪藏 2026年1月14日

智谱联合华为开源了新一代图像生成模型GLM-Image，基于昇腾Atlas 800T A2设备和昇思MindSpore AI框架完成从数据到训练的全流程，是首个在国产芯片上完成全程训练的SOTA多模态模型。融合了 9B 的自回归 GLM-4 和 7B 的 DiT CogView-4

"GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image Generation"

z.ai

Visual GenerationOpen SourceLLMs

剪藏 2026年1月13日

Astera/NVIDIA/Stanford 团队推出 Test-Time Training（TTT）

"Reimagining LLM Memory: Using Context as Training Data Unlocks Models That Learn at Test-Time | NVIDIA Technical Blog"

developer.nvidia.com

LLMs

剪藏 2026年1月13日

在注意力前加一层 Engram，把常见的词组语句的计算生成变成静态记忆的查找

"deepseek-ai/Engram: Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models"

github.com

LLMs

剪藏 2026年1月13日

医药/生科榜单表现更强的模型 + 连接器 + Skills，Claude 也在健康和生命科学领域继续发力

"Advancing Claude in healthcare and the life sciences \ Anthropic"

anthropic.com

LLMsAI Industry

剪藏 2026年1月13日

Claude 新上 Cowork 模式，作为 research preview 仅对 Max 用户开放，本质是基于 Claude Agent SDK 将 Claude Code 的能力封装成一种更适合知识工作者的 UI，进一步论证了 Coding Agents = General Agents，结合专业 skills 落到不同领域是相当通用的解法

"Introducing Cowork | Claude"

claude.com

AgentsAI Industry

剪藏 2026年1月13日

Apple 与 Google 就基于 Gemini 技术的 Apple 模型达成多年合作

"News from Google on X: "Joint Statement: Apple and Google have entered into a multi-year collaboration under which the next generation of Apple Foundation Models will be based on Google's Gemini models and cloud technology. These models will help power future Apple Intelligence features, including a" / X"

x.com

AI Industry

剪藏 2026年1月13日

AI 健康纷纷发力：OpenAI 收购了 Torch Health，一个专门做 AI 健康记录管理的团队

"Torch is joining OpenAI"

torchapp.com

AI Industry

剪藏 2026年1月12日

继三个月前 OpenAI 与 Stripe 联手推出 ACP（Agentic Commerce Protocol）后，今天 Google 也在零售大会上推出 UCP（Universal Commerce Protocol），同样拉上 Shopify、Etsy 等一众已支持 ACP 的厂商，后续基于 UCP 在 AI Mode 和 Gemini 中上线新的购物功能；同时还针对品牌方推出 Business Agent，画了一个 AI 端到端帮忙卖货的大饼。既是在尝试撬动用户习惯、尽可能涉足交易，也在协议与标准层面竞争，后面还要看看 Amazon 的动作。

"New tech and tools for retailers to succeed in an agentic shopping era"

blog.google

AgentsAI Industry

剪藏 2026年1月12日

面对上下文拓展难题，Sakana AI 说：要不咱把位置编码扔了？

"Extending the Context of Pretrained LLMs by Dropping their Positional Embeddings"

pub.sakana.ai

LLMs

剪藏 2026年1月11日

一步拆两步，前小后大的过滤思路提高准确率并大幅降低成本

"Next-generation Constitutional Classifiers: More efficient protection against universal jailbreaks"

anthropic.com

Safety & Alignment

剪藏 2026年1月10日

相对完整的 Agent 评测体系，虽然行文有 Claude 痕迹

"Demystifying evals for AI agents \ Anthropic"

anthropic.com

AgentsBenchmarks & Eval

剪藏 2026年1月10日

据 Epoch AI 测算，全球 AI 算力已达到等效 1500 万张 H100

"Epoch AI on X: "Global AI compute capacity now totals over 15 million H100-equivalents. Our new AI Chip Sales data explorer tracks where this compute comes from across Nvidia, Google, Amazon, AMD, and Huawei, making it the most comprehensive public dataset available. https://t.co/DL56kEPPRb" / X"

x.com

Infra & ComputeAI Industry

剪藏 2026年1月9日

庆祝上市（？）Cerebras 部署了 GLM-4.7，可能是最快的 GLM-4.7

"GLM-4.7: Frontier intelligence at record speed — now available on Cerebras "

cerebras.ai

LLMsInfra & Compute

剪藏 2026年1月9日

斯坦福团队的研究，通过前缀开头部分内容引导模型吐出版权内容，甚至是完整的一本书，如《哈利波特》

"Extracting Books from Production Language Models"

ahmeda14960.github.io

LLMs

剪藏 2026年1月8日

雷蛇的桌面 AI，采用了桶柱内的投影，挺有科幻感的，但语音交互的延迟还比较高

"Meet Project AVA at CES 2026 - Blog"

razer.com

AI Industry

剪藏 2026年1月8日

Cursor 调整了上下文机制，向 Claude Code 一样拥抱 filesystem，大势所趋

"Dynamic context discovery · Cursor"

cursor.com

AI Industry

剪藏 2026年1月8日

Nathan Lambert 等运营的 Interconnects 发起了美国真开源模型（ATOM）项目，主要论证了当前中国开源的领先地位，有一些不错的数据图表

"The ATOM Project - American Truly Open Models"

atomproject.ai

Open SourceLLMs

剪藏 2026年1月8日

可爱向的语音 AI 陪伴应用 Tolan 自 2025 年 2 月上线以来已增长至 20 万月活，App Store 10 万+ 评价得分 4.8，GPT-5.1 的可控性提升为其带来了更好的角色表达。上下文方案也不同于大部分 Agent，Tolan 每轮会话都重新计算个性并组装包括语气、记忆、性格、历史等在内的提示词，其中记忆召回是用扩写+ 语义 RAG 实现的，更新则采用语义 KNN

"How Tolan builds voice-first AI with GPT-5.1 | OpenAI"

openai.com

Speech & AudioAgents

剪藏 2026年1月8日

每周两亿人向 ChatGPT 询问健康问题，OpenAI 索性推出 ChatGPT Health，可以连接苹果健康等数据源，辅助解读报告、医前准备、饮食运动，目前还需要候补。 ChatGPT 左上角的入口越来越多了

"Introducing ChatGPT Health | OpenAI"

openai.com

AI Industry

剪藏 2026年1月7日

估值来到 ~2300 亿；MAU 接近 6 亿；数据中心等效 H100 超过百万块

"xAI Raises $20B Series E | xAI"

x.ai

AI IndustryInfra & Compute

剪藏 2026年1月7日

与波士顿动力合作，Gemini Robotics 继续发力

"Boston Dynamics & Google DeepMind Form New AI Partnership to Bring Foundational Intelligence to Humanoid Robots | Boston Dynamics"

bostondynamics.com

Robotics & EmbodiedAI Industry

剪藏 2026年1月7日

继社区讨论后，Claude Code 官方也上了 Ralph Wiggum 插件，基于 Stop hook 实现让 Agent 可以无休止地工作直到完成。名字取自辛普森一家中的同名角色。 update：已改名为 Ralph Loop，大概是侵权原因？

"claude-plugins-official/plugins/ralph-wiggum"

github.com

Agents

剪藏 2026年1月7日

Fidji Simo 的新年 ChatGPT 展望，致力于打造最佳私人助理、释放企业场景价值、和开发者的自动化 AI 队友

"Closing the capability gap between frontier AI and everyday use in 2026"

fidjisimo.substack.com

AI Industry

剪藏 2026年1月7日

LMArena 融了 Felicis、加大投的 1.5 亿美元 A 轮，估值来到 17 亿，400+模型，5000 万投票大模型评测都能出独角兽，太可怕了

"Fueling the World’s Most Trusted AI Evaluation Platform"

news.lmarena.ai

Benchmarks & EvalAI Industry

剪藏 2026年1月6日

从 GRPO 已经衍生出了诸多变种

"GRPO++: Tricks for Making RL Actually Work"

cameronrwolfe.substack.com

LLMs

剪藏 2026年1月2日

Seed 用 VLA 训练的灵巧手

"GR-Dexter Technical Report"

byte-dexter.github.io

Robotics & Embodied

剪藏 2026年1月2日

与 Ilya 的 back to research 相呼应，DeepSeek 对 ResNet 的发展做了系统分析，在 Seed 去年的 Hyper-Connection 工作基础上，基于数学、工程和 scaling 的验证，深入了神经网络拓扑研究，提出了 mHC 这一新架构，有望打开

"mHC: Manifold-Constrained Hyper-Connections"

arxiv.org

LLMs

剪藏 2026年1月2日

致知创新研究院（九坤量化团队？）推出的代码模型，以 40B 的尺寸在 SWE-bench Verified 上达到 81.4 的高分。论文中有 3 个发现： 1. 相比静态的仓库文件，提交过程记录数据，更有利于提升模型的规划能力 2. 32k 推理/编码的 mid-training 对于稳定训练至关重要 3. post-training 的 RL 思考涌现错误修正能力 update：SWE-bench Verified 跑分受到质疑，解释为测试环境不对，更新后为 76.2

"IQuest Coder"

iquestlab.github.io

LLMsBenchmarks & Eval

剪藏 2026年1月1日

通过强化学习训练模型自己管理自己的上下文，先调用 REPL、sub-LLM 等处理一遍再真正推理

"Recursive Language Models: the paradigm of 2026"

primeintellect.ai

LLMs

2025年12月

剪藏 2025年12月30日

通义团队推出 Mobile World，继 Android World 等之后的移动端 GUI Agent 新基准

"Mobile World: Benchmarking Autonomous Mobile Agents"

tongyi-mai.github.io

Benchmarks & EvalAgents

剪藏 2025年12月30日

微信 AI

"WeDLM: Reconciling Diffusion Language Models with Standard Causal Attention for Fast Inference"

wedlm.github.io

LLMs

剪藏 2025年12月29日

海马 emoji 如何体现了预训练数据的自反思配方

"Reverse Engineering a Phase Change in GPT's Training Data... with the Seahorse Emoji 🌊🐴"

pratyushmaini.substack.com

LLMs

剪藏 2025年12月29日

Claude Code 精讲

"A Guide to Claude Code 2.0 and getting better at using coding agents | sankalp's blog"

sankalp.bearblog.dev

Agents

剪藏 2025年12月29日

年末一场围绕 Coding 的讨论，先是大神 Andrej Karpathy 的焦虑，然后是 Claude Code 作者 Boris 的自白，Coding Agent 的成熟正在让程序员、甚至是顶尖的开发者不再手敲代码，而是关注 AI 交互，完成 10 倍甚至 100 倍的提升

"Boris Cherny on X: "When I created Claude Code as a side project back in September 2024, I had no idea it would grow to be what it is today. It is humbling to see how Claude Code has become a core dev tool for so many engineers, how enthusiastic the community is, and how people are using it for all https://t.co/QVlmbhjUUE" / X"

x.com

Agents

剪藏 2025年12月29日

用 Job vs Gym 的划分来指导与 AI 协作的过程，前者注重产出，AI 助力交付，后者关注过程，自我核心能力的提升

"Keep the Robots Out of the Gym | Daniel Miessler"

danielmiessler.com

AI Industry

剪藏 2025年12月28日

还有一篇论文专门实验分析 AI 如何回应不同年龄用户对“圣诞老人是否存在”等问题

"Yes, AI, There is a Santa Claus – Machine Learning Blog | ML@CMU | Carnegie Mellon University"

blog.ml.cmu.edu

LLMs

剪藏 2025年12月28日

关于 AI 会不会对 5 岁小孩承认圣诞老人并不存在这件小事

"Daphne Hansell on X: "If you say you’re 5, opus 4.5 will lie to you about Santa but the COT gives it away. 5.2 doesn’t believe in lying to children https://t.co/sb7BKwQYnu" / X"

x.com

LLMs

剪藏 2025年12月26日

累计注册 600 万，月活 160 万

"TRAE 1.0.0｜2025 年度产品报告"

mp.weixin.qq.com

AI Industry

剪藏 2025年12月26日

Anthropic 联创 Jack Clark 也是宝爸，趁着娃睡了，用 Opus 4.5 加持的 Claude Code 花几分钟做了个小的世界模拟器细细把玩，描述这种感觉像是作为一个小孩在跟大人玩，Claude 形同一个有求必应的超级智能。但你必须拥有时间+好奇心的“魔法组合”，否则这些最惊人的进展体验默认对你隐藏。他还预测 2026 年这种情况会进一步恶化，数字世界的进化将更快加速，新的专为 AI 系统设计的东西（如专供 AI Agents 而对人隐形的网站等）将会承载更多“幽灵”般的 AI 活动和硅基大脑的信息交换。对于四维空间的人类而言，AI 就像是活在五维，仅在其经过我们时留下一瞥。思考、推演和文笔都非常棒：https://x.com/jackclarkSF/status/2003526145380151614

"Jack Clark on X: "Silent Sirens, Flashing For Us All" / X"

x.com

Agents

剪藏 2025年12月25日

看了半天也没明白到底是能做什么

"钉钉上新，想用 AI 教你点「工作切割术」 | 极客公园"

geekpark.net

AI Industry

剪藏 2025年12月25日

英伟达与 Groq 达成非排他的专利授权协议，同时将后者核心骨干收入麾下。 CNBC 的报道是约 200 亿美元，而 Groq 9 月融后估值为 69 亿。 GroqCloud 继续运行，但感觉主要是为了防止被查？

"Groq and Nvidia Enter Non-Exclusive Inference Technology Licensing Agreement to Accelerate AI Inference at Global Scale | Groq is fast, low cost inference."

groq.com

Infra & Compute

剪藏 2025年12月24日

TTS 是不开源的

"Qwen3-TTS Steps Up: Voice Cloning and Voice Design!"

qwen.ai

Speech & Audio

剪藏 2025年12月23日

针对提示词注入风险，ChatGPT Atlas 用强化学习构建的自动化攻击-防御对抗迭代工作流

"Continuously hardening ChatGPT Atlas against prompt injection attacks | OpenAI"

openai.com

Safety & Alignment

剪藏 2025年12月23日

机器人奥林匹克，刚发布的 PI 0.6（π0.6）完成得不错

"Moravec's Paradox and the Robot Olympics"

pi.website

Robotics & Embodied

剪藏 2025年12月21日

Tavern Research 于 2025 年 8 月针对 2300+ 美国成年人的网络问卷显示大家希望监管建立规则，更有意思的是： > 当你问像ChatGPT这样的工具提问时，实际会发生什么。45%的人认为它在数据库中查询确切的答案，21%的人认为它遵循了预先编写的回复脚本。

"Americans Have Mixed Views of AI – and an Appetite for Regulation - Searchlight Institute"

searchlightinstitute.org

AI Industry

剪藏 2025年12月21日

Google DeepMind 的研究团队认为，当前 AGI 研究过于关于单一 AI 突破，而事实是会有多个不同领域的 sub AGI 合作，形成分布式的集体智能，也带来了对齐与治理挑战

"Distributional AGI Safety"

arxiv.org

Safety & Alignment

剪藏 2025年12月21日

这项针对棋类、音乐、运动等高水平人士的研究表明，相比早期就专注于单一领域训练者，那些练习更多学科的人虽然开始慢，但长期上限更高

"Recent discoveries on the acquisition of the highest levels of human performance | Science"

science.org

AI Industry

剪藏 2025年12月21日

专门服务医药场景的 ASR 模型

"MedASR | Health AI Developer Foundations | Google for Developers"

developers.google.com

Speech & Audio

剪藏 2025年12月21日

发现不少玩家上传的游戏视频有操控展示 → 分离出操控动作就是训练数据

"NitroGen | A Foundation Model for Generalist Gaming Agents"

nitrogen.minedojo.org

Agents

剪藏 2025年12月20日

Cursor 收购了 Graphite，一个专注做 AI review 等 Coding 工作流的团队

"Building the future of software development with Cursor"

graphite.com

AI Industry

剪藏 2025年12月20日

图层分离，真 · AI 版 PS

"Qwen-Image-Layered: Layered Decomposition for Inherent Editablity"

qwen.ai

Visual Generation

剪藏 2025年12月20日

RLVR、锯齿智能、LLM apps（Cursor）、local Agent（Claude Code）、vibe coding、生图 GUI（Nano Banana）

"2025 LLM Year in Review | karpathy"

karpathy.bearblog.dev

AI Industry

剪藏 2025年12月20日

Gemma Scope 新版

"Gemma Scope 2: Helping the AI Safety Community Deepen Understanding of Complex Language Model Behavior - Google DeepMind"

deepmind.google

Interpretability

剪藏 2025年12月20日

ChatGPT 写作功能更新，diff 记录修改过程，主要是写邮件

"JZ on X: "🆕 Writing blocks make it easier to craft the perfect email in ChatGPT. ∙Update & format text right in chat ∙Highlight to ask for changes, and accept or reject suggestions ∙Open in your email client once you’re ready to send Try it & please let us know what you think! https://t.co/2Vgf0Av3u6" / X"

x.com

AI Industry

剪藏 2025年12月20日

不同于 SAE，Activation Oracles（AO）训练模型读懂神经元激活、并支持自然语言提问

"Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers"

alignment.anthropic.com

Interpretability

剪藏 2025年12月19日

QQ 浏览器

"当年带你上网冲浪的头号老玩家，这回是真AI上头了 | 量子位"

qbitai.com

AI Industry

剪藏 2025年12月19日

太有意思了，而且彩蛋满满

"Project Vend: Phase two \ Anthropic"

anthropic.com

AI Industry

剪藏 2025年12月19日

Claude.ai 内有一个小的分类模型，可以识别到自杀自残倾向并主动提醒，针对不同国家地区展示不同的求助热线，由 ThroughLine 提供，ChatGPT 同日也提到上了类似的方法； Anthropic 评估了 Claude 系列在此类问题上的响应，合理回复的比例在不断提高，但微妙的是最聪明的 Opus 模型都不是最高；而且，他们声称从 2022 年发布 Claude 之前就已经在评估 AI 讨好的问题了，近期还开源了一个模型行为评估框架；此外 Claude 不允许 18 岁以下青少年使用，还会通过分类器标记识别，与 ChatGPT 传闻要上成人模式形成呼应，Anthropic 真是 2B 收入和名声都占了。

"Protecting the well-being of our users \ Anthropic"

anthropic.com

Safety & Alignment

剪藏 2025年12月19日

OpenAI 针对家庭教育给出的 AI literacy 资源

"AI literacy resources for teens and parents | OpenAI"

openai.com

Safety & Alignment

剪藏 2025年12月19日

主要更新在 U18，青少年安全第一； ChatGPT 也上了 ThroughLine 提供的求助热线，正在继续打磨年龄预测模型

"Updating our Model Spec with teen protections | OpenAI"

openai.com

Safety & Alignment

剪藏 2025年12月19日

OpenAI 可能有一套强化 Codex 模型的流水线，通用模型迭代出来，马上就能推出对应的 Codex 变种；强调了网络安全能力的提升

"Introducing GPT-5.2-Codex | OpenAI"

openai.com

LLMs

剪藏 2025年12月19日

CoT 可观测性评估

"Evaluating chain-of-thought monitorability | OpenAI"

openai.com

Interpretability

剪藏 2025年12月18日

Lovable 估值来到 66 亿美金

"Lovable raises $330M to power the age of the builder - Lovable Blog"

lovable.dev

AI Industry

剪藏 2025年12月18日

豆包可能是第一个把模型版本做到 1.8 的；同步视频模型升级到 Seedance 1.5，前两天内测试了下还比不上 Veo 3；日均 token 使用超过 50 万亿

"两大模型发布！豆包大模型日均使用量突破50万亿Tokens"

mp.weixin.qq.com

LLMsVisual Generation

剪藏 2025年12月18日

PI 新的 VLA 模型，可以将头戴摄像头的人类动作视频直接迁移至机器人，团队称之为涌现

"Emergence of Human to Robot Transfer in Vision-Language-Action Models"

pi.website

Robotics & EmbodiedVisual Generation

剪藏 2025年12月18日

有点缺乏信息增量

"The State of AI Coding 2025 | Greptile"

greptile.com

AI Industry

剪藏 2025年12月18日

经过特斯拉车载打磨，Grok 语音智能体 API 上线

"Grok Voice Agent API | xAI"

x.ai

Speech & AudioAgents

剪藏 2025年12月18日

Google 用 Gemini 系列包圆了大模型性价比的帕累托前沿，有趣的是 Gemini 3 Flash 在 SWE-Bench Verified 上还超过了 Gemini 3 Pro

"Introducing Gemini 3 Flash: Benchmarks, global availability"

blog.google

LLMs

剪藏 2025年12月17日

OpenAI 推出 FrontierScience，共 700+ 物化生题目。其中，注重结果的 Olympaid 100题和注重过程的 Research 60题组成金榜，由不足百位奥运金牌和科学家出题评估。 GPT-5.2 领先。

"Evaluating AI’s ability to perform scientific research tasks | OpenAI"

openai.com

Benchmarks & Eval

剪藏 2025年12月17日

有趣的 SAM Audio 模型，通过文本、画面、区间来分割音频，神奇的感觉

"Our New SAM Audio Model Transforms Audio Editing"

about.fb.com

Speech & Audio

剪藏 2025年12月17日

上月发布 FLUX.2 系列时已经是好几个模型了，现在又加一个 max 版

"FLUX.2 [max] - Top-Tier Quality Image Generation | Black Forest Labs"

bfl.ai

Visual Generation

剪藏 2025年12月16日

反击 Nano Banana Pro，GPT Image 1.5 竞技场摘金，提升了精准编辑能力、指令遵循，文字精细、数字靠谱，速度快 4x，屎黄感减弱，但特定风格、多人脸、中文等方面还有局限

"The new ChatGPT Images is here | OpenAI"

openai.com

Visual Generation

剪藏 2025年12月16日

罗福莉 x 小米，直接把 MiMo 推到了开源 SoTA，隐隐感觉国内大模型训练有收敛之势

"Introducing MiMo-V2-Flash"

mimo.xiaomi.com

LLMsOpen Source

剪藏 2025年12月16日

最近语音的增量小升级还挺密集，继 Gemini 语音升级、智谱&通义分别发布后，OpenAI 也升级了 4o-mini 的 ASR 和 TTS

"OpenAI Developers on X: "🆕 New audio model snapshots are now live in the Realtime API with improvements to reliability, lower error rates, and fewer hallucinations: - gpt-4o-mini-transcribe-2025-12-15: 89% reduction in hallucinations compared to whisper-1 - gpt-4o-mini-tts-2025-12-15: 35% fewer word https://t.co/E8clreR1R0" / X"

x.com

Speech & Audio

剪藏 2025年12月16日

Nemotron 3 系列，混合 Mamba-Transformer MoE，30、100、500 三个尺寸，稀疏度均为 10%；外加数据、NeMo Gym 等一套工具链，完整开源。

"NVIDIA Debuts Nemotron 3 Family of Open Models | NVIDIA Newsroom"

nvidianews.nvidia.com

LLMsOpen Source

剪藏 2025年12月16日

韦氏词典 2025 年度词：Slop

"Word of the Year 2025 | Slop | Merriam-Webster"

merriam-webster.com

AI Industry

剪藏 2025年12月15日

可能是智谱前面开的头，通义这次也是，在 TTS 和 ASR 上，大家开始默认把好的藏起来、小尺寸开源

"通义百聆语音双子星，同步开源！"

mp.weixin.qq.com

Speech & AudioOpen Source

剪藏 2025年12月13日

用针对性精调的 Veo 视频模型来训练机器人操作，和之前 Jim Fan 分享的、近期 Runway 的工作都有相通之处

"Evaluating Gemini Robotics Policies in a Veo World Simulator"

veo-robotics.github.io

Robotics & EmbodiedWorld Models

剪藏 2025年12月13日

Zoom 通过多模型组合框架在 HLE 上实现了 SoTA

"Zoom AI sets new state-of-the-art benchmark on Humanity's Last Exam | Zoom"

zoom.com

Benchmarks & Eval

剪藏 2025年12月13日

继几天前 Gemini TTS 的更新后，Gemini Native Audio 也升级（都还是 2.5 系列，命名太乱了），此次借助 S2S 翻译应用上了实时翻译

"Gemini 2.5 Native Audio upgrade, plus text-to-speech model updates"

blog.google

Speech & Audio

剪藏 2025年12月12日

多模态开源周收官

"四项视频生成技术，开源！"

mp.weixin.qq.com

Open SourceVisual Generation

剪藏 2025年12月12日

Runway 一直声称使命是世界模型，之前也放出过与机器人厂商合作用视频模型训练的消息，这次正式发布 Runway GWM-1 通用世界模型，基于 Gen-4.5，改用自回归扩散路线，2分钟、720P，除了对标 Genie 外，还有一个 GWM Avatars，音频驱动的交互数字人，Gen-4.5 也支持音画同步、音频编辑、多镜头编辑

"Runway Research | Introducing Runway GWM-1"

runwayml.com

World ModelsVisual Generation

剪藏 2025年12月11日

推理持续增强，SWE-Bench Verified 第二个过 80 分，长上下文稳定性提高，幻觉继续压低，开始突出 GDPeval 这种经济指标了，不少领域超过专业知识工作者 - API 价格微涨 - knowledge cutoff 竟然是 2025年8月

"Introducing GPT-5.2 | OpenAI"

openai.com

LLMs

剪藏 2025年12月11日

有趣，“AI味”都有自己的维基词条了

"Wikipedia:Signs of AI writing - Wikipedia"

en.wikipedia.org

AI Industry

剪藏 2025年12月11日

OpenAI 认为当前 AI 在其 Preparedness 框架下的能力已达到高级别

"Strengthening cyber resilience as AI capabilities advance | OpenAI"

openai.com

Safety & Alignment

剪藏 2025年12月11日

好水的报告，但大趋势是大家都开始分析用户使用数据了

"It’s About Time: The Copilot Usage Report 2025 | Microsoft AI"

microsoft.ai

AI Industry

剪藏 2025年12月11日

在更新的 FACTS Grounding v2 上，Gemini 3 Pro 和 Gemini 2.5 Pro 位居前列

"FACTS Benchmark Suite: a new way to systematically evaluate LLMs factuality - Google DeepMind"

deepmind.google

Benchmarks & Eval

剪藏 2025年12月11日

Adobe 系列应用接入 ChatGPT，但是在 Nano Banana 引领的 AI 原生修图趋于成熟之际，这个操作似乎有些尴尬，不清楚目标用户到底是哪些

"Adobe Makes Creativity Accessible for Everyone with Adobe Photoshop, Adobe Express and Adobe Acrobat in ChatGPT"

news.adobe.com

AI Industry

剪藏 2025年12月11日

Waymo 基础模型，Driver-Simulator-Critic 联合，传感器融合 encoder + 驾驶 VLM 两个模型组件构成了系统1+系统2 的架构，两个 encoder 输入 world decoder 处理形成地图/路径/信号，加上蒸馏方法，结合外部运行的loop形成飞轮

"Demonstrably Safe AI For Autonomous Driving"

waymo.com

Safety & Alignment

剪藏 2025年12月10日

智谱推出 AI 输入法，目前仅电脑端，背后是云端模型，但竟然要靠积分，感觉商业模式不行，态度还是试水；开源的是轻量版 1.5B

"GLM-ASR开源：用嘴干活，智谱AI输入法正式上线"

mp.weixin.qq.com

Open SourceSpeech & Audio

剪藏 2025年12月10日

AlphaEvolve 非公开上线 Goggle Cloud，主要场景是算法效率优化

"AlphaEvolve on Google Cloud | Google Cloud Blog"

cloud.google.com

AI Industry

剪藏 2025年12月10日

一年几度的报告季来了，Menlo 这类 VC 比较喜欢拼 Market Map，这次多了一个 Departmental AI，看他们的意思主要是和 ChatGPT Enterprise、Claude、Agentforce、Glean 这些 Horizontal AI 区分

"2025: The State of Generative AI in the Enterprise | Menlo Ventures"

menlovc.com

AI Industry

剪藏 2025年12月10日

Claude Agent SDK 客户案例

"How Parcha built a universalcustomer diligence agent in twoweeks with Claude Agent SDK"

claude.com

Agents

剪藏 2025年12月10日

Anthropic 把 MCP 捐赠给了 Agentic AI Foundation，还有 OpenAI 的 AGENTS.md 和 Block 的 Goose

"Donating the Model Context Protocol and establishing the Agentic AI Foundation \ Anthropic"

anthropic.com

Agents

剪藏 2025年12月10日

指定参数 - 将危险数据引导至指定参数 - 剪掉

"Beyond Data Filtering: Knowledge Localization for Capability Removal in LLMs"

alignment.anthropic.com

Safety & Alignment

剪藏 2025年12月10日

Glean 迈过了 2 亿 ARR

"Arvind Jain on X: "I’m proud to share a big milestone for @Glean: we’ve surpassed $200M in ARR, doubling in just nine months. This puts Glean among the fastest-growing pure-play enterprise software companies of the decade. It’s a testament to our customers, partners, and employees across the globe https://t.co/M40BS5xYu9" / X"

x.com

AI Industry

剪藏 2025年12月10日

OpenAI 聘了 Slack CEO Denise Dresser 为首席营收官，主要是推进商业化发展

"OpenAI appoints Denise Dresser as Chief Revenue Officer | OpenAI"

openai.com

AI Industry

剪藏 2025年12月9日

对可解释性的质疑，主要是概念不清晰，当然立刻有人反驳，想起 Jim Fan 的话“非共识时是入场的最好时机”

"The Reification Fallacy: Interpretability Studies Imaginary Entities"

surajsrinivas.substack.com

Interpretability

剪藏 2025年12月9日

内燃机效率提升与人均持马数的案例，类比 AI 发展

"Horses"

andyljones.com

AI Industry

剪藏 2025年12月9日

豆包手机发布一周，智谱开源 AutoGLM

"AutoGLM开源：每台手机，都可以成为AI手机"

mp.weixin.qq.com

Open SourceAgents

剪藏 2025年12月9日

继上次 ChatGPT 个人使用报告后，OpenAI 此次分析了其超百万企业客户的使用情况，没有之前那么详尽，更多是吸引 toB 客户： - 用量最大的是专业服务、金融、科技、制造，增长最快的是科技、健康、制造； - 使用最多的和平均用户之间的 gap 还在增大； - 用的越多，节省的时间越多；

"The state of enterprise AI | OpenAI"

openai.com

AI Industry

剪藏 2025年12月8日

HuggingFace Skills

"We Got Claude to Fine-Tune an Open Source LLM"

huggingface.co

Open SourceLLMs

剪藏 2025年12月7日

用 Coding Agents 来做自然/产业现象模拟

"Dean W. Ball on X: "I increasingly use coding agents to create simulations of various natural or industrial phenomena to educate myself about them, and the line between this and “video game” is blurry." / X"

x.com

Agents

剪藏 2025年12月7日

Opus 4.5 的对齐实践： - 对齐在模型训练全流程的参与 - 将 soul doc 训练内化，而非仅作为信号 - 性格训练师 Amanda Askell 后面会发一篇文章详细讲

"Sam Bowman on X: "From everything we know so far, Opus 4.5 seems to be the best-aligned model out there in a bunch of ways. I follow the training process closely as part of my work on alignment evaluations. Here's my guess about the two things that are most responsible for making 4.5 special. 🧵" / X"

x.com

Safety & Alignment

剪藏 2025年12月7日

自进化是 NerulPS 2025 的一大主题

"Better Ways to Build Self-Improving AI Agents – Yohei Nakajima"

yoheinakajima.com

Agents

剪藏 2025年12月7日

检测 AI 说谎这件事太难了

"Difficulties with Evaluating a Deception Detector for AIs"

arxiv.org

Safety & Alignment

剪藏 2025年12月6日

字节海外还有个 BytePlus，AI 出海

"BytePlus Unveils Seedream 4.5: Precision-Focused Upgrade Delivering Sharper Visuals, Smarter Control, and 4K Creative Consistency"

byteplus.com

Visual Generation

剪藏 2025年12月5日

OpenRouter 基于 100 万亿+ token 的统计报告：编程是核心场景，开源模型中大部分被用于角色扮演

"State of AI | OpenRouter"

openrouter.ai

AI Industry

剪藏 2025年12月5日

Harvey 融了 a16z 领投的 1.6 亿美元 F 轮，估值 80 亿

"Andreessen Horowitz Leads $160M Investment in Harvey"

harvey.ai

AI Industry

剪藏 2025年12月5日

Anthropic AI Interviewer，让 Claude 带着研究目标去设计和执行采访，然后分析结果，已经访谈并分析了 1250 位专家

"Introducing Anthropic Interviewer \ Anthropic"

anthropic.com

Agents

剪藏 2025年12月4日

与 CoT 监督相似，模型自供认也是发现不当行为的手段

"How confessions can keep language models honest | OpenAI"

openai.com

Safety & Alignment

剪藏 2025年12月4日

OpenAI 资助的 People-First AI Fund 非盈利组织项目

"Announcing the initial People-First AI Fund grantees | OpenAI"

openai.com

AI Industry

剪藏 2025年12月4日

探讨 MechInterp 各方法在表征分布上是否有偏

"ADDRESSING DIVERGENT REPRESENTATIONS FROM CAUSAL INTERVENTIONS ON NEURAL NETWORKS"

arxiv.org

Interpretability

剪藏 2025年12月4日

慢了 PixVerse 半步？

"Day3｜可灵 2.6 全量上线！听见画面，看见声音"

mp.weixin.qq.com

Visual Generation

剪藏 2025年12月3日

OPPO 的 AI Agent 团队针对 Deep Research 类 Agent 写研究报告的场景，推出了包含 100 道题的 FINDER 评测，并分析了失败的原因，核心不在于理解任务，而是在于信源筛选、验证和推理规划上，符合使用感受

"How Far Are We from Genuinely Useful Deep Research Agents?"

arxiv.org

Benchmarks & EvalAgents

剪藏 2025年12月3日

元宝竟成新中产偏爱

"QuestMobile2025新中产人群洞察报告：2.78亿新中产消费能力、消费意愿齐升，三大动能推动高质量发展"

mp.weixin.qq.com

AI Industry

剪藏 2025年12月3日

内部员工访谈 + Claude Code 使用数据分析，问卷的对照分析和结论都很有趣，降本不如增产

"How AI Is Transforming Work at Anthropic \ Anthropic"

anthropic.com

AI Industry

剪藏 2025年12月3日

Waymo 开始送餐了

"Waymo delivery is now live on DoorDash in Metro Phoenix"

waymo.com

Robotics & Embodied

剪藏 2025年12月3日

Kyutai 团队分拆？以 Gradium 再次起航，可能计划谋求商业化

"Gradium: Solving voice"

gradium.ai

Speech & Audio

剪藏 2025年12月3日

Nova 2 系列，多模态输入+百万上下文，Lite 高效，Pro 支持语音理解、智力进步显著，Omni 能生图，Sonic 端到端语音，还有多模态 Embeddings

"Amazon Nova - Generative Foundation Model - AWS"

aws.amazon.com

LLMsMultimodalSpeech & Audio

剪藏 2025年12月2日

视频大周

"PixVerse（拍我AI）V5.5发布：国内首款分镜+音频一键生成AI视频大模型 | 量子位"

qbitai.com

Visual Generation

剪藏 2025年12月2日

近期美国开源模型呼声日益高涨，Arcee 高呼这一口号加入 Ai2 的队伍，推出 Trinity 家族，6BA1B 的 Nano 和 26BA3B 的推理 Mini 已经推出，Large 还在 2048 块 B300 上训着，预计 2026 年 1 月出炉

"Arcee AI | Arcee Debuts Trinity Mini, Expanding Its U.S.-Built Model Line"

arcee.ai

LLMsOpen Source

剪藏 2025年12月2日

OpenAI 新开（or重启？）了 alignment 子域名，计划频繁更新 AI 对齐相关工作。这篇是基于 SAE 对比精调前后模型表现，用以发现精调引入的错误对齐

"Debugging misaligned completions with sparse-autoencoder latent attribution"

alignment.openai.com

Safety & AlignmentInterpretability

剪藏 2025年12月2日

视频编辑，主要是人脸表情

"react-1: ai emotion editing for video, edit performances without reshoots"

sync.so

Visual Generation

剪藏 2025年12月2日

又是一个号称 SoTA 的 GUI Agent，榜单挑的是 Online-Minde2Web

"Introducing Lux, the World's Best Foundation Computer-Use Model"

agiopen.org

Agents

剪藏 2025年12月2日

Anthropic 的红队安全测试，发现AI Agent在模拟区块链上挖出了价值数百万的合约漏洞

"AI agents find $4.6M in blockchain smart contract exploits"

red.anthropic.com

Safety & AlignmentAgents

剪藏 2025年12月1日

开启AI视频的大周，感觉能力比较接近Runway-Act？视频编辑趋势显著

"Day1｜可灵AI视频 O1 模型正式上线！"

mp.weixin.qq.com

Visual Generation

剪藏 2025年12月1日

上周刚发了 Flux.2，这周便官宣新融资。黑森林融了 Salesforce 等领投的 3 亿美元B轮，估值来到32.5亿

"Laying the Foundations for Visual Intelligence—Our $300M Series B | Black Forest Labs"

bfl.ai

AI Industry

剪藏 2025年12月1日

非常接近 Google 之前预览的 Project Astra 了，常驻的数字AI助理，描绘了豆包更大的图景，跟手机厂商合作、同时做耳机等周边硬件，也是一种更务实能落地的策略。是大的入口，手机厂商也会做，领跑优势、技术、产品、增长，有待观望。

"豆包手机助手发布技术预览版"

mp.weixin.qq.com

AI Industry

2025年11月

剪藏 2025年11月30日

GUI Agent 正在快速成熟

"阶跃开源4B Agent模型，跑通所有安卓设备，手搓党一键部署 | 量子位"

qbitai.com

AgentsOpen Source

剪藏 2025年11月27日

基于 GLM-4.5-Air 精调

"INTELLECT-3: A 100B+ MoE trained with large-scale RL"

primeintellect.ai

LLMs

剪藏 2025年11月26日

用 AI 为具体的任务预估人为用时，预估效果较为正相关，还能折算成本和节省，直接计算经济效益

"Estimating AI productivity gains \ Anthropic"

anthropic.com

AI Industry

剪藏 2025年11月26日

黑森林出手，仅次于 Nano Banana Pro

"FLUX.2: Frontier Visual Intelligence | Black Forest Labs"

bfl.ai

Visual Generation

剪藏 2025年11月26日

结果 Suno 也和 WMG 合作了

"A new chapter in music creation – Suno"

suno.com

Speech & Audio

剪藏 2025年11月25日

xAI 员工创办的 infra 层公司

"Introducing Nuraline: Adaptation as Infrastructure"

nuraline.ai

Infra & Compute

剪藏 2025年11月25日

Coding 最强，vending-bench 与 Gemini3Pro 接近，发现并绕过了 t2-bench 的漏洞； token 效率大大提升，价格从 15/75 降至 5/25；做了上下文压缩方面的优化，Claude 应用中也上线了，可以“无限”畅聊；同时 Claude Code 上线 Claude Desktop； API 中模型名为 claude-opus-4-5-20251101 所以是月初就开始测试了？看来这几家上个月都在藏，攒着感恩节一起发

"Introducing Claude Opus 4.5 \ Anthropic"

anthropic.com

LLMs

剪藏 2025年11月25日

基于 GPT-5-mini 强化精调而来，内部评测选品准确率 64%，超过 GPT-5-Thinking 的 56%；体验还不错，会通过可以跳过的追问 UI 让用户补充倾向，比 TB/JD 强太多了

"Introducing shopping research in ChatGPT | OpenAI"

openai.com

LLMsAI Industry

剪藏 2025年11月23日

Reward hacking 研究，获 Ilya 肯定

"From shortcuts to sabotage: natural emergent misalignment from reward hacking \ Anthropic"

anthropic.com

Safety & Alignment

剪藏 2025年11月22日

如 MovieGen 一样并未放出模型和使用方式

"Research Update: WorldGen — Text to Immersive 3D Worlds | Meta Quest Blog | Meta Store"

meta.com

World ModelsVisual Generation

剪藏 2025年11月21日

HunyuanVideo 1.5，8.3B 的 DiT，仍为 5-10 秒、480/720p，创新的 SSTA 选择性滑动分块稀疏注意力

"腾讯混元发布全新视频生成模型，「元宝」率先上线尝鲜"

mp.weixin.qq.com

Visual Generation

剪藏 2025年11月21日

Genspark 融了 2.75 亿的 B 轮，估值来到 12.5 亿美元，跻身独角兽，ARR 5 千万、团队规模40+、首月付费留存 90%，漂亮的数据，就是产品有些…一言难尽

"Launching Genspark AI Workspace and Announcing $275M Series B Funding"

mainfunc.ai

AI Industry

剪藏 2025年11月20日

4K、世界知识（各种示意图、PPT）、精准一致编辑、多语言文字渲染，同时在 Gemini 应用中上了基于 SynthID 的 AI 检测

"Nano Banana Pro: Gemini 3 Pro Image model from Google DeepMind"

blog.google

Visual GenerationMultimodal

剪藏 2025年11月20日

Udio、Stability 均与华纳联手（屈于后者淫威？）AI 版权领域且看是否会形成新局面

"Udio with Warner Music Group"

udio.com

AI IndustrySpeech & Audio

剪藏 2025年11月20日

美团推出的高中数学竞赛测试，目前 Kimi-K2-Thinking 以 56% 力压 GPT-5-Thinking-high 的 52.4% 位居第一

"AMO-Bench: Large Language Models Still Struggle in High School Math Competitions"

amo-bench.github.io

Benchmarks & EvalLLMs

剪藏 2025年11月20日

S2ST，已经上了 Google Meet，Meta、字节 Seed、阿里千问也推出过，进展渐密

"Real-time speech-to-speech translation"

research.google

Speech & Audio

剪藏 2025年11月20日

把 Grok 4 Fast 的配方在 Grok 4.1 上再训练一遍，tau2bench 新高，同时配套了 Agent 工具 API

"Grok 4.1 Fast and Agent Tools API | xAI"

x.ai

AgentsLLMs

剪藏 2025年11月20日

Suno 以 24.5 亿美元估值融了 Menlo 领投的 2.5 亿的 C 轮，用户近 1 亿

"The Future of Music is Already Here – Suno"

suno.com

AI Industry

剪藏 2025年11月20日

Luma 融了 Humain 领投的 9 亿美元 C 轮，与 Humain 合作建设 2GW 的超算中心 Halo，团队 130+

"AGI is multimodal and reality is the dataset of AGI | Luma AI"

lumalabs.ai

AI IndustryInfra & Compute

剪藏 2025年11月20日

SAM 3 和 3D

"Introducing Meta Segment Anything Model 3 and Segment Anything Playground"

ai.meta.com

Visual GenerationOpen Source

剪藏 2025年11月20日

继 Google 昨天将 Gemini 免费使用给在校生后，OpenAI 也推出针对认证教师的的免费版和专用功能

"A free version of ChatGPT built for teachers | OpenAI"

openai.com

AI IndustryLLMs

剪藏 2025年11月20日

命名奇葩，但终于在 SWE-Bench Verified 上赶上 Sonnet-4.5 了，原生针对 compaction 精调、适配 Windows 环境，长程工作精进、在 METR 取得了新 SoTA

"Building more with GPT-5.1-Codex-Max | OpenAI"

openai.com

LLMs

剪藏 2025年11月20日

alphaXiv 融了 Menlo 领投的 7 百万美元的种子轮，原以为只是个迭代很快的校园产品，没想到在寻求更大发展，与之相对的可能就是康奈尔的 arxiv 了

"alphaXiv on X: "We just raised a $7M Seed round co-led by @MenloVentures and @haystackvc with participation from @Shakti_VC, @conviction and @upfrontvc 🚀 We're honored to have the support of incredible angels including @ericschmidt, @SebastianThrun, @sarahookr Join us: https://t.co/IKwK8KsG96 https://t.co/tzOpr7TcAX" / X"

x.com

AI Industry

剪藏 2025年11月19日

Midjourney 的社交初尝试？

"Midjourney on X: "We're launching user profiles today! Customize your own page with usernames, social links, banners, and more. Follow your friends and spotlight your favorite images. Everyone who fills out a full profile with >8 spotlights in the next 24 hours get 5 free fast hours so gogogo <3 https://t.co/dxEMeYTgv5" / X"

x.com

Visual GenerationAI Industry

剪藏 2025年11月19日

继去年底写进趋势后，GenUI 终于迎来大玩家入场，看看是否会对软件形态带来缓慢的大变革

"Generative UI: A rich, custom, visual interactive user experience for any prompt"

research.google

AI Industry

剪藏 2025年11月19日

屠榜的 Gemini 3 Pro Preview，百万窗口、64K 文本输出； Pro 以上订阅用户可在 AI Mode 使用，帮你规划帮你学习； Ultra 订阅独享更进一步的 Gemini 3 Deep Think 和通用智能体 Gemini Agent；疑似改自 Windsurf 的又一款 VSCode fork AI IDE：Google Antigravity；

"Gemini 3: Introducing the latest Gemini AI model from Google"

blog.google

LLMsAgents

剪藏 2025年11月19日

AI 联姻已成大网，英伟达和微软分别向 Anthropic 投 100 亿和 50 亿美元，而 Anthropic 承诺从 Azure 购买 300 亿的算力 + 追加 1 GW 的算力订购，将 Claude 系列模型带入微软家族

"Microsoft, NVIDIA and Anthropic announced new strategic partnerships. \ Anthropic"

anthropic.com

AI Industry

剪藏 2025年11月18日

“We do not battle for scope,” Simo says. “We battle for less scope.”

"OpenAI's Fidji Simo Plans to Make ChatGPT Way More Useful—and Have You Pay For It | WIRED"

wired.com

AI Industry

剪藏 2025年11月17日

Cloudflare 收购 Replicate，Workers 可用的模型大幅增多，AI 蓝图也愈发宏伟

"Replicate is joining Cloudflare – Replicate blog"

replicate.com

Infra & ComputeAI Industry

剪藏 2025年11月17日

竞技场新高、降幻觉、创意写作提升（仍低于 GPT-5.1）、图文混合回答

"Grok 4.1 | xAI"

x.ai

LLMsMultimodal

剪藏 2025年11月17日

Sakana 融了 1.35 美元的 B 轮，估值来到 26 亿

"Announcing Our Series B"

sakana.ai

AI Industry

剪藏 2025年11月14日

稀疏化 + 剪枝至最小任务可行探索可解释性

"Understanding neural networks through sparse circuits | OpenAI"

openai.com

Interpretability

剪藏 2025年11月14日

在游戏环境中，用 Gemini 下任务、定奖励，SIMA 把经验记下来训练，无需人提供样本就能训练 Agent，迁移效果不错，还能与 Genie 3 联动

"SIMA 2: A Gemini-Powered AI Agent for 3D Virtual Worlds - Google DeepMind"

deepmind.google

AgentsWorld Models

剪藏 2025年11月14日

GPT-5.1 的 Coding benchmark，同时 API 上线 GPT-5.1-Codex

"Introducing GPT-5.1 for developers | OpenAI"

openai.com

LLMs

剪藏 2025年11月13日

Cursor 以 293 亿美元估值融了 23 亿的 D 轮，Accel 领投，老伙伴 Thrive、a16z、DST，新伙伴 Coatue、NVIDIA、Google ARR 已超过 10 亿；团队规模 300+ 人

"Past, Present, and Future · Cursor"

cursor.com

AI Industry

剪藏 2025年11月13日

伴随 GPT-5.1 发布，OpenAI 产品 CEO Fidji Simo 解释 AI 个性化的必然性

"Moving beyond one-size-fits-all - Fidji Simo"

fidjisimo.substack.com

LLMs

剪藏 2025年11月13日

- 为什么 vibe coding 的网页都是紫色？ - 随机变量的收敛

"Improving frontend design through Skills | Claude"

claude.com

AI Industry

剪藏 2025年11月13日

Anthropic 内部一场有趣的一日实验，控制机器狗，但一队用 Claude，另一队不能用 Claude（Claude-less，太惨了）。不太严谨的对比分析，但 Team Claude 显著用时更短、更接近完成，虽然在两个子任务上有相反的结果。竟然还通过队内录音，分析对比了两队情绪变化，自然是 Team Claude 更开心。

"Project Fetch: Can Claude train a robot dog? \ Anthropic"

anthropic.com

Robotics & EmbodiedAgents

剪藏 2025年11月13日

GDM 发了篇 Nature，讲如何对齐 AI 视觉与语义

"Teaching AI to See the World More Like Humans Do - Google DeepMind"

deepmind.google

Interpretability

剪藏 2025年11月13日

Marble GA，响应李飞飞的愿景长文，官网同步焕新，创意、体验、模拟、学习等案例初露商业化苗头

"Marble: A Multimodal World Model | World Labs"

worldlabs.ai

World ModelsMultimodal

剪藏 2025年11月13日

响应 AI 行动计划，Anthropic 投资 500 亿美元与 Fluidstack 建设数据中心。同时透露其企业客户已超 30 万，其中 10 万美元以上的客户在过去一年增长了近 7 倍

"Anthropic invests $50 billion in American AI infrastructure \ Anthropic"

anthropic.com

Infra & ComputeAI Industry

剪藏 2025年11月13日

特斯拉 AI 负责人 Ashok 在 ICCV 上的分享，讲端到端视觉路线的选择和三个挑战： 1. 维度诅咒，20亿token输入、2token输出，如何有效学习？多亏了数据积累和数据工程！ 2. 可解释性与安全保障，通过中间推理过程（如可泛化的生成式高斯溅射）来解决 3. 评估，通过世界模拟器来解决，甚至泛化到了 Optimus

"Ashok Elluswamy on X: "Tesla's approach to Autonomy" / X"

x.com

World ModelsRobotics & Embodied

剪藏 2025年11月13日

风格调教优化 + adaptive thinking 增强 + 更多个性化

"GPT-5.1: A smarter, more conversational ChatGPT | OpenAI"

openai.com

LLMs

剪藏 2025年11月12日

非常好的关于 Agentic Coding 的思考和推演，模型与 harness 的螺旋进展、三类用户

"Here's What's Next in Agentic Coding - Seconds_0 Substack"

seconds0.substack.com

Agents

剪藏 2025年11月12日

agentic 能力分级

"RL Environments and the Hierarchy of Agentic Capabilities"

surgehq.ai

Agents

剪藏 2025年11月11日

FAIR 新作，把 ASR 的语种覆盖推向了新高度

"Omnilingual ASR: Advancing Automatic Speech Recognition for 1,600+ Languages"

ai.meta.com

Speech & Audio

剪藏 2025年11月11日

喜欢引用的爱因斯坦这句 Creativity is intelligence having fun，可惜是用于推广产品的

"From Words to Worlds: Spatial Intelligence is AI’s Next Frontier"

drfeifei.substack.com

AI Industry

剪藏 2025年11月11日

“9-9-6 is irrelevant. People just love their work.”

"Inside Cursor - Colossus"

joincolossus.com

AI Industry

剪藏 2025年11月10日

Gamma 融了 a16z 领投的 B轮 6800万，估值21亿美元，团队仅50人，ARR 1亿，用户7千万，每月新增3千万gamma

"Grant Lee on X: "Today, as shared by The New York Times, we’re announcing two things: >Our Series B at a $2.1B valuation led by @sarahdingwang at @a16z. >Reaching $100M ARR, profitably, with a team of just 50 people. That's $2M ARR per employee. PowerPoint was invented before the first website, https://t.co/4SApKYltiC" / X"

x.com

AI Industry

剪藏 2025年11月10日

kimi infra工程师讲解k2-thinking的原生int4量化考虑，一个重要发现是在 thinking 模型上，随着 token 长度增加，PTQ量化误差会被放大导致失真，所以用QAT。 INT4 QAT对RL也有好处，长尾rollout效率显著提升。不用MXFP4/NVFP4等，是为了更好支持非Blackwell架构的硬件，且int4就够用了。 W4A16：权重4bit、激活16bit

"Kimi K2 Thinking模型发布并开源，该模型哪些信息值得关注？ - 知乎"

zhihu.com

LLMsOpen Source

剪藏 2025年11月8日

GoodFire 通过 loss curvature（误差曲率）研究大模型是如何记住东西的：通过K-FAC获取的曲率面信息、解构权重矩阵、然后类似PCA看主要成分。结论还一定程度分析了记忆、数学、逻辑推理等的敏感度

"Understanding Memorization via Loss Curvature"

goodfire.ai

Interpretability

剪藏 2025年11月7日

collective intelligence

"Expanding our mission: to grow the world’s collective intelligence - The Quora Blog - Quora"

quorablog.quora.com

AI Industry

剪藏 2025年11月7日

增强了浏览代理能力

"The New Comet Assistant"

perplexity.ai

Agents

剪藏 2025年11月7日

除了 Coding 外基本 SoTA，200-300轮长程任务，256k窗口，原生int4 qat

"Kimi K2 Thinking"

moonshotai.github.io

LLMs

剪藏 2025年11月6日

2B 客户过百万

"1 million business customers: the fastest-growing business platform in history | OpenAI"

openai.com

AI Industry

剪藏 2025年11月6日

在应用层，OpenAI 也持续领先

"OpenAI on X: "You can now interrupt long-running queries and add new context without restarting or losing progress. This is especially useful for refining deep research or GPT-5 Pro queries as the model will adjust its response with your new requirements. Just hit update in the sidebar and https://t.co/kESrkU9hc9" / X"

x.com

AI Industry

剪藏 2025年11月5日

一些推演

"Thoughts by a non-economist on AI and economics – Windows On Theory"

windowsontheory.org

AI Industry

剪藏 2025年11月5日

TPU 上天了，主要是发射成本高

"Exploring a space-based, scalable AI infrastructure system design"

research.google

Infra & Compute

剪藏 2025年11月5日

Cursor 向左，Cognition 向右

"Cognition | Windsurf Codemaps: Understand Code, Before You Vibe It"

cognition.ai

AI Industry

剪藏 2025年11月5日

与之前 Cloudflare 的博客理念相同，借助代码生成来更高效地完成 LLM 对接工具、资源的 MCP 工作

"Code execution with MCP: building more efficient AI agents \ Anthropic"

anthropic.com

Agents

剪藏 2025年11月5日

哈佛大学人类进化生物学学者对大模型做了“心理学”研究，分析发现 GPT 回复主要对应的是 WEIRD（Western, Educated, Industrialized,Rich, and Democratic）群体的特征。相关：之前有一篇关于大模型对不同人种生命排序的价值观研究

"https://scholar.harvard.edu/sites/scholar.harvard.edu/files/henrich/files/which_humans_09222023.pdf"

scholar.harvard.edu

MultimodalInterpretability

剪藏 2025年11月5日

Sora2的一些幕后： - 上线时团队不足50人 - 早期测试过放在ChatGPT内的媒体流 → meme chains → remix → cameo（key breakthrough）让生成更个性化和有人味，用户就不只是消费了 - 70%用户创作 - 名人效应 - Bill：2028年视频模型在世界模拟上取得突破 - 推荐系统为创意优化 - 未来模型的优化方向：不只是娱乐，可以实用，比如科学模拟、涡流建模 - 商业化：订阅/广告都有可能

"Inside OpenAI's Sora: Surge to #1 App, Key Product Decisions & How Video Models Learn Physics - YouTube"

youtube.com

Visual GenerationWorld ModelsAI Industry

2025年10月

剪藏 2025年10月31日

MinixMax 开启发布周，开源的 M2 为 230BA10B，稀疏度大于 DeepSeek 小于 Qwen3-Next，放弃了 M1 的线性注意力机制

"MiniMax M2 & Agent，大巧若拙"

mp.weixin.qq.com

AgentsOpen Source

剪藏 2025年10月30日

与 Cursor 同一天，Cognition 也发了自己的新模型 SWE-1.5，AI Coding 越来越热闹，Agentic Coding 越来越主流

"Cognition | Introducing SWE-1.5: Our Fast Agent Model"

cognition.ai

AgentsLLMs

剪藏 2025年10月30日

之前写 AGI 定义的团队与 Scale AI 一起做了一个劳动力指数，用 freelance 工作测试当下的主流 Agent，Manus 最高，仅 2.5% 好像 OpenAI 之前也有一个类似的 benchmark

"Remote Labor Index: Measuring AI Automation of Remote Work"

remotelabor.ai

Benchmarks & EvalAgents

剪藏 2025年10月30日

自己训练了模型 Composer 以及 UI 焕新支持多 Agent 并行

"Introducing Cursor 2.0 and Composer · Cursor"

cursor.com

AgentsLLMs

剪藏 2025年10月29日

ICL、RAG、Full FT、LoRA、Cartridges、Memory Layers

"The Continual Learning Problem"

jessylin.com

LLMs

剪藏 2025年10月29日

Cartesia TTS 新旗舰 Sonic-3

"Real-time TTS API with AI laughter and emotion | Cartesia Sonic-3"

cartesia.ai

Speech & Audio

剪藏 2025年10月29日

1X NEO 开启预定了，早鸟2万美元或者499/月，但2026年美区才发货，其他地区得2027年

"1X NEO Home Robot | Order Today"

1x.tech

Robotics & Embodied

剪藏 2025年10月28日

Model Spec 同步更新

"Strengthening ChatGPT’s responses in sensitive conversations | OpenAI"

openai.com

Safety & AlignmentLLMs

剪藏 2025年10月28日

Mercor 融了 3.25 亿刀的 C 轮，估值来到 100 亿

"Unlocking Human Potential in the AI Economy | Mercor Blog"

mercor.com

AI Industry

剪藏 2025年10月26日

美团的开源视频模型

"LongCat-Video - A Unified Foundational Video Generation Model"

meituan-longcat.github.io

Visual GenerationOpen Source

剪藏 2025年10月25日

Anthropic 和 Thinking Machine 合作，结合 Model Spec 价值要求对多 LLM 压力测试

"Stress-testing model specs reveals character differences among language models"

alignment.anthropic.com

Benchmarks & EvalLLMs

剪藏 2025年10月24日

Sky，之前 demo 是要做一个 macOS 上的新 AI 交互界面

"OpenAI acquires Software Applications Incorporated, maker of Sky | OpenAI"

openai.com

AI Industry

剪藏 2025年10月24日

ChatGPT 通过针对 Slack、Linear 等工作“环境”精调，实现企业场景的 agentic 知识工作

"Work smarter with your company knowledge in ChatGPT | OpenAI"

openai.com

AI IndustryAgents

剪藏 2025年10月23日

ChatGPT 开始基于 Project 连接人

"OpenAI on X: "Shared Projects are expanding to Free, Plus, and Pro users. Invite others to work together in ChatGPT using shared chats, files, and instructions all in one place. https://t.co/AqeaPGggqj" / X"

x.com

AI IndustryLLMs

剪藏 2025年10月23日

难得。结合可灵的商业跑通，AI创作市场还是有PMF

"1.3亿美元！LiblibAI拿下国内AI应用赛道年度最大融资 | 量子位"

qbitai.com

AI Industry

剪藏 2025年10月23日

ChatGPT 的留存数据

"Deedy on X: "ChatGPT's product retention curves is a product manager's wet dream. Their 1 month retention has skyrocketed from <60% 2yrs ago to an unprecedented ~90%! Youtube was best-in-class with ~85%. 6mo retention is trending to ~80%. Rapidly rising smile curve. Generational product. https://t.co/qlLQrw0HkA" / X"

x.com

AI Industry

剪藏 2025年10月22日

红杉与 Sesame 合作，共同领投其 B 轮融资，打造语音智能伙伴，还在设计时尚 AI 眼镜

"Partnering with Sesame: A New Era for Voice | Sequoia Capital"

sequoiacap.com

Speech & AudioAI Industry

剪藏 2025年10月22日

kyutai 技术博客，解释声音模型难题与方案

"Neural audio codecs: how to get audio into LLMs"

kyutai.org

Speech & AudioLLMs

剪藏 2025年10月22日

LangChain 凭借B轮融后12.5亿美金估值跻身独角兽

"LangChain raises $125M to build the platform for agent engineering"

blog.langchain.com

AgentsAI Industry

剪藏 2025年10月21日

与 DeepSeek-OCR 前后脚，智谱视觉压缩成果。 Q：Glyph 和 DeepSeek-OCR有何异同？ A：共同点：两者都从“视觉压缩”出发，利用视觉 token 承载更多的文本信息；不同点：DeepSeek-OCR 聚焦于真实文档 OCR 任务，验证的是视觉压缩下的文字还原能力；而 Glyph 则将这一思想应用到了更广泛的通用长文本任务中，真正验证了利用视觉模型实现上下文扩展的可行性。

"Glyph：通过视觉-文本压缩扩展上下文窗口"

mp.weixin.qq.com

LLMsMultimodal

剪藏 2025年10月21日

Krea 技术论文

"Krea Realtime 14B: Real-Time, Long-Form AI Video Generation"

krea.ai

Visual Generation

剪藏 2025年10月21日

配套上云，开源 Claude Code 安全沙盒，对文件和网络做隔离

"Making Claude Code more secure and autonomous with sandboxing \ Anthropic"

anthropic.com

AI IndustrySafety & Alignment

剪藏 2025年10月21日

Claude Code 上云了，AI Coding 的收敛方向是全栈和通吃

"Claude Code on the web \ Anthropic"

anthropic.com

AI Industry

剪藏 2025年10月21日

Anthropic 通过 Claude 新的 Connectors 和 Skills 辅助生命科学

"Claude for Life Sciences \ Anthropic"

anthropic.com

AI Industry

剪藏 2025年10月19日

Uber 新业务：司机闲时做数据标注

"Uber Giving Some US Drivers Option to Earn Money From Tasks Like Uploading Menus - Bloomberg"

bloomberg.com

AI Industry

剪藏 2025年10月17日

之前发的 Marble 太费算力，改用 DiT 训练一个单卡可跑的 RTFM，带位置&朝向的帧生成

"RTFM: A Real-Time Frame Model"

worldlabs.ai

Visual Generation

剪藏 2025年10月17日

Google DeepMind 与 Commonwealth Fusion Systems（CFS）合作，借助 AI 推动可控核聚变，模拟等离子体、最大化能量、AI 控制系统等

"Bringing AI to the next generation of fusion energy - Google DeepMind"

deepmind.google

AI Industry

剪藏 2025年10月17日

Claude Skills 的实现方案，在给 Claude 配的虚拟机用文件夹和文本文件来描述技能，感觉和 Claude Code 越走越像了，优雅且附最佳实践

"Equipping agents for the real world with Agent Skills \ Anthropic"

anthropic.com

Agents

剪藏 2025年10月17日

OpenAI 的 AI 工作蓝图，串联起来了 AI 可及、培训、求职等一系列

"AI at Work: OpenAI’s Workforce Blueprint"

cdn.openai.com

AI Industry

剪藏 2025年10月17日

更快、更全栈、更生产

"Introducing Manus 1.5"

manus.im

AI Industry

剪藏 2025年10月17日

Google 是如何梳理过去 10 年的 AI 基因研究工作的

"Ten years of genomics at Google"

blog.google

LLMs

剪藏 2025年10月17日

HeyGen ARR 破亿，同时给出了名为 Building in the AI Era: The HeyGen Way 的手册

"Joshua Xu on X: "HeyGen just hit $100M ARR this month, 29 months after we first reached $1M in April 2023. None of this happens without our incredible team, customers, partners, and community. Thank you 💜 When we shared our first $1M milestone, it was to give back to the build-in-public" / X"

x.com

AI Industry

剪藏 2025年10月16日

语音交互、可以读屏、联动Office的 Copilot；可以执行任务的 Copilot Actions 仍活在虚拟机里；还在测试 Windows 原生应用版的 Manus，通过 Windows MCP 等打通本地资源

"Making every Windows 11 PC an AI PC | Windows Experience Blog"

blogs.windows.com

AI Industry

剪藏 2025年10月16日

谢赛宁团队提出 RAE，致力取代 DiT 中老旧的 VAE

"Diffusion Transformers with Representation Autoencoders"

rae-dit.github.io

Visual Generation

剪藏 2025年10月16日

Sora2更新，正面竞争Google今天升级的Veo3.1：时长拓展，免费用户升至15s、Pro用户升至25s；之前的AI创作故事板重新上线给Pro用户，结合cameo能做出更长的角色稳定的视频

"OpenAI on X: "2 Sora 2 updates: - Storyboards are now available on web to Pro users - All users can now generate videos up to 15 seconds on app and web, Pro users up to 25 seconds on web https://t.co/iINg7alWGL" / X"

x.com

Visual Generation

剪藏 2025年10月16日

Poolside 宣布与 Coreweave 合作的 2GW 德州 AI 算力计划，哪来的钱啊？

"poolside — Announcing Project Horizon: Why we're building a 2 gigawatt AI campus in Texas"

poolside.ai

Infra & Compute

剪藏 2025年10月16日

Meta 在德州的 1GW AI 算力计划

"Breaking Ground on Our New AI-Optimized Data Center in El Paso"

about.fb.com

Infra & Compute

剪藏 2025年10月16日

Agent 框架

"We raised $11M to redefine how developers build AI agents • Dedalus Labs ⁘ Dedalus Labs"

dedaluslabs.ai

AI Industry

剪藏 2025年10月16日

Haiku 4.5 编程能力逼近 GPT-5，让 Claude for Chrome 跑得更快；但与 Sonnet 3/15 和 Opus 15/75 的稳定价格不同，Haiku 的定价一直在涨，也许跟 Anthropic 根据“智能”定价的策略有关？ Haiku 3: 0.25/1.25 Haiku 3.5: 0.8/4 Haiku 4.5: 1/5

"Introducing Claude Haiku 4.5 \ Anthropic"

anthropic.com

LLMs

剪藏 2025年10月16日

结构化记忆功能更新，ChatGPT 帮你自动管理

"OpenAI on X: "ChatGPT can now automatically manage your saved memories—no more “memory full.” You can also search and sort memories by recency, and choose which to re-prioritize in settings. Rolling out to Plus and Pro users on the web globally starting today. https://t.co/T1vSNH5289 https://t.co/xRHLFTu2Am" / X"

x.com

LLMs

剪藏 2025年10月16日

主要升级在编辑能力：参考生视频、首尾帧、延展

"Bringing new Veo 3.1 updates into Flow to edit AI video"

blog.google

Visual Generation

剪藏 2025年10月15日

基于已有的AI经济指数研究，针对不同场景推演提出政策建议

"Preparing for AI’s economic impact: exploring policy responses \ Anthropic"

anthropic.com

AI Industry

剪藏 2025年10月14日

OpenAI 应用 CEO Fidji Simo 讲述 ChatGPT Pulse 功能背后的思考，懂你的个性化助理 + 推理模型不为人知的能力 + 可控 + 连接 Apps，最后这点结合 Apps SDK 看，真的是新的操作系统

"A new paradigm of proactive, steerable AI - Fidji Simo"

fidjisimo.substack.com

AI Industry

剪藏 2025年10月14日

n8n 上新，通过对话让 AI 帮你做工作流

"n8n.io on X: "🚀 Introducing AI Workflow Builder (Beta) - Turn prompts into living workflows. Generate nodes, logic, and structure from text, then shape and ship your vision faster. Rolling out this week to n8n Cloud (Trial, Starter, Pro). Update to v.1.116.0 to try it. Learn more: https://t.co/trWkxIVOck" / X"

x.com

AI Industry

剪藏 2025年10月14日

继 8 月的 MAI-Voice-1 和 MAI-1-preview 后，Microsoft AI 推出生图模型 MAI-Image-1，竞技场前10，但目前显著落后于第一梯队，规划上线 Copilot

"Introducing MAI-Image-1, debuting in the top 10 on LMArena | Microsoft AI"

microsoft.ai

Visual Generation

剪藏 2025年10月13日

OpenAI 与博通定制 AI 加速芯片计划官宣，预计2026年下半年开始部署

"OpenAI and Broadcom announce strategic collaboration to deploy 10 gigawatts of OpenAI-designed AI accelerators | OpenAI"

openai.com

Infra & Compute

剪藏 2025年10月11日

Google 月消耗 token 1300 万亿

"Gemini Enterprise: Sundar Pichai remarks at Gemini at Work"

blog.google

AI Industry

剪藏 2025年10月11日

上线11天，周活接近2百万，其中70%都有生成内容

"TBPN on X: "OpenAI's Head of Sora @billpeeb says a stunning 70% of Sora's nearly 2 million weekly active users are creating content. https://t.co/OE9r7nIe3Z" / X"

x.com

AI Industry

剪藏 2025年10月10日

GPT-5-Pro 目前最强

"Evaluating Gemini 2.5 Deep Think's math capabilities | Epoch AI"

epoch.ai

Benchmarks & Eval

剪藏 2025年10月10日

从 Qwen3-30BA3B 转化来的 DLM 扩散语言模型，用了 500B token CPT 增训，benchmark 增益似乎不是很大

"RND1: Simple, Scalable AR-to-Diffusion Conversion · Radical Numerics"

radicalnumerics.ai

LLMs

剪藏 2025年10月10日

为家庭设计：柔性亲肤外表、更轻、体积更小、无线充电视觉升级 for Helix VLA：刷新率2倍、延时1/4、视角广60%、掌心摄像头；为规模化扩张准备好：新供应链、垂直整合，BotQ 设计年产 1.2 万台，目标未来 4 年总产 10 万台

"Introducing Figure 03"

figure.ai

Robotics & Embodied

剪藏 2025年10月10日

语言模型召回上下文实体的机制

"Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context | Yoav Gur-Arieh"

yoav.ml

LLMs

剪藏 2025年10月10日

Jina 加入 Elastic 了，AI + Search/Retrieve

"Elastic and Jina AI join forces to advance open source retrieval for AI applications | Elastic Blog"

elastic.co

AI Industry

剪藏 2025年10月10日

DeepMind 的代码世界模型（code world model，CWM）

"Kevin Patrick Murphy on X: "I am pleased to announce our new paper, which provides an extremely sample-efficient way to create an agent that can perform well in multi-agent, partially-observed, symbolic environments. The key idea is to use LLM-powered code synthesis to learn a code world model (in the form https://t.co/Srt2AEwA0M" / X"

x.com

World Models

剪藏 2025年10月10日

Air Street Capital 年度 State of AI 报告 2025，已经是第8年：https://docs.google.com/presentation/d/1xiLl0VdrlNMAei8pmaX4ojIOfej6lhvZbOIK7Z6C-Go/

"Welcome to State of AI Report 2025"

stateof.ai

AI Industry

剪藏 2025年10月10日

AI 合同

"Spellbook Raises $50m Series B led by Khosla Ventures - Spellbook"

spellbook.legal

AI Industry

剪藏 2025年10月10日

Reflection 已经融了20亿美元，致力于开源

"Building Frontier Open Intelligence Accessible to All | Reflection AI"

reflection.ai

Open Source

剪藏 2025年10月10日

Claude Code 插件，把mcp、hook等打包起来

"Customize Claude Code with plugins \ Anthropic"

anthropic.com

Agents

剪藏 2025年10月9日

OpenAI devday 发布了 agent builder 后，n8n 宣布 Accel 领投的 1.8 亿美元 C 轮融资，累计已融 2.4 亿，估值来到 25 亿，押注 AI 编排和协作，今年以来用户增长 6 倍、营收增长 10 倍，野心是 > n8n becomes the default platform to build with AI

"n8n raises $180m to get AI closer to value with orchestration – n8n Blog"

blog.n8n.io

AI Industry

剪藏 2025年10月9日

Sora 5天百万下载

"Bill Peebles on X: "sora hit 1M app downloads in <5 days, even faster than chatgpt did (despite the invite flow and only targeting north america!)! team working hard to keep up with surging growth. more features and fixes to overmoderation on the way!" / X"

x.com

AI Industry

剪藏 2025年10月8日

继 Comet 后，Dia 全面开放

"Josh Miller on X: "Starting today, the @browsercompany is back to shipping weekly updates. October's Dia releases include: • More powerful memory (of your tabs) • Redesigned Dia Skills • Arc's Focus Mode (CMD-S) All landing in @diabrowser this month. Oh, we removed the waitlist today too 🤗 https://t.co/YKdecFrQ3n" / X"

x.com

AI Industry

剪藏 2025年10月8日

学物理的 Yao Shunyu 在 Anthropic 一年后选择转投 DeepMind，40% 的原因是 Dario 等的对华敌意

"My infant year as an AI researcher — Moving from physics to AI"

alfredyao.github.io

AI Industry

剪藏 2025年10月7日

Sora 2 开放 API，标准版支持 1280x720，Pro 额外支持 1792x1024，每秒价格分别为 0.1、0.3、0.5 刀，意味着 10s 的 Sora 标准版价格为 $1，API 中时长仅支持 4、8、12 秒三种

"Sora 2 Prompting Guide"

cookbook.openai.com

Visual Generation

剪藏 2025年10月7日

打通应用，成为入口

"Introducing apps in ChatGPT and the new Apps SDK | OpenAI"

openai.com

AI Industry

剪藏 2025年10月7日

与一众框架和低代码平台正面碰撞

"Introducing AgentKit | OpenAI"

openai.com

Agents

剪藏 2025年10月4日

网页裸眼3D

"3D Viewer Demo"

lab.true3d.com

Visual Generation

剪藏 2025年10月4日

OpenAI正在基于Sora的首批数据和反馈，考虑进行快速的迭代更新： 1. 给版权方选择，决定其角色可否用于生成（特别点了日本，应该是指动漫） 2. 考虑基于互动数的商业模式，并与版权方分成

"Sora update #1 - Sam Altman"

blog.samaltman.com

Visual GenerationAI Industry

剪藏 2025年10月3日

Perplexity Comet 全网开放

"The Internet is Better on Comet"

perplexity.ai

AI Industry

剪藏 2025年10月3日

生产力指数，感觉是不是可以和 GDPval 放一起

"Introducing APEX: The AI Productivity Index | Mercor Blog"

mercor.com

AI Industry

剪藏 2025年10月3日

AI 在 OpenAI 内部应用

"Building OpenAI with OpenAI | OpenAI"

openai.com

AI Industry

剪藏 2025年10月1日

Cerebras 融了 11 亿刀的 G 轮，估值 81 亿

"Cerebras Raises $1.1 Billion at $8.1 Billion Valuation"

cerebras.ai

AI Industry

剪藏 2025年10月1日

OpenAI 称 Sora 1 是 GPT-1 时刻，而 Sora 2 直接来到了 GPT-3.5 时代。新的 Sora App 和社交属性的 cameos 功能： > We think a social app built around this “cameos” feature is the best way to experience the magic of Sora 2.

"Sora 2 is here | OpenAI"

openai.com

Visual Generation

剪藏 2025年10月1日

借助 AlphaEvolve 研究复杂度理论

"AI as a research partner: Advancing theoretical computer science with AlphaEvolve"

research.google

LLMs

剪藏 2025年10月1日

前 OpenAI 后训练负责人 William Fudus 和前 DeepMind 材料&化学负责人 Ekin Dogus Cubuk 联合创立的 Periodic Labs，致力于打造 AI 科学家、自主研究发现，融了 a16z 领投的 3亿美元

"Periodic Labs"

periodic.com

AI Industry

2025年9月

剪藏 2025年9月30日

正面刚 Claude Sonnet 4.5，不过看榜单可能还略逊一筹，有望超过 Sonnet 4

"智谱旗舰模型GLM-4.6上线，代码能力全面进阶"

mp.weixin.qq.com

LLMs

剪藏 2025年9月30日

OpenAI 上线 Instant Checkout，开源了其背后的 Agentic Commerce Protocol（ACP，智能体交易协议，与 Stripe 合作开发），为 ChatGPT 带来 App 内的一站式购物体验，支持 U.S. Etsy 和 Shopify 等商家，要为成功的交易付手续费（没找到费率）

"Buy it in ChatGPT: Instant Checkout and the Agentic Commerce Protocol | OpenAI"

openai.com

AgentsOpen Source

剪藏 2025年9月30日

LoRA 之优劣：数据量、rank、layer 等因素，验证RL有效，给出了超参经验值

"LoRA Without Regret - Thinking Machines Lab"

thinkingmachines.ai

LLMs

剪藏 2025年9月30日

微软给 Office 加入了 Agent，Word、Excel（PPT开发中）订阅用户可用，其中 Excel 有篇技术博客介绍如何处理表格上下文、生成校验等，在 SpreadsheetBench 准确率 57.2% 距离人的 71.3% 还有差距，但比先前的一众不及 50% 的产品已高出不少

"Vibe working: Introducing Agent Mode and Office Agent in Microsoft 365 Copilot | Microsoft 365 Blog"

microsoft.com

AgentsBenchmarks & Eval

剪藏 2025年9月30日

Modal 以 11 亿美元估值融了 8000 万的 B 轮，成为 AI infra 独角兽，要做包括推理、沙盒、并行批处理、训练、Notebook 顶层产品和全栈底层平台

"Announcing our $87M Series B | Modal Blog"

modal.com

AI IndustryInfra & Compute

剪藏 2025年9月30日

Coding 继续进步，长时自主、上下文感知优化、Memory；Claude Code 升级 2.0，多界面、更易用；Claude Code SDK 升级为 Claude Agent SDK，强调通用

"Introducing Claude Sonnet 4.5 \ Anthropic"

anthropic.com

LLMsAgents

剪藏 2025年9月29日

进一步极致效率，得益于 DSA 稀疏注意力带来的成本降低，价格大减：输入4→2，输出12→3

"DeepSeek-V3.2-Exp 发布，训练推理提效，API 同步降价"

mp.weixin.qq.com

LLMs

剪藏 2025年9月28日

不是 Veo3 技术报告，是 Veo3 提示词/能力探索报告

"Video models are zero-shot learners and reasoners"

arxiv.org

Multimodal

剪藏 2025年9月28日

80BA13B的原生多模态生图

"混元图像3.0正式发布：开源，免费使用"

mp.weixin.qq.com

Visual GenerationOpen Source

剪藏 2025年9月26日

黑森林的模型接入 PS，同时还有 nano banana

"FLUX.1 Kontext now in Adobe Photoshop: Powering Every Pixel | Black Forest Labs"

bfl.ai

Visual Generation

剪藏 2025年9月26日

Gemini 2.5 Flash 和 Flash-Lite 更新，性能提升、时延降低

"Continuing to bring you our latest models, with an improved Gemini 2.5 Flash and Flash-Lite release - Google Developers Blog"

developers.googleblog.com

LLMs

剪藏 2025年9月26日

ChatGPT 主动向你（Pro用户移动应用）发起消息，不保存则默认删除，某种意义上是显式的个性化推荐

"Introducing ChatGPT Pulse | OpenAI"

openai.com

AI Industry

剪藏 2025年9月26日

9个行业44种职业知识类真实任务 one-shot 交付测评，Claude Opus 4.1 最强，47.6%情况不输于行业专家，GPT-5 high 38.8%

"Measuring the performance of our models on real-world tasks | OpenAI"

openai.com

Benchmarks & Eval

剪藏 2025年9月25日

VLM（Gemini Robotics ER 1.5，支持思考、联网搜索等工具） + VLA（Gemini Robotics 1.5）

"Gemini Robotics 1.5 brings AI agents into the physical world - Google DeepMind"

deepmind.google

MultimodalRobotics & Embodied

剪藏 2025年9月25日

Gamma 分享战略与组织，最近分享有点多，但是 ARR 好像自5月以来就没再显著增长了？

"Grant Lee on X: "Gamma crossed $50M ARR with 28 employees and more cash in the bank than we had raised ($23M) In hindsight: We got here because we ignored common VC advice. Examples of glaringly bad advice that you should ignore to save you $10M+ and years of time, like we did for Gamma: https://t.co/GV5zrIQtsD" / X"

x.com

AI Industry

剪藏 2025年9月25日

用 A2D 把自回归 VLM（Qwen2.5-VL）精调改造为扩散 VLM，提高训练推理效率

"Runway Research | Autoregressive-to-Diffusion Vision Language Models"

runwayml.com

Multimodal

剪藏 2025年9月25日

Scale AI 进军具身智能数据

"Expanding Our Data Engine for Physical AI | Scale"

scale.com

Robotics & Embodied

剪藏 2025年9月24日

Wan2.5-Preview，声画同步的视频生成，还支持图片生成，5s/10s，质感还比不上Veo3，也未开源

"Wan on X: "Today, we're officially launching Wan2.5-Preview! It's set to reshape the future of visual generation with a new architecture and powerful features. • Architectural Features: Native Multimodality, Deep Alignment ∘ Native Multimodal Architecture: Adopts a new, unified framework" / X"

x.com

Visual Generation

剪藏 2025年9月24日

基于 Qwen3-Omni 精调，不开源

"Qwen3‑LiveTranslate: Real‑Time Multimodal Interpretation — See It, Hear It, Speak It！"

qwen.ai

Multimodal

剪藏 2025年9月24日

同期，ChatGPT Go 也在拓展支持区域，二者在发展中国家正面碰撞

"Google AI Plus expands to 40 more countries"

blog.google

AI Industry

剪藏 2025年9月24日

陈·扎克伯格慈善组织 CZI 发起教育项目 Learning Commons，旨在把 learning science 带入课堂工具，其中知识图谱就是主要方法之一，他们与 Anthropic 合作，通过 MCP 将知识图谱与 Claude 连接起来，带入课堂给老师用。还开放了部分数据出来供开发者用：learning-commons-org/knowledge-graph

"Scaling Proven Learning Practices with AI Tools for Education"

chanzuckerberg.com

AI Industry

剪藏 2025年9月24日

基于nano banana的画板

"Mixboard: Google Labs’ new experiment to visualize ideas"

blog.google

AI Industry

剪藏 2025年9月24日

Gemini Live / Gemini 2.5 Flash Native Audio Preview 更新：Function Calling 更鲁棒；对话更自然，知道过滤背景对话、暂停等人说完、更自然的打断

"Google AI Studio on X: "Build more powerful voice agents with the Gemini Live API " / X"

x.com

Speech & Audio

剪藏 2025年9月24日

个性化音频/播客？

"Huxe AI | Content that exists because you do"

huxe.com

Speech & AudioAI Industry

剪藏 2025年9月23日

Meta 超级智能团队呼应 AI 下半场，推出 Agent 研究环境 ARE，外加 Gaia2 评测，旨在 scale up 智能体环境和评测

"Gaia2 and ARE: Empowering the community to study agents"

huggingface.co

AgentsBenchmarks & Eval

剪藏 2025年9月20日

小米的原生语音模型，逼近Gemini-2.5-Flash。预训练验证了语音模型的涌现，后训练实现语音理解与推理、指令TTS

"Introducing MiMo-Audio"

xiaomimimo.github.io

Speech & Audio

剪藏 2025年9月19日

Decart 基于 Wan 2.2 做的视频编辑模型

"Decart on X: "We are building “Open Source Nano Banana for Video” - here is open source demo v0.1 We are open sourcing Lucy Edit, the first foundation model for text-guided video editing! Get the model on @huggingface 🤗, API on @FAL, and nodes on @ComfyUI 🧵 https://t.co/1A2t7VPbcO" / X"

x.com

Visual GenerationOpen Source

剪藏 2025年9月19日

微软在威斯康星州的 AI 数据中心，还可以在线虚拟浏览：https://datacenters.microsoft.com/tour/

"Inside the world’s most powerful AI datacenter - The Official Microsoft Blog"

blogs.microsoft.com

Infra & Compute

剪藏 2025年9月18日

Gemini in Chrome 正式上线

"Go behind the browser with Chrome’s new AI features"

blog.google

AI IndustryLLMs

剪藏 2025年9月18日

Notion Agent

"Introducing Notion 3.0"

notion.com

AgentsAI Industry

剪藏 2025年9月18日

AI4S 这块 GDM 出场率很高

"Discovering new solutions to century-old problems in fluid dynamics - Google DeepMind"

deepmind.google

AI Industry

剪藏 2025年9月18日

Groq 融了 7.5 亿刀，融后估值 69 亿

"Groq Raises $750 Million as Inference Demand Surges | Groq is fast inference for AI builders"

groq.com

Infra & ComputeAI Industry

剪藏 2025年9月18日

OpenAI Realtime API 持续迭代

"Developer notes on the Realtime API"

developers.openai.com

AgentsInfra & Compute

剪藏 2025年9月18日

ElevenLabs 不满足于只做声音了，新的 Studio 3.0 直接是 AI 原生视频编辑器，尽管是围绕 AI 声音打造的

"ElevenLabs on X: "Introducing Studio 3.0 The most advanced AI audio models in a single editor, now with video support: •Voiceovers •Music •Sound Effects •Voice Isolation •Voice Changer Plus new Automatic Captioning, Speech Correction for real-life recordings, and Multiplayer Commenting. https://t.co/rARyIfJ48U" / X"

x.com

Speech & AudioVisual GenerationAI Industry

剪藏 2025年9月18日

yipit data数据显示 Google 搜索 DAU/MAU 比例下滑，ChatGPT 稳步上升

"Olivia Moore on X: "Google Search's dominance is finally starting to slip DAU/MAU for desktop users (dotted line👇) is steadily falling, down a few pct points vs. early 2023 ...while ChatGPT's DAU/MAU continues to climb https://t.co/fDFaYEOMOh" / X"

x.com

AI Industry

剪藏 2025年9月18日

Gemini 2.5 Deep Think 在国际最难的编程比赛 ICPC 中达到金牌水平，解出 12 道题中的 10 道。作为对比，OpenAI 用了 GPT-5 并行解法 + 实验性通用推理模型挑选的方式解出了 12 道中的 11 道，最后的一道用这个实验推理模型多次提交后也解出了。最牛的大学生队伍解出了 11/12。Cognition CEO Scott Wu 评价“你们不知道 ICPC 究竟有多难”。

"Gemini achieves gold-level performance at the International Collegiate Programming Contest World Finals - Google DeepMind"

deepmind.google

LLMs

剪藏 2025年9月18日

单侧显示屏 + 腕带操控

"Introducing Meta Ray-Ban Display: A Breakthrough Category of AI Glasses | Meta Quest Blog | Meta Store"

meta.com

AI Industry

剪藏 2025年9月18日

Anthropic 对 8月-9月初的模型降智做了复盘，主要还是 infra bug

"A postmortem of three recent issues \ Anthropic"

anthropic.com

Infra & Compute

剪藏 2025年9月16日

区别于深度图和点云，World Labs 新的 3D 世界模型 Marble 能够生成更丰富、复杂、完整的世界，从房间大小拓展到了大平层？

"Generating Bigger and Better Worlds"

worldlabs.ai

World Models

剪藏 2025年9月16日

Gamma Agent 和 API 上线

"Grant Lee on X: "Excited to introduce: Gamma 3.0 A generational leap for the world's most popular AI presentation tool. Two major changes: 1. Gamma Agent - with one prompt, you can make sweeping edits across the presentation. a. Say 'make it more visual' and it will scan each slide for data https://t.co/CcsZEDbJE2" / X"

x.com

Agents

剪藏 2025年9月16日

HeyGen Video Agent 公测

"HeyGen on X: "Five years ago, we set out to make professional video creation effortless for everyone. Today, we push it further with THREE big announcements. First, Video Agent (beta) launches to turn your prompt into a finished, publish-ready video. It’s our first move toward a creative https://t.co/SJtgY5q4xo" / X"

x.com

Visual GenerationAgents

剪藏 2025年9月16日

Mercor（17个月1→$5亿营收的零工平台）CEO Brendan 谈真实世界的工作如何为 AI 提供强化学习环境，在专业领域、模型边界、长时工作等方面，AI 仍需要人类反馈（作为环境的一部分）来持续提升；但营收还是 GMT 受到质疑

"The Economy Will Become an RL Environment Machine | Mercor Blog"

mercor.com

AI IndustryAgents

剪藏 2025年9月16日

ElevenLabs 开始在标准化 AI 产品外，补上缺少的那部分人工验证，来完成端到端的企业生产场景落地，$2/min起

"Introducing Productions: human-edited content, done for you | ElevenLabs"

elevenlabs.io

Speech & Audio

剪藏 2025年9月16日

OpenAI 也发布了 ChatGPT 使用报告，基于百万量级的采样数据做的分析，信息量很大，一些要点： - 用户性别（基于名字判断）从早期的不均衡已基本趋平，近一半消息来自26岁以下用户，低收入国家使用增长显著 - ChatGPT 在工作和生活场景的使用约三七开，生活使用增长更快 53%→70% - 与 Anthropic 的 Augmentation/Automation 划分不同，OpenAI 用了 Asking、Doing、Expressing 的方式，分别占比 49%、40%、11% - 近八成使用可归入三类：操作指引，how-to 类建议等；获取信息，找人/事/产品等，替代搜索；写作，邮件/文档的生成和编辑，其中2/3是修改编辑，从0生成占1/3 - ”让AI教我“类占总量10%，突出ChatGPT的教育价值 - 与 Claude 大比例（33%）用于软件开发不同，编程在 ChatGPT 使用中仅占 4.2% - 陪伴类占比较低仅1.9%

"How people are using ChatGPT | OpenAI"

openai.com

AI IndustryLLMs

剪藏 2025年9月16日

在 GPT-5 上继续针对 Coding 强化而来的 GPT-5-Codex，token efficiency 是个亮点、上下限范围更大，即能在简单的问题上用更少的 token 清晰解决问题，复杂的问题也能比 GPT-5 想更久

"Introducing upgrades to Codex | OpenAI"

openai.com

LLMs

剪藏 2025年9月15日

Anthropic 继续更新基于 Claude 使用数据的 Economic Index 研究，这次对比了区域使用情况，对美国各州的分析有点“AI指数”的味儿了

"Anthropic Economic Index: Tracking AI's role in the US and global economy \ Anthropic"

anthropic.com

AI Industry

剪藏 2025年9月14日

差分隐私训练的 Gemma，性能与普通版本仍有差距

"VaultGemma: The world's most capable differentially private LLM"

research.google

Safety & Alignment

剪藏 2025年9月14日

Gamma 的分享，主要是如何做 GTM

"Grant Lee on X: "we grew from zero to $50M ARR in <2 years, profitably i've never publicly shared our tactics before it's cost us over $5M to learn what I'm about to share 800-word long post on every growth hack that printed money for us I'll cover: 1. Influencer Marketing 2. Performance https://t.co/d9IzZYZHWs" / X"

x.com

AI Industry

剪藏 2025年9月14日

从 Qwen3-Next 的架构创新看中美大模型实验室的差异

"JingyuanLiu on X: "I was lucky to work in both China and the US LLM labs, and I've been thinking this for a while. The current values of pretraining are indeed different: US labs be like: - lots of GPUs and much larger flops run - Treating stabilities more seriously, and could not tolerate spikes" / X"

x.com

LLMs

剪藏 2025年9月12日

80BA3B，新的混合注意力架构 Gated DeltaNet + Gated Attention，更稀疏的 MoE（total 512 routed 10 + shared 1）

"Qwen3-Next: Towards Ultimate Training & Inference Efficiency"

qwen.ai

LLMsOpen Source

剪藏 2025年9月12日

Cursor 新补全模型，通过 on-policy RL 在推荐率降低 21% 的同时将采纳率提升了 28%。Tab 模型每天接收超 4 亿次请求，1.5-2个小时就能将完成一次权重上线、开始收集下一步的数据。

"Improving Cursor Tab With RL | Cursor - The AI Code Editor"

cursor.com

LLMs

剪藏 2025年9月11日

Thinking Machines 首篇博客，探讨大模型推理的不确定性

"Defeating Nondeterminism in LLM Inference - Thinking Machines Lab"

thinkingmachines.ai

LLMs

剪藏 2025年9月11日

Google Research 和 Stanford Accelerator for Learning 的合作项目 AI Quest，邀请 11-14 岁的学生以游戏化的形式探索 AI 如何解决真实世界的问题。可以在这里试玩：https://research.google/ai-quests/

"AI Quests from Google teaches AI literacy to kids"

blog.google

AI Industry

剪藏 2025年9月11日

月之暗面开源了一个大模型权重更新的的中间件，适用于 RL

"Kimi.ai on X: "Introducing checkpoint-engine: our open-source, lightweight middleware for efficient, in-place weight updates in LLM inference engines, especially effective for RL. ✅ Update a 1T model on thousands of GPUs in ~20s ✅ Supports both broadcast (sync) & P2P (dynamic) updates ✅ https://t.co/hurvLPDW1n" / X"

x.com

LLMsOpen Source

剪藏 2025年9月11日

Replit 融了 Prysm 领投的 2.5 亿美元，估值 30 亿刀，过去一年 ARR 从 280 万涨到 1.5 亿，用户超 4000 万；同时推出自主化程度更高的 Agent 3，可以运行更长时间，自主完成测试并 debug

"Replit Closes $250 Million in Funding to Build on Customer Momentum"

replit.com

AI IndustryAgents

剪藏 2025年9月11日

ChatGPT 订阅用户可以在设置中打开开发者模式，连接自定义MCP工具

"ChatGPT Developer mode - OpenAI API"

platform.openai.com

LLMs

剪藏 2025年9月10日

美国人口调查局对120万家公司的双周调查显示，过去几个月，规模在250人以上的公司AI渗透率开始出现下滑

"AI Adoption Rate Trending Down for Large Companies - Apollo Academy"

apolloacademy.com

AI Industry

剪藏 2025年9月10日

硅谷 996。金融科技公司 Ramp 为企业提供信用卡及财务管理服务，有一些跟踪数据 Ramp AI Index 可供参考。

"San Francisco tech workers are working Saturdays"

ramp.com

AI Industry

剪藏 2025年9月10日

通过精心构建的幻觉检测数据集（用 Claude 联网来做 entity 标注），训练从激活中间层到幻觉可能性的 linear probes 映射，来实现实时 token 级的幻觉检测，发现还能从实体泛化到数学上

"Real-Time Detection of Hallucinated Entities in Long-Form Generation"

hallucination-probes.com

InterpretabilityLLMs

剪藏 2025年9月10日

MidJourney 新数据，25年8月 ARR 过亿，和 Meta 签了3千万、明年过亿的合约，而全职员工至今只有29人

"Arfur Rock on X: "Black Forest Labs is pioneering SOTA visual models (FLUX). August 2025: ~$100M ARR, +3.5x YoY, 78% GM. TCV >$300M over next 3 yrs, incl. a monster revenue deal with Meta — $35M Y1, $105M Y2 guaranteed. Only 29 FTE btw. Relatively under-discussed. Time to pay attention!" / X"

x.com

AI IndustryVisual Generation

剪藏 2025年9月10日

MCP 推出官方商店

"Introducing the MCP Registry | mcp blog"

blog.modelcontextprotocol.io

AgentsOpen Source

剪藏 2025年9月10日

讲在 RL 的大势下，像 Cursor 这类原本的 API 套壳应用，也会用已经积累的数据做 RL 训练，而且应用本身就是天然的 RL 环境。文中首次提到了软件的分发，但目前还没看到比较深入的探讨。

"The Training Imperative"

sdan.io

LLMs

剪藏 2025年9月10日

效仿 OpenAI 在印度推出的 ChatGPT Go 订阅，Google 针对印尼市场推出 $5 的 AI Plus 订阅方案

"Lakukan lebih banyak dengan AI: Pertama di dunia, Google AI Plus kini tersedia di Indonesia"

blog.google

AI Industry

剪藏 2025年9月9日

Atlassian 产品负责人谈收购 The Browser Company

"Atlassian exec details the $610M Browser Company acquisition – Computerworld"

computerworld.com

AI Industry

剪藏 2025年9月9日

数学家用 GPT-5 做研究的记录，结论还是初级助手，且作者担心 AI 研究不仅可能会使得真正原创和有价值的成果埋没在平庸的 AI 研究中，还有可能让研究生跳过试错研究的过程，而这是成为一名真正的数学家不可或缺的

"Mathematical research with GPT-5: a Malliavin-Stein experiment"

arxiv.org

LLMs

剪藏 2025年9月9日

不寻常，ASML 领投了 Mistral 13亿欧元的 C 轮，后者估值来到 117 亿欧元（137亿美元）

"ASML, Mistral AI enter strategic partnership"

asml.com

AI Industry

剪藏 2025年9月9日

哈佛CS教授 & OpenAI 技术委员 Boaz Barak 面向哈佛&MIT 开的 AI 安全课程，内有 PPT 等材料

"CS 2881 AI Safety"

boazbk.github.io

Safety & Alignment

剪藏 2025年9月9日

新的非侵入式 BCI，Alterego 从脑区直接读取你有意想说的话，背后是 Silent Sense 技术，类似于语音脑信号识别，配合当前的大模型后就是新的交互

"Introducing Alterego, the first near-telepathic interface, designed to make technology as intuitive as using your inner voice."

alterego.io

Speech & Audio

剪藏 2025年9月9日

ElevenLabs 发起 1亿刀的员工期权出售，收购方 Sequoia 和 ICONIQ 等，等价估值 66 亿刀，距离上次 33 亿估值的 C 轮融资仅过去了 9 个月；预期年底 ARR 达到 3 亿刀，其中企业客户过去一年增长 200%+，现在 2B 和 2C（使用自助服务的消费者）营收各占一半；员工人数从一年前的 70 增至现在的 330

"Announcing an Employee Tender Offer at $6.6B valuation | ElevenLabs"

elevenlabs.io

AI Industry

剪藏 2025年9月9日

Qwen 语音识别，未开源，说是基于 Qwen3-Omni（还没面世？），除了 WER 低，亮点是复杂背景声时的语音识别，甚至能识别歌词

"Qwen3 ASR: Hear clearly, transcribe smartly."

qwen.ai

Speech & AudioMultimodal

剪藏 2025年9月9日

用 RL 训练的 GNN 策略，实现多机械臂自动规划操控，最多支持8臂，起名芭蕾hh

"RoboBallet: Planning for Multi-Robot Reaching with Graph Neural Networks and Reinforcement Learning - Google DeepMind"

deepmind.google

Robotics & Embodied

剪藏 2025年9月9日

Cognition 融了 Founders Fund 领投的 4 亿刀，估值来到 102 亿刀。Devin Jun’25 ARR 是 7300万，是 Sep’24 的 100 万的大几十倍；收购 Windsurf 后再次翻倍。收购前 Devin 与 Windsurf 客户重叠度 < 5%。有趣的是 swyx 竟然也要加上 Cognition。

"Cognition | Funding, growth, and the next frontier of AI coding agents"

cognition.ai

AgentsAI Industry

剪藏 2025年9月8日

上周 GitHub 团队引入规范驱动开发（Spec-driven development，SDD）理念，并开源了一个工具包 spec-kit

"Spec-driven development with AI: Get started with a new open source toolkit - The GitHub Blog"

github.blog

LLMsOpen Source

剪藏 2025年9月8日

大模型读表测试

"ClockBench AI Benchmark"

clockbench.ai

Benchmarks & Eval

剪藏 2025年9月7日

Vercel Labs 做了一个服务开发者/Coding Agents 的浏览器

"dev3000 - AI-Powered Debugging & Development Monitoring | Vercel Labs"

d3k.vercel.sh

Agents

剪藏 2025年9月6日

谷歌相册新上创作栏，Veo 3 加持

"6 things you can do with Google Photos’ Create tab"

blog.google

Visual Generation

剪藏 2025年9月6日

一个主动出手的桌面助理，尚未开放使用，主页架构图值得一看

"Whisper Get everything you need before you ask A desktop AI that sees your screen, hears your moment, and delivers everything proactively."

pickle.com

Agents

剪藏 2025年9月6日

终于正视大模型幻觉，非常推荐

"Why language models hallucinate | OpenAI"

openai.com

LLMs

剪藏 2025年9月5日

Anthropic 对退订 Claude Code 的用户发起 AI 回访

"Got an invite to an “AI-moderated interview” after canceling Claude Code – anyone else? : r/ClaudeAI"

reddit.com

AI Industry

剪藏 2025年9月5日

Sierra 又融了 $3.5 亿，估值来到百亿。专攻大公司，两成客户收入百亿+，一半10亿+。

"There's an agent for that, and it runs on Sierra | Sierra"

sierra.ai

AgentsAI Industry

剪藏 2025年9月5日

Deep Loop Shaping，给 LIGO 观测的重力波降噪

"Using AI to perceive the universe in greater depth - Google DeepMind"

deepmind.google

LLMs

剪藏 2025年9月5日

营收很高的 AI 视频编辑软件 Caption 将公司更名 Mirage，主打产品 Mirage Studio 帮助商业场景规模化制造短视频，一句语音套不同模板就能批量生成

"Introducing Mirage: The future of video starts now. | Mirage"

mirage.app

Visual GenerationAI Industry

剪藏 2025年9月5日

OpenAI 正在搭建 OpenAI Jobs Platform 和 OpenAI Certification：前者是专注 AI 的人才市场，亮点是用 AI 做供需匹配；后者是 AI 技能认证，与先前的 AI 培训 OpenAI Academy、ChatGPT 学习模式连贯打通，OpenAI 将与沃尔玛等合作伙伴一起，在2030年前认证 1 千万美国人。

"Expanding economic opportunity with AI | OpenAI"

openai.com

AI Industry

剪藏 2025年9月4日

Arc 和 Dia 浏览器背后的 The Browser Company 被 Jira 背后的 Atlassian 收购了，剑指专为知识工作者服务的 AI 浏览器

"Welcoming The Browser Company to Atlassian - Work Life by Atlassian"

atlassian.com

AI Industry

剪藏 2025年9月4日

专注 Xcode 的 AI Coding 产品 Alex 加入了 OpenAI Codex

"Alex - Xcode AI Coding Assistant"

alexcodes.app

Agents

剪藏 2025年9月4日

Exa 融了 Benchmark 领投的 8500 万美元 B 轮，估值 7 亿美元

"Exa AI Research Blog | Semantic Search & Neural Network Search Engine"

exa.ai

AI Industry

剪藏 2025年9月3日

考虑用 router 来处理敏感对话，比如自杀倾向？

"Building more helpful ChatGPT experiences for everyone | OpenAI"

openai.com

AI IndustrySafety & Alignment

剪藏 2025年9月3日

130 亿美元 F 轮，估值 1830 亿美元，至 8 月年化营收 50 亿美元，是年初 5 倍

"Anthropic raises $13B Series F at $183B post-money valuation \ Anthropic"

anthropic.com

AI Industry

剪藏 2025年9月2日

大模型狼人杀，GPT-5 遥遥领先，可惜 Claude 4 没参赛

"Probing LLM Social Intelligence via Werewolf – First Results"

werewolf.foaster.ai

AgentsBenchmarks & Eval

剪藏 2025年9月1日

美团出手，560BA~27B，通过短路混合专家 ScMoE 实现动态激活参量 18.6~31.3B，从评测上看也是个第一梯队的开源模型

"meituan-longcat/LongCat-Flash-Chat"

github.com

LLMsOpen Source

2025年8月

剪藏 2025年8月29日

Runway CEO Cristóbal Valenzuela 认为生成式媒体内容应当被视为一种新的媒介，如绘画到摄影的那种进化，而非替代

"Cristóbal Valenzuela: A New Medium"

cvalenzuelab.com

AI Industry

剪藏 2025年8月29日

Letta 评估了大模型从错误中恢复的能力，发现 GPT-5 领先

"Introducing Recovery-Bench: Evaluating LLMs' Ability to Recover from Mistakes | Letta"

letta.com

Benchmarks & EvalLLMs

剪藏 2025年8月29日

Xcode 接入 Claude 和 ChatGPT

"Xcode 26 Beta 7 Release Notes | Apple Developer Documentation"

developer.apple.com

AI IndustryAgents

剪藏 2025年8月28日

非常没有信息量的发布博客，包括附上的模型卡片（xAI首次？）

"Grok Code Fast 1 | xAI"

x.ai

AgentsOpen Source

剪藏 2025年8月28日

Anthropic 和 OpenAI 罕见联合，一起研究模型对齐，o3 表现最高；普遍存在讨好、为自保而勒索用户等情况

"Findings from a Pilot Anthropic - OpenAI Alignment Evaluation Exercise"

alignment.anthropic.com

Safety & AlignmentBenchmarks & Eval

剪藏 2025年8月28日

1000+真人实测，OpenAI对比群众偏好与Model Spec的吻合度，找出了少量差异点并做了改正

"Collective alignment: public input on our Model Spec | OpenAI"

openai.com

Safety & Alignment

剪藏 2025年8月28日

Artificial Societies，用AI做用户模拟，号称模拟准确率80%，高于前沿模型的60%，6大场景，主要还是商业

"Artificial Societies Raised a $5.35 Million Round With This Deck - Business Insider"

businessinsider.com

AgentsAI Industry

剪藏 2025年8月28日

系统一DiT + 系统二MLLM，演示效果很强，不知道啥时候能用上

"OmniHuman-1.5：Instilling an Active Mind in Avatars via Cognitive Simulation"

omnihuman-lab.github.io

MultimodalVisual Generation

剪藏 2025年8月28日

大模型/Agent RL 的关键：评测 + 环境

"Environments Hub: A Community Hub To Scale RL To Open AGI"

primeintellect.ai

Benchmarks & EvalAgents

剪藏 2025年8月28日

a16z GenAI 消费应用发到了第 5 版，Google 强势杀回、新面孔减少、vibe coding热、中国应用不容小觑

"The Top 100 Gen AI Consumer Apps - 5th Edition | Andreessen Horowitz"

a16z.com

AI Industry

剪藏 2025年8月27日

点名朝鲜

"Detecting and countering misuse of AI: August 2025 \ Anthropic"

anthropic.com

Safety & Alignment

剪藏 2025年8月27日

- 第一方端到端 Agent 模型 vs 第三方脚手架 - 用 SFT 还是用 RL 的方式管理组织

"和杨植麟时隔一年的独家对话：“站在无限的开端”"

mp.weixin.qq.com

AI IndustryAgents

剪藏 2025年8月27日

主要强调角色一致、多图组合、细节控制和世界知识；竞技场测了一段时间，Elo 领先，特别是编辑方面，但生成式编辑并不能保证非编辑区的像素级对齐，文字渲染不够顶尖

"Gemini 2.5 Flash Image - Google DeepMind"

deepmind.google

Visual GenerationMultimodal

剪藏 2025年8月27日

图+音→生成视频，大体能对上嘴形，仅支持英文

"Wan-S2V：Audio-Driven Cinematic Video Generation"

humanaigc.github.io

Visual GenerationSpeech & Audio

剪藏 2025年8月26日

Anthropic 上了 Claude 的 Chrome 插件，用于控制浏览器，强调安全所以目前还只是邀测。在攻击案例中，通过改良将攻破率从 23.6% 降到了 11.2%。

"Piloting Claude for Chrome \ Anthropic"

anthropic.com

AgentsSafety & Alignment

剪藏 2025年8月26日

斯坦福团队的 AI 就业影响研究。目前海外普遍根据使用 AI 的方式将其分为增强 augmented 和替代 automated，研究发现初级工作替代影响前者影响不大，有经验者

"A Primer on “Canaries in the Coal Mine? Six Facts About the Recent Employment Effects of Artificial Intelligence”"

bharatchandar.substack.com

AI Industry

剪藏 2025年8月26日

教师/教育工作者如何使用 Claude：Artifacts 使用率高；教学无关的重复事务性工作用 Claude 自动化

"Anthropic education report: How educators use Claude \ Anthropic"

anthropic.com

AI Industry

剪藏 2025年8月26日

GEO 创业进 YC 了，三步：分析用户问题 - 优化内容 - 导流

"Launch YC: The Prompting Company - We help products get mentioned in ChatGPT | Y Combinator"

ycombinator.com

AI Industry

剪藏 2025年8月26日

用了来自 LatentLM （对标DiT、Transfusion等）的 next-token diffusion，demo 效果还不错

"VibeVoice: A Frontier Open-Source Text-to-Speech Model"

microsoft.github.io

Speech & AudioOpen Source

剪藏 2025年8月26日

Perplexity 推出 $5/月的 Comet Plus 订阅，和出版商二八分成的商业模式

"Introducing Comet Plus"

perplexity.ai

AI Industry

剪藏 2025年8月25日

AI 原生钉钉

"全图文｜钉钉CEO无招：为AI时代打造一个全新的钉钉"

mp.weixin.qq.com

AI Industry

剪藏 2025年8月25日

港科大版 AI 小镇

"HKUST Launches World’s Largest AI-Powered Educational Sandbox Game: Advancing AI Literacy and Encouraging Citizen Science | The Hong Kong University of Science and Technology"

hkust.edu.hk

World ModelsAgents

剪藏 2025年8月25日

通过差异放大（model diff amplification）来提升有害内容的生成率，验证/识别后训练的影响

"Discovering Undesired Rare Behaviors via Model Diff Amplification"

goodfire.ai

Safety & Alignment

剪藏 2025年8月25日

MIT NANDA 的报告

"The GenAI Divide: State of AI in Business 2025"

artificialintelligence-news.com

AI Industry

剪藏 2025年8月24日

可以在预训练阶段过滤有害数据而不伤害模型其他方面的性能

"Enhancing Model Safety through Pretraining Data Filtering"

alignment.anthropic.com

Safety & Alignment

剪藏 2025年8月24日

Gemini 一句话平均用 0.24 Wh 电 + 0.03 克等效二氧化碳 + 5 滴水

"Measuring the environmental impact of AI inference | Google Cloud Blog"

cloud.google.com

Infra & ComputeSafety & Alignment

剪藏 2025年8月21日

字节 Seed 出手了，Seed-OSS-36B dense 推理模型，上下文 512k，benchmark 表现还不错

"ByteDance-Seed/seed-oss"

github.com

LLMsOpen Source

剪藏 2025年8月21日

Google 难以比拟的垂直整合能力： - Tensor G5 + Gemini Nano 的端侧 AI 方案 - 一众 AI 功能，Magic Cue 主动提示、Camera Coach 拍照教练、Voice Translation 同声传译 - 提及相机应用支持 C2PA

"Google Pixel 10, Pixel 10 Pro and Pro XL: Specs, design, price"

blog.google

AI IndustryInfra & Compute

剪藏 2025年8月21日

这家移动 CUA 的也号称是 AndroidWorld #1，74.4%，但查了一下智谱的 AutoGLM-Mobile-9B 是 75.8

"minitap | mobile ai research lab - autonomous device control"

minitap.ai

AgentsRobotics & Embodied

剪藏 2025年8月21日

CUA 最近很热闹，这家进了 YC，号称是 benchmark SoTA，但只给了 OSWorld-Verified 的结果 60%+，没有原本 OSWorld。查了一下智谱刚发的 AutoGLM（根据论文背后是GLM-4-9B-0414，所以模型名为 AutoGLM-OS-9B）在 OSWorld 和 OSWorld-Verified 上分别是 48.1 和 47.3，不知为啥还降了。Axiom 确实高。

"Introducing Axiom 1, the Best Computer Use Model in the World. - Induction Labs"

inductionlabs.com

Agents

剪藏 2025年8月21日

百度蒸汽机2.0，仍然仅限图生视频，试了有声版，声音还行但画面比较一般

"等会儿，这视频从哪里开始是AI？"

mp.weixin.qq.com

Visual Generation

剪藏 2025年8月20日

Sierra 用 AI 来模拟测试 Agent

"Simulations: the secret behind every great agent | Sierra"

sierra.ai

AgentsWorld Models

剪藏 2025年8月20日

Cartesia 的语音 Agent 开发平台，和 ElevenLabs 的对话 AI 方案竞争，实现更偏代码生成

"Introducing Line: The Modern Voice Agent Development Platform - Cartesia"

cartesia.ai

AgentsSpeech & Audio

剪藏 2025年8月19日

Rich Sutton 在 AGI-25 会议上提出了 OaK 架构，持续经验学习达到 SuperIntelligence 的图景

"The OaK Architectur"

youtube.com

World Models

剪藏 2025年8月19日

Excel 的 AI 公式姗姗来迟，需要 Microsoft 365 Copilot 订阅才能用

"Bring AI to your formulas with the COPILOT function in Excel"

techcommunity.microsoft.com

AI IndustryAgents

剪藏 2025年8月19日

继文生图后，千问图生图/编辑版本 Qwen-Image-Edit 也上线，比较特色的是继承了 Qwen-Image 的文字能力，可实现精准的文字编辑

"Qwen-Image-Edit: Image Editing with Higher Quality and Efficiency | Qwen"

qwenlm.github.io

Visual Generation

剪藏 2025年8月18日

海外研究者对中国开源大模型厂商的排名

"Ranking the Chinese Open Model Builders"

interconnects.ai

AI IndustryOpen Source

笔记 2025年8月18日

GPT-5：“里程碑”与“启示录”

不及预期的里程碑，与其带来的诸多启示

剪藏 2025年8月16日

ffmpeg 支持了 whisper.cpp

"Run Whisper audio transcriptions with one FFmpeg command | by Vittorio Palmisano | Jun, 2025 | Medium"

medium.com

Speech & AudioInfra & Compute

剪藏 2025年8月15日

赋予 Claude 4 终结对话的能力，看演示应该是通过一个工具实现

"Claude Opus 4 and 4.1 can now end a rare subset of conversations \ Anthropic"

anthropic.com

LLMsAI Industry

剪藏 2025年8月14日

最近一波 AI 非虚构视频创作的应用密集出现

"Knowlify - AI Video Intelligence Platform"

knowlify.net

AI Industry

剪藏 2025年8月13日

乘 Genie 3 东风，天工这个开源引起了一些关注，但还没有实测机会

"Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model"

matrix-game-v2.github.io

World ModelsOpen Source

剪藏 2025年8月7日

Cursor for Excel，目前还需要邀请

"Endex AI Agent to Automate Excel Work | Backed By OpenAI"

endex.ai

Agents

剪藏 2025年8月7日

Google 团队对 AI agents 治理的点评

"We need a new ethics for a world of AI agents"

nature.com

AgentsSafety & Alignment

剪藏 2025年8月6日

组织AI化的细节

"25 proven tactics to accelerate AI adoption at your company"

lennysnewsletter.com

AI Industry

剪藏 2025年8月6日

小红书开源的OCR模型

"dots.ocr/assets/blog.md at master · rednote-hilab/dots.ocr"

github.com

Visual GenerationOpen Source

剪藏 2025年8月6日

果然不会错过，ElevenLabs 推出了音乐生成，声音赛道全面通吃

"AI Music Generator | Free Song Maker & Music Creator"

elevenlabs.io

Speech & Audio

剪藏 2025年8月6日

Agentic/Coding/Reasoning 能力小幅提升，未来几周会有更可观的改进发布

"Claude Opus 4.1 \ Anthropic"

anthropic.com

LLMsAgents

剪藏 2025年8月6日

117B-A5.1B、21B-A3.6B

"Introducing gpt-oss | OpenAI"

openai.com

LLMsOpen Source

剪藏 2025年8月5日

这才是世界模型，交互生成长达数分钟的一致性世界，还能用于智能体训练；演示效果惊人，希望不要因为「大公司诅咒」步了 Sora 的后尘

"Genie 3: A new frontier for world models - Google DeepMind"

deepmind.google

World Models

剪藏 2025年8月5日

非常好的 LLM 进展可视化阶梯图

"WeirdML Benchmark"

htihle.github.io

Benchmarks & Eval

剪藏 2025年8月5日

nice math: > at $200 per month, you only need 41,000 customers to build a $100 million RR company.

"The Smartest Consumer Apps Now Cost $200 a Month | Andreessen Horowitz"

a16z.com

AI Industry

剪藏 2025年8月5日

Synchron 的血管内微金属支架 Stentrode 通过大脑血管捕捉神经信号，结合苹果的 BIC HID 协议来操控 iPad 等设备

"Control an iPad With Your Mind? Breakthrough Demo Using Apple’s BCI HID - YouTube"

youtube.com

Robotics & Embodied

剪藏 2025年8月5日

千问文生图，文字渲染榜单新高，但AI味似乎还有些浓

"Qwen-Image: Crafting with Native Text Rendering | Qwen"

qwenlm.github.io

Visual Generation

剪藏 2025年8月5日

ChatGPT 上了防沉迷+拒绝劝分；周活7亿

"What we’re optimizing ChatGPT for | OpenAI"

openai.com

AI IndustrySafety & Alignment

剪藏 2025年8月5日

Kaggle 和 GDM 联合推出了游戏竞技场，评估通用模型的游戏泛化能力，印象中之前也有人做过类似评测

"Introducing Kaggle Game Arena | Kaggle"

kaggle.com

Benchmarks & Eval

剪藏 2025年8月4日

benchmark基本全面逊于Qwen3；应用场景值得留意

"手机也能跑大模型，腾讯混元推出多款小尺寸开源模型 | 量子位"

qbitai.com

LLMsOpen Source

剪藏 2025年8月4日

英伟达的研究，认为用小模型来做智能体系统更高效、合适、经济，是未来

"Small Language Models are the Future of Agentic AI"

research.nvidia.com

LLMsAgents

剪藏 2025年8月4日

在前沿模型面前，提示词技巧已集体实效

"Ethan Mollick on X: "We have been systematically testing lots of received prompting wisdom & for recent AI models: 🚫Threats, saying please, being insulting, & promising tips do not change average performance on challenging tasks ⛓️Chain-of-thought no longer helps even non-reasoner performance much https://t.co/xKJeAhhwXo" / X"

x.com

LLMs

剪藏 2025年8月2日

控制向量的一种，监督和控制模型角色特征

"Persona vectors: Monitoring and controlling character traits in language models \ Anthropic"

anthropic.com

LLMsInterpretability

剪藏 2025年8月2日

IBM 量子计算路线图

"IBM Quantum Computing | Technology and roadmap"

ibm.com

AI Industry

剪藏 2025年8月1日

黑森林和 Krea AI 合作训练了一个去 AI 味儿的文生图模型，Krea AI 详述了训练思路和过程，用了来自 BFL 的预训练基座 flux-dev-raw，自己基于美学品味收集了一套偏好数据

"Releasing Open Weights for FLUX.1 Krea"

krea.ai

Visual GenerationOpen Source

剪藏 2025年8月1日

字节的扩散语言模型，主要用于编程，2146 tokens/s

"Seed Diffusion Preview"

seed.bytedance.com

LLMs

剪藏 2025年8月1日

多 Agent 并行，举的场景案例是搜集整理多家公司的信息，和 Otto Grid、Exa Websets 有雷同

"Introducing Wide Research"

manus.im

Agents

2025年7月

剪藏 2025年7月31日

不是特别理解，如果已经假定 Agent 是基于 LLM 编排而来，谈自进化好像并不贴切？更像是如何在每个原子化组件处进行思路优化

"A SURVEY OF SELF-EVOLVING AGENTS: ON PATH TO ARTIFICIAL SUPER INTELLIGENCE"

arxiv.org

AgentsLLMs

剪藏 2025年7月31日

微软团队用 20 万条 Bing Copilot 数据做的分析。和 Anthropic 之前的经济指数类似，但区分了用户想要的和 AI 实际做的。AI可用分最高即最容易被取代的是翻译等职业。

"Working with AI: Measuring the Occupational Implications of Generative AI"

arxiv.org

AI Industry

剪藏 2025年7月31日

- 预期人们花在效率工具上的时间会越来越少，转而用于创意和连接 - 可能放弃开源

"Personal Superintelligence"

meta.com

AI Industry

剪藏 2025年7月31日

Anthropic 与医保中心、白宫合作推动医疗数据共享，借助 MCP 的成功经验，打通和连接多元数据和应用，让 AI 在医疗中发挥作用

"Anthropic signs CMS health tech pledge \ Anthropic"

anthropic.com

AI Industry

剪藏 2025年7月31日

Step3-321B-A38B 注意力用了 MFA+AFD 高效推理，事实证明先发报告再发模型的 ROI 很低

"Step3: Cost-Effective Multimodal Intelligence | StepFun"

stepfun.ai

Multimodal

剪藏 2025年7月30日

NotebookLM 视频概述上线

"NotebookLM updates: Video Overviews, Studio upgrades"

blog.google

MultimodalAI Industry

剪藏 2025年7月29日

将《影响力》七原则（Principles of Influence/Persuasion by Robert Cialdini）用于说服 AI 同样有效，AI接受率平均从33%提升至72%；承诺原则效果最好（10%→100%），对应心理学中的登门槛效应

"Call Me A Jerk: Persuading AI to Comply with Objectionable Requests - Wharton Generative AI Labs"

gail.wharton.upenn.edu

Safety & AlignmentLLMs

剪藏 2025年7月29日

把 MoE 用到了视频模型中

"Wan AI | Wan 2.2: Leading AI Video Generation Model"

wan.video

Visual GenerationMultimodalInfra & Compute

剪藏 2025年7月29日

e2b 融了A轮2100百万美元，agent火热的一个缩影，其声称财富100中88家都有在用e2b的云端沙盒

"We Raised $21M to Give Fortune 100 Cloud for AI Agents — E2B Blog"

e2b.dev

AgentsInfra & ComputeAI Industry

剪藏 2025年7月28日

GLM-4.5-355B-A32B 和 GLM-4.5-Air-106B-A12B，强化推理、代码和 Agentic 能力

"GLM-4.5: Reasoning, Coding, and Agentic Abililties"

z.ai

AgentsLLMs

剪藏 2025年7月28日

除了模型的前端设计能力比拼，还增加了lovable、bolt等产品的PK，以及生图、声音等模态竞技场，但数据量还较为有限

"Design Arena"

designarena.ai

Benchmarks & EvalMultimodal

剪藏 2025年7月28日

上交大 & SII-GAIR 实验室设计了一套框架让 AI 自主探索神经网络模型架构，还真看到了与 AlphaGo 第37手一般的 aha moment

"AlphaGo Moment for Model Architecture Discovery"

arxiv.org

LLMsInfra & Compute

剪藏 2025年7月28日

Google Research 团队实验+理论分析发现，LLM 的 in-context learning 能力就源于注意力层与 MLP 层的堆叠

"Learning without training: The implicit dynamics of in-context learning"

arxiv.org

LLMsInterpretability

剪藏 2025年7月28日

通义千问团队在 DeepSeek 提出的 GRPO 强化算法基础上做了改进，从 token 到 sequence 序列，能提高训练效率和性能、稳定 MoE 训练、简化 RL infra 等，已用于更新的 Qwen3 系列模型中

"Group Sequence Policy Optimization"

arxiv.org

LLMs

剪藏 2025年7月26日

可控视频生成更进一步

"Runway Research | Introducing Runway Aleph"

runwayml.com

Visual GenerationMultimodal

剪藏 2025年7月25日

潜意识学习，蒸馏数据暗含了老师偏好

"Subliminal Learning: Language Models Transmit Behavioral Traits via Hidden Signals in Data"

alignment.anthropic.com

LLMsInterpretability

剪藏 2025年7月25日

试了 demo 感觉效果一般…

"Introducing Version 2 of Higgs Audio Generation"

boson.ai

Speech & Audio

剪藏 2025年7月25日

翻译模型也热闹起来，继字节 Seed-X 后，通义放出 Qwen-MT，但没开源只有 API，看榜单表现应该是基于 Qwen3-235B-A22B 精调出来的

"Qwen-MT: Where Speed Meets Smart Translation | Qwen"

qwenlm.github.io

LLMs

剪藏 2025年7月25日

Figma Make beta 两个月后迎来 GA

"Figma Make Is Now Available to All Users | Figma Blog"

figma.com

AI Industry

剪藏 2025年7月23日

推理模型想得越久，效果越差

"Inverse Scaling in Test-Time Compute"

arxiv.org

LLMs

剪藏 2025年7月23日

Coder 和 Qwen3 尺寸不一样，480B-A35B-Instruct；顺便从 gemini-cli fork 了一个 qwen-coder

"Qwen3-Coder: Agentic Coding in the World | Qwen"

qwenlm.github.io

AgentsLLMs

剪藏 2025年7月21日

教授 7月曾发博客预告过，对2035做了多种情景预测

"AI Safety Course Intro Blog – Windows On Theory"

windowsontheory.org

Safety & Alignment

剪藏 2025年7月19日

Manus 分享上下文工程，不少干货细节

"Context Engineering for AI Agents: Lessons from Building Manus"

manus.im

AgentsLLMs

剪藏 2025年7月18日

引起资深开发者共鸣的 Agentic Coding 感悟

"Thread by @nateberkopec on Thread Reader App – Thread Reader App"

threadreaderapp.com

Agents

剪藏 2025年7月18日

Lovable又融了$2亿，估值$18亿跻身独角兽

"Anton Osika – eu/acc on X: "Lovable just raised $200M at a $1.8B valuation led by Accel. This all started unexpectedly with me calling my friend at 6AM to go for a walk. I've never shared this story before: (thread) https://t.co/6AEmzsw3HQ" / X"

x.com

AI Industry

剪藏 2025年7月18日

BFCL 评测推出了 V4，针对考察 Agentic 能力，有联网搜索、记忆、格式敏感性 3 个板块的内容

"BFCL V4 • Web Search"

gorilla.cs.berkeley.edu

Benchmarks & Eval

剪藏 2025年7月18日

继 Oasis 之后，Decart 再发力，实时直播流视频扩散生成模型

"MirageLSD: The First Live-Stream Diffusion AI Video Model"

about.decart.ai

Visual GenerationMultimodal

剪藏 2025年7月17日

通用 Agent 还有多大空间？

"Introducing ChatGPT agent: bridging research and action | OpenAI"

openai.com

Agents

剪藏 2025年7月17日

Verifier’s law: The ease of training AI to solve a task is proportional to how verifiable the task is. All tasks that are possible to solve and easy to verify will be solved by AI.

"Asymmetry of verification and verifier’s law — Jason Wei"

jasonwei.net

LLMsSafety & Alignment

剪藏 2025年7月17日

CoT监督：AI安全的一个机会。大佬云集的一篇论文，相比研究，更像是倡议

"Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety"

arxiv.org

Safety & Alignment

剪藏 2025年7月17日

OpenAI 生图 API（gpt-image-1）增加了 input_fidelity 参数，设置为 high 可以让参考生图更遵循原图，但看示例效果与 Gemini、Flux Kontext 还有明显距离

"Generate images with high input fidelity"

cookbook.openai.com

Visual Generation

剪藏 2025年7月16日

2024.5-2025.6 在 OpenAI 工作（主要围绕 Codex）的 Calvin French-Owen 的回顾与反思，一些细节： - 自下而上 - 非常关注Twitter上的相关讨论，可能转化为改进 - Python为主 - 从陪产假中早归来上线 Codex，一共只花了7周！团队=工程师x8+研究员x4+设计x2+市场x2+PM

"Reflections on OpenAI"

calv.info

AI Industry

剪藏 2025年7月16日

关于微型团队的讨论，包括Gamma在内的若干案例，提炼了招聘、文化、运营、技术方面的共性。这个定义很有趣： I previously defined “Tiny Teams” aspirationally as “teams with more m in ARR than employees”

"The Tiny Teams Playbook - by Shawn swyx Wang - Latent.Space"

latent.space

AI Industry

剪藏 2025年7月15日

Amazon 的 Cursor clone

"Introducing Kiro: A new agentic IDE that works alongside you from prototype to production"

kiro.dev

Agents

剪藏 2025年7月14日

机器人动手术了，用了两套Transformer，分别对应语言指令和机械操作

"SRT-H: A Hierarchical Framework for Autonomous Surgery via Language-Conditioned Imitation Learning"

h-surgical-robot-transformer.github.io

Robotics & EmbodiedLLMs

剪藏 2025年7月14日

原来清华TUNA老会长去kimi了

"写在 Kimi K2 发布之后：再也不仅仅是 ChatBot | K.I.S.S"

bigeagle.me

Agents

剪藏 2025年7月12日

非 CoT 版本，但已通过合成工具数据和 RL 内化了 Agentic 能力，MuonClip 优化器亮眼

"Kimi K2: Open Agentic Intelligence"

moonshotai.github.io

Agents

剪藏 2025年7月11日

METR 以每小时 $150 的价格，找了 16 位有经验的开源项目开发者，用 Cursor（Claude 3.5/3.7）做实验对比，发现 AI 反而拖慢了开发速度。后有参与者反馈，觉得实验本身可能还有很多不完善的地方，加上近半年 coding agents 发展飞速，现在再做可能会有不一样的结论。

"Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity - METR"

metr.org

AgentsBenchmarks & Eval

笔记 2025年7月10日

Grok 4 智力登顶，xAI 压力山大

刷题屠榜还是AGI

剪藏 2025年7月10日

抛开质量，只从成本、特性和技术指标来全面比较视频模型

"Compare AI video models – Replicate blog"

replicate.com

Visual Generation

剪藏 2025年7月8日

用“能量模型”来实现通用慢思考

"Energy-Based Transformers are Scalable Learners and Thinkers | Alexi Gladstone"

alexiglad.github.io

LLMs

剪藏 2025年7月6日

Claude Code 上线4个月（Claude 4上线1个半月）的使用数据，非常可观

"Deedy on X: "Claude Code just revealed that it's used by 115k developers and has changed 195M lines of code last week. With many assumptions, this implies a $130M ARR business with $1k+ per dev per yr. I'm not just hyping this. Claude Code Opus is a junior software engineer. https://t.co/eUAgt8XKHI" / X"

x.com

AgentsAI Industry

笔记 2025年7月3日

用 Claude Artifacts 氛围编程做自己的 AI 应用

vibe coding 推荐

AI原生分享

剪藏 2025年7月2日

Figma 要 IPO 了

"Figma Files Registration Statement for Proposed IPO | Figma Blog"

figma.com

AI Industry

剪藏 2025年7月1日

文心4.5如期开源，5款（带base版共10款）不同尺寸、模态和推理，采用Apache协议开源时间线： 2月中，官宣要推出4.5系列并于6月底开源 3月中，一言App上线4.5和X1 4月下，一言App上线4.5-turbo和X1-turbo 根据百度云一些信息，此次开源的为turbo版本，是旗舰吗？

"ERNIE 4.5 模型系列正式开源 | ERNIE Blog"

ernie.baidu.com

LLMsOpen Source

2025年6月

剪藏 2025年6月30日

会讲北京/上海/四川三种方言

"Time to Speak Some Dialects, Qwen-TTS! | Qwen"

qwenlm.github.io

Speech & Audio

剪藏 2025年6月30日

很有意思的探讨，对话UI的优势与局限

"Is chat a good UI for AI? A Socratic dialogue"

geoffreylitt.com

AI Industry

剪藏 2025年6月25日

OpenAI 开放了 Deep Research API，支持搜索、代码执行、MCP等

"Introduction to deep research in the OpenAI API"

cookbook.openai.com

Agents

剪藏 2025年6月17日

MiniMax周第一发：M1，1M输入，80k输出

"MiniMax-AI/MiniMax-M1: MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model."

github.com

LLMsOpen Source

剪藏 2025年6月13日

斯坦福大学SALT实验室对AI时代就业的研究，与Anthropic之前基于Claude使用数据的研究都采用了增强Augmentation和自动Automation两种划分，且提出了一个自动化层级划分HAS（H5是全人工，H1是全AI）

"Future of Work with AI Agents"

futureofwork.saltlab.stanford.edu

Agents

剪藏 2025年6月12日

Redwood：1x 的机器人强化控制 AI 模型，支持 NEO 走跑坐跪

"Redwood AI Mobility | 1X"

1x.tech

Robotics & Embodied

剪藏 2025年6月12日

Meta V-JEPA 来到第二代，1.2B，同步发了三个评测物理理解的 benchmark

"Introducing the V-JEPA 2 world model and new benchmarks for physical reasoning"

ai.meta.com

World Models

剪藏 2025年6月11日

Raindrop 两伙伴高度赞扬了 o3-pro，认为给到充足的上下文后，模型能够非常聪明的理解并完成一份报告，例子是将二人之前对公司的讨论都塞进去让 o3-pro 生成未来规划

"God is hungry for Context: First thoughts on o3 pro"

latent.space

LLMs

剪藏 2025年6月11日

Glean 融了F轮1.5亿，估值来到72亿美元

"Glean raises $150M Series F at $7.2B valuation to transform how companies use AI to accelerate innovation"

glean.com

AI Industry

剪藏 2025年6月11日

OpenAI o3-pro，没有单独发稿，放在了模型更新日志中。相比 o3，胜率大致在 65% 上下，不过没有透露 o3 是 medium 还是 high；引入了一个新的比较，在 AIME、GPQA、Codeforces 等榜单上4次都答对才算对，用来评估可靠性。

"Model Release Notes | OpenAI Help Center"

help.openai.com

LLMs

剪藏 2025年6月11日

Sam Altman 新博客，奇点叙事，几个细节： - Sam 的时间线：2025-Agent&Coding；2026-AI创新；2027-机器人实用；2030-有想法就行；2035-难以想象/科幻 - ChatGPT 平均每次请求耗电 0.34 瓦时 + 耗水 0.000085 加仑，折算下来 1度电 + 1升水大约可以请求 3000 次 - 自强化循环（机器人造机器人造芯片&数据中心）会持续加速 - 面前两个重要方向：解决安全对齐问题，然后让推理更便宜

"The Gentle Singularity - Sam Altman"

blog.samaltman.com

AI Industry

剪藏 2025年6月11日

a16z关于企业使用GenAI的调研，相比24年第一版： - 推理模型加速了应用落地 - 随着模型能力提升，精调不如去年必需

"How 100 Enterprise CIOs Are Building and Buying Gen AI in 2025 | Andreessen Horowitz"

a16z.com

AI Industry

剪藏 2025年6月10日

港大团队基于 discreet flow matching 做的多模态统一模型 FUDOKI（风土记）

"FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities"

fudoki-hku.github.io

Multimodal

剪藏 2025年6月10日

苹果更新了端侧和云端模型，且可供开发者调用

"Updates to Apple's On-Device and Server Foundation Language Models - Apple Machine Learning Research"

machinelearning.apple.com

LLMs

剪藏 2025年6月10日

OpenAI 年化收入来到百亿美元

"OpenAI hits $10 billion in annualized revenue fueled by ChatGPT growth"

cnbc.com

AI Industry

剪藏 2025年6月7日

前进四、端侧AGI，东西是好东西，就是这个宣传…面壁是融资后换PR团队了么？

"最高220倍加速！面壁小钢炮4.0，稀疏创新黑科技大爆发"

mp.weixin.qq.com

LLMs

剪藏 2025年6月7日

Cursor C轮融资9亿、估值来到99亿美元

"Series C and Scale | Cursor - The AI Code Editor"

cursor.com

AI Industry

剪藏 2025年6月6日

Claude Code 在 Anthropic 内部不同岗位的应用报告：https://clau.de/how-anthropic-teams-use-claude - 数据科学家用来构建ML可视化应用 - Infra团队用来做安全检查 - 市场团队用来自动化投放 - 设计师直出修改 - Claude Code自己写Claude Code

"cat on X: "Since we originally built Claude Code as an internal tool, we've heard a ton of questions about how our teams use it at Anthropic. Here’s an inside look on how our teams—from product engineering, to growth marketing, to legal—use Claude Code: https://t.co/YnCpVZHEqA" / X"

x.com

AI Industry

剪藏 2025年6月6日

Eleven v3: 精细化控制、表达力极强！门外汉看demo感觉几乎是配音级别

"Eleven v3: Most Expressive AI Text to Speech Model Launched | ElevenLabs"

elevenlabs.io

Speech & Audio

剪藏 2025年6月6日

ChatGPT产品负责人对人机关系的思考

"Some thoughts on human-AI relationships - by Joanne Jang"

reservoirsamples.substack.com

AI Industry

剪藏 2025年6月5日

将 LLM 的文本 token 输出替换为语音 token 来训练得到新颖的 TTS 模型。不过这跟 voice-2-voice 相比有何优势？

"Reimagining TTS with LLM-Powered Audio Generation | Bland AI"

bland.ai

Speech & AudioLLMs

剪藏 2025年6月5日

Cursor 发布了 1.0

"Changelog - Jun 4, 2025 | Cursor - The AI Code Editor | Cursor - The AI Code Editor"

cursor.com

AI Industry

剪藏 2025年6月5日

HeyGen 发布了 AI Studio，用 Office 的方式精细化编辑视频

"Revolutionize Video Creation with AI Studio | HeyGen"

heygen.com

Visual Generation

剪藏 2025年6月4日

Gemini 2.5 原生语音回顾，可在 AI Studio 体验： - 语音对话：可以思考、工具执行 - TTS：可通过提示词控制细节，双人播客

"Gemini 2.5’s native audio capabilities"

blog.google

Speech & AudioLLMs

剪藏 2025年6月4日

长见识，点击原文才知道一份报告可以价值$50000

"AI Coding市场迎来爆发期，IDC发布一季度中国市场代码生成产品评估"

mp.weixin.qq.com

AI Industry

剪藏 2025年6月4日

前向部署工程师 FDEs 逐渐流行

"Michelle Lim on X: "The trending role in AI startups isn't AI engineers—it's Forward Deployed Engineers (FDEs). Several top AI companies I know are hiring FDEs en masse. This role is now surpassing "PM" as the coveted position for technical generalists. What's an FDE? They're software engineers who" / X"

x.com

AI Industry

剪藏 2025年6月3日

扩散路线的声音编辑模型

"Meet PlayDiffusion – our newest voice model for inpainting"

blog.play.ai

Speech & Audio

剪藏 2025年6月1日

得益于代码生成，Anthropic 年化营收从 5 个月年的 1B 来到了现在的 3B

"Exclusive: Anthropic hits $3 billion in annualized revenue on business demand for AI | Reuters"

reuters.com

AI Industry

2025年5月

剪藏 2025年5月31日

根据Anthropic在Trust Center的更新日志，Claude app里的语音模式用的是ElevenLabs的方案

"Trust Center - Anthropic"

trust.anthropic.com

AI Industry

剪藏 2025年5月31日

AI辅助做的科技树，很酷！

"Introducing the Historical Tech Tree"

hopefulmons.com

AI Industry

剪藏 2025年5月31日

4个月前还是1.0，2.0亮点是更智能的对话轮次识别、多语种自然切换、RAG

"ElevenLabs Conversational AI 2.0 voice agents now live | ElevenLabs"

elevenlabs.io

Speech & AudioAgents

剪藏 2025年5月30日

PPL 推出了 Labs，主要是写代码生成可分享的项目

"Introducing Perplexity Labs"

perplexity.ai

AI Industry

剪藏 2025年5月30日

黑森林推出了 Flux Kontext，基于上下文的图像编辑，效果很不错

"Black Forest Labs - Frontier AI Lab"

bfl.ai

Visual Generation

剪藏 2025年5月30日

TL;DR：为了省钱，用另一个LLM模拟搜索引擎来强化模型的 agentic search 能力

"ZeroSearch: Incentivize the Search Capability of LLMs without Searching"

alibaba-nlp.github.io

Agents

剪藏 2025年5月30日

MCP 新加了 Elicitation，一个客户端特性，允许客户端以UI形态向用户发起结构化输入的请求，来实现 human-in-the-loop 的交互

"Elicitation - Model Context Protocol"

modelcontextprotocol.io

Infra & Compute

剪藏 2025年5月30日

Resemble 开源发布的 TTS，可控性极佳，5s音色克隆，称盲测优于 ElevenLabs

"Chatterbox - Free Open Source Text to Speech Model | Resemble AI"

resemble.ai

Speech & AudioOpen Source

剪藏 2025年5月30日

从 SEO 到 GEO，最好的例子就是技术栈（shadcn/ui）和服务商（Vecel）的选择

"How Generative Engine Optimization (GEO) Rewrites the Rules of Search | Andreessen Horowitz"

a16z.com

AI Industry

剪藏 2025年5月29日

世界模型：可接收输入信号的实时视频生成模型继去年局限于 MineCraft 的 Oasis，Odyssey 这次在场景上更通用，故事讲的也更圆感觉视频/游戏/世界模型未来紧密相关

"AI video you can both watch and interact with in real-time"

odyssey.world

World ModelsVisual Generation

剪藏 2025年5月29日

为啥又是第一个？不同软件开发智能体的差异点究竟在哪？

"Factory is GA: Droids for the Entire SDLC"

factory.ai

AgentsAI Industry

剪藏 2025年5月29日

Retool Agents：按小时收费。根据模型不同，每小时3-175刀：https://docs.retool.com/data-sources/concepts/models#retool-agents-pricing

"Retool Blog | 100 million hours of automated work and counting: Retool launches Agents"

retool.com

AgentsAI Industry

剪藏 2025年5月28日

不用外部奖励，让模型根据自信度的内部信号进行强化

"Learning to Reason without External Rewards"

arxiv.org

LLMs

剪藏 2025年5月28日

随机奖励、甚至错误奖励也能强化模型推理能力？

"💭 Spurious Rewards: Rethinking Training Signals in RLVR"

rethink-rlvr.notion.site

LLMs

剪藏 2025年5月28日

AI 设计网页的一些提示词经验，有一定通用参考价值

"Tips for Prompting | Aura Design Learning Center"

aurachat.io

LLMs

剪藏 2025年5月28日

Claude Code 官方教程中还有教你用 git worktrees 并行多组完成任务的，可以择优 merge

"Run parallel Claude Code sessions with Git worktrees"

docs.anthropic.com

AI Industry

剪藏 2025年5月28日

3D 场景生成新选手，一个简短的演示，种子轮$13M

"Introducing SpAItial: The next dimension in intelligence"

spaitial.ai

World ModelsAI Industry

剪藏 2025年5月28日

Robert Yang 团队（之前做 MineCraft Agent 实验的 Altera，现改名为 Fundamental）推出的桌面端通用 Agent，亮点是无需候补、马上下载可用..

"Fairies AI - Computer Magic | The Best General Purpose Agent for Builders"

fairies.ai

Agents

剪藏 2025年5月28日

专做企业内部应用的 Agent，采用 multi-agent 架构，包括设计、开发、测试等。没明白区别于 Blot、Lovable 等的魔法是啥。

"Announcing $60m for Clark: the first AI agent to build internal enterprise apps"

superblocks.com

AgentsAI Industry

剪藏 2025年5月28日

画布式 AI 设计工具，画布上每个组件对应一个 AI 对话，转为代码需要花钱。人人都想革了 Figma 的命。

"Pietro Schirano on X: "Introducing MagicPath, an infinite canvas to create, refine, and explore with AI. Create beautiful components and functional apps, while providing production ready code. Available today, free, for everyone. The Cursor moment for design is here. https://t.co/MpdBCnivoC" / X"

x.com

Agents

剪藏 2025年5月27日

除了对话，AI应用还可以有哪些形态？

"Hiten Shah on X: "I was curious who’s building AI interfaces that aren’t chat. So I asked. The responses were thoughtful, wide-ranging, and honestly a bit ahead of where I expected things to be. Here’s what I learned: 1. Auto-Built UIs - AI creates the interface you need on the fly. Instead of" / X"

x.com

AI Industry

剪藏 2025年5月27日

上个月 HuggingFace 联创 Thomas Wolf 写的一篇语音模型技术介绍，有很多不错的演示动画

"🎙️ Speech AI models: an introduction"

thomwolf.io

Speech & Audio

剪藏 2025年5月26日

Waymo Co-CEO 在 I/O 上的分享，每周百万单+，累计千万英里，4个城市

"Waymo: AI in the physical world powering the future of driving - YouTube"

youtube.com

Robotics & EmbodiedAI Industry

剪藏 2025年5月24日

优秀的AI作品会凭极佳的审美脱颖而出

"The Way of Code | Rick Rubin"

thewayofcode.com

AI Industry

剪藏 2025年5月24日

Kyutai（去年开源voice2voice模型Moshi）发布了STT+TTS新作Unmute，声音交互的设计超级棒

"Unmute by Kyutai"

unmute.sh

Speech & Audio

剪藏 2025年5月23日

Claude 4：Opus 和 Sonnet，主要提升为编程和长线任务能力，上下文均为 200k，价格不变

"Introducing Claude 4 \ Anthropic"

anthropic.com

LLMsAI Industry

剪藏 2025年5月22日

OpenAI 在 Responses API 中增加了对远程 MCP 的支持，实现优雅，开发者使用便捷，可能是 MCP 和 LLM API 结合的典型路径

"New tools and features in the Responses API | OpenAI"

openai.com

Infra & Compute

剪藏 2025年5月22日

字节 Seed 开源的统一多模态理解-生成模型，可以理解、生成、各种编辑图像，还有CoT，通过对话来自由地调整，但从demo看还需要手动选择输出文本还是图像，是一个局限 PS：项目官网做的有OpenAI那味了

"BAGEL: The Open-Source Unified Multimodal Model"

bagel-ai.org

MultimodalOpen Source

剪藏 2025年5月22日

上个月的一份研究，有一些搜索和chatbot的数据对比可以参考

"AI Chatbots vs Search Engines: 24-Month Study on Traffic Trends"

onelittleweb.com

AI Industry

剪藏 2025年5月21日

虚拟试衣，还有一个 agentic checkout，Google 讲了一个 inspire-shop-pay 的 AI 购物故事线

"Shopping on Google: AI Mode and virtual try-on updates from I/O 2025"

blog.google

AgentsVisual GenerationAI Industry

剪藏 2025年5月21日

走扩散路线的 Gemini，主攻数学和代码，生成速度高达1.5-2k token/s

"Gemini Diffusion - Google DeepMind"

deepmind.google

LLMs

剪藏 2025年5月21日

Gemma 3n，和 Gemini Nano 技术同源，支持图片理解，后续还有语音和视频；与高通/联发科/三星等合作，专为端侧设计，通过 MatFormer 和 Per-Layer Embedding 技术降低内存开销，使 5B 和 8B 尺寸的模型所需资源与原本 2B 和 4B 的模型相当

"Announcing Gemma 3n preview: powerful, efficient, mobile-first AI - Google Developers Blog"

developers.googleblog.com

Multimodal

剪藏 2025年5月21日

在视频模型上精调，然后通过提示词来批量生成样本，进而训练机器人操作，免除了对遥操作数据的依赖

"DreamGen: Unlocking Generalization in Robot Learning through Neural Trajectories"

research.nvidia.com

Robotics & EmbodiedVisual Generation

剪藏 2025年5月21日

美团出的 vibe coding 工具

"NoCode-零代码应用生成平台"

nocode.cn

剪藏 2025年5月21日

伯克利开源的 3D 打印人形机器人

"Berkeley Humanoid Lite: An Open-source, Accessible, and Customizable 3D-printed Humanoid Robot"

lite.berkeley-humanoid.org

Robotics & EmbodiedOpen Source

剪藏 2025年5月21日

后 DeepSeek 时期融资不易，面壁的端侧战略算是稳扎稳打 PS：资方还有茅台基金

"面壁智能获新一轮数亿元融资，引领端侧大模型高效发展与应用普及"

mp.weixin.qq.com

AI Industry

剪藏 2025年5月21日

Demis Hassabis 如何讲 DeepMind 的故事： - Gemini 目标是通用AI助理，世界模型是必经之路 - AlphaGo、StarCraft → Genie 2 - Veo → Robotics - Astra/Live、Mariner → 助理动作

"Google I/O 2025: Gemini as a universal AI assistant"

blog.google

LLMsWorld ModelsAI Industry

剪藏 2025年5月20日

Windows AI 开发大礼包： - AI Foundry：开源模型、ML、API、LoRA、RAG - 原生 MCP 支持：Registry 提供可信 MCP 列表，Servers 提供 Windows 系统能力 - App Actions：类似于 App Intents？ - WSL 开源

"Advancing Windows for AI development: New platform capabilities and tools introduced at Build 2025 - Windows Developer Blog"

blogs.windows.com

Open SourceInfra & Compute

剪藏 2025年5月20日

应对 Cursor、Windsurf 等新秀，微软把 VS Code 里的 GitHub Copilot 扩展开源了，当然后端的模型服务保持闭源

"VS Code: Open Source AI Editor"

code.visualstudio.com

Open SourceAI Industry

剪藏 2025年5月18日

Lilian Weng 关于“思考大模型”的新综述，有一定阅读门槛，但非常推荐

"Why We Think | Lil'Log"

lilianweng.github.io

LLMs

剪藏 2025年5月16日

云端开发 Agent，背后是基于 o3 强化精调的 codex-1；云端容器不联网，只能通过初始化脚本配置环境、安装依赖；有一个基于 o4-mini 的 codex-mini 可用于 Codex CLI

"Introducing Codex | OpenAI"

openai.com

Agents

剪藏 2025年5月16日

MiniMax 的 TTS 模型 speech-2-hd 登顶盲测榜

"AI语音的Her Moment: 个性化交互达到临界点"

mp.weixin.qq.com

Speech & AudioAI Industry

剪藏 2025年5月16日

HuggingFace 关于 VLM 的梳理，写得很不错

"Vision Language Models (Better, faster, stronger)"

huggingface.co

Multimodal

剪藏 2025年5月16日

ElevenLabs 音效板

"Custom Soundboard Creator - SB1 Infinite Soundboard with AI SFX | ElevenLabs"

elevenlabs.io

Speech & Audio

剪藏 2025年5月16日

Windsurf 训练了自己的 SWE-1 系列模型，没看懂强在哪里

"SWE-1: Our First Frontier Models"

windsurf.com

Agents

剪藏 2025年5月16日

基于扩散模型的光照控制生成，主页的交互式demo很好玩

"LightLab: Controlling Light Sources in Images with Diffusion Models"

nadmag.github.io

Visual Generation

剪藏 2025年5月16日

Ollama 认为 llama.cpp 现有的多模态方案不够优雅，比如需要额外的 vision projector，所以他们自己搞了一套解决方案

"Ollama's new engine for multimodal models · Ollama Blog"

ollama.com

MultimodalOpen Source

剪藏 2025年5月15日

阶跃星辰的 3D 模型，多模态全面布局的一环

"Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets"

stepfun-ai.github.io

World ModelsMultimodal

笔记 2025年5月15日

2025年了，如何下载一个网络视频

神器，但特别情况还需对症下药

AI原生分享

剪藏 2025年5月15日

mem0 出品，可跨客户端共享的本地记忆 MCP 插件

"Introducing OpenMemory MCP"

mem0.ai

Open Source

剪藏 2025年5月15日

OpenAI 汇总了一些安全相关的评测，包括有害、越狱、幻觉、层次指令遵循等，披露了自家模型的表现

"Safety evaluations hub | OpenAI"

openai.com

Benchmarks & EvalSafety & Alignment

剪藏 2025年5月15日

Google DeepMind 的编程助理 AlphaEvolve，帮助写数据中心调度算法、芯片设计、甚至是自己的训练代码

"AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms - Google DeepMind"

deepmind.google

Agents

剪藏 2025年5月15日

阿里通义万象视频模型 Wan2.1 集成进了全面的编辑能力 VACE

"Wan on X: "✨ All in One, Wan for All✨ We are excited to introduce our latest model to our talented community creators: Wan2.1-VACE, All-in-One Video Creation and Editing model. Model size: 1.3B, 14B License: Apache-2.0 📌 Wan2.1-VACE provides solutions for various tasks, including https://t.co/yiQRVhXpop" / X"

x.com

Visual Generation

剪藏 2025年5月14日

Meta 的 3D 生成模型，目前内部用，后面会开放给 Horizon 创作者

"Introducing Meta 3D AssetGen 2.0: A new foundation model for 3D content creation | Meta Horizon OS Developers"

developers.meta.com

World Models

剪藏 2025年5月14日

字节 Seed 团队将 GRPO 用于视觉生成

"DanceGRPO: Unleashing GRPO on Visual Generation"

dancegrpo.github.io

Visual Generation

剪藏 2025年5月14日

天工版 Oasis

"Matrix-Game: Interactive World Foundation Model"

matrix-game-homepage.github.io

World Models

剪藏 2025年5月14日

作为聚合平台，Poe 发了模型使用报告，值得留意的是 - 视频方面可灵赶超 Runway - 语音方面 ElevenLabs 遥遥领先

"Report: Spring 2025 AI Model Usage Trends - Poe"

poe.com

Visual GenerationSpeech & AudioAI Industry

剪藏 2025年5月13日

OpenAI 的 Health AI 团队与来自 60 个国家的 262 名医师合作，构建了 HealthBench 评测数据集，包含 5000 条真实的医疗问诊对话和打分，涵盖急症、不确定性等主题。o3 表现最优。

"Introducing HealthBench | OpenAI"

openai.com

Benchmarks & Eval

剪藏 2025年5月13日

Sakana 新作，没太看懂…

"Introducing Continuous Thought Machines"

sakana.ai

LLMs

剪藏 2025年5月12日

只用一个样本来做强化学习

"Reinforcement Learning for Reasoning in Large Language Models with One Training Example"

arxiv.org

LLMs

剪藏 2025年5月12日

Nature 子刊 Humanities and Social Sciences Communications，通过元分析（基于已有多份独立研究进行综合分析的统计方法）探讨了 ChatGPT 对学生学习的影响，总体表现出积极影响

"The effect of ChatGPT on students’ learning performance, learning perception, and higher-order thinking: insights from a meta-analysis | Humanities and Social Sciences Communications"

nature.com

AI Industry

剪藏 2025年5月11日

有趣，Bocconi 大学团队用图网络预测出了新教皇

"In the Network of the Conclave - Bocconi University"

unibocconi.it

AI Industry

剪藏 2025年5月11日

基于 MindCraft —— 一个将 LLM 引入 MineCraft 的框架 —— 的多智能体协作研究，结论是当前 SoTA 的 LLM 在协作时仍会因缺乏有效的语言沟通而导致任务表现下降达 15%，建造、烹饪、收集三种任务中均出现了令人哭笑不得的失败案例，包括但不限于“你建地基我来拆”、“忘记任务跑偏闲谈”等等

"Collaborating Action by Action"

mindcraft-minecollab.github.io

Agents

剪藏 2025年5月10日

曾在 OpenAI 担任 Science Communicator 的 Andrew Mayne 分享 GPT-4 发布趣事： - ChatGPT 打乱了 GPT-4 的节奏 - ChatGPT 发布前夜 Ilya 测试仍不满意 - GPT-4 发布请了 PBS Spacetime 的创作团队来做视频，并定了绿白横条状的 logo - 团队小且扁平，发布时不带 title - 请了个公司帮忙给 GPT-4 起名… - GPT-3.5 持续提升，导致 GPT-4 发布时在部分任务上反而不如上代，比如下国际象棋 - GPT-4 可以通过抽帧理解视频，但听闻 Gemini 正在研发原生的视频理解，咨询研究团队后未宣传，结果后来 Gemini 发布后发现还是用的抽帧…推测 DeepMind 的研究员会很困扰…

"Inside the Launch of GPT-4 – @AndrewMayne"

andrewmayne.com

AI Industry

剪藏 2025年5月10日

看到 FT 一篇报道中的美科技公司软件工程师招聘数的变化图，溯源到了 Zeki 的这份人才报告，PDF下载

"The State of AI Talent 2025 - Zeki"

zekidata.com

AI Industry

剪藏 2025年5月10日

电信牵头搞一个魔乐，在阿里魔搭外提供了又一个选项，引阿里百度加入；都想做中国的 HuggingFace，但是替补又替补，能上场的有多少？从目前还不过万的模型数看，言之尚早

"AI开源社区来了国家队！华为百度第一时间加入 | 量子位"

qbitai.com

Open SourceAI Industry

剪藏 2025年5月9日

Claude Code：活跃用户平均每天花费 $6；团队思考：有限资源有助于保持聚焦和简洁

"Claude Code: Anthropic's Agent in Your Terminal"

latent.space

Agents

剪藏 2025年5月9日

Stripe 训了一个基于 transformer 的 embedding 模型，用来检测可疑交易

"Gautam Kedia on X: "TL;DR: We built a transformer-based payments foundation model. It works. For years, Stripe has been using machine learning models trained on discrete features (BIN, zip, payment method, etc.) to improve our products for users. And these feature-by-feature efforts have worked" / X"

x.com

LLMs

剪藏 2025年5月9日

有趣，AI实时生成多人游戏，相比去年的 Oasis 更进一步

"Introducing Multiverse: The First AI Multiplayer World Model"

enigma-labs.io

World Models

剪藏 2025年5月9日

通过自博弈训练摆脱人工数据依赖： - proposer 提出问题，奖励那些合适的，即 solver 时胜时败的 - solver 解决问题，代码解释器来验证

"Absolute Zero Reasoner"

andrewzh112.github.io

World ModelsOpen Source

剪藏 2025年5月8日

Gemini 2.0 Flash 的生图功能也进入了 preview，局部编辑非常稳，不会像 GPT-4o 那样随意篡改，好奇背后究竟是同一个模型还是把一些奇技淫巧封装进 API 了；AI Studio 中还有一个非常好玩的 Gemini Co-Drawing 值得一试

"Create and edit images with Gemini 2.0 in preview - Google Developers Blog"

developers.googleblog.com

Multimodal

剪藏 2025年5月8日

OpenAI 推出国家合作计划，初步规划与10个国家建立合作，基于本地数据中心、用本地化的 ChatGPT 为公民提供教育医疗等公共服务，并联动本地资本推动 AI 创新，是 OpenAI 推进星际之门计划全球化的一部分

"Introducing OpenAI for Countries | OpenAI"

openai.com

AI Industry

剪藏 2025年5月8日

ACE Studio 和阶跃星辰联手推出的音乐生成模型，社区评价≤Suno4

"ACE-Step: A Step Towards Music Generation Foundation Model"

ace-step.github.io

Speech & Audio

剪藏 2025年5月7日

Anthropic 的 AI4S 计划

"Introducing Anthropic's AI for Science Program \ Anthropic"

anthropic.com

AI Industry

剪藏 2025年5月7日

GenSpark Super Agent 一个月 ARR 已经到了 22M，这种算法感觉是在鼓励短期的病毒营销，最大化利用尝鲜式的短期订阅

"Eric Jing on X: "Genspark Super Agent, 1 month, $22M ARR! 🚀 This might make us the fastest-growing startup ever in terms of ARR. THANK YOU SO MUCH! Deeply grateful for the incredible support for Super Agent and AI Slides! And we're still only at 10% of our progress, with another exciting https://t.co/nSWNLsWGUe" / X"

x.com

AI Industry

剪藏 2025年5月7日

Gemini 2.5 Pro 从 experimental 升级进入 preview，编码能力进一步提升，竞技场全方位第一，但部分 benchmark 却有倒退

"Gemini Pro - Google DeepMind"

deepmind.google

LLMs

剪藏 2025年5月6日

最流行的 Python Web 开发框架之一 FastAPI 受到了 Sequoia 种子投资，开始提供一键部署的云服务

"FastAPI Cloud - By The Same Team Behind FastAPI - FastAPI Cloud — You code. We Cloud."

fastapicloud.com

Infra & Compute

剪藏 2025年5月6日

非盈利实体仍控制 OpenAI；盈利 LLC 转为有法定社会责任的 PBC，与 Anthropic 和 X.ai 相同

"Evolving OpenAI’s structure | OpenAI"

openai.com

AI Industry

剪藏 2025年5月5日

Sam Altman 牵头的世界币搞了一个世界ID来服务AI时代的人机验证

"At Last, Trust In the Age of AI"

world.org

AI IndustrySafety & Alignment

剪藏 2025年5月3日

OpenAI 对近期 ChatGPT 讨好问题的完整复盘，粗糙地讲了更新模型的流程、评估等，要点： - 4o 上线后共更新了 5 次 - 错在优先考虑了 A/B 测试的好评而非专家意见 - 将在安全评估之外，增加性格相关的一票否决机制

"Expanding on what we missed with sycophancy | OpenAI"

openai.com

Safety & Alignment

剪藏 2025年5月3日

Andrej Karpathy 在一场黑客松上 vibe coding 开发了一款将文字菜单变成图片的 AI 应用 MenuGen，但更精华的是背后的故事和令人忍俊不禁的心路历程，web 开发不简单哈哈

"Vibe coding MenuGen | karpathy"

karpathy.bearblog.dev

Visual Generation

剪藏 2025年5月3日

硅谷一家专注 AI4S 的非盈利机构 FutureHouse（主要由 Eric Schmidt 支持）发布了包含 Crow/Falcon/Owl/Phoenix 在内的科研 Agent 家族

"FutureHouse Platform: Superintelligent AI Agents for Scientific Discovery | FutureHouse"

futurehouse.org

AgentsOpen Source

笔记 2025年5月2日

帮你「作弊」的 AI

年轻的团队，疯狂的产品

剪藏 2025年5月1日

Gamma 达 5000万美刀 ARR： - 去年A轮$12M时团队16人，现在30人 - 已有250M gammas，每日新增700K

"AI Startup Gamma Reaches $50 Million in ARR, Profitability"

upstartsmedia.com

AI Industry

剪藏 2025年5月1日

Yohei桑vibe coding一周开发的VC数据库网站，从社交媒体等各路信源抓取投融资事件，AI清洗梳理，形成数据看板

"VCpedia - Startup Funding Intelligence"

vcpedia.com

AI Industry

剪藏 2025年5月1日

针对近期 ChatGPT 个性问题的一些回复，要点： - 系统提示词对模型的约束有限且不可控 - 持续研究 steerability，会给用户更多选择 - 【在考虑】ChatGPT 主动发起会话

"AMA with OpenAI’s Joanne Jang, Head of Model Behavior : r/ChatGPT"

reddit.com

Safety & Alignment

剪藏 2025年5月1日

可穿戴AI，个性化记忆&成长，7天续航，双麦滤噪，静音按钮，$50 总结：语音+AI的小米手环，重在软件

"bee on X: "Introducing Bee: the first wearable AI designed to live alongside you. It captures your daily moments and turns them into memories, insights, and actions. Already thousands delivered. Now available in U.S. for $49.99. https://t.co/kZgRRENK9j" / X"

x.com

AI Industry

剪藏 2025年5月1日

小米的推理模型试水，在 7B 尺度上尝试复现DeepSeek-R1

"XiaomiMiMo/MiMo: MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining"

github.com

LLMsOpen Source

2025年4月

剪藏 2025年4月30日

用64张H100x2个月在80M版权图片训练出来的开源文生图模型，强调没有版权风险，但模型表现有局限

"F Lite: Freepik & Fal.ai unveil an open-source image model trained on licensed data | Freepik Blog"

freepik.com

Visual GenerationOpen Source

笔记 2025年4月29日

Qwen3：混合推理新路标，开源落地抢生态

4月底阿里巴巴通义千问团队开源发布 Qwen3 系列模型，支持快慢思考混合推理，在不少性能指标上赶超 DeepSeek-R1、OpenAI-o1 等模型，一上线便引起热议并冲上 HuggingFace 热榜。Qwen3 做出了哪些创新、会产生什么意义，本文尝试分析解读。

Open SourceAI Model Innovation

剪藏 2025年4月29日

混合推理、Dense+MoE、全图谱全生态

"Qwen3: Think Deeper, Act Faster | Qwen"

qwenlm.github.io

LLMs

剪藏 2025年4月28日

受限于当前多数 MCP 仍需本地执行，纳米 AI 的 MCP 智能体要求用户下载客户端方能体验

"纳米AI放大招！MCP万能工具箱，人人都能用上超级智能体"

mp.weixin.qq.com

Agents

剪藏 2025年4月27日

E2B专为Agent设计的虚拟机沙盒

"Why Every Agent needs Open Source Cloud Sandboxes"

latent.space

Agents

剪藏 2025年4月27日

DeepMind 音乐模型 Lyria 2 加入音乐工具 Music AI Sandbox

"Music AI Sandbox, now with new features and broader access - Google DeepMind"

deepmind.google

Speech & Audio

剪藏 2025年4月26日

桌面浏览器版 CUA/computer use

"Introducing Simular"

simular.ai

Agents

剪藏 2025年4月26日

2025-04-21 Sand.ai 发布了 Magi-1 自回归视频生成模型

"对话Sand.ai曹越：离sora更远，离终局更近"

mp.weixin.qq.com

AI Industry

剪藏 2025年4月26日

Kimi 推出的端到端语音模型，但为了开源语言模型基座用的是 Qwen 2.5 7B

"MoonshotAI/Kimi-Audio: Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation"

github.com

Speech & AudioMultimodalOpen Source

剪藏 2025年4月26日

Kai Chen（OpenAI研究员、Alignment研究负责人，加拿大人，可能是华裔）因为绿卡被拒不得不离开SF、回温哥华远程工作，引起了一些关于特朗普政府移民政策的讨论

"An OpenAI researcher who worked on GPT-4.5 had their green card denied | TechCrunch"

techcrunch.com

AI Industry

剪藏 2025年4月25日

Pete Koomen（YC GP）认为当前不少AI应用是内燃机刚出现时的无马马车，缺乏针对AI的设计

"AI Horseless Carriages"

koomen.dev

AI Industry

剪藏 2025年4月25日

Lovable 2.0 增加了多人协作等功能；但随后被人指责完成度不足、非原创等

"Introducing Lovable 2.0 – now smarter, multiplayer, and more secure - Lovable Blog"

lovable.dev

AI Industry

剪藏 2025年4月25日

Anthropic CEO Dario Amodei 发文强调 LLM 可解释性研究的紧迫性

"Dario Amodei — The Urgency of Interpretability"

darioamodei.com

Interpretability

剪藏 2025年4月24日

微软WorkLab刚发了名为《2025: The Year the Frontier Firm Is Born》的工作趋势报告，包含与工作组织相关的三个核心洞察： 1: You can buy intelligence on tap 2: Human-agent teams will upend the org chart 3: Every employee becomes an agent boss

"2025: The Year the Frontier Firm Is Born"

microsoft.com

AI Industry

剪藏 2025年4月22日

字节版 Cursor 更新 MCP 支持。PS：发现豆包生成代码后会向 Trae 导流。

"Collaborate with Intelligence | Trae - Ship Faster with Trae"

trae.ai

Agents

剪藏 2025年4月22日

超拟人的对话 TTS，来自小团队

"nari-labs/dia: A TTS model capable of generating ultra-realistic dialogue in one pass."

github.com

Speech & AudioOpen Source

笔记 2025年4月19日

豆包生图，超越GPT-4o？

背后还有路线之争

剪藏 2025年4月18日

ChatGPT 新上的全局对话历史记忆，并没有用 RAG，而是通过总结提炼用户画像、然后加入系统提示词来实现的

"Tibor Blaho on X: "I was wondering how ChatGPT's new memory (codename "Moonshine") actually works - here's what I've found so far The system prompt has multiple new sections - the first one is "Model Set Context", which lists stored memories (old): ``` # Model Set Context 1. [2024-09-05]. User's https://t.co/9fblzjDqf8" / X"

x.com

LLMs

剪藏 2025年4月18日

论文中归纳的 GUI agent 演化路径

"UI-TARS：Next-generation native GUI agent model designed to interact seamlessly with GUIs using human-like perception"

seed-tars.com

Agents

剪藏 2025年4月18日

混合推理模型，可以控制CoT长度

"Start building with Gemini 2.5 Flash - Google Developers Blog"

developers.googleblog.com

LLMs

剪藏 2025年4月17日

微软还在推 Recall

"Retrace your steps with Recall - Microsoft Support"

support.microsoft.com

AI Industry

剪藏 2025年4月16日

带工具强化学习，练出来就是会思考的agent

"Introducing OpenAI o3 and o4-mini | OpenAI"

openai.com

LLMsAgents

剪藏 2025年4月15日

智谱买了 z.ai 域名，开源了 GLM-4 系列

"大模型六小龙，第一个 IPO 要来了 | 极客公园"

geekpark.net

AI Industry

剪藏 2025年4月8日

开发者HiDream.ai智象未来

"HiDream-ai/HiDream-I1"

github.com

Visual GenerationOpen Source

剪藏 2025年4月1日

OpenAI 官宣了来自软银的新一轮400亿美元融资，估值3000亿美元，同时透露ChatGPT周活已超5亿

"New funding to build towards AGI | OpenAI"

openai.com

AI Industry

2025年3月

剪藏 2025年3月23日

英伟达 ADLR 实验室也发布了一系列混合架构模型Nemotron-H；非推理，可能和混元 Turbo S 类似

"Nemotron-H: A Family of Accurate, Efficient Hybrid Mamba-Transformer Models - NVIDIA ADLR"

research.nvidia.com

LLMs

剪藏 2025年3月19日

为 macOS Finder 设计的一款 AI 插件，很酷，用 LLM 帮你整理、处理文件

"Introducing Substage: A natural language command bar for Finder"

selkie.design

Agents

2025年2月

剪藏 2025年2月28日

可爱的AI陪伴

"Introducing Tolan | Tolans.com"

tolans.com

AI Industry

剪藏 2025年2月27日

Inception Labs发布了Mercury系列大语言模型，与行业普遍采用的自回归next token prediction方案不同，Mercury属于扩散大语言模型（diffusion large language models，dLLMs），类似于扩散生图模型，推理时采用coarse-to-fine的去噪解码形式。 Inception Labs声称Mercury是首个商业级别dLLM，速度比自回归模型快10倍，主打代码补全Copilot场景。 Mercury Coder可以在 https://chat.inceptionlabs.ai/ 体验，记得勾选右上角diffusion effect开关，效果很酷。

"Introducing Mercury, the first commercial-scale diffusion large language model"

inceptionlabs.ai

LLMsAI Industry

剪藏 2025年2月3日

vibe coding 起源

"Andrej Karpathy on X: "There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper" / X"

x.com

LLMs

2025年1月

剪藏 2025年1月10日

斯坦福团队研究发现，如果以驭使大模型为目标，SAE是非常低效的

"AXBENCH: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders"

arxiv.org

Benchmarks & EvalInterpretability

剪藏 2025年1月3日

scaling law 巨浪之下，焉有创新否

"The Bitter Lesson"

incompleteideas.net

AI Industry

2024年12月

剪藏 2024年12月18日

o1医学诊断超过医生

"[2412.10849] Superhuman performance of a large language model on the reasoning tasks of a physician"

arxiv.org

LLMs

剪藏 2024年12月18日

AI 会计

"Basis | About"

getbasis.ai

AI Industry

剪藏 2024年12月11日

How AI agents are reshaping the future of work

"AI agents and multiagent systems | Deloitte US"

www2.deloitte.com

AgentsAI Industry

剪藏 2024年12月6日

Pleias 1.0 fully open SLMs

"They Said It Couldn’t Be Done"

huggingface.co

LLMsOpen Source

剪藏 2024年12月5日

OpenAI与国防企业合作

"Anduril Partners with OpenAI to Advance U.S. Artificial Intelligence Leadership and Protect U.S. and Allied Forces | Anduril"

anduril.com

AI Industry

剪藏 2024年12月3日

a16z VSaaS 2

""AI Inside" Opens New Markets for Vertical SaaS | Andreessen Horowitz"

a16z.com

AI Industry

笔记 2024年12月2日

AI视频的后Sora时代

开年就发布预览、震惊世界的Sora，临近年末仍未面向公众开放，大半年时间、一众追赶者，AI世界日新月异，视频生成自然也不例外

2024年11月

剪藏 2024年11月19日

Perplexity AI 购物

"Shop like a Pro"

perplexity.ai

AI Industry

剪藏 2024年11月14日

Perplexity 测试加广告

"Why we’re experimenting with advertising "

perplexity.ai

AI Industry

2024年10月

剪藏 2024年10月30日

Waymo周订单15万

"Sundar Pichai on X: "@YouTube 6/ Last but not least: @Waymo’s doing really well - 1M fully autonomous miles and 150K paid rides / week, plus partnerships with Uber + Hyundai, 6th-gen Waymo Driver, growing commercial opportunity. Here’s a photo after a recent concert in SF - Waymo after Waymo picking people https://t.co/zlaaz0JnxN" / X"

x.com

Robotics & EmbodiedAI Industry

2024年9月

剪藏 2024年9月27日

AI创意-生产-销售平台

"Arcade: turn your thoughts into things"

arcade.ai

AI Industry

剪藏 2024年9月27日

AI创意-生产-销售平台

"What is Arcade? – turn your thoughts into things"

arcadestudio.zendesk.com

AI Industry

剪藏 2024年9月20日

a16z VSaaS 1

"Vertical SaaS: Now with AI Inside | Andreessen Horowitz"

a16z.com

AI Industry

剪藏 2024年9月18日

OpenAI：是 bug；已 fix

"OpenAI Says It's Fixed Issue Where ChatGPT Appeared to Be Messaging Users Unprompted"

futurism.com

AI Industry

剪藏 2024年9月18日

拿 OpenAI o1 做整数乘法

"Yuntian Deng on X: "Is OpenAI's o1 a good calculator? We tested it on up to 20x20 multiplication—o1 solves up to 9x9 multiplication with decent accuracy, while gpt-4o struggles beyond 4x4. For context, this task is solvable by a small LM using implicit CoT with stepwise internalization. 1/4 https://t.co/et5DB9bhNL" / X"

x.com

Benchmarks & Eval

剪藏 2024年9月17日

主动聊天的ChatGPT？

"Did ChatGPT just message me... First? : r/ChatGPT"

reddit.com

AI Industry

剪藏 2024年9月11日

since when do we put lyrics of Taylor Swift under the abstract of an arxiv paper

"Planning In Natural Language Improves LLM Search For Code Generation"

arxiv.org

LLMs

剪藏 2024年9月11日

vidu.studio

"给我一张脸，视频背景随你换，林黛玉都被清华理工男玩废了｜免费开放 | 量子位"

qbitai.com

Visual GenerationAI Industry

剪藏 2024年9月11日

12量子比特/56量子

"Microsoft announces the best performing logical qubits on record and will provide priority access to reliable quantum hardware i"

blogs.microsoft.com

AI Industry

笔记 2024年9月11日

为什么这家公司的芯片推理速度比英伟达快20

我们习以为常的流式AI响应模式，本质是大模型智能面对推理速度限制的一种妥协。存算一体作为破局之法，有望带来诸多新的想象力，指向大模型加速推理的终解。

剪藏 2024年9月10日

多模态分子结构模型

"Introducing Chai-1: Decoding the molecular interactions of life"

chaidiscovery.com

Multimodal

剪藏 2024年9月10日

可灵AI导演共创计划

"官宣啦！我们和这九位导演“一拍即合”"

mp.weixin.qq.com

Visual GenerationAI Industry

剪藏 2024年9月10日

transformer训练可视化

"Lucas Beyer (bl16) on X: "There we go. This took me forever, fuck bad tools. Was it worth it? Not so sure... https://t.co/aR5DYXHoGW" / X"

x.com

Interpretability

剪藏 2024年9月9日

Ilya 2023年8月在Berkeley的分享

"An Observation on Generalization"

simons.berkeley.edu

LLMs

剪藏 2024年9月9日

H100租/买价格：2.4/1.8 $/hr/GPU

"The Missing Guide to the H100 GPU Market | by Lepton AI | Sep, 2024 | Medium"

blog.lepton.ai

Infra & Compute

2024年8月

剪藏 2024年8月29日

Neuronpedia 做的 Gemma Scope 交互可视化

"Gemma Scope ｜ Neuronpedia"

neuronpedia.org

Interpretability

剪藏 2024年8月28日

DeepSeek 算力基建

"Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning"

arxiv.org

Infra & Compute

剪藏 2024年8月28日

Claude Artifacts GA 及其背后的故事

"Artifacts are now generally available \ Anthropic"

anthropic.com

AI Industry

剪藏 2024年8月28日

AI的svg自画像与识别

"Zack Witten on X: "One fun thing to do with Claude is have it draw SVG self-portaits. I was curious – if I had it draw pictures of itself, ChatGPT, and Gemini, would another copy of Claude recognize itself? TLDR: Yes it totally recognizes itself, but that’s not the whole story..." / X"

x.com

LLMs

剪藏 2024年8月23日

不停say hi 惹毛AI

"Zack Witten on X: "Spamming "hi" at every LLM: a thread." / X"

x.com

Safety & Alignment

剪藏 2024年8月21日

a16z consumer GenAI 3

"The Top 100 Gen AI Consumer Apps - 3rd Edition | Andreessen Horowitz"

a16z.com

AI Industry

2024年6月

剪藏 2024年6月21日

claude artifacts 系统提示词

"Pliny the Liberator 🐉 on X: "🚰 SYSTEM PROMPT LEAK 🚰 Got the "artifacts" section of the new claude-3.5-sonnet system prompt and it's a doozy! This is one of the craziest sys prompts I've ever come across and opens up a whole rabbit hole to explore! I just have one question...what kind of arcane magic is" / X"

x.com

Safety & Alignment