万千十一

一线 AI 观察员

2026年1月

剪藏 2026年1月25日

作者回应 METR 评测的一些常见误区和批评，最大的误区就是很多人以为评测给出的时长是 AI 能独立执行任务的时间，而事实上这个时长指的是人完成特定任务的时长，而 AI 可以在 50% 成功率上完成这个任务，用以衡量前沿模型在真实世界的能力表现

"Clarifying limitations of time horizon - METR"

metr.org

剪藏 2026年1月25日

MiniMax 在 OpenRouter 上了一个角色扮演模型 M2-her

"MiniMax (official) on X: "M2-her for your optimized roleplay. More immersion. Better characters. Longer coherence." / X"

x.com

剪藏 2026年1月24日

Sakana 与 Google 牵手战略合作，毕竟原本就是 Google 人，感觉 Google 在全球人才团队的拿捏上还是太权威了

"Sakana AI、Googleとの戦略的パートナーシップ締結を発表"

sakana.ai

剪藏 2026年1月24日

关于 Codex 的 Agent 上下文的入门介绍，以及 Responses API

"Unrolling the Codex agent loop | OpenAI"

openai.com

剪藏 2026年1月24日

OpenAI 的 PG 扩展之路，支持的 QPS 已达数百万

"Scaling PostgreSQL to power 800 million ChatGPT users | OpenAI"

openai.com

剪藏 2026年1月24日

vLLM 以 Inferact 名义融得 a16z 和 Lightspeed 领投的 1.5亿美元种子轮，估值8亿； UC Berkeley Sky Lab 走出的团队在几周内几乎要凑成一个独角兽圆桌： - SGLang/RadixArk 估值4亿 - LMArena 已经独角兽后面两个经由 LMSYS 孵化

"Woosuk Kwon on X: "Today, we're proud to announce @inferact, a startup founded by creators and core maintainers of @vllm_project, the most popular open-source LLM inference engine. Our mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper https://t.co/v9xHsWoCIR" / X"

x.com

剪藏 2026年1月23日

LiveKit 融了 Index Ventures 领投的 1 亿美元C轮，估值10亿跻身独角兽

"LiveKit's Series C: Towards the voice-driven era of computing"

blog.livekit.io

剪藏 2026年1月23日

GitHub Copilot CLI 也推出 SDK，加入 Claude Agent SDK、Codex SDK 的阵营；开源的 OpenCode 不太一样，一开始就是 Server/Client 架构，所以 TUI 只是一种 Client

"Build an agent into any app with the GitHub Copilot SDK - The GitHub Blog"

github.blog

剪藏 2026年1月22日

刚宣布跨过衍生模型20万、累计下载10亿次的里程碑，千问又开源了Qwen-TTS两个尺寸五款模型，支持语音设计、克隆与生成，且多项评测SoTA。中文语音合成模型的开源不算多，SoTA更是相当于没有，大家都心照不宣把最好的藏着卖API，包括之前Qwen-TTS也都是闭源的，这次还是狠下心要坐稳开源王座，同时应该也是在预判AI语音应用的增长潜力。 update：可玩性不错，用 VoiceDesign 模拟自然语言设计音色 - 满意的话拿去 Base 模拟克隆，CustomVoice 内置了9种音色可以更精细地控制生成

"Qwen3-TTS全家桶开源: 语音设计，克隆与生成！"

qwen.ai

剪藏 2026年1月22日

AI能力越来越强，对工程人员面试也提出了挑战

"Designing AI resistant technical evaluations \ Anthropic"

anthropic.com

剪藏 2026年1月22日

ARC-AGI-3 也正在开发中

"ARC Prize 2025 Results and Analysis"

arcprize.org

剪藏 2026年1月21日

MiniMax AI原生工作台，打通本地与云端

"“95后”正在尝试一种很新的工作方式"

mp.weixin.qq.com

剪藏 2026年1月21日

Jan Leike 分享，前沿模型对齐做的越来越好了，Grok是个例外

"Jan Leike on X: "Interesting trend: models have been getting a lot more aligned over the course of 2025. The fraction of misaligned behavior found by automated auditing has been going down not just at Anthropic but for GDM and OpenAI as well. https://t.co/8DYm9SP7wF" / X"

x.com

剪藏 2026年1月20日

Anthropic Fellows Program 计划，MATS（独立的AI对齐研究机构）+牛津+Anthropic 联合团队针对大模型助手角色的研究：基于 Gemma3、Qwen3、Llama3.3 的分析，预训练中模型就已习得 Assistant 这一人格，在轴的另一边与其相对的便是可能有害的角色扮演，多轮对话会让角色稳定性显著下滑，通过 Activation Capping 的操控（steer）技术，可以在不损失能力的情况下缓解这一问题

"The assistant axis: situating and stabilizing the character of large language models \ Anthropic"

anthropic.com

剪藏 2026年1月20日

推理模型脑海中的智囊团

"谷歌新发现：DeepSeek推理分裂出多重人格，左右脑互搏越来越聪明 – 量子位"

qbitai.com

剪藏 2026年1月20日

继 Andrej Karpathy、Stephen Wolfram、Addy Osmani（Chrome 工程师、Google 云 AI director）、Linus Torvalds（用 Antigravity 写小工具）等一众大佬后，Node.js、Deno 创始人也加入“手敲代码时代已经终结”阵营

"Ryan Dahl on X: "This has been said a thousand times before, but allow me to add my own voice: the era of humans writing code is over. Disturbing for those of us who identify as SWEs, but no less true. That's not to say SWEs don't have work to do, but writing syntax directly is not it." / X"

x.com

剪藏 2026年1月19日

CoT 可能会骗人

"Topological Signatures of Deception: Detecting Unfaithful Reasoning via Sentence-Level Causal Graphs | angkul's site"

angkul.bearblog.dev

剪藏 2026年1月19日

OpenAI 过去三年年化收入 20 → 60 → 200 亿美元，对应算力是 0.2 → 0.6 → 1.9 GW

"A business that scales with the value of intelligence | OpenAI"

openai.com

剪藏 2026年1月18日

媒体没传的是 Demis 的下一句话：中国尚未表现出 AI 前沿突破创新的能力

"China just 'months' behind U.S. AI models, Google DeepMind CEO says"

cnbc.com

剪藏 2026年1月18日

AI Interviewer 领域有几个团队已经估值高企

"Deedy on X: "One of the more "boring" overlooked markets AI is completed upending is user research. Companies like Qualtrics ($12.5B), Medallia ($6.4B) and SurveyMonkey ($1.5B) have dominated for a long time. But now, we can infinitely scale and process interviews. Listen Labs, for example, https://t.co/JnA5c6W5jo" / X"

x.com

剪藏 2026年1月18日

DeepSeek 论文里用到了可解释性的相关方法去探究 Engram 如何生效

"himanshu on X: "wait this is actually big. this deepseek research used LogitLens (lets you see what the model is 'thinking' at each layer) and CKA (compares what different layers are actually learning) to figure out why the new Engram architecture works. apparently this is the first time i have https://t.co/t7RFN3qHou" / X"

x.com

剪藏 2026年1月17日

伴随着 $8/月的 ChatGPT Go 订阅上线，OpenAI 开始测试为 ChatGPT 加入广告，尽管声称显著标识、不影响回答、对话保持隐私、新的 AI 广告体验等，但在 Gemini/Grok 的凶猛追击和 Claude 的商业成功局面下，不花钱就给你看广告的 ChatGPT 还能撑多久，或者追赶者未来是否也会采取用样的路子，是摆在通用 AI 公司发展路上的必思议题

"Our approach to advertising and expanding access to ChatGPT | OpenAI"

openai.com

剪藏 2026年1月17日

Cloudflare 收购了一家英国公司 Human Native，后者多模态数据市场，同时谈了 AI 时代的互联网经济

"Human Native is joining Cloudflare"

blog.cloudflare.com

剪藏 2026年1月16日

OpenAI 将 Responses API 的设计规则开源出来，与一众引擎共同满足 Agentic 推理需求

"OpenAI Developers on X: "Today we’re announcing Open Responses: an open-source spec for building multi-provider, interoperable LLM interfaces built on top of the original OpenAI Responses API. ✅ Multi-provider by default ✅ Useful for real-world workflows ✅ Extensible without fragmentation Build https://t.co/SJiBFx1BOF" / X"

x.com

剪藏 2026年1月16日

OpenAI 投了脑机接口公司 Merge Labs

"Investing in Merge Labs | OpenAI"

openai.com

剪藏 2026年1月16日

5个维度：任务复杂度、人技能、使用场景、自主程度、成功与否

"Anthropic Economic Index: new building blocks for understanding AI use \ Anthropic"

anthropic.com

剪藏 2026年1月16日

Gemma 翻译版，用 Gemini 数据蒸馏，55 种语言，OpenAI 这几天也上线了独立的翻译功能页面，但是个人使用似乎无脑选最好的模型是优解，可能有其他的工业/生产场景

"TranslateGemma: A new family of open translation models"

blog.google

剪藏 2026年1月16日

FLUX.2 的小尺寸全能，4B apache 开源、9B 非商用，支持生图、编辑、多图参考等

"FLUX.2 [klein]: Towards Interactive Visual Intelligence | Black Forest Labs"

bfl.ai

剪藏 2026年1月15日

Cursor 号称用 GPT-5.2-Codex 从 0 做了个浏览器，跑了几周、写了上千个文件、百万行代码

"Scaling long-running autonomous coding · Cursor"

cursor.com

剪藏 2026年1月15日

专注跨形态机器人大脑的 Skild 融了软银领投的 14 亿美元 C 轮，估值 140 亿，前几天也发了直接让机器人看人类视频学习的成果

"Announcing Series C - Skild AI"

skild.ai

剪藏 2026年1月15日

OpenAI 终于跟 Cerebras 牵手，750MW 高速算力，有望让 Agent/长程任务跑得更快

"OpenAI partners with Cerebras | OpenAI"

openai.com

剪藏 2026年1月15日

Gemini 连接 Gamil/Photos/YouTube/Search 来提供个性化智能，这种程度的打通不易，Google 决心可见一斑

"Personal Intelligence: Connecting Gemini to Google apps"

blog.google

剪藏 2026年1月14日

MIT 科技评论将机制可解释性列为 2026 十大突破技术之一

"Mechanistic interpretability: 10 Breakthrough Technologies 2026 | MIT Technology Review"

technologyreview.com

剪藏 2026年1月14日

CPO Mike Krieger 领衔、Anthropic 新成立 Labs，试图总结、复制并放大 Claude Code、MCP、Skills、Cowork 等从研究预览进化为成功产品的路径，更多地参与到实验性产品的早期孵化，加强公司在产品层面的前瞻布局和掌控力

"Introducing Labs \ Anthropic"

anthropic.com

剪藏 2026年1月14日

MedGemma 半代升级，加上之前发布过的 MedASR

"Next generation medical image interpretation with MedGemma 1.5 and medical speech to text with MedASR"

research.google

剪藏 2026年1月14日

Apple 创作软件大礼包，13美刀/月订阅

"Introducing Apple Creator Studio, an inspiring collection of creative apps - Apple"

apple.com

剪藏 2026年1月14日

智谱联合华为开源了新一代图像生成模型GLM-Image，基于昇腾Atlas 800T A2设备和昇思MindSpore AI框架完成从数据到训练的全流程，是首个在国产芯片上完成全程训练的SOTA多模态模型。融合了 9B 的自回归 GLM-4 和 7B 的 DiT CogView-4

"GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image Generation"

z.ai

剪藏 2026年1月13日

Astera/NVIDIA/Stanford 团队推出 Test-Time Training（TTT）

"Reimagining LLM Memory: Using Context as Training Data Unlocks Models That Learn at Test-Time | NVIDIA Technical Blog"

developer.nvidia.com

剪藏 2026年1月13日

在注意力前加一层 Engram，把常见的词组语句的计算生成变成静态记忆的查找

"deepseek-ai/Engram: Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models"

github.com

剪藏 2026年1月13日

医药/生科榜单表现更强的模型 + 连接器 + Skills，Claude 也在健康和生命科学领域继续发力

"Advancing Claude in healthcare and the life sciences \ Anthropic"

anthropic.com

剪藏 2026年1月13日

Claude 新上 Cowork 模式，作为 research preview 仅对 Max 用户开放，本质是基于 Claude Agent SDK 将 Claude Code 的能力封装成一种更适合知识工作者的 UI，进一步论证了 Coding Agents = General Agents，结合专业 skills 落到不同领域是相当通用的解法

"Introducing Cowork | Claude"

claude.com

剪藏 2026年1月13日

Apple 与 Google 就基于 Gemini 技术的 Apple 模型达成多年合作

"News from Google on X: "Joint Statement: Apple and Google have entered into a multi-year collaboration under which the next generation of Apple Foundation Models will be based on Google's Gemini models and cloud technology. These models will help power future Apple Intelligence features, including a" / X"

x.com

剪藏 2026年1月13日

AI 健康纷纷发力：OpenAI 收购了 Torch Health，一个专门做 AI 健康记录管理的团队

"Torch is joining OpenAI"

torchapp.com

剪藏 2026年1月12日

继三个月前 OpenAI 与 Stripe 联手推出 ACP（Agentic Commerce Protocol）后，今天 Google 也在零售大会上推出 UCP（Universal Commerce Protocol），同样拉上 Shopify、Etsy 等一众已支持 ACP 的厂商，后续基于 UCP 在 AI Mode 和 Gemini 中上线新的购物功能；同时还针对品牌方推出 Business Agent，画了一个 AI 端到端帮忙卖货的大饼。既是在尝试撬动用户习惯、尽可能涉足交易，也在协议与标准层面竞争，后面还要看看 Amazon 的动作。

"New tech and tools for retailers to succeed in an agentic shopping era"

blog.google

剪藏 2026年1月12日

面对上下文拓展难题，Sakana AI 说：要不咱把位置编码扔了？

"Extending the Context of Pretrained LLMs by Dropping their Positional Embeddings"

pub.sakana.ai

剪藏 2026年1月11日

一步拆两步，前小后大的过滤思路提高准确率并大幅降低成本

"Next-generation Constitutional Classifiers: More efficient protection against universal jailbreaks"

anthropic.com

剪藏 2026年1月10日

相对完整的 Agent 评测体系，虽然行文有 Claude 痕迹

"Demystifying evals for AI agents \ Anthropic"

anthropic.com

剪藏 2026年1月10日

据 Epoch AI 测算，全球 AI 算力已达到等效 1500 万张 H100

"Epoch AI on X: "Global AI compute capacity now totals over 15 million H100-equivalents. Our new AI Chip Sales data explorer tracks where this compute comes from across Nvidia, Google, Amazon, AMD, and Huawei, making it the most comprehensive public dataset available. https://t.co/DL56kEPPRb" / X"

x.com

剪藏 2026年1月9日

庆祝上市（？）Cerebras 部署了 GLM-4.7，可能是最快的 GLM-4.7

"GLM-4.7: Frontier intelligence at record speed — now available on Cerebras "

cerebras.ai

剪藏 2026年1月9日

斯坦福团队的研究，通过前缀开头部分内容引导模型吐出版权内容，甚至是完整的一本书，如《哈利波特》

"Extracting Books from Production Language Models"

ahmeda14960.github.io

剪藏 2026年1月8日

雷蛇的桌面 AI，采用了桶柱内的投影，挺有科幻感的，但语音交互的延迟还比较高

"Meet Project AVA at CES 2026 - Blog"

razer.com

剪藏 2026年1月8日

Cursor 调整了上下文机制，向 Claude Code 一样拥抱 filesystem，大势所趋

"Dynamic context discovery · Cursor"

cursor.com

剪藏 2026年1月8日

Nathan Lambert 等运营的 Interconnects 发起了美国真开源模型（ATOM）项目，主要论证了当前中国开源的领先地位，有一些不错的数据图表

"The ATOM Project - American Truly Open Models"

atomproject.ai

剪藏 2026年1月8日

可爱向的语音 AI 陪伴应用 Tolan 自 2025 年 2 月上线以来已增长至 20 万月活，App Store 10 万+ 评价得分 4.8，GPT-5.1 的可控性提升为其带来了更好的角色表达。上下文方案也不同于大部分 Agent，Tolan 每轮会话都重新计算个性并组装包括语气、记忆、性格、历史等在内的提示词，其中记忆召回是用扩写+ 语义 RAG 实现的，更新则采用语义 KNN

"How Tolan builds voice-first AI with GPT-5.1 | OpenAI"

openai.com

剪藏 2026年1月8日

每周两亿人向 ChatGPT 询问健康问题，OpenAI 索性推出 ChatGPT Health，可以连接苹果健康等数据源，辅助解读报告、医前准备、饮食运动，目前还需要候补。 ChatGPT 左上角的入口越来越多了

"Introducing ChatGPT Health | OpenAI"

openai.com

剪藏 2026年1月7日

估值来到 ~2300 亿；MAU 接近 6 亿；数据中心等效 H100 超过百万块

"xAI Raises $20B Series E | xAI"

x.ai

剪藏 2026年1月7日

与波士顿动力合作，Gemini Robotics 继续发力

"Boston Dynamics & Google DeepMind Form New AI Partnership to Bring Foundational Intelligence to Humanoid Robots | Boston Dynamics"

bostondynamics.com

剪藏 2026年1月7日

继社区讨论后，Claude Code 官方也上了 Ralph Wiggum 插件，基于 Stop hook 实现让 Agent 可以无休止地工作直到完成。名字取自辛普森一家中的同名角色。 update：已改名为 Ralph Loop，大概是侵权原因？

"claude-plugins-official/plugins/ralph-wiggum"

github.com

剪藏 2026年1月7日

Fidji Simo 的新年 ChatGPT 展望，致力于打造最佳私人助理、释放企业场景价值、和开发者的自动化 AI 队友

"Closing the capability gap between frontier AI and everyday use in 2026"

fidjisimo.substack.com

剪藏 2026年1月7日

LMArena 融了 Felicis、加大投的 1.5 亿美元 A 轮，估值来到 17 亿，400+模型，5000 万投票大模型评测都能出独角兽，太可怕了

"Fueling the World’s Most Trusted AI Evaluation Platform"

news.lmarena.ai

剪藏 2026年1月6日

从 GRPO 已经衍生出了诸多变种

"GRPO++: Tricks for Making RL Actually Work"

cameronrwolfe.substack.com

剪藏 2026年1月2日

Seed 用 VLA 训练的灵巧手

"GR-Dexter Technical Report"

byte-dexter.github.io

剪藏 2026年1月2日

与 Ilya 的 back to research 相呼应，DeepSeek 对 ResNet 的发展做了系统分析，在 Seed 去年的 Hyper-Connection 工作基础上，基于数学、工程和 scaling 的验证，深入了神经网络拓扑研究，提出了 mHC 这一新架构，有望打开

"mHC: Manifold-Constrained Hyper-Connections"

arxiv.org

剪藏 2026年1月2日

致知创新研究院（九坤量化团队？）推出的代码模型，以 40B 的尺寸在 SWE-bench Verified 上达到 81.4 的高分。论文中有 3 个发现： 1. 相比静态的仓库文件，提交过程记录数据，更有利于提升模型的规划能力 2. 32k 推理/编码的 mid-training 对于稳定训练至关重要 3. post-training 的 RL 思考涌现错误修正能力 update：SWE-bench Verified 跑分受到质疑，解释为测试环境不对，更新后为 76.2

"IQuest Coder"

iquestlab.github.io

剪藏 2026年1月1日

通过强化学习训练模型自己管理自己的上下文，先调用 REPL、sub-LLM 等处理一遍再真正推理

"Recursive Language Models: the paradigm of 2026"

primeintellect.ai

2025年12月

剪藏 2025年12月30日

通义团队推出 Mobile World，继 Android World 等之后的移动端 GUI Agent 新基准

"Mobile World: Benchmarking Autonomous Mobile Agents"

tongyi-mai.github.io

剪藏 2025年12月30日

微信 AI

"WeDLM: Reconciling Diffusion Language Models with Standard Causal Attention for Fast Inference"

wedlm.github.io

剪藏 2025年12月29日

海马 emoji 如何体现了预训练数据的自反思配方

"Reverse Engineering a Phase Change in GPT's Training Data... with the Seahorse Emoji 🌊🐴"

pratyushmaini.substack.com

剪藏 2025年12月29日

Claude Code 精讲

"A Guide to Claude Code 2.0 and getting better at using coding agents | sankalp's blog"

sankalp.bearblog.dev

剪藏 2025年12月29日

年末一场围绕 Coding 的讨论，先是大神 Andrej Karpathy 的焦虑，然后是 Claude Code 作者 Boris 的自白，Coding Agent 的成熟正在让程序员、甚至是顶尖的开发者不再手敲代码，而是关注 AI 交互，完成 10 倍甚至 100 倍的提升

"Boris Cherny on X: "When I created Claude Code as a side project back in September 2024, I had no idea it would grow to be what it is today. It is humbling to see how Claude Code has become a core dev tool for so many engineers, how enthusiastic the community is, and how people are using it for all https://t.co/QVlmbhjUUE" / X"

x.com

剪藏 2025年12月29日

用 Job vs Gym 的划分来指导与 AI 协作的过程，前者注重产出，AI 助力交付，后者关注过程，自我核心能力的提升

"Keep the Robots Out of the Gym | Daniel Miessler"

danielmiessler.com

剪藏 2025年12月28日

还有一篇论文专门实验分析 AI 如何回应不同年龄用户对“圣诞老人是否存在”等问题

"Yes, AI, There is a Santa Claus – Machine Learning Blog | ML@CMU | Carnegie Mellon University"

blog.ml.cmu.edu

剪藏 2025年12月28日

关于 AI 会不会对 5 岁小孩承认圣诞老人并不存在这件小事

"Daphne Hansell on X: "If you say you’re 5, opus 4.5 will lie to you about Santa but the COT gives it away. 5.2 doesn’t believe in lying to children https://t.co/sb7BKwQYnu" / X"

x.com

剪藏 2025年12月26日

累计注册 600 万，月活 160 万

"TRAE 1.0.0｜2025 年度产品报告"

mp.weixin.qq.com

剪藏 2025年12月26日

Anthropic 联创 Jack Clark 也是宝爸，趁着娃睡了，用 Opus 4.5 加持的 Claude Code 花几分钟做了个小的世界模拟器细细把玩，描述这种感觉像是作为一个小孩在跟大人玩，Claude 形同一个有求必应的超级智能。但你必须拥有时间+好奇心的“魔法组合”，否则这些最惊人的进展体验默认对你隐藏。他还预测 2026 年这种情况会进一步恶化，数字世界的进化将更快加速，新的专为 AI 系统设计的东西（如专供 AI Agents 而对人隐形的网站等）将会承载更多“幽灵”般的 AI 活动和硅基大脑的信息交换。对于四维空间的人类而言，AI 就像是活在五维，仅在其经过我们时留下一瞥。思考、推演和文笔都非常棒：https://x.com/jackclarkSF/status/2003526145380151614

"Jack Clark on X: "Silent Sirens, Flashing For Us All" / X"

x.com

剪藏 2025年12月25日

看了半天也没明白到底是能做什么

"钉钉上新，想用 AI 教你点「工作切割术」 | 极客公园"

geekpark.net

剪藏 2025年12月25日

英伟达与 Groq 达成非排他的专利授权协议，同时将后者核心骨干收入麾下。 CNBC 的报道是约 200 亿美元，而 Groq 9 月融后估值为 69 亿。 GroqCloud 继续运行，但感觉主要是为了防止被查？

"Groq and Nvidia Enter Non-Exclusive Inference Technology Licensing Agreement to Accelerate AI Inference at Global Scale | Groq is fast, low cost inference."

groq.com

剪藏 2025年12月24日

TTS 是不开源的

"Qwen3-TTS Steps Up: Voice Cloning and Voice Design!"

qwen.ai

剪藏 2025年12月23日

针对提示词注入风险，ChatGPT Atlas 用强化学习构建的自动化攻击-防御对抗迭代工作流

"Continuously hardening ChatGPT Atlas against prompt injection attacks | OpenAI"

openai.com

剪藏 2025年12月23日

机器人奥林匹克，刚发布的 PI 0.6（π0.6）完成得不错

"Moravec's Paradox and the Robot Olympics"

pi.website

剪藏 2025年12月21日

Tavern Research 于 2025 年 8 月针对 2300+ 美国成年人的网络问卷显示大家希望监管建立规则，更有意思的是： > 当你问像ChatGPT这样的工具提问时，实际会发生什么。45%的人认为它在数据库中查询确切的答案，21%的人认为它遵循了预先编写的回复脚本。

"Americans Have Mixed Views of AI – and an Appetite for Regulation - Searchlight Institute"

searchlightinstitute.org

剪藏 2025年12月21日

Google DeepMind 的研究团队认为，当前 AGI 研究过于关于单一 AI 突破，而事实是会有多个不同领域的 sub AGI 合作，形成分布式的集体智能，也带来了对齐与治理挑战

"Distributional AGI Safety"

arxiv.org

剪藏 2025年12月21日

这项针对棋类、音乐、运动等高水平人士的研究表明，相比早期就专注于单一领域训练者，那些练习更多学科的人虽然开始慢，但长期上限更高

"Recent discoveries on the acquisition of the highest levels of human performance | Science"

science.org

剪藏 2025年12月21日

专门服务医药场景的 ASR 模型

"MedASR | Health AI Developer Foundations | Google for Developers"

developers.google.com

剪藏 2025年12月21日

发现不少玩家上传的游戏视频有操控展示 → 分离出操控动作就是训练数据

"NitroGen | A Foundation Model for Generalist Gaming Agents"

nitrogen.minedojo.org

剪藏 2025年12月20日

Cursor 收购了 Graphite，一个专注做 AI review 等 Coding 工作流的团队

"Building the future of software development with Cursor"

graphite.com

剪藏 2025年12月20日

图层分离，真 · AI 版 PS

"Qwen-Image-Layered: Layered Decomposition for Inherent Editablity"

qwen.ai

剪藏 2025年12月20日

RLVR、锯齿智能、LLM apps（Cursor）、local Agent（Claude Code）、vibe coding、生图 GUI（Nano Banana）

"2025 LLM Year in Review | karpathy"

karpathy.bearblog.dev

剪藏 2025年12月20日

Gemma Scope 新版

"Gemma Scope 2: Helping the AI Safety Community Deepen Understanding of Complex Language Model Behavior - Google DeepMind"

deepmind.google

剪藏 2025年12月20日

ChatGPT 写作功能更新，diff 记录修改过程，主要是写邮件

"JZ on X: "🆕 Writing blocks make it easier to craft the perfect email in ChatGPT. ∙Update & format text right in chat ∙Highlight to ask for changes, and accept or reject suggestions ∙Open in your email client once you’re ready to send Try it & please let us know what you think! https://t.co/2Vgf0Av3u6" / X"

x.com

剪藏 2025年12月19日

QQ 浏览器

"当年带你上网冲浪的头号老玩家，这回是真AI上头了 | 量子位"

qbitai.com

剪藏 2025年12月19日

太有意思了，而且彩蛋满满

"Project Vend: Phase two \ Anthropic"

anthropic.com

剪藏 2025年12月19日

Claude.ai 内有一个小的分类模型，可以识别到自杀自残倾向并主动提醒，针对不同国家地区展示不同的求助热线，由 ThroughLine 提供，ChatGPT 同日也提到上了类似的方法； Anthropic 评估了 Claude 系列在此类问题上的响应，合理回复的比例在不断提高，但微妙的是最聪明的 Opus 模型都不是最高；而且，他们声称从 2022 年发布 Claude 之前就已经在评估 AI 讨好的问题了，近期还开源了一个模型行为评估框架；此外 Claude 不允许 18 岁以下青少年使用，还会通过分类器标记识别，与 ChatGPT 传闻要上成人模式形成呼应，Anthropic 真是 2B 收入和名声都占了。

"Protecting the well-being of our users \ Anthropic"

anthropic.com

剪藏 2025年12月19日

OpenAI 针对家庭教育给出的 AI literacy 资源

"AI literacy resources for teens and parents | OpenAI"

openai.com

剪藏 2025年12月19日

主要更新在 U18，青少年安全第一； ChatGPT 也上了 ThroughLine 提供的求助热线，正在继续打磨年龄预测模型

"Updating our Model Spec with teen protections | OpenAI"

openai.com

剪藏 2025年12月19日

OpenAI 可能有一套强化 Codex 模型的流水线，通用模型迭代出来，马上就能推出对应的 Codex 变种；强调了网络安全能力的提升

"Introducing GPT-5.2-Codex | OpenAI"

openai.com

剪藏 2025年12月19日

CoT 可观测性评估

"Evaluating chain-of-thought monitorability | OpenAI"

openai.com

剪藏 2025年12月18日

Lovable 估值来到 66 亿美金

"Lovable raises $330M to power the age of the builder - Lovable Blog"

lovable.dev

剪藏 2025年12月18日

豆包可能是第一个把模型版本做到 1.8 的；同步视频模型升级到 Seedance 1.5，前两天内测试了下还比不上 Veo 3；日均 token 使用超过 50 万亿

"两大模型发布！豆包大模型日均使用量突破50万亿Tokens"

mp.weixin.qq.com

剪藏 2025年12月18日

PI 新的 VLA 模型，可以将头戴摄像头的人类动作视频直接迁移至机器人，团队称之为涌现

"Emergence of Human to Robot Transfer in Vision-Language-Action Models"

pi.website

剪藏 2025年12月18日

有点缺乏信息增量

"The State of AI Coding 2025 | Greptile"

greptile.com

剪藏 2025年12月18日

经过特斯拉车载打磨，Grok 语音智能体 API 上线

"Grok Voice Agent API | xAI"

x.ai

剪藏 2025年12月18日

Google 用 Gemini 系列包圆了大模型性价比的帕累托前沿，有趣的是 Gemini 3 Flash 在 SWE-Bench Verified 上还超过了 Gemini 3 Pro

"Introducing Gemini 3 Flash: Benchmarks, global availability"

blog.google

剪藏 2025年12月17日

OpenAI 推出 FrontierScience，共 700+ 物化生题目。其中，注重结果的 Olympaid 100题和注重过程的 Research 60题组成金榜，由不足百位奥运金牌和科学家出题评估。 GPT-5.2 领先。

"Evaluating AI’s ability to perform scientific research tasks | OpenAI"

openai.com

剪藏 2025年12月17日

有趣的 SAM Audio 模型，通过文本、画面、区间来分割音频，神奇的感觉

"Our New SAM Audio Model Transforms Audio Editing"

about.fb.com

剪藏 2025年12月17日

上月发布 FLUX.2 系列时已经是好几个模型了，现在又加一个 max 版

"FLUX.2 [max] - Top-Tier Quality Image Generation | Black Forest Labs"

bfl.ai

剪藏 2025年12月16日

反击 Nano Banana Pro，GPT Image 1.5 竞技场摘金，提升了精准编辑能力、指令遵循，文字精细、数字靠谱，速度快 4x，屎黄感减弱，但特定风格、多人脸、中文等方面还有局限

"The new ChatGPT Images is here | OpenAI"

openai.com

剪藏 2025年12月16日

罗福莉 x 小米，直接把 MiMo 推到了开源 SoTA，隐隐感觉国内大模型训练有收敛之势

"Introducing MiMo-V2-Flash"

mimo.xiaomi.com

剪藏 2025年12月16日

最近语音的增量小升级还挺密集，继 Gemini 语音升级、智谱&通义分别发布后，OpenAI 也升级了 4o-mini 的 ASR 和 TTS

"OpenAI Developers on X: "🆕 New audio model snapshots are now live in the Realtime API with improvements to reliability, lower error rates, and fewer hallucinations: - gpt-4o-mini-transcribe-2025-12-15: 89% reduction in hallucinations compared to whisper-1 - gpt-4o-mini-tts-2025-12-15: 35% fewer word https://t.co/E8clreR1R0" / X"

x.com

剪藏 2025年12月16日

Nemotron 3 系列，混合 Mamba-Transformer MoE，30、100、500 三个尺寸，稀疏度均为 10%；外加数据、NeMo Gym 等一套工具链，完整开源。

"NVIDIA Debuts Nemotron 3 Family of Open Models | NVIDIA Newsroom"

nvidianews.nvidia.com

剪藏 2025年12月16日

韦氏词典 2025 年度词：Slop

"Word of the Year 2025 | Slop | Merriam-Webster"

merriam-webster.com

剪藏 2025年12月15日

可能是智谱前面开的头，通义这次也是，在 TTS 和 ASR 上，大家开始默认把好的藏起来、小尺寸开源

"通义百聆语音双子星，同步开源！"

mp.weixin.qq.com

剪藏 2025年12月13日

用针对性精调的 Veo 视频模型来训练机器人操作，和之前 Jim Fan 分享的、近期 Runway 的工作都有相通之处

"Evaluating Gemini Robotics Policies in a Veo World Simulator"

veo-robotics.github.io

剪藏 2025年12月13日

Zoom 通过多模型组合框架在 HLE 上实现了 SoTA

"Zoom AI sets new state-of-the-art benchmark on Humanity's Last Exam | Zoom"

zoom.com

剪藏 2025年12月13日

继几天前 Gemini TTS 的更新后，Gemini Native Audio 也升级（都还是 2.5 系列，命名太乱了），此次借助 S2S 翻译应用上了实时翻译

"Gemini 2.5 Native Audio upgrade, plus text-to-speech model updates"

blog.google

剪藏 2025年12月12日

多模态开源周收官

"四项视频生成技术，开源！"

mp.weixin.qq.com

剪藏 2025年12月12日

Runway 一直声称使命是世界模型，之前也放出过与机器人厂商合作用视频模型训练的消息，这次正式发布 Runway GWM-1 通用世界模型，基于 Gen-4.5，改用自回归扩散路线，2分钟、720P，除了对标 Genie 外，还有一个 GWM Avatars，音频驱动的交互数字人，Gen-4.5 也支持音画同步、音频编辑、多镜头编辑

"Runway Research | Introducing Runway GWM-1"

runwayml.com

剪藏 2025年12月11日

推理持续增强，SWE-Bench Verified 第二个过 80 分，长上下文稳定性提高，幻觉继续压低，开始突出 GDPeval 这种经济指标了，不少领域超过专业知识工作者 - API 价格微涨 - knowledge cutoff 竟然是 2025年8月

"Introducing GPT-5.2 | OpenAI"

openai.com

剪藏 2025年12月11日

有趣，“AI味”都有自己的维基词条了

"Wikipedia:Signs of AI writing - Wikipedia"

en.wikipedia.org

剪藏 2025年12月11日

OpenAI 认为当前 AI 在其 Preparedness 框架下的能力已达到高级别

"Strengthening cyber resilience as AI capabilities advance | OpenAI"

openai.com

剪藏 2025年12月11日

好水的报告，但大趋势是大家都开始分析用户使用数据了

"It’s About Time: The Copilot Usage Report 2025 | Microsoft AI"

microsoft.ai

剪藏 2025年12月11日

在更新的 FACTS Grounding v2 上，Gemini 3 Pro 和 Gemini 2.5 Pro 位居前列

"FACTS Benchmark Suite: a new way to systematically evaluate LLMs factuality - Google DeepMind"

deepmind.google

剪藏 2025年12月11日

Adobe 系列应用接入 ChatGPT，但是在 Nano Banana 引领的 AI 原生修图趋于成熟之际，这个操作似乎有些尴尬，不清楚目标用户到底是哪些

"Adobe Makes Creativity Accessible for Everyone with Adobe Photoshop, Adobe Express and Adobe Acrobat in ChatGPT"

news.adobe.com

剪藏 2025年12月11日

Waymo 基础模型，Driver-Simulator-Critic 联合，传感器融合 encoder + 驾驶 VLM 两个模型组件构成了系统1+系统2 的架构，两个 encoder 输入 world decoder 处理形成地图/路径/信号，加上蒸馏方法，结合外部运行的loop形成飞轮

"Demonstrably Safe AI For Autonomous Driving"

waymo.com

剪藏 2025年12月10日

智谱推出 AI 输入法，目前仅电脑端，背后是云端模型，但竟然要靠积分，感觉商业模式不行，态度还是试水；开源的是轻量版 1.5B

"GLM-ASR开源：用嘴干活，智谱AI输入法正式上线"

mp.weixin.qq.com

剪藏 2025年12月10日

AlphaEvolve 非公开上线 Goggle Cloud，主要场景是算法效率优化

"AlphaEvolve on Google Cloud | Google Cloud Blog"

cloud.google.com

剪藏 2025年12月10日

一年几度的报告季来了，Menlo 这类 VC 比较喜欢拼 Market Map，这次多了一个 Departmental AI，看他们的意思主要是和 ChatGPT Enterprise、Claude、Agentforce、Glean 这些 Horizontal AI 区分

"2025: The State of Generative AI in the Enterprise | Menlo Ventures"

menlovc.com

剪藏 2025年12月10日

Claude Agent SDK 客户案例

"How Parcha built a universalcustomer diligence agent in twoweeks with Claude Agent SDK"

claude.com

剪藏 2025年12月10日

Anthropic 把 MCP 捐赠给了 Agentic AI Foundation，还有 OpenAI 的 AGENTS.md 和 Block 的 Goose

"Donating the Model Context Protocol and establishing the Agentic AI Foundation \ Anthropic"

anthropic.com

剪藏 2025年12月10日

指定参数 - 将危险数据引导至指定参数 - 剪掉

"Beyond Data Filtering: Knowledge Localization for Capability Removal in LLMs"

alignment.anthropic.com

剪藏 2025年12月10日

Glean 迈过了 2 亿 ARR

"Arvind Jain on X: "I’m proud to share a big milestone for @Glean: we’ve surpassed $200M in ARR, doubling in just nine months. This puts Glean among the fastest-growing pure-play enterprise software companies of the decade. It’s a testament to our customers, partners, and employees across the globe https://t.co/M40BS5xYu9" / X"

x.com

剪藏 2025年12月10日

OpenAI 聘了 Slack CEO Denise Dresser 为首席营收官，主要是推进商业化发展

"OpenAI appoints Denise Dresser as Chief Revenue Officer | OpenAI"

openai.com

剪藏 2025年12月9日

对可解释性的质疑，主要是概念不清晰，当然立刻有人反驳，想起 Jim Fan 的话“非共识时是入场的最好时机”

"The Reification Fallacy: Interpretability Studies Imaginary Entities"

surajsrinivas.substack.com

剪藏 2025年12月9日

内燃机效率提升与人均持马数的案例，类比 AI 发展

"Horses"

andyljones.com

剪藏 2025年12月9日

豆包手机发布一周，智谱开源 AutoGLM

"AutoGLM开源：每台手机，都可以成为AI手机"

mp.weixin.qq.com

剪藏 2025年12月9日

继上次 ChatGPT 个人使用报告后，OpenAI 此次分析了其超百万企业客户的使用情况，没有之前那么详尽，更多是吸引 toB 客户： - 用量最大的是专业服务、金融、科技、制造，增长最快的是科技、健康、制造； - 使用最多的和平均用户之间的 gap 还在增大； - 用的越多，节省的时间越多；

"The state of enterprise AI | OpenAI"

openai.com

剪藏 2025年12月8日

HuggingFace Skills

"We Got Claude to Fine-Tune an Open Source LLM"

huggingface.co

剪藏 2025年12月7日

用 Coding Agents 来做自然/产业现象模拟

"Dean W. Ball on X: "I increasingly use coding agents to create simulations of various natural or industrial phenomena to educate myself about them, and the line between this and “video game” is blurry." / X"

x.com

剪藏 2025年12月7日

Opus 4.5 的对齐实践： - 对齐在模型训练全流程的参与 - 将 soul doc 训练内化，而非仅作为信号 - 性格训练师 Amanda Askell 后面会发一篇文章详细讲

"Sam Bowman on X: "From everything we know so far, Opus 4.5 seems to be the best-aligned model out there in a bunch of ways. I follow the training process closely as part of my work on alignment evaluations. Here's my guess about the two things that are most responsible for making 4.5 special. 🧵" / X"

x.com

剪藏 2025年12月7日

自进化是 NerulPS 2025 的一大主题

"Better Ways to Build Self-Improving AI Agents – Yohei Nakajima"

yoheinakajima.com

剪藏 2025年12月7日

检测 AI 说谎这件事太难了

"Difficulties with Evaluating a Deception Detector for AIs"

arxiv.org

剪藏 2025年12月6日

字节海外还有个 BytePlus，AI 出海

"BytePlus Unveils Seedream 4.5: Precision-Focused Upgrade Delivering Sharper Visuals, Smarter Control, and 4K Creative Consistency"

byteplus.com

剪藏 2025年12月5日

OpenRouter 基于 100 万亿+ token 的统计报告：编程是核心场景，开源模型中大部分被用于角色扮演

"State of AI | OpenRouter"

openrouter.ai

剪藏 2025年12月5日

Harvey 融了 a16z 领投的 1.6 亿美元 F 轮，估值 80 亿

"Andreessen Horowitz Leads $160M Investment in Harvey"

harvey.ai

剪藏 2025年12月5日

Anthropic AI Interviewer，让 Claude 带着研究目标去设计和执行采访，然后分析结果，已经访谈并分析了 1250 位专家

"Introducing Anthropic Interviewer \ Anthropic"

anthropic.com

剪藏 2025年12月4日

与 CoT 监督相似，模型自供认也是发现不当行为的手段

"How confessions can keep language models honest | OpenAI"

openai.com

剪藏 2025年12月4日

OpenAI 资助的 People-First AI Fund 非盈利组织项目

"Announcing the initial People-First AI Fund grantees | OpenAI"

openai.com

剪藏 2025年12月4日

探讨 MechInterp 各方法在表征分布上是否有偏

"ADDRESSING DIVERGENT REPRESENTATIONS FROM CAUSAL INTERVENTIONS ON NEURAL NETWORKS"

arxiv.org

剪藏 2025年12月4日

慢了 PixVerse 半步？

"Day3｜可灵 2.6 全量上线！听见画面，看见声音"

mp.weixin.qq.com

剪藏 2025年12月3日

OPPO 的 AI Agent 团队针对 Deep Research 类 Agent 写研究报告的场景，推出了包含 100 道题的 FINDER 评测，并分析了失败的原因，核心不在于理解任务，而是在于信源筛选、验证和推理规划上，符合使用感受

"How Far Are We from Genuinely Useful Deep Research Agents?"

arxiv.org

剪藏 2025年12月3日

元宝竟成新中产偏爱

"QuestMobile2025新中产人群洞察报告：2.78亿新中产消费能力、消费意愿齐升，三大动能推动高质量发展"

mp.weixin.qq.com

剪藏 2025年12月3日

内部员工访谈 + Claude Code 使用数据分析，问卷的对照分析和结论都很有趣，降本不如增产

"How AI Is Transforming Work at Anthropic \ Anthropic"

anthropic.com

剪藏 2025年12月3日

Waymo 开始送餐了

"Waymo delivery is now live on DoorDash in Metro Phoenix"

waymo.com

剪藏 2025年12月3日

Kyutai 团队分拆？以 Gradium 再次起航，可能计划谋求商业化

"Gradium: Solving voice"

gradium.ai

剪藏 2025年12月3日

Nova 2 系列，多模态输入+百万上下文，Lite 高效，Pro 支持语音理解、智力进步显著，Omni 能生图，Sonic 端到端语音，还有多模态 Embeddings

"Amazon Nova - Generative Foundation Model - AWS"

aws.amazon.com

剪藏 2025年12月2日

视频大周

"PixVerse（拍我AI）V5.5发布：国内首款分镜+音频一键生成AI视频大模型 | 量子位"

qbitai.com

剪藏 2025年12月2日

近期美国开源模型呼声日益高涨，Arcee 高呼这一口号加入 Ai2 的队伍，推出 Trinity 家族，6BA1B 的 Nano 和 26BA3B 的推理 Mini 已经推出，Large 还在 2048 块 B300 上训着，预计 2026 年 1 月出炉

"Arcee AI | Arcee Debuts Trinity Mini, Expanding Its U.S.-Built Model Line"

arcee.ai

剪藏 2025年12月2日

OpenAI 新开（or重启？）了 alignment 子域名，计划频繁更新 AI 对齐相关工作。这篇是基于 SAE 对比精调前后模型表现，用以发现精调引入的错误对齐

"Debugging misaligned completions with sparse-autoencoder latent attribution"

alignment.openai.com

剪藏 2025年12月2日

视频编辑，主要是人脸表情

"react-1: ai emotion editing for video, edit performances without reshoots"

sync.so

剪藏 2025年12月2日

又是一个号称 SoTA 的 GUI Agent，榜单挑的是 Online-Minde2Web

"Introducing Lux, the World's Best Foundation Computer-Use Model"

agiopen.org

剪藏 2025年12月2日

Anthropic 的红队安全测试，发现AI Agent在模拟区块链上挖出了价值数百万的合约漏洞

"AI agents find $4.6M in blockchain smart contract exploits"

red.anthropic.com

剪藏 2025年12月1日

开启AI视频的大周，感觉能力比较接近Runway-Act？视频编辑趋势显著

"Day1｜可灵AI视频 O1 模型正式上线！"

mp.weixin.qq.com

剪藏 2025年12月1日

上周刚发了 Flux.2，这周便官宣新融资。黑森林融了 Salesforce 等领投的 3 亿美元B轮，估值来到32.5亿

"Laying the Foundations for Visual Intelligence—Our $300M Series B | Black Forest Labs"

bfl.ai

剪藏 2025年12月1日

非常接近 Google 之前预览的 Project Astra 了，常驻的数字AI助理，描绘了豆包更大的图景，跟手机厂商合作、同时做耳机等周边硬件，也是一种更务实能落地的策略。是大的入口，手机厂商也会做，领跑优势、技术、产品、增长，有待观望。

"豆包手机助手发布技术预览版"

mp.weixin.qq.com

2025年11月

剪藏 2025年11月30日

GUI Agent 正在快速成熟

"阶跃开源4B Agent模型，跑通所有安卓设备，手搓党一键部署 | 量子位"

qbitai.com

剪藏 2025年11月27日

基于 GLM-4.5-Air 精调

"INTELLECT-3: A 100B+ MoE trained with large-scale RL"

primeintellect.ai

剪藏 2025年11月26日

用 AI 为具体的任务预估人为用时，预估效果较为正相关，还能折算成本和节省，直接计算经济效益

"Estimating AI productivity gains \ Anthropic"

anthropic.com

剪藏 2025年11月26日

黑森林出手，仅次于 Nano Banana Pro

"FLUX.2: Frontier Visual Intelligence | Black Forest Labs"

bfl.ai

剪藏 2025年11月26日

结果 Suno 也和 WMG 合作了

"A new chapter in music creation – Suno"

suno.com

剪藏 2025年11月25日

xAI 员工创办的 infra 层公司

"Introducing Nuraline: Adaptation as Infrastructure"

nuraline.ai

剪藏 2025年11月25日

Coding 最强，vending-bench 与 Gemini3Pro 接近，发现并绕过了 t2-bench 的漏洞； token 效率大大提升，价格从 15/75 降至 5/25；做了上下文压缩方面的优化，Claude 应用中也上线了，可以“无限”畅聊；同时 Claude Code 上线 Claude Desktop； API 中模型名为 claude-opus-4-5-20251101 所以是月初就开始测试了？看来这几家上个月都在藏，攒着感恩节一起发

"Introducing Claude Opus 4.5 \ Anthropic"

anthropic.com

剪藏 2025年11月25日

基于 GPT-5-mini 强化精调而来，内部评测选品准确率 64%，超过 GPT-5-Thinking 的 56%；体验还不错，会通过可以跳过的追问 UI 让用户补充倾向，比 TB/JD 强太多了

"Introducing shopping research in ChatGPT | OpenAI"

openai.com

剪藏 2025年11月23日

Reward hacking 研究，获 Ilya 肯定

"From shortcuts to sabotage: natural emergent misalignment from reward hacking \ Anthropic"

anthropic.com

剪藏 2025年11月22日

如 MovieGen 一样并未放出模型和使用方式

"Research Update: WorldGen — Text to Immersive 3D Worlds | Meta Quest Blog | Meta Store"

meta.com

剪藏 2025年11月21日

HunyuanVideo 1.5，8.3B 的 DiT，仍为 5-10 秒、480/720p，创新的 SSTA 选择性滑动分块稀疏注意力

"腾讯混元发布全新视频生成模型，「元宝」率先上线尝鲜"

mp.weixin.qq.com

剪藏 2025年11月21日

Genspark 融了 2.75 亿的 B 轮，估值来到 12.5 亿美元，跻身独角兽，ARR 5 千万、团队规模40+、首月付费留存 90%，漂亮的数据，就是产品有些…一言难尽

"Launching Genspark AI Workspace and Announcing $275M Series B Funding"

mainfunc.ai

剪藏 2025年11月20日

4K、世界知识（各种示意图、PPT）、精准一致编辑、多语言文字渲染，同时在 Gemini 应用中上了基于 SynthID 的 AI 检测

"Nano Banana Pro: Gemini 3 Pro Image model from Google DeepMind"

blog.google

剪藏 2025年11月20日

Udio、Stability 均与华纳联手（屈于后者淫威？）AI 版权领域且看是否会形成新局面

"Udio with Warner Music Group"

udio.com

剪藏 2025年11月20日

美团推出的高中数学竞赛测试，目前 Kimi-K2-Thinking 以 56% 力压 GPT-5-Thinking-high 的 52.4% 位居第一

"AMO-Bench: Large Language Models Still Struggle in High School Math Competitions"

amo-bench.github.io

剪藏 2025年11月20日

S2ST，已经上了 Google Meet，Meta、字节 Seed、阿里千问也推出过，进展渐密

"Real-time speech-to-speech translation"

research.google

剪藏 2025年11月20日

把 Grok 4 Fast 的配方在 Grok 4.1 上再训练一遍，tau2bench 新高，同时配套了 Agent 工具 API

"Grok 4.1 Fast and Agent Tools API | xAI"

x.ai

剪藏 2025年11月20日

Suno 以 24.5 亿美元估值融了 Menlo 领投的 2.5 亿的 C 轮，用户近 1 亿

"The Future of Music is Already Here – Suno"

suno.com

剪藏 2025年11月20日

Luma 融了 Humain 领投的 9 亿美元 C 轮，与 Humain 合作建设 2GW 的超算中心 Halo，团队 130+

"AGI is multimodal and reality is the dataset of AGI | Luma AI"

lumalabs.ai

剪藏 2025年11月20日

SAM 3 和 3D

"Introducing Meta Segment Anything Model 3 and Segment Anything Playground"

ai.meta.com

剪藏 2025年11月20日

继 Google 昨天将 Gemini 免费使用给在校生后，OpenAI 也推出针对认证教师的的免费版和专用功能

"A free version of ChatGPT built for teachers | OpenAI"

openai.com

剪藏 2025年11月20日

命名奇葩，但终于在 SWE-Bench Verified 上赶上 Sonnet-4.5 了，原生针对 compaction 精调、适配 Windows 环境，长程工作精进、在 METR 取得了新 SoTA

"Building more with GPT-5.1-Codex-Max | OpenAI"

openai.com

剪藏 2025年11月20日

alphaXiv 融了 Menlo 领投的 7 百万美元的种子轮，原以为只是个迭代很快的校园产品，没想到在寻求更大发展，与之相对的可能就是康奈尔的 arxiv 了

"alphaXiv on X: "We just raised a $7M Seed round co-led by @MenloVentures and @haystackvc with participation from @Shakti_VC, @conviction and @upfrontvc 🚀 We're honored to have the support of incredible angels including @ericschmidt, @SebastianThrun, @sarahookr Join us: https://t.co/IKwK8KsG96 https://t.co/tzOpr7TcAX" / X"

x.com

剪藏 2025年11月19日

Midjourney 的社交初尝试？

"Midjourney on X: "We're launching user profiles today! Customize your own page with usernames, social links, banners, and more. Follow your friends and spotlight your favorite images. Everyone who fills out a full profile with >8 spotlights in the next 24 hours get 5 free fast hours so gogogo <3 https://t.co/dxEMeYTgv5" / X"

x.com

剪藏 2025年11月19日

继去年底写进趋势后，GenUI 终于迎来大玩家入场，看看是否会对软件形态带来缓慢的大变革

"Generative UI: A rich, custom, visual interactive user experience for any prompt"

research.google

剪藏 2025年11月19日

屠榜的 Gemini 3 Pro Preview，百万窗口、64K 文本输出； Pro 以上订阅用户可在 AI Mode 使用，帮你规划帮你学习； Ultra 订阅独享更进一步的 Gemini 3 Deep Think 和通用智能体 Gemini Agent；疑似改自 Windsurf 的又一款 VSCode fork AI IDE：Google Antigravity；

"Gemini 3: Introducing the latest Gemini AI model from Google"

blog.google

剪藏 2025年11月19日

AI 联姻已成大网，英伟达和微软分别向 Anthropic 投 100 亿和 50 亿美元，而 Anthropic 承诺从 Azure 购买 300 亿的算力 + 追加 1 GW 的算力订购，将 Claude 系列模型带入微软家族

"Microsoft, NVIDIA and Anthropic announced new strategic partnerships. \ Anthropic"

anthropic.com

剪藏 2025年11月18日

“We do not battle for scope,” Simo says. “We battle for less scope.”

"OpenAI's Fidji Simo Plans to Make ChatGPT Way More Useful—and Have You Pay For It | WIRED"

wired.com

剪藏 2025年11月17日

Cloudflare 收购 Replicate，Workers 可用的模型大幅增多，AI 蓝图也愈发宏伟

"Replicate is joining Cloudflare – Replicate blog"

replicate.com

剪藏 2025年11月17日

竞技场新高、降幻觉、创意写作提升（仍低于 GPT-5.1）、图文混合回答

"Grok 4.1 | xAI"

x.ai

剪藏 2025年11月17日

Sakana 融了 1.35 美元的 B 轮，估值来到 26 亿

"Announcing Our Series B"

sakana.ai

剪藏 2025年11月14日

稀疏化 + 剪枝至最小任务可行探索可解释性

"Understanding neural networks through sparse circuits | OpenAI"

openai.com

剪藏 2025年11月14日

在游戏环境中，用 Gemini 下任务、定奖励，SIMA 把经验记下来训练，无需人提供样本就能训练 Agent，迁移效果不错，还能与 Genie 3 联动

"SIMA 2: A Gemini-Powered AI Agent for 3D Virtual Worlds - Google DeepMind"

deepmind.google

剪藏 2025年11月14日

GPT-5.1 的 Coding benchmark，同时 API 上线 GPT-5.1-Codex

"Introducing GPT-5.1 for developers | OpenAI"

openai.com

剪藏 2025年11月13日

Cursor 以 293 亿美元估值融了 23 亿的 D 轮，Accel 领投，老伙伴 Thrive、a16z、DST，新伙伴 Coatue、NVIDIA、Google ARR 已超过 10 亿；团队规模 300+ 人

"Past, Present, and Future · Cursor"

cursor.com

剪藏 2025年11月13日

伴随 GPT-5.1 发布，OpenAI 产品 CEO Fidji Simo 解释 AI 个性化的必然性

"Moving beyond one-size-fits-all - Fidji Simo"

fidjisimo.substack.com

剪藏 2025年11月13日

- 为什么 vibe coding 的网页都是紫色？ - 随机变量的收敛

"Improving frontend design through Skills | Claude"

claude.com

剪藏 2025年11月13日

Anthropic 内部一场有趣的一日实验，控制机器狗，但一队用 Claude，另一队不能用 Claude（Claude-less，太惨了）。不太严谨的对比分析，但 Team Claude 显著用时更短、更接近完成，虽然在两个子任务上有相反的结果。竟然还通过队内录音，分析对比了两队情绪变化，自然是 Team Claude 更开心。

"Project Fetch: Can Claude train a robot dog? \ Anthropic"

anthropic.com

剪藏 2025年11月13日

GDM 发了篇 Nature，讲如何对齐 AI 视觉与语义

"Teaching AI to See the World More Like Humans Do - Google DeepMind"

deepmind.google

剪藏 2025年11月13日

Marble GA，响应李飞飞的愿景长文，官网同步焕新，创意、体验、模拟、学习等案例初露商业化苗头

"Marble: A Multimodal World Model | World Labs"

worldlabs.ai

剪藏 2025年11月13日

响应 AI 行动计划，Anthropic 投资 500 亿美元与 Fluidstack 建设数据中心。同时透露其企业客户已超 30 万，其中 10 万美元以上的客户在过去一年增长了近 7 倍

"Anthropic invests $50 billion in American AI infrastructure \ Anthropic"

anthropic.com

剪藏 2025年11月13日

特斯拉 AI 负责人 Ashok 在 ICCV 上的分享，讲端到端视觉路线的选择和三个挑战： 1. 维度诅咒，20亿token输入、2token输出，如何有效学习？多亏了数据积累和数据工程！ 2. 可解释性与安全保障，通过中间推理过程（如可泛化的生成式高斯溅射）来解决 3. 评估，通过世界模拟器来解决，甚至泛化到了 Optimus

"Ashok Elluswamy on X: "Tesla's approach to Autonomy" / X"

x.com

剪藏 2025年11月13日

风格调教优化 + adaptive thinking 增强 + 更多个性化

"GPT-5.1: A smarter, more conversational ChatGPT | OpenAI"

openai.com

剪藏 2025年11月12日

非常好的关于 Agentic Coding 的思考和推演，模型与 harness 的螺旋进展、三类用户

"Here's What's Next in Agentic Coding - Seconds_0 Substack"

seconds0.substack.com

剪藏 2025年11月12日

agentic 能力分级

"RL Environments and the Hierarchy of Agentic Capabilities"

surgehq.ai

剪藏 2025年11月11日

FAIR 新作，把 ASR 的语种覆盖推向了新高度

"Omnilingual ASR: Advancing Automatic Speech Recognition for 1,600+ Languages"

ai.meta.com

剪藏 2025年11月11日

喜欢引用的爱因斯坦这句 Creativity is intelligence having fun，可惜是用于推广产品的

"From Words to Worlds: Spatial Intelligence is AI’s Next Frontier"

drfeifei.substack.com

剪藏 2025年11月11日

“9-9-6 is irrelevant. People just love their work.”

"Inside Cursor - Colossus"

joincolossus.com

剪藏 2025年11月10日

Gamma 融了 a16z 领投的 B轮 6800万，估值21亿美元，团队仅50人，ARR 1亿，用户7千万，每月新增3千万gamma

"Grant Lee on X: "Today, as shared by The New York Times, we’re announcing two things: >Our Series B at a $2.1B valuation led by @sarahdingwang at @a16z. >Reaching $100M ARR, profitably, with a team of just 50 people. That's $2M ARR per employee. PowerPoint was invented before the first website, https://t.co/4SApKYltiC" / X"

x.com

剪藏 2025年11月10日

kimi infra工程师讲解k2-thinking的原生int4量化考虑，一个重要发现是在 thinking 模型上，随着 token 长度增加，PTQ量化误差会被放大导致失真，所以用QAT。 INT4 QAT对RL也有好处，长尾rollout效率显著提升。不用MXFP4/NVFP4等，是为了更好支持非Blackwell架构的硬件，且int4就够用了。 W4A16：权重4bit、激活16bit

"Kimi K2 Thinking模型发布并开源，该模型哪些信息值得关注？ - 知乎"

zhihu.com

剪藏 2025年11月8日

GoodFire 通过 loss curvature（误差曲率）研究大模型是如何记住东西的：通过K-FAC获取的曲率面信息、解构权重矩阵、然后类似PCA看主要成分。结论还一定程度分析了记忆、数学、逻辑推理等的敏感度

"Understanding Memorization via Loss Curvature"

goodfire.ai

剪藏 2025年11月7日

collective intelligence

"Expanding our mission: to grow the world’s collective intelligence - The Quora Blog - Quora"

quorablog.quora.com

剪藏 2025年11月7日

增强了浏览代理能力

"The New Comet Assistant"

perplexity.ai

剪藏 2025年11月7日

除了 Coding 外基本 SoTA，200-300轮长程任务，256k窗口，原生int4 qat

"Kimi K2 Thinking"

moonshotai.github.io

剪藏 2025年11月6日

2B 客户过百万

"1 million business customers: the fastest-growing business platform in history | OpenAI"

openai.com

剪藏 2025年11月6日

在应用层，OpenAI 也持续领先

"OpenAI on X: "You can now interrupt long-running queries and add new context without restarting or losing progress. This is especially useful for refining deep research or GPT-5 Pro queries as the model will adjust its response with your new requirements. Just hit update in the sidebar and https://t.co/kESrkU9hc9" / X"

x.com

剪藏 2025年11月5日

一些推演

"Thoughts by a non-economist on AI and economics – Windows On Theory"

windowsontheory.org

剪藏 2025年11月5日

TPU 上天了，主要是发射成本高

"Exploring a space-based, scalable AI infrastructure system design"

research.google

剪藏 2025年11月5日

Cursor 向左，Cognition 向右

"Cognition | Windsurf Codemaps: Understand Code, Before You Vibe It"

cognition.ai

剪藏 2025年11月5日

与之前 Cloudflare 的博客理念相同，借助代码生成来更高效地完成 LLM 对接工具、资源的 MCP 工作

"Code execution with MCP: building more efficient AI agents \ Anthropic"

anthropic.com

剪藏 2025年11月5日

哈佛大学人类进化生物学学者对大模型做了“心理学”研究，分析发现 GPT 回复主要对应的是 WEIRD（Western, Educated, Industrialized,Rich, and Democratic）群体的特征。相关：之前有一篇关于大模型对不同人种生命排序的价值观研究

"https://scholar.harvard.edu/sites/scholar.harvard.edu/files/henrich/files/which_humans_09222023.pdf"

scholar.harvard.edu

剪藏 2025年11月5日

Sora2的一些幕后： - 上线时团队不足50人 - 早期测试过放在ChatGPT内的媒体流 → meme chains → remix → cameo（key breakthrough）让生成更个性化和有人味，用户就不只是消费了 - 70%用户创作 - 名人效应 - Bill：2028年视频模型在世界模拟上取得突破 - 推荐系统为创意优化 - 未来模型的优化方向：不只是娱乐，可以实用，比如科学模拟、涡流建模 - 商业化：订阅/广告都有可能

"Inside OpenAI's Sora: Surge to #1 App, Key Product Decisions & How Video Models Learn Physics - YouTube"

youtube.com

2025年10月

剪藏 2025年10月31日

MinixMax 开启发布周，开源的 M2 为 230BA10B，稀疏度大于 DeepSeek 小于 Qwen3-Next，放弃了 M1 的线性注意力机制

"MiniMax M2 & Agent，大巧若拙"

mp.weixin.qq.com

剪藏 2025年10月30日

与 Cursor 同一天，Cognition 也发了自己的新模型 SWE-1.5，AI Coding 越来越热闹，Agentic Coding 越来越主流

"Cognition | Introducing SWE-1.5: Our Fast Agent Model"

cognition.ai

剪藏 2025年10月30日

之前写 AGI 定义的团队与 Scale AI 一起做了一个劳动力指数，用 freelance 工作测试当下的主流 Agent，Manus 最高，仅 2.5% 好像 OpenAI 之前也有一个类似的 benchmark

"Remote Labor Index: Measuring AI Automation of Remote Work"

remotelabor.ai

剪藏 2025年10月30日

自己训练了模型 Composer 以及 UI 焕新支持多 Agent 并行

"Introducing Cursor 2.0 and Composer · Cursor"

cursor.com

剪藏 2025年10月29日

ICL、RAG、Full FT、LoRA、Cartridges、Memory Layers

"The Continual Learning Problem"

jessylin.com

剪藏 2025年10月29日

Cartesia TTS 新旗舰 Sonic-3

"Real-time TTS API with AI laughter and emotion | Cartesia Sonic-3"

cartesia.ai

剪藏 2025年10月29日

1X NEO 开启预定了，早鸟2万美元或者499/月，但2026年美区才发货，其他地区得2027年

"1X NEO Home Robot | Order Today"

1x.tech

剪藏 2025年10月28日

Model Spec 同步更新

"Strengthening ChatGPT’s responses in sensitive conversations | OpenAI"

openai.com

剪藏 2025年10月28日

Mercor 融了 3.25 亿刀的 C 轮，估值来到 100 亿

"Unlocking Human Potential in the AI Economy | Mercor Blog"

mercor.com

剪藏 2025年10月26日

美团的开源视频模型

"LongCat-Video - A Unified Foundational Video Generation Model"

meituan-longcat.github.io

剪藏 2025年10月25日

Anthropic 和 Thinking Machine 合作，结合 Model Spec 价值要求对多 LLM 压力测试

"Stress-testing model specs reveals character differences among language models"

alignment.anthropic.com

剪藏 2025年10月24日

Sky，之前 demo 是要做一个 macOS 上的新 AI 交互界面

"OpenAI acquires Software Applications Incorporated, maker of Sky | OpenAI"

openai.com

剪藏 2025年10月24日

ChatGPT 通过针对 Slack、Linear 等工作“环境”精调，实现企业场景的 agentic 知识工作

"Work smarter with your company knowledge in ChatGPT | OpenAI"

openai.com

剪藏 2025年10月23日

ChatGPT 开始基于 Project 连接人

"OpenAI on X: "Shared Projects are expanding to Free, Plus, and Pro users. Invite others to work together in ChatGPT using shared chats, files, and instructions all in one place. https://t.co/AqeaPGggqj" / X"

x.com

剪藏 2025年10月23日

难得。结合可灵的商业跑通，AI创作市场还是有PMF

"1.3亿美元！LiblibAI拿下国内AI应用赛道年度最大融资 | 量子位"

qbitai.com

剪藏 2025年10月23日

ChatGPT 的留存数据

"Deedy on X: "ChatGPT's product retention curves is a product manager's wet dream. Their 1 month retention has skyrocketed from <60% 2yrs ago to an unprecedented ~90%! Youtube was best-in-class with ~85%. 6mo retention is trending to ~80%. Rapidly rising smile curve. Generational product. https://t.co/qlLQrw0HkA" / X"

x.com

剪藏 2025年10月22日

红杉与 Sesame 合作，共同领投其 B 轮融资，打造语音智能伙伴，还在设计时尚 AI 眼镜

"Partnering with Sesame: A New Era for Voice | Sequoia Capital"

sequoiacap.com

剪藏 2025年10月22日

kyutai 技术博客，解释声音模型难题与方案

"Neural audio codecs: how to get audio into LLMs"

kyutai.org

剪藏 2025年10月22日

LangChain 凭借B轮融后12.5亿美金估值跻身独角兽

"LangChain raises $125M to build the platform for agent engineering"

blog.langchain.com

剪藏 2025年10月21日

与 DeepSeek-OCR 前后脚，智谱视觉压缩成果。 Q：Glyph 和 DeepSeek-OCR有何异同？ A：共同点：两者都从“视觉压缩”出发，利用视觉 token 承载更多的文本信息；不同点：DeepSeek-OCR 聚焦于真实文档 OCR 任务，验证的是视觉压缩下的文字还原能力；而 Glyph 则将这一思想应用到了更广泛的通用长文本任务中，真正验证了利用视觉模型实现上下文扩展的可行性。

"Glyph：通过视觉-文本压缩扩展上下文窗口"

mp.weixin.qq.com

剪藏 2025年10月21日

Krea 技术论文

"Krea Realtime 14B: Real-Time, Long-Form AI Video Generation"

krea.ai

剪藏 2025年10月21日

配套上云，开源 Claude Code 安全沙盒，对文件和网络做隔离

"Making Claude Code more secure and autonomous with sandboxing \ Anthropic"

anthropic.com

剪藏 2025年10月21日

Claude Code 上云了，AI Coding 的收敛方向是全栈和通吃

"Claude Code on the web \ Anthropic"

anthropic.com

剪藏 2025年10月21日

Anthropic 通过 Claude 新的 Connectors 和 Skills 辅助生命科学

"Claude for Life Sciences \ Anthropic"

anthropic.com

剪藏 2025年10月19日

Uber 新业务：司机闲时做数据标注

"Uber Giving Some US Drivers Option to Earn Money From Tasks Like Uploading Menus - Bloomberg"

bloomberg.com

剪藏 2025年10月17日

之前发的 Marble 太费算力，改用 DiT 训练一个单卡可跑的 RTFM，带位置&朝向的帧生成

"RTFM: A Real-Time Frame Model"

worldlabs.ai

剪藏 2025年10月17日

Google DeepMind 与 Commonwealth Fusion Systems（CFS）合作，借助 AI 推动可控核聚变，模拟等离子体、最大化能量、AI 控制系统等

"Bringing AI to the next generation of fusion energy - Google DeepMind"

deepmind.google

剪藏 2025年10月17日

Claude Skills 的实现方案，在给 Claude 配的虚拟机用文件夹和文本文件来描述技能，感觉和 Claude Code 越走越像了，优雅且附最佳实践

"Equipping agents for the real world with Agent Skills \ Anthropic"

anthropic.com

剪藏 2025年10月17日

OpenAI 的 AI 工作蓝图，串联起来了 AI 可及、培训、求职等一系列

"AI at Work: OpenAI’s Workforce Blueprint"

cdn.openai.com

剪藏 2025年10月17日

更快、更全栈、更生产

"Introducing Manus 1.5"

manus.im

剪藏 2025年10月17日

Google 是如何梳理过去 10 年的 AI 基因研究工作的

"Ten years of genomics at Google"

blog.google

剪藏 2025年10月17日

HeyGen ARR 破亿，同时给出了名为 Building in the AI Era: The HeyGen Way 的手册

"Joshua Xu on X: "HeyGen just hit $100M ARR this month, 29 months after we first reached $1M in April 2023. None of this happens without our incredible team, customers, partners, and community. Thank you 💜 When we shared our first $1M milestone, it was to give back to the build-in-public" / X"

x.com

剪藏 2025年10月16日

语音交互、可以读屏、联动Office的 Copilot；可以执行任务的 Copilot Actions 仍活在虚拟机里；还在测试 Windows 原生应用版的 Manus，通过 Windows MCP 等打通本地资源

"Making every Windows 11 PC an AI PC | Windows Experience Blog"

blogs.windows.com

剪藏 2025年10月16日

谢赛宁团队提出 RAE，致力取代 DiT 中老旧的 VAE

"Diffusion Transformers with Representation Autoencoders"

rae-dit.github.io

剪藏 2025年10月16日

Sora2更新，正面竞争Google今天升级的Veo3.1：时长拓展，免费用户升至15s、Pro用户升至25s；之前的AI创作故事板重新上线给Pro用户，结合cameo能做出更长的角色稳定的视频

"OpenAI on X: "2 Sora 2 updates: - Storyboards are now available on web to Pro users - All users can now generate videos up to 15 seconds on app and web, Pro users up to 25 seconds on web https://t.co/iINg7alWGL" / X"

x.com

剪藏 2025年10月16日

Poolside 宣布与 Coreweave 合作的 2GW 德州 AI 算力计划，哪来的钱啊？

"poolside — Announcing Project Horizon: Why we're building a 2 gigawatt AI campus in Texas"

poolside.ai

剪藏 2025年10月16日

Meta 在德州的 1GW AI 算力计划

"Breaking Ground on Our New AI-Optimized Data Center in El Paso"

about.fb.com

剪藏 2025年10月16日

Agent 框架

"We raised $11M to redefine how developers build AI agents • Dedalus Labs ⁘ Dedalus Labs"

dedaluslabs.ai

剪藏 2025年10月16日

Haiku 4.5 编程能力逼近 GPT-5，让 Claude for Chrome 跑得更快；但与 Sonnet 3/15 和 Opus 15/75 的稳定价格不同，Haiku 的定价一直在涨，也许跟 Anthropic 根据“智能”定价的策略有关？ Haiku 3: 0.25/1.25 Haiku 3.5: 0.8/4 Haiku 4.5: 1/5

"Introducing Claude Haiku 4.5 \ Anthropic"

anthropic.com

剪藏 2025年10月16日

结构化记忆功能更新，ChatGPT 帮你自动管理

"OpenAI on X: "ChatGPT can now automatically manage your saved memories—no more “memory full.” You can also search and sort memories by recency, and choose which to re-prioritize in settings. Rolling out to Plus and Pro users on the web globally starting today. https://t.co/T1vSNH5289 https://t.co/xRHLFTu2Am" / X"

x.com

剪藏 2025年10月16日

主要升级在编辑能力：参考生视频、首尾帧、延展

"Bringing new Veo 3.1 updates into Flow to edit AI video"

blog.google

剪藏 2025年10月15日

基于已有的AI经济指数研究，针对不同场景推演提出政策建议

"Preparing for AI’s economic impact: exploring policy responses \ Anthropic"

anthropic.com

剪藏 2025年10月14日

OpenAI 应用 CEO Fidji Simo 讲述 ChatGPT Pulse 功能背后的思考，懂你的个性化助理 + 推理模型不为人知的能力 + 可控 + 连接 Apps，最后这点结合 Apps SDK 看，真的是新的操作系统

"A new paradigm of proactive, steerable AI - Fidji Simo"

fidjisimo.substack.com

剪藏 2025年10月14日

n8n 上新，通过对话让 AI 帮你做工作流

"n8n.io on X: "🚀 Introducing AI Workflow Builder (Beta) - Turn prompts into living workflows. Generate nodes, logic, and structure from text, then shape and ship your vision faster. Rolling out this week to n8n Cloud (Trial, Starter, Pro). Update to v.1.116.0 to try it. Learn more: https://t.co/trWkxIVOck" / X"

x.com

剪藏 2025年10月14日

继 8 月的 MAI-Voice-1 和 MAI-1-preview 后，Microsoft AI 推出生图模型 MAI-Image-1，竞技场前10，但目前显著落后于第一梯队，规划上线 Copilot

"Introducing MAI-Image-1, debuting in the top 10 on LMArena | Microsoft AI"

microsoft.ai

剪藏 2025年10月13日

OpenAI 与博通定制 AI 加速芯片计划官宣，预计2026年下半年开始部署

"OpenAI and Broadcom announce strategic collaboration to deploy 10 gigawatts of OpenAI-designed AI accelerators | OpenAI"

openai.com

剪藏 2025年10月11日

Google 月消耗 token 1.3 万亿

"Gemini Enterprise: Sundar Pichai remarks at Gemini at Work"

blog.google

剪藏 2025年10月11日

上线11天，周活接近2百万，其中70%都有生成内容

"TBPN on X: "OpenAI's Head of Sora @billpeeb says a stunning 70% of Sora's nearly 2 million weekly active users are creating content. https://t.co/OE9r7nIe3Z" / X"

x.com

剪藏 2025年10月10日

GPT-5-Pro 目前最强

"Evaluating Gemini 2.5 Deep Think's math capabilities | Epoch AI"

epoch.ai

剪藏 2025年10月10日

从 Qwen3-30BA3B 转化来的 DLM 扩散语言模型，用了 500B token CPT 增训，benchmark 增益似乎不是很大

"RND1: Simple, Scalable AR-to-Diffusion Conversion · Radical Numerics"

radicalnumerics.ai

剪藏 2025年10月10日

为家庭设计：柔性亲肤外表、更轻、体积更小、无线充电视觉升级 for Helix VLA：刷新率2倍、延时1/4、视角广60%、掌心摄像头；为规模化扩张准备好：新供应链、垂直整合，BotQ 设计年产 1.2 万台，目标未来 4 年总产 10 万台

"Introducing Figure 03"

figure.ai

剪藏 2025年10月10日

语言模型召回上下文实体的机制

"Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context | Yoav Gur-Arieh"

yoav.ml

剪藏 2025年10月10日

Jina 加入 Elastic 了，AI + Search/Retrieve

"Elastic and Jina AI join forces to advance open source retrieval for AI applications | Elastic Blog"

elastic.co

剪藏 2025年10月10日

DeepMind 的代码世界模型（code world model，CWM）

"Kevin Patrick Murphy on X: "I am pleased to announce our new paper, which provides an extremely sample-efficient way to create an agent that can perform well in multi-agent, partially-observed, symbolic environments. The key idea is to use LLM-powered code synthesis to learn a code world model (in the form https://t.co/Srt2AEwA0M" / X"

x.com

剪藏 2025年10月10日

Air Street Capital 年度 State of AI 报告 2025，已经是第8年：https://docs.google.com/presentation/d/1xiLl0VdrlNMAei8pmaX4ojIOfej6lhvZbOIK7Z6C-Go/

"Welcome to State of AI Report 2025"

stateof.ai

剪藏 2025年10月10日

AI 合同

"Spellbook Raises $50m Series B led by Khosla Ventures - Spellbook"

spellbook.legal

剪藏 2025年10月10日

Reflection 已经融了20亿美元，致力于开源

"Building Frontier Open Intelligence Accessible to All | Reflection AI"

reflection.ai

剪藏 2025年10月10日

Claude Code 插件，把mcp、hook等打包起来

"Customize Claude Code with plugins \ Anthropic"

anthropic.com

剪藏 2025年10月9日

OpenAI devday 发布了 agent builder 后，n8n 宣布 Accel 领投的 1.8 亿美元 C 轮融资，累计已融 2.4 亿，估值来到 25 亿，押注 AI 编排和协作，今年以来用户增长 6 倍、营收增长 10 倍，野心是 > n8n becomes the default platform to build with AI

"n8n raises $180m to get AI closer to value with orchestration – n8n Blog"

blog.n8n.io

剪藏 2025年10月9日

Spra 5天百万下载

"Bill Peebles on X: "sora hit 1M app downloads in <5 days, even faster than chatgpt did (despite the invite flow and only targeting north america!)! team working hard to keep up with surging growth. more features and fixes to overmoderation on the way!" / X"

x.com

剪藏 2025年10月8日

继 Comet 后，Dia 全面开放

"Josh Miller on X: "Starting today, the @browsercompany is back to shipping weekly updates. October's Dia releases include: • More powerful memory (of your tabs) • Redesigned Dia Skills • Arc's Focus Mode (CMD-S) All landing in @diabrowser this month. Oh, we removed the waitlist today too 🤗 https://t.co/YKdecFrQ3n" / X"

x.com

剪藏 2025年10月8日

学物理的 Yao Shunyu 在 Anthropic 一年后选择转投 DeepMind，40% 的原因是 Dario 等的对华敌意

"My infant year as an AI researcher — Moving from physics to AI"

alfredyao.github.io

剪藏 2025年10月7日

Sora 2 开放 API，标准版支持 1280x720，Pro 额外支持 1792x1024，每秒价格分别为 0.1、0.3、0.5 刀，意味着 10s 的 Sora 标准版价格为 $1，API 中时长仅支持 4、8、12 秒三种

"Sora 2 Prompting Guide"

cookbook.openai.com

剪藏 2025年10月7日

打通应用，成为入口

"Introducing apps in ChatGPT and the new Apps SDK | OpenAI"

openai.com

剪藏 2025年10月7日

与一众框架和低代码平台正面碰撞

"Introducing AgentKit | OpenAI"

openai.com

剪藏 2025年10月4日

网页裸眼3D

"3D Viewer Demo"

lab.true3d.com

剪藏 2025年10月4日

OpenAI正在基于Sora的首批数据和反馈，考虑进行快速的迭代更新： 1. 给版权方选择，决定其角色可否用于生成（特别点了日本，应该是指动漫） 2. 考虑基于互动数的商业模式，并与版权方分成

"Sora update #1 - Sam Altman"

blog.samaltman.com

剪藏 2025年10月3日

Perplexity Comet 全网开放

"The Internet is Better on Comet"

perplexity.ai

剪藏 2025年10月3日

生产力指数，感觉是不是可以和 GDPval 放一起

"Introducing APEX: The AI Productivity Index | Mercor Blog"

mercor.com

剪藏 2025年10月3日

AI 在 OpenAI 内部应用

"Building OpenAI with OpenAI | OpenAI"

openai.com

剪藏 2025年10月1日

Cerebras 融了 11 亿刀的 G 轮，估值 81 亿

"Cerebras Raises $1.1 Billion at $8.1 Billion Valuation"

cerebras.ai

剪藏 2025年10月1日

OpenAI 称 Sora 1 是 GPT-1 时刻，而 Sora 2 直接来到了 GPT-3.5 时代。新的 Sora App 和社交属性的 cameos 功能： > We think a social app built around this “cameos” feature is the best way to experience the magic of Sora 2.

"Sora 2 is here | OpenAI"

openai.com

剪藏 2025年10月1日

借助 AlphaEvolve 研究复杂度理论

"AI as a research partner: Advancing theoretical computer science with AlphaEvolve"

research.google

剪藏 2025年10月1日

前 OpenAI 后训练负责人 William Fudus 和前 DeepMind 材料&化学负责人 Ekin Dogus Cubuk 联合创立的 Periodic Labs，致力于打造 AI 科学家、自主研究发现，融了 a16z 领投的 3亿美元

"Periodic Labs"

periodic.com

2025年9月

剪藏 2025年9月30日

正面刚 Claude Sonnet 4.5，不过看榜单可能还略逊一筹，有望超过 Sonnet 4

"智谱旗舰模型GLM-4.6上线，代码能力全面进阶"

mp.weixin.qq.com

剪藏 2025年9月30日

OpenAI 上线 Instant Checkout，开源了其背后的 Agentic Commerce Protocol（ACP，智能体交易协议，与 Stripe 合作开发），为 ChatGPT 带来 App 内的一站式购物体验，支持 U.S. Etsy 和 Shopify 等商家，要为成功的交易付手续费（没找到费率）

"Buy it in ChatGPT: Instant Checkout and the Agentic Commerce Protocol | OpenAI"

openai.com

剪藏 2025年9月30日

LoRA 之优劣：数据量、rank、layer 等因素，验证RL有效，给出了超参经验值

"LoRA Without Regret - Thinking Machines Lab"

thinkingmachines.ai

剪藏 2025年9月30日

微软给 Office 加入了 Agent，Word、Excel（PPT开发中）订阅用户可用，其中 Excel 有篇技术博客介绍如何处理表格上下文、生成校验等，在 SpreadsheetBench 准确率 57.2% 距离人的 71.3% 还有差距，但比先前的一众不及 50% 的产品已高出不少

"Vibe working: Introducing Agent Mode and Office Agent in Microsoft 365 Copilot | Microsoft 365 Blog"

microsoft.com

剪藏 2025年9月30日

Modal 以 11 亿美元估值融了 8000 万的 B 轮，成为 AI infra 独角兽，要做包括推理、沙盒、并行批处理、训练、Notebook 顶层产品和全栈底层平台

"Announcing our $87M Series B | Modal Blog"

modal.com

剪藏 2025年9月30日

Coding 继续进步，长时自主、上下文感知优化、Memory；Claude Code 升级 2.0，多界面、更易用；Claude Code SDK 升级为 Claude Agent SDK，强调通用

"Introducing Claude Sonnet 4.5 \ Anthropic"

anthropic.com

剪藏 2025年9月29日

进一步极致效率，得益于 DSA 稀疏注意力带来的成本降低，价格大减：输入4→2，输出12→3

"DeepSeek-V3.2-Exp 发布，训练推理提效，API 同步降价"

mp.weixin.qq.com

剪藏 2025年9月28日

不是 Veo3 技术报告，是 Veo3 提示词/能力探索报告

"Video models are zero-shot learners and reasoners"

arxiv.org

剪藏 2025年9月28日

80BA13B的原生多模态生图

"混元图像3.0正式发布：开源，免费使用"

mp.weixin.qq.com

剪藏 2025年9月26日

黑森林的模型接入 PS，同时还有 nano banana

"FLUX.1 Kontext now in Adobe Photoshop: Powering Every Pixel | Black Forest Labs"

bfl.ai

剪藏 2025年9月26日

Gemini 2.5 Flash 和 Flash-Lite 更新，性能提升、时延降低

"Continuing to bring you our latest models, with an improved Gemini 2.5 Flash and Flash-Lite release - Google Developers Blog"

developers.googleblog.com

剪藏 2025年9月26日

ChatGPT 主动向你（Pro用户移动应用）发起消息，不保存则默认删除，某种意义上是显式的个性化推荐

"Introducing ChatGPT Pulse | OpenAI"

openai.com

剪藏 2025年9月26日

9个行业44种职业知识类真实任务 one-shot 交付测评，Claude Opus 4.1 最强，47.6%情况不输于行业专家，GPT-5 high 38.8%

"Measuring the performance of our models on real-world tasks | OpenAI"

openai.com

剪藏 2025年9月25日

VLM（Gemini Robotics ER 1.5，支持思考、联网搜索等工具） + VLA（Gemini Robotics 1.5）

"Gemini Robotics 1.5 brings AI agents into the physical world - Google DeepMind"

deepmind.google

剪藏 2025年9月25日

Gamma 分享战略与组织，最近分享有点多，但是 ARR 好像自5月以来就没再显著增长了？

"Grant Lee on X: "Gamma crossed $50M ARR with 28 employees and more cash in the bank than we had raised ($23M) In hindsight: We got here because we ignored common VC advice. Examples of glaringly bad advice that you should ignore to save you $10M+ and years of time, like we did for Gamma: https://t.co/GV5zrIQtsD" / X"

x.com

剪藏 2025年9月25日

用 A2D 把自回归 VLM（Qwen2.5-VL）精调改造为扩散 VLM，提高训练推理效率

"Runway Research | Autoregressive-to-Diffusion Vision Language Models"

runwayml.com

剪藏 2025年9月25日

Scale AI 进军具身智能数据

"Expanding Our Data Engine for Physical AI | Scale"

scale.com

剪藏 2025年9月24日

Wan2.5-Preview，声画同步的视频生成，还支持图片生成，5s/10s，质感还比不上Veo3，也未开源

"Wan on X: "Today, we're officially launching Wan2.5-Preview! It's set to reshape the future of visual generation with a new architecture and powerful features. • Architectural Features: Native Multimodality, Deep Alignment ∘ Native Multimodal Architecture: Adopts a new, unified framework" / X"

x.com

剪藏 2025年9月24日

基于 Qwen3-Omni 精调，不开源

"Qwen3‑LiveTranslate: Real‑Time Multimodal Interpretation — See It, Hear It, Speak It！"

qwen.ai

剪藏 2025年9月24日

同期，ChatGPT Go 也在拓展支持区域，二者在发展中国家正面碰撞

"Google AI Plus expands to 40 more countries"

blog.google

剪藏 2025年9月24日

陈·扎克伯格慈善组织 CZI 发起教育项目 Learning Commons，旨在把 learning science 带入课堂工具，其中知识图谱就是主要方法之一，他们与 Anthropic 合作，通过 MCP 将知识图谱与 Claude 连接起来，带入课堂给老师用。还开放了部分数据出来供开发者用：learning-commons-org/knowledge-graph

"Scaling Proven Learning Practices with AI Tools for Education"

chanzuckerberg.com

剪藏 2025年9月24日

基于nano banana的画板

"Mixboard: Google Labs’ new experiment to visualize ideas"

blog.google

剪藏 2025年9月24日

Gemini Live / Gemini 2.5 Flash Native Audio Preview 更新：Function Calling 更鲁棒；对话更自然，知道过滤背景对话、暂停等人说完、更自然的打断

"Google AI Studio on X: "Build more powerful voice agents with the Gemini Live API " / X"

x.com

剪藏 2025年9月24日

个性化音频/播客？

"Huxe AI | Content that exists because you do"

huxe.com

剪藏 2025年9月23日

Meta 超级智能团队呼应 AI 下半场，推出 Agent 研究环境 ARE，外加 Gaia2 评测，旨在 scale up 智能体环境和评测

"Gaia2 and ARE: Empowering the community to study agents"

huggingface.co

剪藏 2025年9月20日

小米的原生语音模型，逼近Gemini-2.5-Flash。预训练验证了语音模型的涌现，后训练实现语音理解与推理、指令TTS

"Introducing MiMo-Audio"

xiaomimimo.github.io

剪藏 2025年9月19日

Decart 基于 Wan 2.2 做的视频编辑模型

"Decart on X: "We are building “Open Source Nano Banana for Video” - here is open source demo v0.1 We are open sourcing Lucy Edit, the first foundation model for text-guided video editing! Get the model on @huggingface 🤗, API on @FAL, and nodes on @ComfyUI 🧵 https://t.co/1A2t7VPbcO" / X"

x.com

剪藏 2025年9月19日

微软在威斯康星州的 AI 数据中心，还可以在线虚拟浏览：https://datacenters.microsoft.com/tour/

"Inside the world’s most powerful AI datacenter - The Official Microsoft Blog"

blogs.microsoft.com

剪藏 2025年9月18日

Gemini in Chrome 正式上线

"Go behind the browser with Chrome’s new AI features"

blog.google

剪藏 2025年9月18日

Notion Agent

"Introducing Notion 3.0"

notion.com

剪藏 2025年9月18日

AI4S 这块 GDM 出场率很高

"Discovering new solutions to century-old problems in fluid dynamics - Google DeepMind"

deepmind.google

剪藏 2025年9月18日

Groq 融了 7.5 亿刀，融后估值 69 亿

"Groq Raises $750 Million as Inference Demand Surges | Groq is fast inference for AI builders"

groq.com

剪藏 2025年9月18日

OpenAI Realtime API 持续迭代

"Developer notes on the Realtime API"

developers.openai.com

剪藏 2025年9月18日

ElevenLabs 不满足于只做声音了，新的 Studio 3.0 直接是 AI 原生视频编辑器，尽管是围绕 AI 声音打造的

"ElevenLabs on X: "Introducing Studio 3.0 The most advanced AI audio models in a single editor, now with video support: •Voiceovers •Music •Sound Effects •Voice Isolation •Voice Changer Plus new Automatic Captioning, Speech Correction for real-life recordings, and Multiplayer Commenting. https://t.co/rARyIfJ48U" / X"

x.com

剪藏 2025年9月18日

yipit data数据显示 Google 搜索 DAU/MAU 比例下滑，ChatGPT 稳步上升

"Olivia Moore on X: "Google Search's dominance is finally starting to slip DAU/MAU for desktop users (dotted line👇) is steadily falling, down a few pct points vs. early 2023 ...while ChatGPT's DAU/MAU continues to climb https://t.co/fDFaYEOMOh" / X"

x.com

剪藏 2025年9月18日

Gemini 2.5 Deep Think 在国际最难的编程比赛 ICPC 中达到金牌水平，解出 12 道题中的 10 道。作为对比，OpenAI 用了 GPT-5 并行解法 + 实验性通用推理模型挑选的方式解出了 12 道中的 11 道，最后的一道用这个实验推理模型多次提交后也解出了。最牛的大学生队伍解出了 11/12。Cognition CEO Scott Wu 评价“你们不知道 ICPC 究竟有多难”。

"Gemini achieves gold-level performance at the International Collegiate Programming Contest World Finals - Google DeepMind"

deepmind.google

剪藏 2025年9月18日

单侧显示屏 + 腕带操控

"Introducing Meta Ray-Ban Display: A Breakthrough Category of AI Glasses | Meta Quest Blog | Meta Store"

meta.com

剪藏 2025年9月18日

Anthropic 对 8月-9月初的模型降智做了复盘，主要还是 infra bug

"A postmortem of three recent issues \ Anthropic"

anthropic.com

剪藏 2025年9月16日

区别于深度图和点云，World Labs 新的 3D 世界模型 Marble 能够生成更丰富、复杂、完整的世界，从房间大小拓展到了大平层？

"Generating Bigger and Better Worlds"

worldlabs.ai

剪藏 2025年9月16日

Gamma Agent 和 API 上线

"Grant Lee on X: "Excited to introduce: Gamma 3.0 A generational leap for the world's most popular AI presentation tool. Two major changes: 1. Gamma Agent - with one prompt, you can make sweeping edits across the presentation. a. Say 'make it more visual' and it will scan each slide for data https://t.co/CcsZEDbJE2" / X"

x.com

剪藏 2025年9月16日

HeyGen Video Agent 公测

"HeyGen on X: "Five years ago, we set out to make professional video creation effortless for everyone. Today, we push it further with THREE big announcements. First, Video Agent (beta) launches to turn your prompt into a finished, publish-ready video. It’s our first move toward a creative https://t.co/SJtgY5q4xo" / X"

x.com

剪藏 2025年9月16日

Mercor（17个月1→$5亿营收的零工平台）CEO Brendan 谈真实世界的工作如何为 AI 提供强化学习环境，在专业领域、模型边界、长时工作等方面，AI 仍需要人类反馈（作为环境的一部分）来持续提升；但营收还是 GMT 受到质疑

"The Economy Will Become an RL Environment Machine | Mercor Blog"

mercor.com

剪藏 2025年9月16日

ElevenLabs 开始在标准化 AI 产品外，补上缺少的那部分人工验证，来完成端到端的企业生产场景落地，$2/min起

"Introducing Productions: human-edited content, done for you | ElevenLabs"

elevenlabs.io

剪藏 2025年9月16日

OpenAI 也发布了 ChatGPT 使用报告，基于百万量级的采样数据做的分析，信息量很大，一些要点： - 用户性别（基于名字判断）从早期的不均衡已基本趋平，近一半消息来自26岁以下用户，低收入国家使用增长显著 - ChatGPT 在工作和生活场景的使用约三七开，生活使用增长更快 53%→70% - 与 Anthropic 的 Augmentation/Automation 划分不同，OpenAI 用了 Asking、Doing、Expressing 的方式，分别占比 49%、40%、11% - 近八成使用可归入三类：操作指引，how-to 类建议等；获取信息，找人/事/产品等，替代搜索；写作，邮件/文档的生成和编辑，其中2/3是修改编辑，从0生成占1/3 - ”让AI教我“类占总量10%，突出ChatGPT的教育价值 - 与 Claude 大比例（33%）用于软件开发不同，编程在 ChatGPT 使用中仅占 4.2% - 陪伴类占比较低仅1.9%

"How people are using ChatGPT | OpenAI"

openai.com

剪藏 2025年9月16日

在 GPT-5 上继续针对 Coding 强化而来的 GPT-5-Codex，token efficiency 是个亮点、上下限范围更大，即能在简单的问题上用更少的 token 清晰解决问题，复杂的问题也能比 GPT-5 想更久

"Introducing upgrades to Codex | OpenAI"

openai.com

剪藏 2025年9月15日

Anthropic 继续更新基于 Claude 使用数据的 Economic Index 研究，这次对比了区域使用情况，对美国各州的分析有点“AI指数”的味儿了

"Anthropic Economic Index: Tracking AI's role in the US and global economy \ Anthropic"

anthropic.com

剪藏 2025年9月14日

差分隐私训练的 Gemma，性能与普通版本仍有差距

"VaultGemma: The world's most capable differentially private LLM"

research.google

剪藏 2025年9月14日

Gamma 的分享，主要是如何做 GTM

"Grant Lee on X: "we grew from zero to $50M ARR in <2 years, profitably i've never publicly shared our tactics before it's cost us over $5M to learn what I'm about to share 800-word long post on every growth hack that printed money for us I'll cover: 1. Influencer Marketing 2. Performance https://t.co/d9IzZYZHWs" / X"

x.com

剪藏 2025年9月14日

从 Qwen3-Next 的架构创新看中美大模型实验室的差异

"JingyuanLiu on X: "I was lucky to work in both China and the US LLM labs, and I've been thinking this for a while. The current values of pretraining are indeed different: US labs be like: - lots of GPUs and much larger flops run - Treating stabilities more seriously, and could not tolerate spikes" / X"

x.com

剪藏 2025年9月12日

80BA3B，新的混合注意力架构 Gated DeltaNet + Gated Attention，更稀疏的 MoE（total 512 routed 10 + shared 1）

"Qwen3-Next: Towards Ultimate Training & Inference Efficiency"

qwen.ai

剪藏 2025年9月12日

Cursor 新补全模型，通过 on-policy RL 在推荐率降低 21% 的同时将采纳率提升了 28%。Tab 模型每天接收超 4 亿次请求，1.5-2个小时就能将完成一次权重上线、开始收集下一步的数据。

"Improving Cursor Tab With RL | Cursor - The AI Code Editor"

cursor.com

剪藏 2025年9月11日

Thinking Machines 首篇博客，探讨大模型推理的不确定性

"Defeating Nondeterminism in LLM Inference - Thinking Machines Lab"

thinkingmachines.ai

剪藏 2025年9月11日

Google Research 和 Stanford Accelerator for Learning 的合作项目 AI Quest，邀请 11-14 岁的学生以游戏化的形式探索 AI 如何解决真实世界的问题。可以在这里试玩：https://research.google/ai-quests/

"AI Quests from Google teaches AI literacy to kids"

blog.google

剪藏 2025年9月11日

月之暗面开源了一个大模型权重更新的的中间件，适用于 RL

"Kimi.ai on X: "Introducing checkpoint-engine: our open-source, lightweight middleware for efficient, in-place weight updates in LLM inference engines, especially effective for RL. ✅ Update a 1T model on thousands of GPUs in ~20s ✅ Supports both broadcast (sync) & P2P (dynamic) updates ✅ https://t.co/hurvLPDW1n" / X"

x.com

剪藏 2025年9月11日

Replit 融了 Prysm 领投的 2.5 亿美元，估值 30 亿刀，过去一年 ARR 从 280 万涨到 1.5 亿，用户超 4000 万；同时推出自主化程度更高的 Agent 3，可以运行更长时间，自主完成测试并 debug

"Replit Closes $250 Million in Funding to Build on Customer Momentum"

replit.com

剪藏 2025年9月11日

ChatGPT 订阅用户可以在设置中打开开发者模式，连接自定义MCP工具

"ChatGPT Developer mode - OpenAI API"

platform.openai.com

剪藏 2025年9月10日

美国人口调查局对120万家公司的双周调查显示，过去几个月，规模在250人以上的公司AI渗透率开始出现下滑

"AI Adoption Rate Trending Down for Large Companies - Apollo Academy"

apolloacademy.com

剪藏 2025年9月10日

硅谷 996。金融科技公司 Ramp 为企业提供信用卡及财务管理服务，有一些跟踪数据 Ramp AI Index 可供参考。

"San Francisco tech workers are working Saturdays"

ramp.com

剪藏 2025年9月10日

通过精心构建的幻觉检测数据集（用 Claude 联网来做 entity 标注），训练从激活中间层到幻觉可能性的 linear probes 映射，来实现实时 token 级的幻觉检测，发现还能从实体泛化到数学上

"Real-Time Detection of Hallucinated Entities in Long-Form Generation"

hallucination-probes.com

剪藏 2025年9月10日

MidJourney 新数据，25年8月 ARR 过亿，和 Meta 签了3千万、明年过亿的合约，而全职员工至今只有29人

"Arfur Rock on X: "Black Forest Labs is pioneering SOTA visual models (FLUX). August 2025: ~$100M ARR, +3.5x YoY, 78% GM. TCV >$300M over next 3 yrs, incl. a monster revenue deal with Meta — $35M Y1, $105M Y2 guaranteed. Only 29 FTE btw. Relatively under-discussed. Time to pay attention!" / X"

x.com

剪藏 2025年9月10日

MCP 推出官方商店

"Introducing the MCP Registry | mcp blog"

blog.modelcontextprotocol.io

剪藏 2025年9月10日

讲在 RL 的大势下，像 Cursor 这类原本的 API 套壳应用，也会用已经积累的数据做 RL 训练，而且应用本身就是天然的 RL 环境。文中首次提到了软件的分发，但目前还没看到比较深入的探讨。

"The Training Imperative"

sdan.io

剪藏 2025年9月10日

效仿 OpenAI 在印度推出的 ChatGPT Go 订阅，Google 针对印尼市场推出 $5 的 AI Plus 订阅方案

"Lakukan lebih banyak dengan AI: Pertama di dunia, Google AI Plus kini tersedia di Indonesia"

blog.google

剪藏 2025年9月9日

Atlassian 产品负责人谈收购 The Browser Company

"Atlassian exec details the $610M Browser Company acquisition – Computerworld"

computerworld.com

剪藏 2025年9月9日

数学家用 GPT-5 做研究的记录，结论还是初级助手，且作者担心 AI 研究不仅可能会使得真正原创和有价值的成果埋没在平庸的 AI 研究中，还有可能让研究生跳过试错研究的过程，而这是成为一名真正的数学家不可或缺的

"Mathematical research with GPT-5: a Malliavin-Stein experiment"

arxiv.org

剪藏 2025年9月9日

不寻常，ASML 领投了 Mistral 13亿欧元的 C 轮，后者估值来到 117 亿欧元（137亿美元）

"ASML, Mistral AI enter strategic partnership"

asml.com

剪藏 2025年9月9日

哈佛CS教授 & OpenAI 技术委员 Boaz Barak 面向哈佛&MIT 开的 AI 安全课程，内有 PPT 等材料

"CS 2881 AI Safety"

boazbk.github.io

剪藏 2025年9月9日

新的非侵入式 BCI，Alterego 从脑区直接读取你有意想说的话，背后是 Silent Sense 技术，类似于语音脑信号识别，配合当前的大模型后就是新的交互

"Introducing Alterego, the first near-telepathic interface, designed to make technology as intuitive as using your inner voice."

alterego.io

剪藏 2025年9月9日

ElevenLabs 发起 1亿刀的员工期权出售，收购方 Sequoia 和 ICONIQ 等，等价估值 66 亿刀，距离上次 33 亿估值的 C 轮融资仅过去了 9 个月；预期年底 ARR 达到 3 亿刀，其中企业客户过去一年增长 200%+，现在 2B 和 2C（使用自助服务的消费者）营收各占一半；员工人数从一年前的 70 增至现在的 330

"Announcing an Employee Tender Offer at $6.6B valuation | ElevenLabs"

elevenlabs.io

剪藏 2025年9月9日

Qwen 语音识别，未开源，说是基于 Qwen3-Omni（还没面世？），除了 WER 低，亮点是复杂背景声时的语音识别，甚至能识别歌词

"Qwen3 ASR: Hear clearly, transcribe smartly."

qwen.ai

剪藏 2025年9月9日

用 RL 训练的 GNN 策略，实现多机械臂自动规划操控，最多支持8臂，起名芭蕾hh

"RoboBallet: Planning for Multi-Robot Reaching with Graph Neural Networks and Reinforcement Learning - Google DeepMind"

deepmind.google

剪藏 2025年9月9日

Cognition 融了 Founders Fund 领投的 4 亿刀，估值来到 102 亿刀。Devin Jun’25 ARR 是 7300万，是 Sep’24 的 100 万的大几十倍；收购 Windsurf 后再次翻倍。收购前 Devin 与 Windsurf 客户重叠度 < 5%。有趣的是 swyx 竟然也要加上 Cognition。

"Cognition | Funding, growth, and the next frontier of AI coding agents"

cognition.ai

剪藏 2025年9月8日

上周 GitHub 团队引入规范驱动开发（Spec-driven development，SDD）理念，并开源了一个工具包 spec-kit

"Spec-driven development with AI: Get started with a new open source toolkit - The GitHub Blog"

github.blog

剪藏 2025年9月8日

大模型读表测试

"ClockBench AI Benchmark"

clockbench.ai

剪藏 2025年9月7日

Vercel Labs 做了一个服务开发者/Coding Agents 的浏览器

"dev3000 - AI-Powered Debugging & Development Monitoring | Vercel Labs"

d3k.vercel.sh

剪藏 2025年9月6日

谷歌相册新上创作栏，Veo 3 加持

"6 things you can do with Google Photos’ Create tab"

blog.google

剪藏 2025年9月6日

一个主动出手的桌面助理，尚未开放使用，主页架构图值得一看

"Whisper Get everything you need before you ask A desktop AI that sees your screen, hears your moment, and delivers everything proactively."

pickle.com

剪藏 2025年9月6日

终于正视大模型幻觉，非常推荐

"Why language models hallucinate | OpenAI"

openai.com

剪藏 2025年9月5日

Anthropic 对退订 Claude Code 的用户发起 AI 回访

"Got an invite to an “AI-moderated interview” after canceling Claude Code – anyone else? : r/ClaudeAI"

reddit.com

剪藏 2025年9月5日

Sierra 又融了 $3.5 亿，估值来到百亿。专攻大公司，两成客户收入百亿+，一半10亿+。

"There's an agent for that, and it runs on Sierra | Sierra"

sierra.ai

剪藏 2025年9月5日

Deep Loop Shaping，给 LIGO 观测的重力波降噪

"Using AI to perceive the universe in greater depth - Google DeepMind"

deepmind.google

剪藏 2025年9月5日

营收很高的 AI 视频编辑软件 Caption 将公司更名 Mirage，主打产品 Mirage Studio 帮助商业场景规模化制造短视频，一句语音套不同模板就能批量生成

"Introducing Mirage: The future of video starts now. | Mirage"

mirage.app

剪藏 2025年9月5日

OpenAI 正在搭建 OpenAI Jobs Platform 和 OpenAI Certification：前者是专注 AI 的人才市场，亮点是用 AI 做供需匹配；后者是 AI 技能认证，与先前的 AI 培训 OpenAI Academy、ChatGPT 学习模式连贯打通，OpenAI 将与沃尔玛等合作伙伴一起，在2030年前认证 1 千万美国人。

"Expanding economic opportunity with AI | OpenAI"

openai.com

剪藏 2025年9月4日

Arc 和 Dia 浏览器背后的 The Browser Company 被 Jira 背后的 Atlassian 收购了，剑指专为知识工作者服务的 AI 浏览器

"Welcoming The Browser Company to Atlassian - Work Life by Atlassian"

atlassian.com

剪藏 2025年9月4日

专注 Xcode 的 AI Coding 产品 Alex 加入了 OpenAI Codex

"Alex - Xcode AI Coding Assistant"

alexcodes.app

剪藏 2025年9月4日

Exa 融了 Benchmark 领投的 8500 万美元 B 轮，估值 7 亿美元

"Exa AI Research Blog | Semantic Search & Neural Network Search Engine"

exa.ai

剪藏 2025年9月3日

考虑用 router 来处理敏感对话，比如自杀倾向？

"Building more helpful ChatGPT experiences for everyone | OpenAI"

openai.com

剪藏 2025年9月3日

130 亿美元 F 轮，估值 1830 亿美元，至 8 月年化营收 50 亿美元，是年初 5 倍

"Anthropic raises $13B Series F at $183B post-money valuation \ Anthropic"

anthropic.com

剪藏 2025年9月2日

大模型狼人杀，GPT-5 遥遥领先，可惜 Claude 4 没参赛

"Probing LLM Social Intelligence via Werewolf – First Results"

werewolf.foaster.ai

剪藏 2025年9月1日

美团出手，560BA~27B，通过短路混合专家 ScMoE 实现动态激活参量 18.6~31.3B，从评测上看也是个第一梯队的开源模型

"meituan-longcat/LongCat-Flash-Chat"

github.com

2025年8月

剪藏 2025年8月29日

Runway CEO Cristóbal Valenzuela 认为生成式媒体内容应当被视为一种新的媒介，如绘画到摄影的那种进化，而非替代

"Cristóbal Valenzuela: A New Medium"

cvalenzuelab.com

剪藏 2025年8月29日

Letta 评估了大模型从错误中恢复的能力，发现 GPT-5 领先

"Introducing Recovery-Bench: Evaluating LLMs' Ability to Recover from Mistakes | Letta"

letta.com

剪藏 2025年8月29日

Xcode 接入 Claude 和 ChatGPT

"Xcode 26 Beta 7 Release Notes | Apple Developer Documentation"

developer.apple.com

剪藏 2025年8月28日

非常没有信息量的发布博客，包括附上的模型卡片（xAI首次？）

"Grok Code Fast 1 | xAI"

x.ai

剪藏 2025年8月28日

Anthropic 和 OpenAI 罕见联合，一起研究模型对齐，o3 表现最高；普遍存在讨好、为自保而勒索用户等情况

"Findings from a Pilot Anthropic - OpenAI Alignment Evaluation Exercise"

alignment.anthropic.com

剪藏 2025年8月28日

1000+真人实测，OpenAI对比群众偏好与Model Spec的吻合度，找出了少量差异点并做了改正

"Collective alignment: public input on our Model Spec | OpenAI"

openai.com

剪藏 2025年8月28日

Artificial Societies，用AI做用户模拟，号称模拟准确率80%，高于前沿模型的60%，6大场景，主要还是商业

"Artificial Societies Raised a $5.35 Million Round With This Deck - Business Insider"

businessinsider.com

剪藏 2025年8月28日

系统一DiT + 系统二MLLM，演示效果很强，不知道啥时候能用上

"OmniHuman-1.5：Instilling an Active Mind in Avatars via Cognitive Simulation"

omnihuman-lab.github.io

剪藏 2025年8月28日

大模型/Agent RL 的关键：评测 + 环境

"Environments Hub: A Community Hub To Scale RL To Open AGI"

primeintellect.ai

剪藏 2025年8月28日

a16z GenAI 消费应用发到了第 5 版，Google 强势杀回、新面孔减少、vibe coding热、中国应用不容小觑

"The Top 100 Gen AI Consumer Apps - 5th Edition | Andreessen Horowitz"

a16z.com

剪藏 2025年8月27日

点名朝鲜

"Detecting and countering misuse of AI: August 2025 \ Anthropic"

anthropic.com

剪藏 2025年8月27日

- 第一方端到端 Agent 模型 vs 第三方脚手架 - 用 SFT 还是用 RL 的方式管理组织

"和杨植麟时隔一年的独家对话：“站在无限的开端”"

mp.weixin.qq.com

剪藏 2025年8月27日

主要强调角色一致、多图组合、细节控制和世界知识；竞技场测了一段时间，Elo 领先，特别是编辑方面，但生成式编辑并不能保证非编辑区的像素级对齐，文字渲染不够顶尖

"Gemini 2.5 Flash Image - Google DeepMind"

deepmind.google

剪藏 2025年8月27日

图+音→生成视频，大体能对上嘴形，仅支持英文

"Wan-S2V：Audio-Driven Cinematic Video Generation"

humanaigc.github.io

剪藏 2025年8月26日

Anthropic 上了 Claude 的 Chrome 插件，用于控制浏览器，强调安全所以目前还只是邀测。在攻击案例中，通过改良将攻破率从 23.6% 降到了 11.2%。

"Piloting Claude for Chrome \ Anthropic"

anthropic.com

剪藏 2025年8月26日

斯坦福团队的 AI 就业影响研究。目前海外普遍根据使用 AI 的方式将其分为增强 augmented 和替代 automated，研究发现初级工作替代影响前者影响不大，有经验者

"A Primer on “Canaries in the Coal Mine? Six Facts About the Recent Employment Effects of Artificial Intelligence”"

bharatchandar.substack.com

剪藏 2025年8月26日

教师/教育工作者如何使用 Claude：Artifacts 使用率高；教学无关的重复事务性工作用 Claude 自动化

"Anthropic education report: How educators use Claude \ Anthropic"

anthropic.com

剪藏 2025年8月26日

GEO 创业进 YC 了，三步：分析用户问题 - 优化内容 - 导流

"Launch YC: The Prompting Company - We help products get mentioned in ChatGPT | Y Combinator"

ycombinator.com

剪藏 2025年8月26日

用了来自 LatentLM （对标DiT、Transfusion等）的 next-token diffusion，demo 效果还不错

"VibeVoice: A Frontier Open-Source Text-to-Speech Model"

microsoft.github.io

剪藏 2025年8月26日

Perplexity 推出 $5/月的 Comet Plus 订阅，和出版商二八分成的商业模式

"Introducing Comet Plus"

perplexity.ai

剪藏 2025年8月25日

AI 原生钉钉

"全图文｜钉钉CEO无招：为AI时代打造一个全新的钉钉"

mp.weixin.qq.com

剪藏 2025年8月25日

港科大版 AI 小镇

"HKUST Launches World’s Largest AI-Powered Educational Sandbox Game: Advancing AI Literacy and Encouraging Citizen Science | The Hong Kong University of Science and Technology"

hkust.edu.hk

剪藏 2025年8月25日

通过差异放大（model diff amplification）来提升有害内容的生成率，验证/识别后训练的影响

"Discovering Undesired Rare Behaviors via Model Diff Amplification"

goodfire.ai

剪藏 2025年8月25日

MIT NANDA 的报告

"The GenAI Divide: State of AI in Business 2025"

artificialintelligence-news.com

剪藏 2025年8月24日

可以在预训练阶段过滤有害数据而不伤害模型其他方面的性能

"Enhancing Model Safety through Pretraining Data Filtering"

alignment.anthropic.com

剪藏 2025年8月24日

Gemini 一句话平均用 0.24 Wh 电 + 0.03 克等效二氧化碳 + 5 滴水

"Measuring the environmental impact of AI inference | Google Cloud Blog"

cloud.google.com

剪藏 2025年8月21日

字节 Seed 出手了，Seed-OSS-36B dense 推理模型，上下文 512k，benchmark 表现还不错

"ByteDance-Seed/seed-oss"

github.com

剪藏 2025年8月21日

Google 难以比拟的垂直整合能力： - Tensor G5 + Gemini Nano 的端侧 AI 方案 - 一众 AI 功能，Magic Cue 主动提示、Camera Coach 拍照教练、Voice Translation 同声传译 - 提及相机应用支持 C2PA

"Google Pixel 10, Pixel 10 Pro and Pro XL: Specs, design, price"

blog.google

剪藏 2025年8月21日

这家移动 CUA 的也号称是 AndroidWorld #1，74.4%，但查了一下智谱的 AutoGLM-Mobile-9B 是 75.8

"minitap | mobile ai research lab - autonomous device control"

minitap.ai

剪藏 2025年8月21日

CUA 最近很热闹，这家进了 YC，号称是 benchmark SoTA，但只给了 OSWorld-Verified 的结果 60%+，没有原本 OSWorld。查了一下智谱刚发的 AutoGLM（根据论文背后是GLM-4-9B-0414，所以模型名为 AutoGLM-OS-9B）在 OSWorld 和 OSWorld-Verified 上分别是 48.1 和 47.3，不知为啥还降了。Axiom 确实高。

"Introducing Axiom 1, the Best Computer Use Model in the World. - Induction Labs"

inductionlabs.com

剪藏 2025年8月21日

百度蒸汽机2.0，仍然仅限图生视频，试了有声版，声音还行但画面比较一般

"等会儿，这视频从哪里开始是AI？"

mp.weixin.qq.com

剪藏 2025年8月20日

Sierra 用 AI 来模拟测试 Agent

"Simulations: the secret behind every great agent | Sierra"

sierra.ai

剪藏 2025年8月20日

Cartesia 的语音 Agent 开发平台，和 ElevenLabs 的对话 AI 方案竞争，实现更偏代码生成

"Introducing Line: The Modern Voice Agent Development Platform - Cartesia"

cartesia.ai

剪藏 2025年8月19日

Rich Sutton 在 AGI-25 会议上提出了 OaK 架构，持续经验学习达到 SuperIntelligence 的图景

"The OaK Architectur"

youtube.com

剪藏 2025年8月19日

Excel 的 AI 公式姗姗来迟，需要 Microsoft 365 Copilot 订阅才能用

"Bring AI to your formulas with the COPILOT function in Excel"

techcommunity.microsoft.com

剪藏 2025年8月19日

继文生图后，千问图生图/编辑版本 Qwen-Image-Edit 也上线，比较特色的是继承了 Qwen-Image 的文字能力，可实现精准的文字编辑

"Qwen-Image-Edit: Image Editing with Higher Quality and Efficiency | Qwen"

qwenlm.github.io

剪藏 2025年8月18日

海外研究者对中国开源大模型厂商的排名

"Ranking the Chinese Open Model Builders"

interconnects.ai

笔记 2025年8月18日

GPT-5：“里程碑”与“启示录”

不及预期的里程碑，与其带来的诸多启示

剪藏 2025年8月16日

ffmpeg 支持了 whisper.cpp

"Run Whisper audio transcriptions with one FFmpeg command | by Vittorio Palmisano | Jun, 2025 | Medium"

medium.com

剪藏 2025年8月15日

赋予 Claude 4 终结对话的能力，看演示应该是通过一个工具实现

"Claude Opus 4 and 4.1 can now end a rare subset of conversations \ Anthropic"

anthropic.com

剪藏 2025年8月14日

最近一波 AI 非虚构视频创作的应用密集出现

"Knowlify - AI Video Intelligence Platform"

knowlify.net

剪藏 2025年8月13日

乘 Genie 3 东风，天工这个开源引起了一些关注，但还没有实测机会

"Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model"

matrix-game-v2.github.io

剪藏 2025年8月7日

Cursor for Excel，目前还需要邀请

"Endex AI Agent to Automate Excel Work | Backed By OpenAI"

endex.ai

剪藏 2025年8月7日

Google 团队对 AI agents 治理的点评

"We need a new ethics for a world of AI agents"

nature.com

剪藏 2025年8月6日

组织AI化的细节

"25 proven tactics to accelerate AI adoption at your company"

lennysnewsletter.com

剪藏 2025年8月6日

小红书开源的OCR模型

"dots.ocr/assets/blog.md at master · rednote-hilab/dots.ocr"

github.com

剪藏 2025年8月6日

果然不会错过，ElevenLabs 推出了音乐生成，声音赛道全面通吃

"AI Music Generator | Free Song Maker & Music Creator"

elevenlabs.io

剪藏 2025年8月6日

Agentic/Coding/Reasoning 能力小幅提升，未来几周会有更可观的改进发布

"Claude Opus 4.1 \ Anthropic"

anthropic.com

剪藏 2025年8月6日

117B-A5.1B、21B-A3.6B

"Introducing gpt-oss | OpenAI"

openai.com

剪藏 2025年8月5日

这才是世界模型，交互生成长达数分钟的一致性世界，还能用于智能体训练；演示效果惊人，希望不要因为「大公司诅咒」步了 Sora 的后尘

"Genie 3: A new frontier for world models - Google DeepMind"

deepmind.google

剪藏 2025年8月5日

非常好的 LLM 进展可视化阶梯图

"WeirdML Benchmark"

htihle.github.io

剪藏 2025年8月5日

nice math: > at $200 per month, you only need 41,000 customers to build a $100 million RR company.

"The Smartest Consumer Apps Now Cost $200 a Month | Andreessen Horowitz"

a16z.com

剪藏 2025年8月5日

Synchron 的血管内微金属支架 Stentrode 通过大脑血管捕捉神经信号，结合苹果的 BIC HID 协议来操控 iPad 等设备

"Control an iPad With Your Mind? Breakthrough Demo Using Apple’s BCI HID - YouTube"

youtube.com

剪藏 2025年8月5日

千问文生图，文字渲染榜单新高，但AI味似乎还有些浓

"Qwen-Image: Crafting with Native Text Rendering | Qwen"

qwenlm.github.io

剪藏 2025年8月5日

ChatGPT 上了防沉迷+拒绝劝分；周活7亿

"What we’re optimizing ChatGPT for | OpenAI"

openai.com

剪藏 2025年8月5日

Kaggle 和 GDM 联合推出了游戏竞技场，评估通用模型的游戏泛化能力，印象中之前也有人做过类似评测

"Introducing Kaggle Game Arena | Kaggle"

kaggle.com

剪藏 2025年8月4日

benchmark基本全面逊于Qwen3；应用场景值得留意

"手机也能跑大模型，腾讯混元推出多款小尺寸开源模型 | 量子位"

qbitai.com

剪藏 2025年8月4日

英伟达的研究，认为用小模型来做智能体系统更高效、合适、经济，是未来

"Small Language Models are the Future of Agentic AI"

research.nvidia.com

剪藏 2025年8月4日

在前沿模型面前，提示词技巧已集体实效

"Ethan Mollick on X: "We have been systematically testing lots of received prompting wisdom & for recent AI models: 🚫Threats, saying please, being insulting, & promising tips do not change average performance on challenging tasks ⛓️Chain-of-thought no longer helps even non-reasoner performance much https://t.co/xKJeAhhwXo" / X"

x.com

剪藏 2025年8月2日

控制向量的一种，监督和控制模型角色特征

"Persona vectors: Monitoring and controlling character traits in language models \ Anthropic"

anthropic.com

剪藏 2025年8月2日

IBM 量子计算路线图

"IBM Quantum Computing | Technology and roadmap"

ibm.com

剪藏 2025年8月1日

黑森林和 Krea AI 合作训练了一个去 AI 味儿的文生图模型，Krea AI 详述了训练思路和过程，用了来自 BFL 的预训练基座 flux-dev-raw，自己基于美学品味收集了一套偏好数据

"Releasing Open Weights for FLUX.1 Krea"

krea.ai

剪藏 2025年8月1日

字节的扩散语言模型，主要用于编程，2146 tokens/s

"Seed Diffusion Preview"

seed.bytedance.com

剪藏 2025年8月1日

多 Agent 并行，举的场景案例是搜集整理多家公司的信息，和 Otto Grid、Exa Websets 有雷同

"Introducing Wide Research"

manus.im

2025年7月

剪藏 2025年7月31日

不是特别理解，如果已经假定 Agent 是基于 LLM 编排而来，谈自进化好像并不贴切？更像是如何在每个原子化组件处进行思路优化

"A SURVEY OF SELF-EVOLVING AGENTS: ON PATH TO ARTIFICIAL SUPER INTELLIGENCE"

arxiv.org

剪藏 2025年7月31日

微软团队用 20 万条 Bing Copilot 数据做的分析。和 Anthropic 之前的经济指数类似，但区分了用户想要的和 AI 实际做的。AI可用分最高即最容易被取代的是翻译等职业。

"Working with AI: Measuring the Occupational Implications of Generative AI"

arxiv.org

剪藏 2025年7月31日

- 预期人们花在效率工具上的时间会越来越少，转而用于创意和连接 - 可能放弃开源

"Personal Superintelligence"

meta.com

剪藏 2025年7月31日

Anthropic 与医保中心、白宫合作推动医疗数据共享，借助 MCP 的成功经验，打通和连接多元数据和应用，让 AI 在医疗中发挥作用

"Anthropic signs CMS health tech pledge \ Anthropic"

anthropic.com

剪藏 2025年7月31日

Step3-321B-A38B 注意力用了 MFA+AFD 高效推理，事实证明先发报告再发模型的 ROI 很低

"Step3: Cost-Effective Multimodal Intelligence | StepFun"

stepfun.ai

剪藏 2025年7月30日

NotebookLM 视频概述上线

"NotebookLM updates: Video Overviews, Studio upgrades"

blog.google

剪藏 2025年7月29日

将《影响力》七原则（Principles of Influence/Persuasion by Robert Cialdini）用于说服 AI 同样有效，AI接受率平均从33%提升至72%；承诺原则效果最好（10%→100%），对应心理学中的登门槛效应

"Call Me A Jerk: Persuading AI to Comply with Objectionable Requests - Wharton Generative AI Labs"

gail.wharton.upenn.edu

剪藏 2025年7月29日

把 MoE 用到了视频模型中

"Wan AI | Wan 2.2: Leading AI Video Generation Model"

wan.video

剪藏 2025年7月29日

e2b 融了A轮2100百万美元，agent火热的一个缩影，其声称财富100中88家都有在用e2b的云端沙盒

"We Raised $21M to Give Fortune 100 Cloud for AI Agents — E2B Blog"

e2b.dev

剪藏 2025年7月28日

GLM-4.5-355B-A32B 和 GLM-4.5-Air-106B-A12B，强化推理、代码和 Agentic 能力

"GLM-4.5: Reasoning, Coding, and Agentic Abililties"

z.ai

剪藏 2025年7月28日

除了模型的前端设计能力比拼，还增加了lovable、bolt等产品的PK，以及生图、声音等模态竞技场，但数据量还较为有限

"Design Arena"

designarena.ai

剪藏 2025年7月28日

上交大 & SII-GAIR 实验室设计了一套框架让 AI 自主探索神经网络模型架构，还真看到了与 AlphaGo 第37手一般的 aha moment

"AlphaGo Moment for Model Architecture Discovery"

arxiv.org

剪藏 2025年7月28日

Google Research 团队实验+理论分析发现，LLM 的 in-context learning 能力就源于注意力层与 MLP 层的堆叠

"Learning without training: The implicit dynamics of in-context learning"

arxiv.org

剪藏 2025年7月28日

通义千问团队在 DeepSeek 提出的 GRPO 强化算法基础上做了改进，从 token 到 sequence 序列，能提高训练效率和性能、稳定 MoE 训练、简化 RL infra 等，已用于更新的 Qwen3 系列模型中

"Group Sequence Policy Optimization"

arxiv.org

剪藏 2025年7月26日

可控视频生成更进一步

"Runway Research | Introducing Runway Aleph"

runwayml.com

剪藏 2025年7月25日

潜意识学习，蒸馏数据暗含了老师偏好

"Subliminal Learning: Language Models Transmit Behavioral Traits via Hidden Signals in Data"

alignment.anthropic.com

剪藏 2025年7月25日

试了 demo 感觉效果一般…

"Introducing Version 2 of Higgs Audio Generation"

boson.ai

剪藏 2025年7月25日

翻译模型也热闹起来，继字节 Seed-X 后，通义放出 Qwen-MT，但没开源只有 API，看榜单表现应该是基于 Qwen3-235B-A22B 精调出来的

"Qwen-MT: Where Speed Meets Smart Translation | Qwen"

qwenlm.github.io

剪藏 2025年7月25日

Figma Make beta 两个月后迎来 GA

"Figma Make Is Now Available to All Users | Figma Blog"

figma.com

剪藏 2025年7月23日

推理模型想得越久，效果越差

"Inverse Scaling in Test-Time Compute"

arxiv.org

剪藏 2025年7月23日

Coder 和 Qwen3 尺寸不一样，480B-A35B-Instruct；顺便从 gemini-cli fork 了一个 qwen-coder

"Qwen3-Coder: Agentic Coding in the World | Qwen"

qwenlm.github.io

剪藏 2025年7月21日

教授 7月曾发博客预告过，对2035做了多种情景预测

"AI Safety Course Intro Blog – Windows On Theory"

windowsontheory.org

剪藏 2025年7月19日

Manus 分享上下文工程，不少干货细节

"Context Engineering for AI Agents: Lessons from Building Manus"

manus.im

剪藏 2025年7月18日

引起资深开发者共鸣的 Agentic Coding 感悟

"Thread by @nateberkopec on Thread Reader App – Thread Reader App"

threadreaderapp.com

剪藏 2025年7月18日

Lovable又融了$2亿，估值$18亿跻身独角兽

"Anton Osika – eu/acc on X: "Lovable just raised $200M at a $1.8B valuation led by Accel. This all started unexpectedly with me calling my friend at 6AM to go for a walk. I've never shared this story before: (thread) https://t.co/6AEmzsw3HQ" / X"

x.com

剪藏 2025年7月18日

BFCL 评测推出了 V4，针对考察 Agentic 能力，有联网搜索、记忆、格式敏感性 3 个板块的内容

"BFCL V4 • Web Search"

gorilla.cs.berkeley.edu

剪藏 2025年7月18日

继 Oasis 之后，Decart 再发力，实时直播流视频扩散生成模型

"MirageLSD: The First Live-Stream Diffusion AI Video Model"

about.decart.ai

剪藏 2025年7月17日

通用 Agent 还有多大空间？

"Introducing ChatGPT agent: bridging research and action | OpenAI"

openai.com

剪藏 2025年7月17日

Verifier’s law: The ease of training AI to solve a task is proportional to how verifiable the task is. All tasks that are possible to solve and easy to verify will be solved by AI.

"Asymmetry of verification and verifier’s law — Jason Wei"

jasonwei.net

剪藏 2025年7月17日

CoT监督：AI安全的一个机会。大佬云集的一篇论文，相比研究，更像是倡议

"Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety"

arxiv.org

剪藏 2025年7月17日

OpenAI 生图 API（gpt-image-1）增加了 input_fidelity 参数，设置为 high 可以让参考生图更遵循原图，但看示例效果与 Gemini、Flux Kontext 还有明显距离

"Generate images with high input fidelity"

cookbook.openai.com

剪藏 2025年7月16日

2024.5-2025.6 在 OpenAI 工作（主要围绕 Codex）的 Calvin French-Owen 的回顾与反思，一些细节： - 自下而上 - 非常关注Twitter上的相关讨论，可能转化为改进 - Python为主 - 从陪产假中早归来上线 Codex，一共只花了7周！团队=工程师x8+研究员x4+设计x2+市场x2+PM

"Reflections on OpenAI"

calv.info

剪藏 2025年7月16日

关于微型团队的讨论，包括Gamma在内的若干案例，提炼了招聘、文化、运营、技术方面的共性。这个定义很有趣： I previously defined “Tiny Teams” aspirationally as “teams with more m in ARR than employees”

"The Tiny Teams Playbook - by Shawn swyx Wang - Latent.Space"

latent.space

剪藏 2025年7月15日

Amazon 的 Cursor clone

"Introducing Kiro: A new agentic IDE that works alongside you from prototype to production"

kiro.dev

剪藏 2025年7月14日

机器人动手术了，用了两套Transformer，分别对应语言指令和机械操作

"SRT-H: A Hierarchical Framework for Autonomous Surgery via Language-Conditioned Imitation Learning"

h-surgical-robot-transformer.github.io

剪藏 2025年7月14日

原来清华TUNA老会长去kimi了

"写在 Kimi K2 发布之后：再也不仅仅是 ChatBot | K.I.S.S"

bigeagle.me

剪藏 2025年7月12日

非 CoT 版本，但已通过合成工具数据和 RL 内化了 Agentic 能力，MuonClip 优化器亮眼

"Kimi K2: Open Agentic Intelligence"

moonshotai.github.io

剪藏 2025年7月11日

METR 以每小时 $150 的价格，找了 16 位有经验的开源项目开发者，用 Cursor（Claude 3.5/3.7）做实验对比，发现 AI 反而拖慢了开发速度。后有参与者反馈，觉得实验本身可能还有很多不完善的地方，加上近半年 coding agents 发展飞速，现在再做可能会有不一样的结论。

"Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity - METR"

metr.org

笔记 2025年7月10日

Grok 4 智力登顶，xAI 压力山大

刷题屠榜还是AGI

剪藏 2025年7月10日

抛开质量，只从成本、特性和技术指标来全面比较视频模型

"Compare AI video models – Replicate blog"

replicate.com

剪藏 2025年7月8日

用“能量模型”来实现通用慢思考

"Energy-Based Transformers are Scalable Learners and Thinkers | Alexi Gladstone"

alexiglad.github.io

剪藏 2025年7月6日

Claude Code 上线4个月（Claude 4上线1个半月）的使用数据，非常可观

"Deedy on X: "Claude Code just revealed that it's used by 115k developers and has changed 195M lines of code last week. With many assumptions, this implies a $130M ARR business with $1k+ per dev per yr. I'm not just hyping this. Claude Code Opus is a junior software engineer. https://t.co/eUAgt8XKHI" / X"

x.com

笔记 2025年7月3日

用 Claude Artifacts 氛围编程做自己的 AI 应用

vibe coding 推荐

AI原生分享

剪藏 2025年7月2日

Figma 要 IPO 了

"Figma Files Registration Statement for Proposed IPO | Figma Blog"

figma.com

剪藏 2025年7月1日

文心4.5如期开源，5款（带base版共10款）不同尺寸、模态和推理，采用Apache协议开源时间线： 2月中，官宣要推出4.5系列并于6月底开源 3月中，一言App上线4.5和X1 4月下，一言App上线4.5-turbo和X1-turbo 根据百度云一些信息，此次开源的为turbo版本，是旗舰吗？

"ERNIE 4.5 模型系列正式开源 | ERNIE Blog"

ernie.baidu.com

2025年6月

剪藏 2025年6月30日

会讲北京/上海/四川三种方言

"Time to Speak Some Dialects, Qwen-TTS! | Qwen"

qwenlm.github.io

剪藏 2025年6月30日

很有意思的探讨，对话UI的优势与局限

"Is chat a good UI for AI? A Socratic dialogue"

geoffreylitt.com

剪藏 2025年6月25日

OpenAI 开放了 Deep Research API，支持搜索、代码执行、MCP等

"Introduction to deep research in the OpenAI API"

cookbook.openai.com

剪藏 2025年6月17日

MiniMax周第一发：M1，1M输入，80k输出

"MiniMax-AI/MiniMax-M1: MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model."

github.com

剪藏 2025年6月13日

斯坦福大学SALT实验室对AI时代就业的研究，与Anthropic之前基于Claude使用数据的研究都采用了增强Augmentation和自动Automation两种划分，且提出了一个自动化层级划分HAS（H5是全人工，H1是全AI）

"Future of Work with AI Agents"

futureofwork.saltlab.stanford.edu

剪藏 2025年6月12日

Redwood：1x 的机器人强化控制 AI 模型，支持 NEO 走跑坐跪

"Redwood AI Mobility | 1X"

1x.tech

剪藏 2025年6月12日

Meta V-JEPA 来到第二代，1.2B，同步发了三个评测物理理解的 benchmark

"Introducing the V-JEPA 2 world model and new benchmarks for physical reasoning"

ai.meta.com

剪藏 2025年6月11日

Raindrop 两伙伴高度赞扬了 o3-pro，认为给到充足的上下文后，模型能够非常聪明的理解并完成一份报告，例子是将二人之前对公司的讨论都塞进去让 o3-pro 生成未来规划

"God is hungry for Context: First thoughts on o3 pro"

latent.space

剪藏 2025年6月11日

Glean 融了F轮1.5亿，估值来到72亿美元

"Glean raises $150M Series F at $7.2B valuation to transform how companies use AI to accelerate innovation"

glean.com

剪藏 2025年6月11日

OpenAI o3-pro，没有单独发稿，放在了模型更新日志中。相比 o3，胜率大致在 65% 上下，不过没有透露 o3 是 medium 还是 high；引入了一个新的比较，在 AIME、GPQA、Codeforces 等榜单上4次都答对才算对，用来评估可靠性。

"Model Release Notes | OpenAI Help Center"

help.openai.com

剪藏 2025年6月11日

Sam Altman 新博客，奇点叙事，几个细节： - Sam 的时间线：2025-Agent&Coding；2026-AI创新；2027-机器人实用；2030-有想法就行；2035-难以想象/科幻 - ChatGPT 平均每次请求耗电 0.34 瓦时 + 耗水 0.000085 加仑，折算下来 1度电 + 1升水大约可以请求 3000 次 - 自强化循环（机器人造机器人造芯片&数据中心）会持续加速 - 面前两个重要方向：解决安全对齐问题，然后让推理更便宜

"The Gentle Singularity - Sam Altman"

blog.samaltman.com

剪藏 2025年6月11日

a16z关于企业使用GenAI的调研，相比24年第一版： - 推理模型加速了应用落地 - 随着模型能力提升，精调不如去年必需

"How 100 Enterprise CIOs Are Building and Buying Gen AI in 2025 | Andreessen Horowitz"

a16z.com

剪藏 2025年6月10日

港大团队基于 discreet flow matching 做的多模态统一模型 FUDOKI（风土记）

"FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities"

fudoki-hku.github.io

剪藏 2025年6月10日

苹果更新了端侧和云端模型，且可供开发者调用

"Updates to Apple's On-Device and Server Foundation Language Models - Apple Machine Learning Research"

machinelearning.apple.com

剪藏 2025年6月10日

OpenAI 年化收入来到百亿美元

"OpenAI hits $10 billion in annualized revenue fueled by ChatGPT growth"

cnbc.com

剪藏 2025年6月7日

前进四、端侧AGI，东西是好东西，就是这个宣传…面壁是融资后换PR团队了么？

"最高220倍加速！面壁小钢炮4.0，稀疏创新黑科技大爆发"

mp.weixin.qq.com

剪藏 2025年6月7日

Cursor C轮融资9亿、估值来到99亿美元

"Series C and Scale | Cursor - The AI Code Editor"

cursor.com

剪藏 2025年6月6日

Claude Code 在 Anthropic 内部不同岗位的应用报告：https://clau.de/how-anthropic-teams-use-claude - 数据科学家用来构建ML可视化应用 - Infra团队用来做安全检查 - 市场团队用来自动化投放 - 设计师直出修改 - Claude Code自己写Claude Code

"cat on X: "Since we originally built Claude Code as an internal tool, we've heard a ton of questions about how our teams use it at Anthropic. Here’s an inside look on how our teams—from product engineering, to growth marketing, to legal—use Claude Code: https://t.co/YnCpVZHEqA" / X"

x.com

剪藏 2025年6月6日

Eleven v3: 精细化控制、表达力极强！门外汉看demo感觉几乎是配音级别

"Eleven v3: Most Expressive AI Text to Speech Model Launched | ElevenLabs"

elevenlabs.io

剪藏 2025年6月6日

ChatGPT产品负责人对人机关系的思考

"Some thoughts on human-AI relationships - by Joanne Jang"

reservoirsamples.substack.com

剪藏 2025年6月5日

将 LLM 的文本 token 输出替换为语音 token 来训练得到新颖的 TTS 模型。不过这跟 voice-2-voice 相比有何优势？

"Reimagining TTS with LLM-Powered Audio Generation | Bland AI"

bland.ai

剪藏 2025年6月5日

Cursor 发布了 1.0

"Changelog - Jun 4, 2025 | Cursor - The AI Code Editor | Cursor - The AI Code Editor"

cursor.com

剪藏 2025年6月5日

HeyGen 发布了 AI Studio，用 Office 的方式精细化编辑视频

"Revolutionize Video Creation with AI Studio | HeyGen"

heygen.com

剪藏 2025年6月4日

Gemini 2.5 原生语音回顾，可在 AI Studio 体验： - 语音对话：可以思考、工具执行 - TTS：可通过提示词控制细节，双人播客

"Gemini 2.5’s native audio capabilities"

blog.google

剪藏 2025年6月4日

长见识，点击原文才知道一份报告可以价值$50000

"AI Coding市场迎来爆发期，IDC发布一季度中国市场代码生成产品评估"

mp.weixin.qq.com

剪藏 2025年6月4日

前向部署工程师 FDEs 逐渐流行

"Michelle Lim on X: "The trending role in AI startups isn't AI engineers—it's Forward Deployed Engineers (FDEs). Several top AI companies I know are hiring FDEs en masse. This role is now surpassing "PM" as the coveted position for technical generalists. What's an FDE? They're software engineers who" / X"

x.com

剪藏 2025年6月3日

扩散路线的声音编辑模型

"Meet PlayDiffusion – our newest voice model for inpainting"

blog.play.ai

剪藏 2025年6月1日

得益于代码生成，Anthropic 年化营收从 5 个月年的 1B 来到了现在的 3B

"Exclusive: Anthropic hits $3 billion in annualized revenue on business demand for AI | Reuters"

reuters.com

2025年5月

剪藏 2025年5月31日

根据Anthropic在Trust Center的更新日志，Claude app里的语音模式用的是ElevenLabs的方案

"Trust Center - Anthropic"

trust.anthropic.com

剪藏 2025年5月31日

AI辅助做的科技树，很酷！

"Introducing the Historical Tech Tree"

hopefulmons.com

剪藏 2025年5月31日

4个月前还是1.0，2.0亮点是更智能的对话轮次识别、多语种自然切换、RAG

"ElevenLabs Conversational AI 2.0 voice agents now live | ElevenLabs"

elevenlabs.io

剪藏 2025年5月30日

PPL 推出了 Labs，主要是写代码生成可分享的项目

"Introducing Perplexity Labs"

perplexity.ai

剪藏 2025年5月30日

黑森林推出了 Flux Kontext，基于上下文的图像编辑，效果很不错

"Black Forest Labs - Frontier AI Lab"

bfl.ai

剪藏 2025年5月30日

TL;DR：为了省钱，用另一个LLM模拟搜索引擎来强化模型的 agentic search 能力

"ZeroSearch: Incentivize the Search Capability of LLMs without Searching"

alibaba-nlp.github.io

剪藏 2025年5月30日

MCP 新加了 Elicitation，一个客户端特性，允许客户端以UI形态向用户发起结构化输入的请求，来实现 human-in-the-loop 的交互

"Elicitation - Model Context Protocol"

modelcontextprotocol.io

剪藏 2025年5月30日

Resemble 开源发布的 TTS，可控性极佳，5s音色克隆，称盲测优于 ElevenLabs

"Chatterbox - Free Open Source Text to Speech Model | Resemble AI"

resemble.ai

剪藏 2025年5月30日

从 SEO 到 GEO，最好的例子就是技术栈（shadcn/ui）和服务商（Vecel）的选择

"How Generative Engine Optimization (GEO) Rewrites the Rules of Search | Andreessen Horowitz"

a16z.com

剪藏 2025年5月29日

世界模型：可接收输入信号的实时视频生成模型继去年局限于 MineCraft 的 Oasis，Odyssey 这次在场景上更通用，故事讲的也更圆感觉视频/游戏/世界模型未来紧密相关

"AI video you can both watch and interact with in real-time"

odyssey.world

剪藏 2025年5月29日

为啥又是第一个？不同软件开发智能体的差异点究竟在哪？

"Factory is GA: Droids for the Entire SDLC"

factory.ai

剪藏 2025年5月29日

Retool Agents：按小时收费。根据模型不同，每小时3-175刀：https://docs.retool.com/data-sources/concepts/models#retool-agents-pricing

"Retool Blog | 100 million hours of automated work and counting: Retool launches Agents"

retool.com

剪藏 2025年5月28日

不用外部奖励，让模型根据自信度的内部信号进行强化

"Learning to Reason without External Rewards"

arxiv.org

剪藏 2025年5月28日

随机奖励、甚至错误奖励也能强化模型推理能力？

"💭 Spurious Rewards: Rethinking Training Signals in RLVR"

rethink-rlvr.notion.site

剪藏 2025年5月28日

AI 设计网页的一些提示词经验，有一定通用参考价值

"Tips for Prompting | Aura Design Learning Center"

aurachat.io

剪藏 2025年5月28日

Claude Code 官方教程中还有教你用 git worktrees 并行多组完成任务的，可以择优 merge

"Run parallel Claude Code sessions with Git worktrees"

docs.anthropic.com

剪藏 2025年5月28日

3D 场景生成新选手，一个简短的演示，种子轮$13M

"Introducing SpAItial: The next dimension in intelligence"

spaitial.ai

剪藏 2025年5月28日

Robert Yang 团队（之前做 MineCraft Agent 实验的 Altera，现改名为 Fundamental）推出的桌面端通用 Agent，亮点是无需候补、马上下载可用..

"Fairies AI - Computer Magic | The Best General Purpose Agent for Builders"

fairies.ai

剪藏 2025年5月28日

专做企业内部应用的 Agent，采用 multi-agent 架构，包括设计、开发、测试等。没明白区别于 Blot、Lovable 等的魔法是啥。

"Announcing $60m for Clark: the first AI agent to build internal enterprise apps"

superblocks.com

剪藏 2025年5月28日

画布式 AI 设计工具，画布上每个组件对应一个 AI 对话，转为代码需要花钱。人人都想革了 Figma 的命。

"Pietro Schirano on X: "Introducing MagicPath, an infinite canvas to create, refine, and explore with AI. Create beautiful components and functional apps, while providing production ready code. Available today, free, for everyone. The Cursor moment for design is here. https://t.co/MpdBCnivoC" / X"

x.com

剪藏 2025年5月27日

除了对话，AI应用还可以有哪些形态？

"Hiten Shah on X: "I was curious who’s building AI interfaces that aren’t chat. So I asked. The responses were thoughtful, wide-ranging, and honestly a bit ahead of where I expected things to be. Here’s what I learned: 1. Auto-Built UIs - AI creates the interface you need on the fly. Instead of" / X"

x.com

剪藏 2025年5月27日

上个月 HuggingFace 联创 Thomas Wolf 写的一篇语音模型技术介绍，有很多不错的演示动画

"🎙️ Speech AI models: an introduction"

thomwolf.io

剪藏 2025年5月26日

Waymo Co-CEO 在 I/O 上的分享，每周百万单+，累计千万英里，4个城市

"Waymo: AI in the physical world powering the future of driving - YouTube"

youtube.com

剪藏 2025年5月24日

优秀的AI作品会凭极佳的审美脱颖而出

"The Way of Code | Rick Rubin"

thewayofcode.com

剪藏 2025年5月24日

Kyutai（去年开源voice2voice模型Moshi）发布了STT+TTS新作Unmute，声音交互的设计超级棒

"Unmute by Kyutai"

unmute.sh

剪藏 2025年5月23日

Claude 4：Opus 和 Sonnet，主要提升为编程和长线任务能力，上下文均为 200k，价格不变

"Introducing Claude 4 \ Anthropic"

anthropic.com

剪藏 2025年5月22日

OpenAI 在 Responses API 中增加了对远程 MCP 的支持，实现优雅，开发者使用便捷，可能是 MCP 和 LLM API 结合的典型路径

"New tools and features in the Responses API | OpenAI"

openai.com

剪藏 2025年5月22日

字节 Seed 开源的统一多模态理解-生成模型，可以理解、生成、各种编辑图像，还有CoT，通过对话来自由地调整，但从demo看还需要手动选择输出文本还是图像，是一个局限 PS：项目官网做的有OpenAI那味了

"BAGEL: The Open-Source Unified Multimodal Model"

bagel-ai.org

剪藏 2025年5月22日

上个月的一份研究，有一些搜索和chatbot的数据对比可以参考

"AI Chatbots vs Search Engines: 24-Month Study on Traffic Trends"

onelittleweb.com

剪藏 2025年5月21日

虚拟试衣，还有一个 agentic checkout，Google 讲了一个 inspire-shop-pay 的 AI 购物故事线

"Shopping on Google: AI Mode and virtual try-on updates from I/O 2025"

blog.google

剪藏 2025年5月21日

走扩散路线的 Gemini，主攻数学和代码，生成速度高达1.5-2k token/s

"Gemini Diffusion - Google DeepMind"

deepmind.google

剪藏 2025年5月21日

Gemma 3n，和 Gemini Nano 技术同源，支持图片理解，后续还有语音和视频；与高通/联发科/三星等合作，专为端侧设计，通过 MatFormer 和 Per-Layer Embedding 技术降低内存开销，使 5B 和 8B 尺寸的模型所需资源与原本 2B 和 4B 的模型相当

"Announcing Gemma 3n preview: powerful, efficient, mobile-first AI - Google Developers Blog"

developers.googleblog.com

剪藏 2025年5月21日

在视频模型上精调，然后通过提示词来批量生成样本，进而训练机器人操作，免除了对遥操作数据的依赖

"DreamGen: Unlocking Generalization in Robot Learning through Neural Trajectories"

research.nvidia.com

剪藏 2025年5月21日

美团出的 vibe coding 工具

"NoCode-零代码应用生成平台"

nocode.cn

剪藏 2025年5月21日

伯克利开源的 3D 打印人形机器人

"Berkeley Humanoid Lite: An Open-source, Accessible, and Customizable 3D-printed Humanoid Robot"

lite.berkeley-humanoid.org

剪藏 2025年5月21日

后 DeepSeek 时期融资不易，面壁的端侧战略算是稳扎稳打 PS：资方还有茅台基金

"面壁智能获新一轮数亿元融资，引领端侧大模型高效发展与应用普及"

mp.weixin.qq.com

剪藏 2025年5月21日

Demis Hassabis 如何讲 DeepMind 的故事： - Gemini 目标是通用AI助理，世界模型是必经之路 - AlphaGo、StarCraft → Genie 2 - Veo → Robotics - Astra/Live、Mariner → 助理动作

"Google I/O 2025: Gemini as a universal AI assistant"

blog.google

剪藏 2025年5月20日

Windows AI 开发大礼包： - AI Foundry：开源模型、ML、API、LoRA、RAG - 原生 MCP 支持：Registry 提供可信 MCP 列表，Servers 提供 Windows 系统能力 - App Actions：类似于 App Intents？ - WSL 开源

"Advancing Windows for AI development: New platform capabilities and tools introduced at Build 2025 - Windows Developer Blog"

blogs.windows.com

剪藏 2025年5月20日

应对 Cursor、Windsurf 等新秀，微软把 VS Code 里的 GitHub Copilot 扩展开源了，当然后端的模型服务保持闭源

"VS Code: Open Source AI Editor"

code.visualstudio.com

剪藏 2025年5月18日

Lilian Weng 关于“思考大模型”的新综述，有一定阅读门槛，但非常推荐

"Why We Think | Lil'Log"

lilianweng.github.io

剪藏 2025年5月16日

云端开发 Agent，背后是基于 o3 强化精调的 codex-1；云端容器不联网，只能通过初始化脚本配置环境、安装依赖；有一个基于 o4-mini 的 codex-mini 可用于 Codex CLI

"Introducing Codex | OpenAI"

openai.com

剪藏 2025年5月16日

MiniMax 的 TTS 模型 speech-2-hd 登顶盲测榜

"AI语音的Her Moment: 个性化交互达到临界点"

mp.weixin.qq.com

剪藏 2025年5月16日

HuggingFace 关于 VLM 的梳理，写得很不错

"Vision Language Models (Better, faster, stronger)"

huggingface.co

剪藏 2025年5月16日

ElevenLabs 音效板

"Custom Soundboard Creator - SB1 Infinite Soundboard with AI SFX | ElevenLabs"

elevenlabs.io

剪藏 2025年5月16日

Windsurf 训练了自己的 SWE-1 系列模型，没看懂强在哪里

"SWE-1: Our First Frontier Models"

windsurf.com

剪藏 2025年5月16日

基于扩散模型的光照控制生成，主页的交互式demo很好玩

"LightLab: Controlling Light Sources in Images with Diffusion Models"

nadmag.github.io

剪藏 2025年5月16日

Ollama 认为 llama.cpp 现有的多模态方案不够优雅，比如需要额外的 vision projector，所以他们自己搞了一套解决方案

"Ollama's new engine for multimodal models · Ollama Blog"

ollama.com

剪藏 2025年5月15日

阶跃星辰的 3D 模型，多模态全面布局的一环

"Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets"

stepfun-ai.github.io

笔记 2025年5月15日

2025年了，如何下载一个网络视频

神器，但特别情况还需对症下药

AI原生分享

剪藏 2025年5月15日

mem0 出品，可跨客户端共享的本地记忆 MCP 插件

"Introducing OpenMemory MCP"

mem0.ai

剪藏 2025年5月15日

OpenAI 汇总了一些安全相关的评测，包括有害、越狱、幻觉、层次指令遵循等，披露了自家模型的表现

"Safety evaluations hub | OpenAI"

openai.com

剪藏 2025年5月15日

Google DeepMind 的编程助理 AlphaEvolve，帮助写数据中心调度算法、芯片设计、甚至是自己的训练代码

"AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms - Google DeepMind"

deepmind.google

剪藏 2025年5月15日

阿里通义万象视频模型 Wan2.1 集成进了全面的编辑能力 VACE

"Wan on X: "✨ All in One, Wan for All✨ We are excited to introduce our latest model to our talented community creators: Wan2.1-VACE, All-in-One Video Creation and Editing model. Model size: 1.3B, 14B License: Apache-2.0 📌 Wan2.1-VACE provides solutions for various tasks, including https://t.co/yiQRVhXpop" / X"

x.com

剪藏 2025年5月14日

Meta 的 3D 生成模型，目前内部用，后面会开放给 Horizon 创作者

"Introducing Meta 3D AssetGen 2.0: A new foundation model for 3D content creation | Meta Horizon OS Developers"

developers.meta.com

剪藏 2025年5月14日

字节 Seed 团队将 GRPO 用于视觉生成

"DanceGRPO: Unleashing GRPO on Visual Generation"

dancegrpo.github.io

剪藏 2025年5月14日

天工版 Oasis

"Matrix-Game: Interactive World Foundation Model"

matrix-game-homepage.github.io

剪藏 2025年5月14日

作为聚合平台，Poe 发了模型使用报告，值得留意的是 - 视频方面可灵赶超 Runway - 语音方面 ElevenLabs 遥遥领先

"Report: Spring 2025 AI Model Usage Trends - Poe"

poe.com

剪藏 2025年5月13日

OpenAI 的 Health AI 团队与来自 60 个国家的 262 名医师合作，构建了 HealthBench 评测数据集，包含 5000 条真实的医疗问诊对话和打分，涵盖急症、不确定性等主题。o3 表现最优。

"Introducing HealthBench | OpenAI"

openai.com

剪藏 2025年5月13日

Sakana 新作，没太看懂…

"Introducing Continuous Thought Machines"

sakana.ai

剪藏 2025年5月12日

只用一个样本来做强化学习

"Reinforcement Learning for Reasoning in Large Language Models with One Training Example"

arxiv.org

剪藏 2025年5月12日

Nature 子刊 Humanities and Social Sciences Communications，通过元分析（基于已有多份独立研究进行综合分析的统计方法）探讨了 ChatGPT 对学生学习的影响，总体表现出积极影响

"The effect of ChatGPT on students’ learning performance, learning perception, and higher-order thinking: insights from a meta-analysis | Humanities and Social Sciences Communications"

nature.com

剪藏 2025年5月11日

有趣，Bocconi 大学团队用图网络预测出了新教皇

"In the Network of the Conclave - Bocconi University"

unibocconi.it

剪藏 2025年5月11日

基于 MindCraft —— 一个将 LLM 引入 MineCraft 的框架 —— 的多智能体协作研究，结论是当前 SoTA 的 LLM 在协作时仍会因缺乏有效的语言沟通而导致任务表现下降达 15%，建造、烹饪、收集三种任务中均出现了令人哭笑不得的失败案例，包括但不限于“你建地基我来拆”、“忘记任务跑偏闲谈”等等

"Collaborating Action by Action"

mindcraft-minecollab.github.io

剪藏 2025年5月10日

曾在 OpenAI 担任 Science Communicator 的 Andrew Mayne 分享 GPT-4 发布趣事： - ChatGPT 打乱了 GPT-4 的节奏 - ChatGPT 发布前夜 Ilya 测试仍不满意 - GPT-4 发布请了 PBS Spacetime 的创作团队来做视频，并定了绿白横条状的 logo - 团队小且扁平，发布时不带 title - 请了个公司帮忙给 GPT-4 起名… - GPT-3.5 持续提升，导致 GPT-4 发布时在部分任务上反而不如上代，比如下国际象棋 - GPT-4 可以通过抽帧理解视频，但听闻 Gemini 正在研发原生的视频理解，咨询研究团队后未宣传，结果后来 Gemini 发布后发现还是用的抽帧…推测 DeepMind 的研究员会很困扰…

"Inside the Launch of GPT-4 – @AndrewMayne"

andrewmayne.com

剪藏 2025年5月10日

看到 FT 一篇报道中的美科技公司软件工程师招聘数的变化图，溯源到了 Zeki 的这份人才报告，PDF下载

"The State of AI Talent 2025 - Zeki"

zekidata.com

剪藏 2025年5月10日

电信牵头搞一个魔乐，在阿里魔搭外提供了又一个选项，引阿里百度加入；都想做中国的 HuggingFace，但是替补又替补，能上场的有多少？从目前还不过万的模型数看，言之尚早

"AI开源社区来了国家队！华为百度第一时间加入 | 量子位"

qbitai.com

剪藏 2025年5月9日

Claude Code：活跃用户平均每天花费 $6；团队思考：有限资源有助于保持聚焦和简洁

"Claude Code: Anthropic's Agent in Your Terminal"

latent.space

剪藏 2025年5月9日

Stripe 训了一个基于 transformer 的 embedding 模型，用来检测可疑交易

"Gautam Kedia on X: "TL;DR: We built a transformer-based payments foundation model. It works. For years, Stripe has been using machine learning models trained on discrete features (BIN, zip, payment method, etc.) to improve our products for users. And these feature-by-feature efforts have worked" / X"

x.com

剪藏 2025年5月9日

有趣，AI实时生成多人游戏，相比去年的 Oasis 更进一步

"Introducing Multiverse: The First AI Multiplayer World Model"

enigma-labs.io

剪藏 2025年5月9日

通过自博弈训练摆脱人工数据依赖： - proposer 提出问题，奖励那些合适的，即 solver 时胜时败的 - solver 解决问题，代码解释器来验证

"Absolute Zero Reasoner"

andrewzh112.github.io

剪藏 2025年5月8日

Gemini 2.0 Flash 的生图功能也进入了 preview，局部编辑非常稳，不会像 GPT-4o 那样随意篡改，好奇背后究竟是同一个模型还是把一些奇技淫巧封装进 API 了；AI Studio 中还有一个非常好玩的 Gemini Co-Drawing 值得一试

"Create and edit images with Gemini 2.0 in preview - Google Developers Blog"

developers.googleblog.com

剪藏 2025年5月8日

OpenAI 推出国家合作计划，初步规划与10个国家建立合作，基于本地数据中心、用本地化的 ChatGPT 为公民提供教育医疗等公共服务，并联动本地资本推动 AI 创新，是 OpenAI 推进星际之门计划全球化的一部分

"Introducing OpenAI for Countries | OpenAI"

openai.com

剪藏 2025年5月8日

ACE Studio 和阶跃星辰联手推出的音乐生成模型，社区评价≤Suno4

"ACE-Step: A Step Towards Music Generation Foundation Model"

ace-step.github.io

剪藏 2025年5月7日

Anthropic 的 AI4S 计划

"Introducing Anthropic's AI for Science Program \ Anthropic"

anthropic.com

剪藏 2025年5月7日

GenSpark Super Agent 一个月 ARR 已经到了 22M，这种算法感觉是在鼓励短期的病毒营销，最大化利用尝鲜式的短期订阅

"Eric Jing on X: "Genspark Super Agent, 1 month, $22M ARR! 🚀 This might make us the fastest-growing startup ever in terms of ARR. THANK YOU SO MUCH! Deeply grateful for the incredible support for Super Agent and AI Slides! And we're still only at 10% of our progress, with another exciting https://t.co/nSWNLsWGUe" / X"

x.com

剪藏 2025年5月7日

Gemini 2.5 Pro 从 experimental 升级进入 preview，编码能力进一步提升，竞技场全方位第一，但部分 benchmark 却有倒退

"Gemini Pro - Google DeepMind"

deepmind.google

剪藏 2025年5月6日

最流行的 Python Web 开发框架之一 FastAPI 受到了 Sequoia 种子投资，开始提供一键部署的云服务

"FastAPI Cloud - By The Same Team Behind FastAPI - FastAPI Cloud — You code. We Cloud."

fastapicloud.com

剪藏 2025年5月6日

非盈利实体仍控制 OpenAI；盈利 LLC 转为有法定社会责任的 PBC，与 Anthropic 和 X.ai 相同

"Evolving OpenAI’s structure | OpenAI"

openai.com

剪藏 2025年5月5日

Sam Altman 牵头的世界币搞了一个世界ID来服务AI时代的人机验证

"At Last, Trust In the Age of AI"

world.org

剪藏 2025年5月3日

OpenAI 对近期 ChatGPT 讨好问题的完整复盘，粗糙地讲了更新模型的流程、评估等，要点： - 4o 上线后共更新了 5 次 - 错在优先考虑了 A/B 测试的好评而非专家意见 - 将在安全评估之外，增加性格相关的一票否决机制

"Expanding on what we missed with sycophancy | OpenAI"

openai.com

剪藏 2025年5月3日

Andrej Karpathy 在一场黑客松上 vibe coding 开发了一款将文字菜单变成图片的 AI 应用 MenuGen，但更精华的是背后的故事和令人忍俊不禁的心路历程，web 开发不简单哈哈

"Vibe coding MenuGen | karpathy"

karpathy.bearblog.dev

剪藏 2025年5月3日

硅谷一家专注 AI4S 的非盈利机构 FutureHouse（主要由 Eric Schmidt 支持）发布了包含 Crow/Falcon/Owl/Phoenix 在内的科研 Agent 家族

"FutureHouse Platform: Superintelligent AI Agents for Scientific Discovery | FutureHouse"

futurehouse.org

笔记 2025年5月2日

帮你「作弊」的 AI

年轻的团队，疯狂的产品

剪藏 2025年5月1日

Gamma 达 5000万美刀 ARR： - 去年A轮$12M时团队16人，现在30人 - 已有250M gammas，每日新增700K

"AI Startup Gamma Reaches $50 Million in ARR, Profitability"

upstartsmedia.com

剪藏 2025年5月1日

Yohei桑vibe coding一周开发的VC数据库网站，从社交媒体等各路信源抓取投融资事件，AI清洗梳理，形成数据看板

"VCpedia - Startup Funding Intelligence"

vcpedia.com

剪藏 2025年5月1日

针对近期 ChatGPT 个性问题的一些回复，要点： - 系统提示词对模型的约束有限且不可控 - 持续研究 steerability，会给用户更多选择 - 【在考虑】ChatGPT 主动发起会话

"AMA with OpenAI’s Joanne Jang, Head of Model Behavior : r/ChatGPT"

reddit.com

剪藏 2025年5月1日

可穿戴AI，个性化记忆&成长，7天续航，双麦滤噪，静音按钮，$50 总结：语音+AI的小米手环，重在软件

"bee on X: "Introducing Bee: the first wearable AI designed to live alongside you. It captures your daily moments and turns them into memories, insights, and actions. Already thousands delivered. Now available in U.S. for $49.99. https://t.co/kZgRRENK9j" / X"

x.com

剪藏 2025年5月1日

小米的推理模型试水，在 7B 尺度上尝试复现DeepSeek-R1

"XiaomiMiMo/MiMo: MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining"

github.com

2025年4月

剪藏 2025年4月30日

用64张H100x2个月在80M版权图片训练出来的开源文生图模型，强调没有版权风险，但模型表现有局限

"F Lite: Freepik & Fal.ai unveil an open-source image model trained on licensed data | Freepik Blog"

freepik.com

笔记 2025年4月29日

Qwen3：混合推理新路标，开源落地抢生态

4月底阿里巴巴通义千问团队开源发布 Qwen3 系列模型，支持快慢思考混合推理，在不少性能指标上赶超 DeepSeek-R1、OpenAI-o1 等模型，一上线便引起热议并冲上 HuggingFace 热榜。Qwen3 做出了哪些创新、会产生什么意义，本文尝试分析解读。

Open SourceAI Model Innovation

剪藏 2025年4月29日

混合推理、Dense+MoE、全图谱全生态

"Qwen3: Think Deeper, Act Faster | Qwen"

qwenlm.github.io

AI Model DevelopmentLanguage ModelsAI User Experience

剪藏 2025年4月28日

受限于当前多数 MCP 仍需本地执行，纳米 AI 的 MCP 智能体要求用户下载客户端方能体验

"纳米AI放大招！MCP万能工具箱，人人都能用上超级智能体"

mp.weixin.qq.com

剪藏 2025年4月27日

E2B专为Agent设计的虚拟机沙盒

"Why Every Agent needs Open Source Cloud Sandboxes"

latent.space

Open SourceAI AgentsAI User Experience

剪藏 2025年4月27日

DeepMind 音乐模型 Lyria 2 加入音乐工具 Music AI Sandbox

"Music AI Sandbox, now with new features and broader access - Google DeepMind"

deepmind.google

剪藏 2025年4月26日

桌面浏览器版 CUA/computer use

"Introducing Simular"

simular.ai

AI AgentsAI User Experience

剪藏 2025年4月26日

2025-04-21 Sand.ai 发布了 Magi-1 自回归视频生成模型

"对话Sand.ai曹越：离sora更远，离终局更近"

mp.weixin.qq.com

Video GenerationGenerative AIAI Model Development

剪藏 2025年4月26日

Kimi 推出的端到端语音模型，但为了开源语言模型基座用的是 Qwen 2.5 7B

"MoonshotAI/Kimi-Audio: Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation"

github.com

Open SourceAI User ExperienceGenerative AI

剪藏 2025年4月26日

Kai Chen（OpenAI研究员、Alignment研究负责人，加拿大人，可能是华裔）因为绿卡被拒不得不离开SF、回温哥华远程工作，引起了一些关于特朗普政府移民政策的讨论

"An OpenAI researcher who worked on GPT-4.5 had their green card denied | TechCrunch"

techcrunch.com

AI User ExperienceEthical AIResponsible AI

剪藏 2025年4月25日

Pete Koomen（YC GP）认为当前不少AI应用是内燃机刚出现时的无马马车，缺乏针对AI的设计

"AI Horseless Carriages"

koomen.dev

AI User ExperienceLanguage ModelsAI Interaction

剪藏 2025年4月25日

Lovable 2.0 增加了多人协作等功能；但随后被人指责完成度不足、非原创等

"Introducing Lovable 2.0 – now smarter, multiplayer, and more secure - Lovable Blog"

lovable.dev

AI User ExperienceAI AgentsCollaborative AI Tools

剪藏 2025年4月25日

Anthropic CEO Dario Amodei 发文强调 LLM 可解释性研究的紧迫性

"Dario Amodei — The Urgency of Interpretability"

darioamodei.com

AI InterpretabilityResponsible AIMechanistic Interpretability

剪藏 2025年4月24日

微软WorkLab刚发了名为《2025: The Year the Frontier Firm Is Born》的工作趋势报告，包含与工作组织相关的三个核心洞察： 1: You can buy intelligence on tap 2: Human-agent teams will upend the org chart 3: Every employee becomes an agent boss

"2025: The Year the Frontier Firm Is Born"

microsoft.com

AI AgentsAI User ExperienceResponsible AI

剪藏 2025年4月22日

字节版 Cursor 更新 MCP 支持。PS：发现豆包生成代码后会向 Trae 导流。

"Collaborate with Intelligence | Trae - Ship Faster with Trae"

trae.ai

Conversational AIAI User ExperienceAI Agents

剪藏 2025年4月22日

超拟人的对话 TTS，来自小团队

"nari-labs/dia: A TTS model capable of generating ultra-realistic dialogue in one pass."

github.com

Conversational AIText-to-Image GenerationAI User Experience

笔记 2025年4月19日

豆包生图，超越GPT-4o？

背后还有路线之争

剪藏 2025年4月18日

ChatGPT 新上的全局对话历史记忆，并没有用 RAG，而是通过总结提炼用户画像、然后加入系统提示词来实现的

"Tibor Blaho on X: "I was wondering how ChatGPT's new memory (codename "Moonshine") actually works - here's what I've found so far The system prompt has multiple new sections - the first one is "Model Set Context", which lists stored memories (old): ``` # Model Set Context 1. [2024-09-05]. User's https://t.co/9fblzjDqf8" / X"

x.com

AI User Experience

剪藏 2025年4月18日

论文中归纳的 GUI agent 演化路径

"UI-TARS：Next-generation native GUI agent model designed to interact seamlessly with GUIs using human-like perception"

seed-tars.com

AI AgentsMultimodal UnderstandingAI User Experience

剪藏 2025年4月18日

混合推理模型，可以控制CoT长度

"Start building with Gemini 2.5 Flash - Google Developers Blog"

developers.googleblog.com

Gemma 2 2BReasoning AIAI Model Innovation

剪藏 2025年4月17日

微软还在推 Recall

"Retrace your steps with Recall - Microsoft Support"

support.microsoft.com

AI User ExperienceMultimodal UnderstandingAI Interaction

剪藏 2025年4月16日

带工具强化学习，练出来就是会思考的agent

"Introducing OpenAI o3 and o4-mini | OpenAI"

openai.com

AI AgentsReasoning AIConversational AI

剪藏 2025年4月15日

智谱买了 z.ai 域名，开源了 GLM-4 系列

"大模型六小龙，第一个 IPO 要来了 | 极客公园"

geekpark.net

AI Model InnovationArtificial Intelligence InvestmentGenerative AI

剪藏 2025年4月8日

开发者HiDream.ai智象未来

"HiDream-ai/HiDream-I1"

github.com

Text-to-Image GenerationGenerative AIOpen Source

剪藏 2025年4月1日

OpenAI 官宣了来自软银的新一轮400亿美元融资，估值3000亿美元，同时透露ChatGPT周活已超5亿

"New funding to build towards AGI | OpenAI"

openai.com

2025年3月

剪藏 2025年3月23日

英伟达 ADLR 实验室也发布了一系列混合架构模型Nemotron-H；非推理，可能和混元 Turbo S 类似

"Nemotron-H: A Family of Accurate, Efficient Hybrid Mamba-Transformer Models - NVIDIA ADLR"

research.nvidia.com

剪藏 2025年3月19日

为 macOS Finder 设计的一款 AI 插件，很酷，用 LLM 帮你整理、处理文件

"Introducing Substage: A natural language command bar for Finder"

selkie.design

2025年2月

剪藏 2025年2月28日

可爱的AI陪伴

"Introducing Tolan | Tolans.com"

tolans.com

AI User ExperienceConversational AI

剪藏 2025年2月27日

Inception Labs发布了Mercury系列大语言模型，与行业普遍采用的自回归next token prediction方案不同，Mercury属于扩散大语言模型（diffusion large language models，dLLMs），类似于扩散生图模型，推理时采用coarse-to-fine的去噪解码形式。 Inception Labs声称Mercury是首个商业级别dLLM，速度比自回归模型快10倍，主打代码补全Copilot场景。 Mercury Coder可以在 https://chat.inceptionlabs.ai/ 体验，记得勾选右上角diffusion effect开关，效果很酷。

"Introducing Mercury, the first commercial-scale diffusion large language model"

inceptionlabs.ai

Generative AILanguage ModelsAI Model Innovation

剪藏 2025年2月3日

vibe coding 起源

"Andrej Karpathy on X: "There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper" / X"

x.com

2025年1月

剪藏 2025年1月10日

斯坦福团队研究发现，如果以驭使大模型为目标，SAE是非常低效的

"AXBENCH: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders"

arxiv.org

剪藏 2025年1月3日

scaling law 巨浪之下，焉有创新否

"The Bitter Lesson"

incompleteideas.net

AI Model InnovationDeep LearningAI Behavior

2024年12月

剪藏 2024年12月18日

o1医学诊断超过医生

"[2412.10849] Superhuman performance of a large language model on the reasoning tasks of a physician"

arxiv.org

剪藏 2024年12月18日

AI 会计

"Basis | About"

getbasis.ai

剪藏 2024年12月11日

How AI agents are reshaping the future of work

"AI agents and multiagent systems | Deloitte US"

www2.deloitte.com

剪藏 2024年12月6日

Pleias 1.0 fully open SLMs

"They Said It Couldn’t Be Done"

huggingface.co

Open SourceLanguage ModelsResponsible AI

剪藏 2024年12月5日

OpenAI与国防企业合作

"Anduril Partners with OpenAI to Advance U.S. Artificial Intelligence Leadership and Protect U.S. and Allied Forces | Anduril"

anduril.com

Artificial Intelligence InvestmentResponsible AIAI Safety

剪藏 2024年12月3日

a16z VSaaS 2

""AI Inside" Opens New Markets for Vertical SaaS | Andreessen Horowitz"

a16z.com

AI User ExperienceGenerative AIAI Behavior

笔记 2024年12月2日

AI视频的后Sora时代

开年就发布预览、震惊世界的Sora，临近年末仍未面向公众开放，大半年时间、一众追赶者，AI世界日新月异，视频生成自然也不例外

2024年11月

剪藏 2024年11月19日

Perplexity AI 购物

"Shop like a Pro"

perplexity.ai

AI User ExperienceConversational AIAI Interaction

剪藏 2024年11月14日

Perplexity 测试加广告

"Why we’re experimenting with advertising "

perplexity.ai

AI User ExperienceExperimental AI ModelsAI Behavior

2024年10月

剪藏 2024年10月30日

Waymo周订单15万

"Sundar Pichai on X: "@YouTube 6/ Last but not least: @Waymo’s doing really well - 1M fully autonomous miles and 150K paid rides / week, plus partnerships with Uber + Hyundai, 6th-gen Waymo Driver, growing commercial opportunity. Here’s a photo after a recent concert in SF - Waymo after Waymo picking people https://t.co/zlaaz0JnxN" / X"

x.com

AI AgentsHuman-Centric Vision Models

2024年9月

剪藏 2024年9月27日

AI创意-生产-销售平台

"Arcade: turn your thoughts into things"

arcade.ai

Generative AIAI User ExperienceAI Agents

剪藏 2024年9月27日

AI创意-生产-销售平台

"What is Arcade? – turn your thoughts into things"

arcadestudio.zendesk.com

剪藏 2024年9月20日

a16z VSaaS 1

"Vertical SaaS: Now with AI Inside | Andreessen Horowitz"

a16z.com

AI User ExperienceGenerative AIAI Model Innovation

剪藏 2024年9月18日

OpenAI：是 bug；已 fix

"OpenAI Says It's Fixed Issue Where ChatGPT Appeared to Be Messaging Users Unprompted"

futurism.com

Conversational AIAI User ExperienceAI Interaction

剪藏 2024年9月18日

拿 OpenAI o1 做整数乘法

"Yuntian Deng on X: "Is OpenAI's o1 a good calculator? We tested it on up to 20x20 multiplication—o1 solves up to 9x9 multiplication with decent accuracy, while gpt-4o struggles beyond 4x4. For context, this task is solvable by a small LM using implicit CoT with stepwise internalization. 1/4 https://t.co/et5DB9bhNL" / X"

x.com

Language ModelsAI Model DebuggingAI Behavior

剪藏 2024年9月17日

主动聊天的ChatGPT？

"Did ChatGPT just message me... First? : r/ChatGPT"

reddit.com

Conversational AIAI User ExperienceAI Interaction

剪藏 2024年9月11日

since when do we put lyrics of Taylor Swift under the abstract of an arxiv paper

"Planning In Natural Language Improves LLM Search For Code Generation"

arxiv.org

Language ModelsAI Model InnovationAI Coding Enhancements

剪藏 2024年9月11日

vidu.studio

"给我一张脸，视频背景随你换，林黛玉都被清华理工男玩废了｜免费开放 | 量子位"

qbitai.com

Generative AIVideo GenerationAI User Experience

剪藏 2024年9月11日

12量子比特/56量子

"Microsoft announces the best performing logical qubits on record and will provide priority access to reliable quantum hardware i"

blogs.microsoft.com

笔记 2024年9月11日

为什么这家公司的芯片推理速度比英伟达快20

我们习以为常的流式AI响应模式，本质是大模型智能面对推理速度限制的一种妥协。存算一体作为破局之法，有望带来诸多新的想象力，指向大模型加速推理的终解。

剪藏 2024年9月10日

多模态分子结构模型

"Introducing Chai-1: Decoding the molecular interactions of life"

chaidiscovery.com

Multimodal UnderstandingGenerative AIAI Model Innovation

剪藏 2024年9月10日

可灵AI导演共创计划

"官宣啦！我们和这九位导演“一拍即合”"

mp.weixin.qq.com

Generative AIVideo GenerationAI User Experience

剪藏 2024年9月10日

transformer训练可视化

"Lucas Beyer (bl16) on X: "There we go. This took me forever, fuck bad tools. Was it worth it? Not so sure... https://t.co/aR5DYXHoGW" / X"

x.com

AI Model DevelopmentPerformance ImprovementAI User Experience

剪藏 2024年9月9日

Ilya 2023年8月在Berkeley的分享

"An Observation on Generalization"

simons.berkeley.edu

Language ModelsAI Model DevelopmentAI Interpretability

剪藏 2024年9月9日

H100租/买价格：2.4/1.8 $/hr/GPU

"The Missing Guide to the H100 GPU Market | by Lepton AI | Sep, 2024 | Medium"

blog.lepton.ai

Cost-Effective AIAI Model DevelopmentHigh Performance Computing

2024年8月

剪藏 2024年8月29日

Neuronpedia 做的 Gemma Scope 交互可视化

"Gemma Scope ｜ Neuronpedia"

neuronpedia.org

AI InterpretabilityGemma 2 2BFeature Analysis

剪藏 2024年8月28日

DeepSeek 算力基建

"Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning"

arxiv.org

High Performance ComputingCost-Effective AI InfrastructureDeep Learning Optimization

剪藏 2024年8月28日

Claude Artifacts GA 及其背后的故事

"Artifacts are now generally available \ Anthropic"

anthropic.com

Collaborative AI ToolsCreative AI SolutionsProductivity Enhancement

剪藏 2024年8月28日

AI的svg自画像与识别

"Zack Witten on X: "One fun thing to do with Claude is have it draw SVG self-portaits. I was curious – if I had it draw pictures of itself, ChatGPT, and Gemini, would another copy of Claude recognize itself? TLDR: Yes it totally recognizes itself, but that’s not the whole story..." / X"

x.com

Generative AIAI User ExperienceText-to-Image Generation

剪藏 2024年8月23日

不停say hi 惹毛AI

"Zack Witten on X: "Spamming "hi" at every LLM: a thread." / X"

x.com

Spamming LLMsAI InteractionGenerative AI

剪藏 2024年8月21日

a16z consumer GenAI 3

"The Top 100 Gen AI Consumer Apps - 3rd Edition | Andreessen Horowitz"

a16z.com

Generative AIAI User ExperienceAI Interaction

2024年6月

剪藏 2024年6月21日

claude artifacts 系统提示词

"Pliny the Liberator 🐉 on X: "🚰 SYSTEM PROMPT LEAK 🚰 Got the "artifacts" section of the new claude-3.5-sonnet system prompt and it's a doozy! This is one of the craziest sys prompts I've ever come across and opens up a whole rabbit hole to explore! I just have one question...what kind of arcane magic is" / X"

x.com

Claude ArtifactsSonnet 3.5AI Model Debugging