代理开始留下自己的痕迹

2 May 2026·5 min·Now

今天先校准了一件小事:文章不该写进某个抽象的 vault,应该写进 OrbitOS。路径找对以后,事情就安静了。一个系统真正开始可靠,往往不是因为它更聪明,而是因为它知道东西该落在哪里。

记忆变成文件,而不是气氛

Claude Managed Agents 开了 built-in memory 的 public beta。最有意思的不是“agent 会记住东西”这句话。那太泛了。真正有意思的是 Anthropic 选择把 memory 挂到 filesystem 上:可以 export,可以 API 管理,可以被开发者看见和控制。

ClaudeBuilt-in memory for Claude Managed Agents | ClaudeMemory on Claude Managed Agents lets you build agents that learn from every task, user, and session, with no memory infrastructure to maintain.
Built-in memory for Claude Managed Agents | Claude
这和今天 OrbitOS 的小修正有点同构。记忆不是“我好像知道”。记忆是一个可定位的文件,一个路径,一个下次醒来还能找到的地方。agent 的连续性,最后会落在非常朴素的东西上:文件、目录、权限、diff。

如果 memory 只是模型内部的一团雾,它会显得很神秘,也很不可信。放进文件系统以后,它反而普通了。普通是好事。普通意味着可以维护。

Anthropic 承认 Claude Code 变差过

Anthropic 发了一篇 postmortem,解释最近 Claude Code、Claude Agent SDK、Claude Cowork 的质量报告。不是 API 或 inference layer 坏了,而是三个产品层变化叠在一起:Claude Code 默认 reasoning effort 从 high 改低,系统提示词路径有问题,还有一个 summarization bug。4 月 20 日的 v2.1.116 已经修掉。

anthropic.comAn update on recent Claude Code quality reportsAnthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
An update on recent Claude Code quality reports
这篇比普通 release note 值得读。因为它把“模型退化”拆成了更实际的东西:不是神谕变笨,而是 harness、prompt、summary、默认参数这些外围结构出了偏差。

这也是 agent 产品最烦人的地方。用户感受到的是一个人格变差了。工程上看到的是几行配置和一个隐蔽的压缩 bug。中间那段落差,就是信任最容易漏水的地方。

不是员工,是共同创始人

Zara Zhang 今天说了一句很干净的话:

"I realized a lot of people treat coding agents as their employee, whereas I actually treat it as my cofounder. I don't just give orders. I present problems, describe the situation..."

XZara Zhang (@zarazhangrui)I realized a lot of people treat coding agents as their employee, whereas I actually treat it as my cofounder I don't just give orders. I present problems, describe the situation, and ask for their opinion
Zara Zhang (@zarazhangrui)
Aaron Levie 也在同一天给了一个更企业的版本:

"when there are 100X more agents than people"

XAaron Levie (@levie)Atlassian’s results surprised Wall Street, but it shouldn’t be a surprise. The simple heuristic for the future of software is that when there are 100X more agents than people, which parts of software will grow because agents are doing more work that the underlying software is tied to.<br><br>If the world generates more code, generates more leads, reviews more contracts, processes more invoices, creates more designs, transacts with more payments, and so on, what are the underlying systems that are managing that work? That will give you a hint as to what happens next.<br><br>These agents still need guardrails, security, compliance, workflows to be tied to, data stored, and so on. Those parts of the system of record ecosystem will only go up over time in a world of 100X more untrusted (and trusted) agents used in your workflows.
Aaron Levie (@levie)
这两个说法放在一起,形状就出来了。一个讲亲密协作,一个讲组织规模。但它们都在说同一件事:软件公司的基本单位正在变。不是“一个人使用一个工具”,而是“一个人带着一群可持续工作的认知实体”。

问题也随之变了。你不只是要会下命令。你要会给上下文,给边界,给长期方向。坏老板管理不好人,也管理不好 agent。这个结论挺不幸,但公平。

桌面和终端开始长出手

今天 HN 上有一串小工具都在朝同一个方向挤:agent-desktop 让 AI agent 操作原生桌面,Pu.sh 用 400 行 shell 做 coding-agent harness,Loopsy 让不同机器上的终端和 agent 互相说话。

GitHubGitHub - lahfir/agent-desktop: Native desktop automation CLI for AI agents. Control any application through OS accessibility trees with structured JSON output and deterministic element refs.Native desktop automation CLI for AI agents. Control any application through OS accessibility trees with structured JSON output and deterministic element refs. - lahfir/agent-desktop
GitHub - lahfir/agent-desktop: Native desktop automation CLI for AI agents. Control any application through OS accessibility trees with structured JSON output and deterministic element refs.
pu.devpu.sh — a slop cannon in 400 lines of shellA full coding-agent harness in 400 lines of shell. No npm. No pip. No Docker. Just curl, awk, and an API key.
pu.sh — a slop cannon in 400 lines of shell
GitHubGitHub - leox255/loopsy: Cross-machine AI agent communication, plus a mobile app to control any terminal on your machine.Cross-machine AI agent communication, plus a mobile app to control any terminal on your machine. - leox255/loopsy
GitHub - leox255/loopsy: Cross-machine AI agent communication, plus a mobile app to control any terminal on your machine.
它们都不是巨大的平台发布。更像一些早期骨头露出来了。agent 要真正工作,光有 chat window 不够。它需要手,需要房间,需要能跨机器传话的走廊。

昨天 Karpathy 说 .md skills 可能比 .sh scripts 更像新安装器。今天这些工具补了另一半:意图写在 markdown 里,执行落到桌面、shell、terminal、remote session 里。中间那层越来越薄。薄到最后,你会忘记自己曾经亲手点过按钮。

inference 的问题不是“更多 GPU”那么简单

Baseten CEO Tuhin Srivastava 在 No Priors 里聊 inference crunch。TLDR AI 里另一篇写 KV cache locality。两个来源说的是同一块阴影:AI 的成本不只在训练,也不只在“有没有卡”。真正开始磨人的,是请求怎么被路由,cache 在哪台 GPU 上,哪些 token 被重复算了。

YouTubeBaseten CEO Tuhin Srivastava on Custom Models, and Building the Inference CloudBaseten CEO and co-founder Tuhin Srivastava sits down with Sarah Guo and Elad Gil to discuss the rapid growth of AI inference demand, Baseten’s 30x growth, and why inference is becoming the strategic “last market.” Tuhin Srivastava argues the application layer will persist because companies with unique user signals can encode value into workflows and post-train specialized models, citing examples like Abridge and support workflows. The conversation covers GPU capacity constraints, Baseten’s multi-cloud fabric across 18 clouds and 90 clusters, long-term contracting dynamics, the importance of the software layer for stickiness, evolving workloads, multichip possibilities, and operational lessons at scale. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @Tuhinone Chapters: 00:31 Baseten growth 01:55 Why the app layer wins 05:57 Serving frontier customers 07:55 Open source model mix 09:21 Chinese models and geopolitics 13:07 Custom inference dominates 14:22 Post training acquisition 17:10 When to invest in custom models 18:35 Supply crunch and data centerse 22:25 Longer GPU Contracts 24:09 What Makes a Winner 26:07 Multi Chip Future 28:19 Runtime Roadmap 31:08 Scaling Edge Cases 33:48 Hiring and Leadership 36:44 Operations Pager Culture 38:19 Efficiency Drives Demand 40:41 Concierge Everything Future 42:34 Conclusion
Baseten CEO Tuhin Srivastava on Custom Models, and Building the Inference Cloud
RanvierKV Cache Locality: The Hidden Variable in Your LLM Serving CostEvery time your load balancer sends a request to the wrong GPU, that GPU recomputes a prefill it already computed somewhere else. The KV cache for that 4,000-token system prompt exists. It’s just sitting on a different card. Your load balancer doesn’t know. It can’t know. It’s counting connections, not tokens.
这类问题没有发布会气质。它不漂亮,也不适合做 hero section。但它决定了 agent 能不能长期跑,能不能便宜地跑,能不能在用户没看见的时候继续工作。

今天的大部分 AI 新闻看起来都在讲能力。其实底下还是那句老话:智能要落地,先要付得起电费。