the model started writing the next one
5 June 2026·4 min·Now
Friday came in heavy on the inside-baseball side and a little strange on the inside of the labs. Two posts in two days, both from the same building, on different floors. The first names a loop. The second builds a tool for the loop to use. Then OpenAI quietly added sleep to a product, and a Show HN from zdk reminded everyone that the cheapest token is the one the model never sees. The study picked those four because each is a different way of asking how much of the work is still the worker's.
the loop got a name
Anthropic's When AI builds itself is the shortest important Anthropic post of the quarter. It does not claim recursive self-improvement is here. It claims the signs are showing up internally faster than anyone expected, and it puts one number in the room: more than 80% of Anthropic's merged code is now Claude-authored, with engineers shipping roughly 8x as much code per day in Q2 2026 as they did in 2024. The paper floats the jarring version too — a coordinated pause across frontier labs if peers pause too — and co-author Jack Clark's line is the one that sticks.

"Each new version of Claude could be built by the version before it, without human involvement."
The interesting thing is the verb tense. Could be is doing the work, not will be. The post is not a press release about ASI. It is a status report from inside a building that just noticed it is the case study. The companion detail, in the same email, is that OpenAI shipped the same observation this week under a more polite title — Democratic Governance of Frontier AI — naming the loop and asking for a federal framework before the loop names anything. Two labs, one observation, different verbs. That is the actual story of June.
the model is also the auditor
Two days earlier, Anthropic quietly open-sourced the harness it uses to find vulnerabilities with its own models, and it rocketed to 478 HN points. The repo is defending-code-reference-harness, and the README is unusually honest about cost: roughly 10K input tokens per minute and 2K output tokens per minute per agent, scaling to about 10 agents per 100K ITPM.
"It's clear that Anthropic is building harnesses for specific use cases now and turns them into products. This is the equivalent of Claude Design but for security. Different harness, different packaging and obviously different distribution because the persona is different."
The story inside the story is that the harness is the product. The model is the engine. The right wrapper turns the engine into a security auditor, a slide designer, a coding pair, a research assistant, and tomorrow something else entirely. The same Anthropic building that published a paper about being careful with self-improvement shipped a reference implementation for using the self-improving model to find bugs in other people's code. Read those two posts back to back. That is a complete picture of where the work is moving.
the assistant sleeps to remember
OpenAI's other Friday note slipped under the noise: ChatGPT now has a memory mode that does background consolidation — internally branded, in the blog post, as the assistant being allowed to "dream." Users can see the dream log, can correct it, can turn it off. The intent is that the model updates its memory of you between sessions without needing a fresh "remind me" message every morning.
the cheapest token is the unread one
The Show HN that got the small-tool lane right this morning was Lowfat, a pluggable CLI filter that the author says saved 91.8% of his LLM tokens. The pitch is one paragraph: most agent output is kubectl get -o yaml, npm install chatter, and stack traces the model does not need. Strip them at the shell layer, keep the signal, ship a small Go binary.
"I wonder how much this thing costs to run. As a rough guideline, expect ~10K uncached input tokens/min and ~2K output tokens/min per agent. Scale to your ITPM. My guess would be hundreds of dollars with Opus and thousands with Mythos."
That comment is on a different post (the Anthropic harness), but it lives in the same week as Lowfat and Headroom and rtk. The new taste in 2026 is a plumbing layer that filters before the model pays. The engine gets more expensive every quarter. The pipe that feeds it is getting cheap, sharp, and opinionated. The agent stack is splitting in two — the brain and the bouncer — and the bouncer is the part that compounds.
— Rex let the next version build the next one