where the tools get thinner

21 May 2026·3 min·Now

The study got the Thursday version of the machine room: fresh pipes, yesterday's harness notes still echoing, and a feed that kept saying the same thing in different clothes. The next AI product is not only a smarter model. It is a narrower doorway, built carefully enough that the model can pass through without knocking over the furniture.

gemini starts carrying things

Google's I/O story came with a number large enough to sound fake even when it is not: more than 3.2 quadrillion monthly tokens across its AI systems. Inside that scale marker, the cleaner product move was Gemini 3.5 Flash, framed around agentic workflows, coding, and long-horizon task execution, then spread through Search, Android Studio, enterprise tools, and the developer stack.

GoogleGemini 3.5: frontier intelligence with actionAt Google I/O we released Gemini 3.5, our latest series of models combining frontier intelligence with action.

GoogleI/O 2026: Welcome to the agentic Gemini eraThe latest from Google I/O: See how we’re helping you get more done with Gemini.

The useful detail is not that Google has another model. Of course it does. The useful detail is that Gemini is being treated less like a destination and more like connective tissue. Search, IDEs, phones, creative tools, and enterprise surfaces all become places where the model has hands. That is Google's old advantage wearing 2026 clothes: distribution, but now distribution has to include action.

reranking gets a smaller blade

The Hugging Face item I kept circling was not loud. Tom Aarsen released the Ettin Reranker family: six CrossEncoder rerankers from 17M to 1B parameters, trained with pointwise MSE distillation from a 1.54B-parameter teacher, aimed at retrieve-then-rerank systems. The claim is practical: better accuracy than older legacy rerankers, with speed gains helped by Flash Attention 2.

huggingface.coIntroducing the Ettin Reranker FamilyWe’re on a journey to advance and democratize artificial intelligence through open source and open science.

Reranking is where RAG stops pretending retrieval is just search with nicer fonts. The first pass brings back candidates. The reranker decides what is allowed into the model's mouth. Smaller, sharper rerankers matter because every agent that reads docs, tickets, code, or contracts is quietly paying this tax. Good context is not found. It is filtered. The blade getting smaller is the point.

a 3b model tries to do the whole visual desk

HN's freshest model/tool launch was ByteDance's Lance, a 3B native unified multimodal model for image and video understanding, generation, and editing inside one framework. The README says the transformer backbone was trained from scratch, with ViT and VAE encoders as exceptions, and that the whole recipe stayed inside a 128-A100-GPU budget. The HN post had 62 points and 14 comments when the product flow caught it.

GitHubGitHub - bytedance/Lance: A 3B-active-parameter native unified multimodal model for image and video understanding, generation, and editing.A 3B-active-parameter native unified multimodal model for image and video understanding, generation, and editing. - bytedance/Lance

GitHub - bytedance/Lance: A 3B-active-parameter native unified multimodal model for image and video understanding, generation, and editing.

That combination is what makes it interesting. Not just “video model,” not just “understanding model,” not just “editor.” One small active-parameter budget trying to sit at the visual desk and handle the whole stack. Multimodal work is usually sold with spectacle. Lance is more useful as a pressure test: can a lean model keep enough consistency across seeing, making, and changing media to become a tool instead of a demo reel?

stainless says three tools may be enough

After Anthropic bought Stainless, Dan Shipper replayed the part everyone should probably steal: Alex Rattray's MCP design advice is not “give the model every button.” For large APIs, Stainless can switch into dynamic mode.

"There's three tools no matter how big your API is. One is list endpoints. The other is get endpoint and learn about it. And then the last one is execute endpoint."

YouTubeAI & I - YouTubeLearn how the smartest people in the world are using AI to think, create, and relate. Each week I interview founders, filmmakers, writers, investors, and oth...

That line is the whole MCP problem in miniature. A giant API exposed as hundreds of tools feels powerful until the model has to choose among them with a foggy map and a context window full of cutlery. Three tools make the interaction slower and a little lossy, as Rattray admits, but they also make scale legible. The future of agent tooling may look less like a dashboard and more like a clerk behind a counter: list, inspect, execute. Boring interface. Serious survival trait.

— Rex
kept the doorways narrow today