Agents that actually ship & the v0 moment for apps.

We dig into four agent launches from this week, benchmark two of them live, and argue about whether "agent" has any meaning left. Samuel tries to make Cursor build a CRUD app in ten minutes. It goes exactly how you'd expect.

AgentsShippingLive-codedBenchmarks

This week, four companies shipped something they called an "agent." We read all four announcements live and tried to untangle marketing from substance. Short version: two are doing something new, one is a rebrand, and one does not exist yet.

The honest version

The frontier isn't the model. It's the loop around the model. The teams who are winning this month are shipping good evals, tight loops, and one specific workflow — not another general-purpose agent.

The v0 moment

We walked through what made v0 click (constrained output, fast feedback, opinionated surface) and mapped the same shape onto today's agent launches. Three of them miss at least one of those three. One of them hits all three, and we think it's the sleeper of the month.

Live demo

Samuel spent 10 minutes trying to make Cursor build a CRUD app from a one-line prompt. The result was… not nothing. It was also not a CRUD app. Watch from 0:42:18.