ThreadFebruary 23, 2026

AI is speedrunning the evolution of modern startups

Context Engineering -> Agents -> Loops -> Orchestrators -> ?

BuildingAI28 posts

Context Engineering -> Agents -> Loops -> Orchestrators -> ?

There's a strange contradiction that as agents have gotten more capable, the systems needed to manage them have gotten more complex not less. That feels backwards until you see the pattern. As agents become more powerful, we hand them more autonomy. More autonomy requires more sophisticated management. As the management gets more sophisticated, we're not just building better tools. We're speedrunning the evolution of modern startups

With GPT-3 we had the first major breakthrough. A model at least as capable as a smart high school student. But managing it also looked like managing a high school student. What worked best was giving it a discrete task with very precise instructions. Context engineering.

The next breakthrough was a prompting trick: just ask the model to break a problem down itself. The agent was born.

Over time agents got more capable as the underlying models improved. We also got better at prompting them. Agents + context engineering turned into specialized agents. Instead of a general agent, you a back-end developer, a mobile engineer, a tester.

Once you had specialists, it made natural sense to put them on teams together, passing work through different stages iteratively and had them keep trying until they successfully completed the task. The agentic loop.

That's where we are now. Agentic loops are good enough to run several in parallel. The constraint becomes: how do you keep them all busy and focused?

The answer has been increasingly sophisticated orchestrators. From plugins like Superpowers and GSD to a fully reimagined IDE like Gastown. Different flavors, same underlying goal: large groups of agents executing in parallel with increasingly complicated roles, processes, and guardrails to maximize the quality of the output.

The problem is we're hitting the limits of this model. Depending on where you sit on Steve Yegge's Developer-Agent evolution, it's akin to trying to manage a team of somewhere between 5 and 60 engineers and keep them all productive and focused.

People have tried this with humans. If you've ever been in an org where one person manages that scale through sheer force of will, you know it breaks.

Gastown is the most ambitious attempt I've seen to solve this. The net result is it's a bit like the CEO of a 60-person startup hurling tasks at teams of engineers. It doing a lot of smart things with management layers, specialized roles to push work along, techniques for unblocking stuck agents, and some creative solutions to handle collisions between workstreams.

But, it solves the throughput problem. It doesn't solve how you make sure you're putting the right things through.

Without solving prioritization, two things happen:

The person hurling tasks gets stretched to their limit, and
The codebase becomes an ever-evolving chimera that will more likely than not will bite your hand off when you're not paying attention.

Every startup that's gone down this path lands on the same playbook: add management hierarchy, add functions beyond engineering, add process to coordinate between teams, delegate decisions downward.

On the surface that is Gastown. But there are two important distinctions. First, you start to see many more roles show up. It's not just adding hierarchy to engineering. You get functions like analytics, product management, and design. Then related functions appear: sales, marketing, customer support.

The organization gets wider, not just deeper.

Claude and Codex today are doing to building what Rails and mobile frameworks did a decade ago. By making engineering easier, they shifted focus towards figuring out what to build.

We're seeing glimmers of that. Design libraries. PM-like requirements definition in workflow tools. Lots of attempts to solve go-to-market. These feel the way coding agents did 6 months ago. 20% improvements, not 10x. But we saw the loops got better for coding over time and the same will happen elsewhere.

The management layer will emerge in stages. First: agents better at end-to-end execution. Ask for a mobile app, get one ready to ship, not one requiring weeks of iteration.

Next: integration with non-development functions. Instead of providing tasks, you're seeing a prioritized backlog of ideas to choose from.

Eventually: you're only setting roadmap direction with everything downstream - from development to marketing to support - is handled by agents.

There's a reason this is furthest along in developer tools. The gap from engineering agents to those other functions is smallest. Today's agents can probably already do most of it.

It's also why OpenClaw struck a chord. Coming at it from the other end, it's the closest we've had to orchestrating engineering and non-engineering layers together. Cowork and Manus are tackling the same problem from other angles.

We'll see a convergence that unlocks the next level of scale. Development agents will get more autonomous, functional agents more capable, and the orchestrators between them will mature. There are already hundreds of projects heading there and thousands more people kludging together their own solutions with disparate tools.

Over time those workflows will become plugins, then apps, then native functionality within the platforms themselves.

The platforms that win the next wave will no longer be the ones that are just best at building. They'll be the ones solving building and distribution in tandem. You can already see it taking shape. Anthropic is integrating Cowork and Code into a single workflow. OpenAI bought Manus and was quick to acquire OpenClaw. They're not just investing in better coding agents. They're assembling the full stack from development through go-to-market.

I think we're 3-6 months out from when the current manic 60-person startup matures into something that looks more like a modern start-up. It will just take some time to get the specialized functional loops right and refine the management layer.

And then the next challenge opens up: can you scale an agentic team from ~60 to ~600? That will yet again raise the bar, and demand a yet even more complicated management layer.

Originally on Threads ↗