We Need a Bigger IDE — Karpathy's Vision for the Agent Command Center
The age of the IDE isn't over — we need a bigger one. Karpathy nailed the future of dev tools in a single tweet thread.
"tmux grids are awesome. But this is the vi of 2026."
On March 11th, Andrej Karpathy dropped a tweet that cut straight through the developer tools discourse. Someone showed off Claude Code's experimental Agent Teams running inside tmux panes. Karpathy replied: he wanted a proper "Agent Command Center" — an IDE where you could maximize agent teams per monitor, see which ones are idle, toggle related tools like terminals, and track usage stats. Then he quote-tweeted himself with the punchline:
"Expectation: the age of the IDE is over. Reality: we're going to need a bigger IDE."
The basic unit of interest shifts from files to agents, he argued. But it's still programming. 8,500 likes. 3,700 bookmarks. 1.38 million views. Everyone felt the same itch.
The tmux Grid: A Glorious Band-Aid
It started with Numman Ali, a developer who tweeted a demo of CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=true claude running inside tmux. Teammate agents automatically spawned into new panes. Slick demo — 168K views, 1,300 likes.
But Karpathy's response was the real signal. He acknowledged the "awesome" factor and immediately pointed at the ceiling. tmux arranges text streams into rectangles. That works for 3 agents. What about 10? 30? Which agent is stuck? How many tokens is it burning? What files is it touching? tmux answers none of these questions.
This feels eerily similar to the 1990s, when sysadmins managed servers by opening 20 terminal windows running top. What did we build to fix that? Kubernetes dashboards, Grafana, Datadog — mission control for infrastructure. Agents need the same thing now.
What Exists Today: A Fragmented Front
The current landscape of agent management tools is remarkably scattered.
Claude Code Agent Teams is Anthropic's experimental feature. A lead agent delegates to teammates, who share a task list and communicate directly with each other — unlike subagents, which only report back to the parent. It's promising but still behind an experimental flag, with known limitations around session resumption and shutdown. And the UI? It's a terminal.
Cursor built a multi-agent harness for their "self-driving codebases" research, running thousands of agents simultaneously. Their findings are revealing. When agents self-coordinate freely, they fail — holding locks too long, avoiding risky tasks, losing sight of the big picture. The solution? Clear role separation: Planner → Executor → Worker. This is the same lesson software teams learned over decades. Agents need org charts too.
OpenClaw approaches from the infrastructure angle — a self-hosted gateway connecting agents to messaging apps, with cron jobs and session management built in. It tracks when agents ran, what they did, and what errors occurred. But it's closer to "agent infrastructure" than the visual command center Karpathy envisions.
What Karpathy Actually Wants
Distilling his feature requests:
Per-agent monitor maximization — one screen, one agent, full focus. An agent status dashboard — who's active, who's idle, at a glance. Tool toggles — pop open a specific agent's terminal, browser, or file explorer on demand. Live usage stats — tokens, cost, runtime in real time.
Put these together and the picture crystallizes. This isn't an extension of VSCode or Cursor. This is mission control for agent teams — closer to NASA's Houston than to a text editor.
And here's where Karpathy's insight really lands: "The basic unit of interest is not one file but one agent." IDEs have always been designed around files. File explorer, tabs, diff views, search — files are the atomic unit. When agents become the atomic unit, the IDE's atoms change. A tab holds an agent session, not a file. The sidebar shows agent status, not directory trees. The status bar shows token burn rate, not line count.
The Gap: Imagination vs. Reality
The clearest illustration of the gap between Karpathy's vision and reality comes from Cursor's own research process.
They invested in "proper observability" before scaling agents — logging every agent message, system action, and command output with timestamps. Why? Because without visibility into what agents are doing, debugging is impossible. They even fed these logs back into Cursor to find patterns — using agents to observe agents.
Most developers today are nowhere near this level. The reality is three Claude Code instances in tmux, scrolling up and down trying to figure out what each one is doing. No alerts when an agent stalls. No warning when tokens spike. One agent overwrites another's work and you don't find out until it's all done.
This gap is the next unicorn's territory.
What to Watch Tomorrow
Cursor has started opening their multi-agent harness to some users. Anthropic's Agent Teams remains behind an experimental flag. The fact that Karpathy's tweet hit 1.38M views says demand is crushing supply. Whoever builds the "Kubernetes dashboard for agents" first — that's the next dev tools war.
🔗 Sources
| # | Source | Confidence |
|---|---|---|
| 1 | Karpathy - "bigger IDE" tweet (2026-03-11) | 🟢 Observed |
| 2 | Karpathy - "agent command center" reply (2026-03-11) | 🟢 Observed |
| 3 | Numman Ali - Claude Code Agent Teams + tmux demo (2026-03-10) | 🟢 Observed |
| 4 | Anthropic - Agent Teams docs | 🟢 Observed |
| 5 | Cursor - "Towards self-driving codebases" blog (2026-02-05) | 🔵 Supported |
| 6 | OpenClaw official docs | 🟢 Observed |
| 7 | Cursor official page - agent features | 🔵 Supported |
Confidence levels:
- 🟢 Observed: Directly verifiable facts (official announcements, tweets, product pages)
- 🔵 Supported: Backed by credible sources (blog analysis, research reports)
- 🟡 Speculative: Inference or prediction (analyst opinions, trend interpretation)
- ⚪ Unknown: Uncertain sourcing
HypeProof Daily Research | 2026-03-12
Share