Agentic Alpha Factory: The Reality of AI Trading Automation
\"Built it in a day\"? Here's the unfiltered truth from someone dogfooding it for 7 months. Three bottlenecks of building an agentic quant harness, and the vision to replicate a Pod Shop with AI.
"Built it in a day"? Here's the unfiltered truth from someone dogfooding it for 7 months.
"I fully automated trading with an AI agent — did it in a day." Posts like these pop up every week. Honestly, I was envious. But as someone who's been building this for 7 months straight, let me be real: that's not how it works.
Today I want to share the unfiltered reality of this grind: the 3 major bottlenecks I've faced, my mindset through the struggles, and my ultimate vision for this Agentic Alpha Factory.
What Is a Pod Shop?
First, some context. The world's top hedge funds — Citadel, Millennium, Point72, Balyasny — operate under a structure called a "Pod Shop." Inside one massive fund, there are dozens to hundreds of independent small teams (Pods), each running its own strategies. A typical Pod consists of about 3 people: a Portfolio Manager (PM), 1–2 quant researchers, and a developer. Headquarters provides risk management, capital allocation, and infrastructure. If a Pod breaches its loss limit (typically -3% to -5%), it gets shut down immediately.
The core of this structure is diversification and competition — independent strategies running in parallel, with centralized risk control. What I'm trying to build is a replication of this Pod Shop structure using AI agents.
The Bottleneck 7 Months Ago: Code Generation
Seven months ago, I unveiled a demo of VibeTrading — an agentic quant harness that took an idea from chat → strategy research → backtest (simulating whether a strategy would have made money using historical data) → self-improvement (AI reviews results and refines the strategy on its own).
At the time, I thought the bottleneck would be converting research results into code. So I created a shorthand language (DSL) that let the AI write simple formulas instead of complex code — something like ts_rank(close, 20) to say "rank by closing price over the last 20 days." I spent a lot of time reducing the AI's burden this way.
I then pitched the project to my company and ended up leading a tiny team of two (including myself). My manager suggested: "AI will keep getting smarter — why not ditch DSL and go with raw code generation?" He was exactly right.
Then Claude Code arrived. Skill, Ralph Loop, harness engineering... We migrated to 100% Claude Code. With the right guidelines, code generation is a breeze. We're in an era where "paper to click" is real.
The Current Bottleneck: Agent Operations
Once code generation was solved, the next bottleneck appeared: "How do I keep the agent from stalling?"
This was before Ralph Loop existed, so I struggled quite a bit and ended up solving it with a similar but more situation-specific approach. Then came multi-agent management and context persistence — areas where Claude Code updates have been steadily improving things.
I focus on understanding the essence of features Claude Code ships — Skill, Task, Teams — and applying them properly. I also spend a lot of time connecting concepts from other fields that share the same essence but look different on the surface. (I even do humanities studies on weekends for this... haha.) And there are well-built community harnesses like OMC(oh-my-claudecode) and OMX(oh-my-codex) that I'm still learning from.
The Inevitable Bottleneck: Domain
The latest bottleneck, after going full circle, is squarely in the domain.
Migrating to a new data vendor and things break everywhere. Quality checks at the model level. How do you handle suspended or delisted stocks? Will look-ahead bias creep in? Data gaps have multiple causes — how do you communicate with infrastructure? Checkpoints when connecting agents to event-driven backtesters. Will you get orthogonal alpha when synthesizing long-only strategies? How do you manage and synthesize generated alphas? With limited time, what do you delegate to agents and what do you handle yourself? Ideas are running dry — where do you find and augment new ones? Which evaluation metrics should you use?
The questions never end.
No matter how well AI writes code, deciding "whether missing price data for a suspended stock is genuinely missing or a data vendor error" requires understanding how the historical simulation infrastructure works — and worrying about whether you're accidentally using future information that wouldn't have been available at the time (look-ahead bias). That's domain knowledge. Agents can't substitute for that.
Mindset: Fighting FOMO
Keeping up with new features is already overwhelming, and then there's the domain... Harness development is more frustrating than expected. Every weekend, someone posts about what they built, how everything works perfectly — and I keep thinking, "Am I the only one struggling?"
FOMO hits, but I remind myself that everyone who built something probably went through similar struggles. I decided to focus on my own work with the mindset: to build a machine that carves wood, you first need to carve it by hand.
The Vision: An Asymmetric Structure
What I want is for all these problems to be solved and to rapidly build a Pod Shop — or Alpha Factory — with AI agents.
The numbers make the structure clear.
Running a single quant Pod at a global hedge fund requires a quant researcher + developer + PM — about 3 people. Their average annual compensation is roughly $150K each. One Pod's labor cost: ~$450K/year.
Meanwhile, the structure I'm currently testing has 4 Claude Max 20x accounts each running 2 agentic loops, for a total of 8 AI Pods operating independently as teams. Claude Max 20x costs $200/month per account. Annual cost for 8 AI Pods: ~$9,600.
Running the same 8 Pods with humans would cost ~$3.6M/year. With AI, it's $9,600. Cost ratio: roughly 370:1. Of course, there's no guarantee that AI Pod performance matches human Pods. There's still a long way to go. But I believe this asymmetric structure itself is worth betting on.
And even when I rest, the brilliant people in Silicon Valley building Foundation Models don't. As model performance improves, harness performance improves with it. If I build the harness well, time becomes my ally.
References
| # | Source | URL |
|---|---|---|
| 1 | Original Post — LinkedIn | https://www.linkedin.com/feed/update/urn:li:activity:7438980199616184321/ |
| 2 | VibeTrading Demo — Agentic Quant Research Tool | https://www.linkedin.com/posts/jiwoong-kim-b9934417a_introducing-the-prototype-of-my-new-personal-activity-7361167113148878848-quDk/ |
| 3 | Claude Code — Anthropic Official Docs | https://docs.anthropic.com/en/docs/claude-code |
| 4 | Building Effective Agents (Ralph Loop) — Anthropic Research | https://www.anthropic.com/research/building-effective-agents |
| 5 | OMC / OMX Community | https://discord.com/invite/JkT8CT5gq |
📚 You might also enjoy
Share
