Best AI coding Agents with some crazy upsets | GPT 5, Grok Code Fast, Claude, Qwen 3 Coder
There has been so much in the month of August, Grok Code Fast, GPT 5, Kiro, Qoder, Augment CLI and more. I do my best to put them all through the testing gauntlet and share the results here. Some massive surprises, and exciting times ahead. Links: π§βπ»My Recommended AI Engineer course is Scrimba: https://scrimba.com/the-ai-engineer-path-c02v?via=GosuCoder My Links π ππ» Subscribe: https://www.youtube.com/@GosuCoder ππ» Twitter/X: https://x.com/GosuCoder ππ» LinkedIn: https://www.linkedin.com/in/adamwilliamlarson/ ππ» Discord: https://discord.gg/YGS4AJ2MxA My computer specs GPU: RTX 5090 (sometimes a AMD 7900xtx) CPU: 7800x3d RAM: DDR5 6000Mhz Media/Sponsorship Inquiries β gosucoderyt@gmail.com
Video Chapters
- 0:00 Welcome to the AI Agent Arena: August's Epic Showdown!
- 1:21 Decoding the Tests: How We Pushed AI Agents to Their Limits
- 4:24 Meet the Brains: Unpacking the AI Models Driving Our Agents
- 6:41 Claude 4 Sonnet's Champions: Who Dominated the Arena?
- 9:08 GPT 5's Elite Performers: The Agents Leading the Pack
- 15:18 Claude Opus 4.1's Triumvirate: Unexpected Heroes Emerge
- 17:16 The Rookies' Report Card: How Did the New Agents Stack Up?
- 18:32 The Grand Overview: Unveiling the Ultimate Agent Rankings
- 19:32 The Verdict Is In: Major Insights and Surprising Revelations
- 21:09 Gearing Up for September: What's Next in AI Agent Testing?
Original Output
0:00 Welcome to the AI Agent Arena: August's Epic Showdown! 1:21 Decoding the Tests: How We Pushed AI Agents to Their Limits 4:24 Meet the Brains: Unpacking the AI Models Driving Our Agents 6:41 Claude 4 Sonnet's Champions: Who Dominated the Arena? 9:08 GPT 5's Elite Performers: The Agents Leading the Pack 15:18 Claude Opus 4.1's Triumvirate: Unexpected Heroes Emerge 17:16 The Rookies' Report Card: How Did the New Agents Stack Up? 18:32 The Grand Overview: Unveiling the Ultimate Agent Rankings 19:32 The Verdict Is In: Major Insights and Surprising Revelations 21:09 Gearing Up for September: What's Next in AI Agent Testing? Timestamps by StampBot π€
Unprocessed Timestamp Content
0:00 Diving into August's wild AI agent landscape and extensive testing 0:14 Kiro agent review: reasonable pricing per request, but another VS Code clone 0:29 Qoder agent performance: good job overall, but missing pricing and model details 0:42 Augment CLI agent: initially seemed like Gemini, but it's forging its own path 0:56 Claude Code Router challenges: a tricky setup, logged out for success 1:21 Testing methodology insights: focusing on complex instruction following and unit tests 2:23 Model knowledge is key: some AI agents prioritize knowledge over tool-calling 3:52 Retired test example: building a complex image processing application from scratch 4:24 The Models lineup: Opus is expensive, 2.5 Pro is budget-friendly, Qwen is favorite 6:41 Claude 4 Sonnet Top 3: Augment, Cursor, and a surprisingly high-scoring Warp 8:00 Claude 4 Sonnet rankings: almost all models score well, ClaudeCode lags behind 9:08 GPT 5 Top 3: Cursor, Warp, and the powerful Codex IDE taking the lead 10:19 GPT 5 performance: environmental issues affect some agents, Open-code struggled 12:20 Qwen 3 Coder Top 3: OpenCode with Fireworks, ClaudeCode, and QwenCode's triumph 13:11 Qwen 3 Coder breakdown: Zed and Copilot struggle, while others shine brightly 13:58 Grok Code Fast Top 3: RooCode, Cursor, and an unexpectedly strong GitHub Copilot 14:58 Grok Code Fast scores: Copilot takes top, Zed struggles to even participate 15:18 Claude Opus 4.1 Top 3: Aider, RooCode, and Warp's surprising lead in tests 16:33 Claude Opus 4.1 chart: Claude Code is oddly at the bottom, Warp is soaring high 17:16 Newcomer agent scores: Augment leads, Kiro shows solid performance, Qoder trailing 18:32 Comprehensive test scores: a lot of data, but clear contenders emerge for top spots 19:32 Key takeaways: GrokCode's potential, Warp's unexpected rise, GPT-5 improving quickly 21:09 September's tools: Codex, RooCode, and Claude Code for upcoming projects and tests Timestamps by StampBot π€