Best AI coding Agents with some crazy upsets | GPT 5, Grok Code Fast, Claude, Qwen 3 Coder

There has been so much in the month of August, Grok Code Fast, GPT 5, Kiro, Qoder, Augment CLI and more. I do my best to put them all through the testing gauntlet and share the results here. Some massive surprises, and exciting times ahead. Links: πŸ§‘β€πŸ’»My Recommended AI Engineer course is Scrimba: https://scrimba.com/the-ai-engineer-path-c02v?via=GosuCoder My Links πŸ”— πŸ‘‰πŸ» Subscribe: https://www.youtube.com/@GosuCoder πŸ‘‰πŸ» Twitter/X: https://x.com/GosuCoder πŸ‘‰πŸ» LinkedIn: https://www.linkedin.com/in/adamwilliamlarson/ πŸ‘‰πŸ» Discord: https://discord.gg/YGS4AJ2MxA My computer specs GPU: RTX 5090 (sometimes a AMD 7900xtx) CPU: 7800x3d RAM: DDR5 6000Mhz Media/Sponsorship Inquiries βœ… gosucoderyt@gmail.com

Channel: GosuCoderβ€’Generated by halstonβ€’Duration: 25mβ€’Published Sep 01, 2025
Thumbnail for Best AI coding Agents with some crazy upsets | GPT 5, Grok Code Fast, Claude, Qwen 3 Coder β–Ά Watch on YouTube

Video Chapters

Original Output

0:00 Welcome to the AI Agent Arena: August's Epic Showdown!
1:21 Decoding the Tests: How We Pushed AI Agents to Their Limits
4:24 Meet the Brains: Unpacking the AI Models Driving Our Agents
6:41 Claude 4 Sonnet's Champions: Who Dominated the Arena?
9:08 GPT 5's Elite Performers: The Agents Leading the Pack
15:18 Claude Opus 4.1's Triumvirate: Unexpected Heroes Emerge
17:16 The Rookies' Report Card: How Did the New Agents Stack Up?
18:32 The Grand Overview: Unveiling the Ultimate Agent Rankings
19:32 The Verdict Is In: Major Insights and Surprising Revelations
21:09 Gearing Up for September: What's Next in AI Agent Testing?

Timestamps by StampBot πŸ€–

Unprocessed Timestamp Content

0:00 Diving into August's wild AI agent landscape and extensive testing
0:14 Kiro agent review: reasonable pricing per request, but another VS Code clone
0:29 Qoder agent performance: good job overall, but missing pricing and model details
0:42 Augment CLI agent: initially seemed like Gemini, but it's forging its own path
0:56 Claude Code Router challenges: a tricky setup, logged out for success
1:21 Testing methodology insights: focusing on complex instruction following and unit tests
2:23 Model knowledge is key: some AI agents prioritize knowledge over tool-calling
3:52 Retired test example: building a complex image processing application from scratch
4:24 The Models lineup: Opus is expensive, 2.5 Pro is budget-friendly, Qwen is favorite
6:41 Claude 4 Sonnet Top 3: Augment, Cursor, and a surprisingly high-scoring Warp
8:00 Claude 4 Sonnet rankings: almost all models score well, ClaudeCode lags behind
9:08 GPT 5 Top 3: Cursor, Warp, and the powerful Codex IDE taking the lead
10:19 GPT 5 performance: environmental issues affect some agents, Open-code struggled
12:20 Qwen 3 Coder Top 3: OpenCode with Fireworks, ClaudeCode, and QwenCode's triumph
13:11 Qwen 3 Coder breakdown: Zed and Copilot struggle, while others shine brightly
13:58 Grok Code Fast Top 3: RooCode, Cursor, and an unexpectedly strong GitHub Copilot
14:58 Grok Code Fast scores: Copilot takes top, Zed struggles to even participate
15:18 Claude Opus 4.1 Top 3: Aider, RooCode, and Warp's surprising lead in tests
16:33 Claude Opus 4.1 chart: Claude Code is oddly at the bottom, Warp is soaring high
17:16 Newcomer agent scores: Augment leads, Kiro shows solid performance, Qoder trailing
18:32 Comprehensive test scores: a lot of data, but clear contenders emerge for top spots
19:32 Key takeaways: GrokCode's potential, Warp's unexpected rise, GPT-5 improving quickly
21:09 September's tools: Codex, RooCode, and Claude Code for upcoming projects and tests

Timestamps by StampBot πŸ€–