Can a Local LLM REALLY be your daily coder? Framework Desktop with GLM 4.5 Air and Qwen 3 Coder

With the arrival of my new Framework Desktop I decided to move to coding just with Local LLM's without touching any Claude, GPT5, etc models. I learned a lot while running GLM 4.5 Air, Qwen 3 Coder, GPT OSS 120b, and ultimately I think I landed in a good spot. Links: 🧑‍💻My Recommended AI Engineer course is Scrimba: https://scrimba.com/the-ai-engineer-path-c02v?via=GosuCoder My Links 🔗 👉🏻 Subscribe: https://www.youtube.com/@GosuCoder 👉🏻 Twitter/X: https://x.com/GosuCoder 👉🏻 LinkedIn: https://www.linkedin.com/in/adamwilliamlarson/ 👉🏻 Discord: https://discord.gg/YGS4AJ2MxA My computer specs GPU: RTX 5090 (sometimes a AMD 7900xtx) CPU: 7800x3d RAM: DDR5 6000Mhz Media/Sponsorship Inquiries ✅ gosucoderyt@gmail.com

Channel: GosuCoder•Generated by anonymous•Duration: 17m•Published Aug 27, 2025

Thumbnail for Can a Local LLM REALLY be your daily coder? Framework Desktop with GLM 4.5 Air and Qwen 3 Coder

▶ Watch on YouTube

Video Chapters

Original Output

0:00 Kickstarting the Local LLM Revolution: Say Goodbye to Cloud Costs!
0:30 Unveiling the Powerhouse: The Framework Desktop's AI Muscle
1:45 Putting Models to the Test: Real-World Benchmarks Revealed
3:20 Optimizing for Peak Performance: Essential Settings for Speed & Memory
6:30 The Early Struggles: Battling Repetitive Code Generation
7:55 Facing the Frustration: Long Prompts & Persistent Timeouts
9:45 A Game-Changer Arrives: Solving Timeouts with Crush
11:25 Exploring New Horizons: Jan AI's Creative UI Insights
15:15 The Ultimate Strategy: Smart Models & Swift Agents Working Together
16:00 Finding the Perfect Balance: Matching Model Speed to Task Size

Timestamps by StampBot 🤖

Unprocessed Timestamp Content

0:00 Starting the local LLM coding journey; no more cloud dollars.
0:30 Meet the Framework Desktop: 96GB VRAM for serious AI tasks.
1:45 Benchmarking local models: Qwen 3 Coder and others on display.
3:20 Crucial settings for speed and memory: batch size and quantization types.
5:30 AMD runtime challenges: ROCm versus Vulkan, a memory adventure.
6:30 Initial coding pain: GPT OSS 120B's repetitive code generation blues.
7:55 Prompt processing struggles: long waits and frustrating timeouts.
9:45 Discovering Crush: the timeout-free oasis for persistent coding tasks.
11:25 Back to basics with Jan AI: getting smarter UI design ideas.
13:59 Stress-testing LLM models: a Python script reveals performance insights.
14:15 Framework Desktop in action: GPU working hard, yet surprisingly quiet.
15:15 The grand theory: combining slow, smart models with faster worker agents.
16:00 The sweet spot: smart, slower models for big tasks, fast for small.

Timestamps by StampBot 🤖