Why Is Nobody Talking About AI World Models?
AI isn’t just generating videos anymore — it’s generating entire worlds. 🌍 In this deep dive, I’ll break down the rise of AI World Models: Google’s Genie 3, NVIDIA’s Cosmos, and OpenAI’s Sora — and why they’re the closest thing yet to a real-life Holodeck. As an Ex-Googler who spent years building real-world models, I’ll show you how these systems work, where they’re headed, and why this race will reshape robotics, virtual reality, and the future of content itself. IMO this tech is bigger than LLMs and a key part of AGI people miss completely. Covered in this video: • Google Genie 3: Interactive video generation and single-image 3D reconstruction. • NVIDIA Cosmos & Omniverse: The hybrid approach to building digital twins for Embodied AI using game engine and 3d simulation technology. • The World Model Wars: How Google, OpenAI (Sora), Runway, and NVIDIA are competing. • Synthetic Data Revolution: A look at startups like Parallel Domain, Bifrost, and Sky Engine AI. • The Rendering Stack of Reality: The grand challenge of creating simulations that scale from a single object to an entire city. Chapters: 00:00 Introduction 00:39 1. Dream Worlds at 24 FPS 01:46 2. Painting the Third Dimension 03:03 3. The World Model Wars 05:27 4. Robot Jungle Gyms 08:23 5. Synthetic Data Revolution 11:06 6. Cities That Think 14:07 7. The Holodeck Approaches 17:49 8. The Rendering Stack of Reality #WorldModels #Genie3 #NVIDIAComos Subscribe for more in-depth AI & creative tech videos! 👉 @bilawalsidhu Join My Newsletter: https://spatialintelligence.ai Connect with me on X/Twitter here: https://x.com/bilawalsidhu Everywhere else here: https://bilawal.ai Business inquiries: team@metaversity.us Bio: Bilawal Sidhu is a creator, engineer, and product builder obsessed with blending reality and imagination using art and science. Bilawal is the technology curator for TED Talks, and a venture scout for Andreessen Horowitz. With more than a decade of experience in the tech industry, he spent six years as a product manager at Google, where he worked on spatial computing and 3D maps. His work has been featured in major publications including Bloomberg, Forbes, BBC, CNBC, and Fortune, among others. Bilawal’s journey into computer graphics began at 11, when he fell in love with seamlessly blending 3D into real life footage. Since then, he's captivated over 1.5M subscribers, garnering more than 500M+ views across his platforms. Driven by a mission to empower the next generation of artists and entrepreneurs, Bilawal openly shares AI-assisted workflows and industry insights on social media. When he’s not working, you can find Bilawal expanding his collection of electric guitars. TED: https://www.ted.com/speakers/bilawal_sidhu #aitools #googlegenie3 #aivideo #googledeepmind
Video Chapters
- 0:00 Step into a world where simulation and reality dance on the edge.
- 1:15 Unleash the magic of Genie 3: dreaming with controlled hallucinations.
- 1:50 Witness ancient art leap into explorable 3D realms.
- 3:06 NVIDIA Cosmos: laying the groundwork for intelligent physical AI.
- 4:05 OpenAI Sora: crafting powerful new world simulators through video.
- 4:30 Is Star Trek's Holodeck becoming our new reality?
- 7:40 "Robot Jungle Gyms": How world models empower advanced skills.
- 13:40 AlphaEarth Foundations: imagining a "ChatGPT" for our entire planet.
- 17:55 The monumental challenge: forging the future of 3D.
- 19:50 The ultimate digital twin: just the beginning of a grand journey.
Original Output
0:00 Step into a world where simulation and reality dance on the edge. 1:15 Unleash the magic of Genie 3: dreaming with controlled hallucinations. 1:50 Witness ancient art leap into explorable 3D realms. 3:06 NVIDIA Cosmos: laying the groundwork for intelligent physical AI. 4:05 OpenAI Sora: crafting powerful new world simulators through video. 4:30 Is Star Trek's Holodeck becoming our new reality? 7:40 "Robot Jungle Gyms": How world models empower advanced skills. 13:40 AlphaEarth Foundations: imagining a "ChatGPT" for our entire planet. 17:55 The monumental challenge: forging the future of 3D. 19:50 The ultimate digital twin: just the beginning of a grand journey. Timestamps by StampBot 🤖
Unprocessed Timestamp Content
0:00 Welcome to new realities, where simulation and reality start to blur 0:42 What is Genie 3 and how does it work its magic 0:50 Taking control of AI-generated worlds like a video game 1:15 Genie 3: Playing a dream, a controlled hallucination of reality 1:35 A simulation within a simulation, the rabbit hole deepens 1:50 Turning ancient 2D paintings into wild, explorable 3D worlds 2:30 Step inside Socrates' painting with surprisingly good 3D depth 2:58 Nighthawks painting reborn as an explorable 3D Gaussian splat 3:06 NVIDIA Cosmos: developing foundational models for advanced physical AI 3:15 Runway and Google's long-term research into general world models 3:55 Elon Musk predicts real-time Grok video generation in two months 4:05 OpenAI's Sora and video generation as powerful world simulators 4:30 Recreating Star Trek's holodeck: blurring reality and simulation 5:30 NVIDIA's hybrid approach: creating simulated digital twins of reality 5:50 Robot simulations: training self-driving cars with world models 7:15 Generative simulation: creating infinite variations for robot training 7:40 World models as "robot jungle gyms" for advanced skill generalization 8:15 Your robot dreams of better cleaning, powered by 3D scans 8:25 Robotics is a data problem; physical world data is surprisingly scarce 9:10 Parallel Domain: high-fidelity simulation API for autonomous systems 9:55 Bifrost: hybrid game engine for precise physical AI training 10:30 Sky Engine AI: synthetic data for in-cabin driver monitoring systems 11:10 Physical AI extends to smart cities and industrial infrastructure 11:30 Data From Sky: an omniscient traffic god for urban management 12:20 Blender's City Generator plugin creates detailed urban simulations 12:40 Meta SceneScript: real-world objects imported into interactive simulations 12:55 Runway Aleph & Luma V2V: transforming real-world into simulated environments 13:20 Google Earth: Literally a "wayback machine" for our planet 13:40 AlphaEarth Foundations: building a ChatGPT for the entire Earth 14:10 Fully interactive worlds in VR: the Holodeck approaches reality 15:30 Jetset: Gaussian splats for real-time previz on your phone 15:55 My vision for merging 3D software and generative AI 16:30 Convai Sim: create AI agents and direct virtual tours in Unreal Engine 17:10 Intangible: natural language description for 3D scene creation 17:55 The future of 3D: a hard, unsolved problem of epic scale 18:05 Simulating a bowl of strawberries to an entire city: complexity explodes 18:40 We need entirely new tools for generating complex 3D content 19:00 Unknown future: will neural or hybrid approaches win the rendering race 19:50 We're just scratching the surface of the ultimate digital twin 20:00 Appreciating whoever wrote the incredible rendering engine for reality Timestamps by StampBot 🤖