Google DeepMind has unveiled Genie 3, a powerful new foundation world model designed to train general-purpose AI agents, marking what the lab calls a significant step toward achieving artificial general intelligence (AGI).
“Genie 3 is the first real-time, interactive, general-purpose world model,” said Shlomi Fruchter, Research Director at DeepMind, during a press briefing. “Unlike earlier, narrow world models, Genie 3 isn’t tied to a specific environment. It can generate both photorealistic and imaginative 3D worlds, and everything in between.”
Still in research preview and not yet available to the public, Genie 3 builds on the foundations of Genie 2 and DeepMind’s recent video model Veo 3, combining dynamic visual generation with physical reasoning.
With just a simple text prompt, Genie 3 can create interactive 3D environments at 720p, 24fps, lasting several minutes, far exceeding the 10–20 second limit of its predecessor. It can also simulate “promptable world events,” allowing users to alter environments in real-time using natural language.
What sets Genie 3 apart is its temporal consistency. The model remembers previous frames to maintain physical continuity, despite not being explicitly programmed to do so. This ability, DeepMind says, enables more realistic physics and long-term reasoning, vital for teaching AI agents to operate in dynamic, open-ended environments.
“World models are crucial for training embodied agents, especially when simulating the real world is complex and resource-heavy,” added Jack Parker-Holder, a research scientist at DeepMind.
DeepMind showcased Genie 3’s power using its SIMA (Scalable Instructable Multiworld Agent). In a test, SIMA was instructed to complete tasks like “walk to the packed red forklift” or “approach the green trash compactor” in a virtual warehouse. Genie 3 simulated the environments, and the agent successfully completed all tasks, thanks to the model’s internal consistency.
Genie 3 doesn’t rely on hard-coded physics engines. Instead, it learns how objects move and interact by predicting one frame at a time, auto-regressively, looking back to decide what happens next. This allows it to learn physical rules in a way that’s comparable to human reasoning, such as anticipating when a glass might fall or when to dodge a moving object.
While Genie 3 is groundbreaking, it isn’t without limitations. DeepMind notes:
- It can only simulate a few minutes of interaction, hours are still needed for long-form training.
- Complex multi-agent interactions are still hard to model realistically.
- Though it shows an understanding of physics, certain simulations (like skiing) lack environmental realism.
Still, the implications are immense. Genie 3 paves the way for AI agents that learn through exploration, trial-and-error, and long-term planning, hallmarks of human intelligence.
“We haven’t had our Move 37 moment for embodied agents yet,” said Parker-Holder, referencing the legendary Go move by AlphaGo that stunned experts in 2016. “But Genie 3 may be the beginning of a new era, where AI agents start thinking, planning, and learning like humans.”