
The Future of AI?
World Models: The Future of AI?
World models, also known as world simulators, are being touted by some as the next big thing in AI. AI pioneer Fei-Fei Li's World Labs has raised $230 million to build “large world models,” and DeepMind hired one of the creators of OpenAI's video generator, Sora, to work on “world simulators.” But what the heck are these things?
This blog post will take a deep dive into world models, explaining their potential for revolutionizing AI and how they can be applied in various domains. Let's explore this exciting frontier.
Table of Contents
- World Models and AI
- Understanding World Models
- Inspiration from Human Mental Models
- Example of a Baseball Batter
- World Models and Video Generation
- Limitations of Generative Models
- World Models for Improved Video Realism
- Training Data and Internal Representations
- Beyond Video: Potential Applications
- Sophisticated Forecasting and Planning
- Example: Cleaning a Room
- Yann LeCun's Vision for World Models
- Current Progress and Future Prospects
- Elementary Physics Simulators
- Timeline for Advanced World Models
World Models and AI
World models take inspiration from the mental models of the world that humans develop naturally. Our brains take the abstract representations from our senses and form them into more concrete understanding of the world around us, producing what we called “models” long before AI adopted the phrase. The predictions our brains make based on these models influence how we perceive the world.
Understanding World Models
Inspiration from Human Mental Models
World models are essentially AI systems designed to mimic this human ability to understand and predict the world. They are trained on vast amounts of data, including images, videos, text, and sensor readings. This data helps the model build an internal representation of the world, enabling it to reason about how things work and what might happen next.
Example of a Baseball Batter
A paper by AI researchers David Ha and Jurgen Schmidhuber gives the example of a baseball batter. Batters have milliseconds to decide how to swing their bat — shorter than the time it takes for visual signals to reach the brain. The reason they're able to hit a 100-mile-per-hour fastball is because they can instinctively predict where the ball will go, Ha and Schmidhuber say. “For professional players, this all happens subconsciously,” the research duo writes. “Their muscles reflexively swing the bat at the right time and location in line with their internal models' predictions. They can quickly act on their predictions of the future without the need to consciously roll out possible future scenarios to form a plan.”
It's these subconscious reasoning aspects of world models that some believe are prerequisites for human-level intelligence.
World Models and Video Generation
Limitations of Generative Models
Most, if not all, AI-generated videos veer into uncanny valley territory. Watch them long enough and something bizarre will happen, like limbs twisting and merging into each other.
While a generative model trained on years of video might accurately predict that a basketball bounces, it doesn't actually have any idea why — just like language models don't really understand the concepts behind words and phrases. But a world model with even a basic grasp of why the basketball bounces like it does will be better at showing it do that thing.
World Models for Improved Video Realism
World models are promising a solution to this limitation. By understanding the underlying physics and principles governing how objects interact, world models can generate videos that are more realistic and less prone to uncanny glitches. This is because they are not just replicating patterns, but reasoning about the world in a deeper way.
Training Data and Internal Representations
To enable this kind of insight, world models are trained on a range of data, including photos, audio, videos, and text, with the intent of creating internal representations of how the world works, and the ability to reason about the consequences of actions.
Beyond Video: Potential Applications
Sophisticated Forecasting and Planning
But better video generation is only the tip of the iceberg for world models. Researchers including Meta chief AI scientist Yann LeCun say the models could someday be used for sophisticated forecasting and planning in both the digital and physical realm.
Example: Cleaning a Room
In a talk earlier this year, LeCun described how a world model could help achieve a desired goal through reasoning. A model with a base representation of a “world” (e.g. a video of a dirty room), given an objective (a clean room), could come up with a sequence of actions to achieve that objective (deploy vacuums to sweep, clean the dishes, empty the trash) not because that's a pattern it has observed but because it knows at a deeper level how to go from dirty to clean.
Yann LeCun's Vision for World Models
“We need machines that understand the world; [machines] that can remember things, that have intuition, have common sense — things that can reason and plan to the same level as humans,” LeCun said. “Despite what you might have heard from some of the most enthusiastic people, current AI systems are not capable of any of this.”
Current Progress and Future Prospects
Elementary Physics Simulators
While LeCun estimates that we're at least a decade away from the world models he envisions, today's world models are showing promise as elementary physics simulators. They can be used to predict how simple objects will move and interact with each other, offering a glimpse into the potential of these models to learn and reason about the world.
Timeline for Advanced World Models
The development of world models is still in its early stages, but the potential is immense. As researchers continue to push the boundaries of AI, we can expect to see significant advancements in the capabilities of world models in the coming years. These advancements could lead to a new era of AI systems that are capable of understanding, reasoning, and planning in ways that were previously unimaginable.
Summary
- World models are AI systems that aim to replicate human understanding of the world by building internal representations and reasoning about how things work.
- Inspired by human mental models, world models can improve AI's ability to predict and generate realistic videos, going beyond simple pattern recognition.
- Beyond video generation, world models hold the potential for sophisticated forecasting, planning, and problem-solving in various domains.
- While still in their early stages, world models are already demonstrating capabilities as elementary physics simulators, with significant advancements expected in the coming years.