A Comprehensive Survey on World Models for Embodied AI

This repository accompanies our survey, A Comprehensive Survey on World Models for Embodied AI. World models function as internal simulators of environmental dynamics, enabling forward and counterfactual rollouts that unify perception, prediction, and control across tasks and domains. For a brief overview of survey, please refer to the two slides.: 📑 English (PDF) · 📑 Chinese (PDF)

Papers 📚

Icon legend — 🚗 Autonomous Driving · 🤖 Robotic Manipulation · 🧭 Navigation · 🎬 Video Generation (Indicates the predominant domain. categories are non-exclusive, e.g., robotics and driving may involve generative modeling.)

Decision-Coupled / Sequential / Global Latent Vector

2025

🤖 DisWM: Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning. [ICCV'25] [Paper] [Project Page] [Code] [Dataset]
🤖 FOUNDER: Grounding Foundation Models in World Models for Open-Ended Embodied Decision Making. [ICML'25] [Paper] [Project Page]
🤖 SENSEI: Semantic Exploration Guided by Foundation Models to Learn Versatile World Models. [ICML'25] [Paper] [Project Page] [Code]
🤖 SR-AIF: Solving Sparse-Reward Robotic Tasks From Pixels with Active Inference and World Models. [ICRA'25] [Paper] [Code]
🤖 LUMOS: Language-Conditioned Imitation Learning with World Models. [ICRA'25] [Paper] [Project Page] [Code]
🤖 WMP: World Model-Based Perception for Visual Legged Locomotion. [ICRA'25] [Paper] [Project Page] [Code]
🧭 X-MOBILITY: End-to-end generalizable navigation via world modeling. [ICRA'25] [Paper] [Project Page] [Code]
🚗 AdaWM: Adaptive World Model based Planning for Autonomous Driving. [ICLR'25] [Paper]
🤖 DreamerV3: Mastering diverse control tasks through world models. [Nature'25] [Paper] [Project Page] [Code]
🤖 GLAM: Global-Local Variation Awareness in Mamba-based World Model. [AAAI'25] [Paper] [Code]
🤖 WMR: Learning Humanoid Locomotion with World Model Reconstruction. [arXiv'25] [Paper]
🚗 VL-SAFE: Vision-Language Guided Safety-Aware Reinforcement Learning with World Models for Autonomous Driving. [arXiv'25] [Paper] [Project Page] [Code] [Poster] [Video]
🚗 CALL: Ego-centric Learning of Communicative World Models for Autonomous Driving. [arXiv'25] [Paper]
🤖 Latent Policy Steering with Embodiment-Agnostic Pretrained World Models. [arXiv'25] [Paper]
🤖 ReDRAW: Adapting World Models with Latent-State Dynamics Residuals. [arXiv'25] [Paper] [Project Page]
🤖 OSVI-WM: One-Shot Visual Imitation for Unseen Tasks using World-Model-Guided Trajectory Generation. [arXiv'25] [Paper]
🤖 Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics. [arXiv'25] [Paper]

2024

🤖 PreLAR: World Model Pre-training with Learnable Action Representation. [ECCV'24] [Paper] [Code] [Video]
🤖 DWL: Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning. [RSS'24] [Paper]
🤖 HRSSM: Learning Latent Dynamic Robust Representations for World Models. [ICML'24] [Paper] [Project Page] [Code] [Poster]
🚗 SEM2: Enhance Sample Efficiency and Robustness of End-to-End Urban Autonomous Driving via Semantic Masked World Model. [TITS'24] [Paper]
🚗 Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models. [arXiv'24] [Paper] [Video]

Earlier

🤖 DayDreamer: World Models for Physical Robot Learning. [CoRL'22] [Paper] [Project Page] [Code]
🤖 TransDreamer: Reinforcement Learning with Transformer World Models. [arXiv'22] [Paper] [Code]
🚗 MILE: Model-Based Imitation Learning for Urban Driving. [NeurIPS'22] [Paper] [Code]
🤖 Iso-Dream: Isolating and Leveraging Noncontrollable Visual Dynamics in World Models. [NeurIPS'22] [Paper] [Code]
🤖 DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations. [ICML'22] [Paper] [Project Page] [Code]
🤖 Dreaming: Model-based Reinforcement Learning by Latent Imagination without Reconstruction. [ICRA'21] [Paper]
🤖 DreamerV2: Mastering Atari with Discrete World Models. [ICLR'21] [Paper] [Project Page] [Code] [Blog] [Poster]
🤖 GLAMOR: Planning from Pixels using Inverse Dynamics Models. [ICLR'21] [Paper] [Code]
🤖 Dreamer: Dream to Control: Learning Behaviors by Latent Imagination. [ICLR'20] [Paper] [Project Page] [Code] [Blog] [Poster]
🤖 PlaNet: Learning Latent Dynamics for Planning from Pixels. [ICML'19] [Paper] [Project Page] [Code] [Blog] [Poster]
🤖 Recurrent World Models Facilitate Policy Evolution. [NeurIPS'18] [Paper] [Project Page] [Video]

Decision-Coupled / Sequential / Token Feature Sequence

2025

🤖 EgoAgent: A Joint Predictive Agent Model in Egocentric Worlds. [ICCV'25] [Paper] [Project Page] [Code] [Video]
🧭 NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments. [ICCV'25] [Paper] [Code]
🤖 DyWA: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation. [ICCV'25] [Paper] [Project Page] [Code]
🚗 Epona: Autoregressive Diffusion World Model for Autonomous Driving. [ICCV'25] [Paper] [Project Page] [Code]
🤖 MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control. [IROS'25] [Paper] [Project Page] [Code] [Dataset]
🤖 $\text{D}^2\text{PO}$: World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning. [ACL'25] [Paper] [Code] [Dataset]
🤖 ReOI: Reimagination with Test-time Observation Interventions: Distractor-Robust World Model Predictions for Visual Model Predictive Control. [RSSW'25] [Paper]
🤖 WoMAP: World Models For Embodied Open-Vocabulary Object Localization. [RSSW'25] [Paper]
🤖 TWM: Improving Transformer World Models for Data-Efficient RL. [ICML'25] [Paper]
🤖 TrajWorld: Trajectory World Models for Heterogeneous Environments. [ICML'25] [Paper] [Code] [Dataset]
🚗 SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model. [CVPR'25] [Paper]
🧭 NWM: Navigation World Models. [CVPR'25] [Paper] [Project Page] [Code]
🚗 Learning to Drive from a World Model. [CVPRW'25] [Paper]
🚗 LatentDriver: Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving. [ICRA'25] [Paper] [Project Page] [Code]
🚗 Planning with Adaptive World Models for Autonomous Driving. [ICRA'25] [Paper]
🤖 TWISTER: Learning Transformer-based World Models with Contrastive Predictive Coding. [ICLR'25] [Paper] [Code]
🤖 DCWM: Discrete Codebook World Models for Continuous Control. [ICLR'25] [Paper] [Project Page] [Code] [Video]
🤖 Object-Centric World Model for Language-Guided Manipulation. [ICLRW'25] [Paper]
🧭 NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning. [TPAMI'25] [Paper] [Code]
🤖 Dyn-O: Building Structured World Models with Object-Centric Representations. [arXiv'25] [Paper]
🤖 MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft. [arXiv'25] [Paper] [Code]
🤖 EvoAgent: Self-evolving Agent with Continual World Model for Long-Horizon Tasks. [arXiv'25] [Paper]
🤖 RoboHorizon: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic Manipulation. [arXiv'25] [Paper]
🤖 WorldVLA: Towards Autoregressive Action World Model. [arXiv'25] [Paper] [Code]
🚗 FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving. [arXiv'25] [Paper]
🤖 Dyna-Think: Synergizing Reasoning, Acting, and World Model Simulation in AI Agents. [arXiv'25] [Paper]
🤖 RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy. [arXiv'25] [Paper]
🤖 Language Agents Meet Causality -- Bridging LLMs and Causal World Models. [ICLR'25] [Paper] [Project Page] [Code]

2024

🤖 ECoT: Robotic Control via Embodied Chain-of-Thought Reasoning. [CoRL'24] [Paper] [Project Page] [Code]
🤖 PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation. [NeurIPS'24] [Paper] [Project Page] [Code]
🚗 CarFormer: Self-driving with Learned Object-Centric Representations. [ECCV'24] [Paper] [Project Page] [Code]
🤖 $\Delta$-IRIS: Efficient World Models with Context-Aware Tokenization. [ICML'24] [Paper] [Code]
🤖 Statler: State-Maintaining Language Models for Embodied Reasoning. [ICRA'24] [Paper] [Project Page] [Code]
🚗 DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT. [arXiv'24] [Paper] [Project Page] [Code] [Video]
🚗 Doe-1: Closed-Loop Autonomous Driving with Large World Model. [arXiv'24] [Paper] [Project Page] [Code]
🚗 DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers. [arXiv'24] [Paper] [Project Page] [Code]

Earlier

🤖 TWM: Transformer-based World Models Are Happy With 100k Interactions. [ICLR'23] [Paper] [Code]
🤖 IRIS: Transformers are Sample-Efficient World Models. [ICLR'23] [Paper] [Code]
🤖 Inner Monologue: Embodied Reasoning through Planning with Language Models. [CoRL'22] [Paper] [Project Page] [Video]
🤖 MWM: Masked World Models for Visual Control. [CoRL'22] [Paper] [Project Page] [Code]

Decision-Coupled / Sequential / Spatial Latent Grid

2025

🤖 ParticleFormer: A 3D Point Cloud World Model for Multi-Object, Multi-Material Robotic Manipulation. [CoRL'25] [Paper] [Project Page]
🚗 WoTE: End-to-End Driving with Online Trajectory Evaluation via BEV World Model. [ICCV'25] [Paper] [Code]
🧭 WMNav: Integrating Vision-Language Models into World Models for Object Goal Navigation. [IROS'25] [Paper] [Project Page] [Code] [Video]
🤖 DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning. [ICML'25] [Paper] [Project Page] [Code] [Dataset]
🚗 RenderWorld: World Model with Self-Supervised 3D Label. [ICRA'25] [Paper]
🚗 PreWorld: Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving. [ICLR'25] [Paper] [Code]
🚗 SSR: Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving. [ICLR'25] [Paper] [Code]
🚗 LAW: Enhancing End-to-End Autonomous Driving with Latent World Model. [ICLR'25] [Paper] [Code]
🚗 Drive-OccWorld: Driving in the Occupancy World: Vision-Centric 4D occupancy forecasting and planning via world models for autonomous driving. [AAAI'25] [Paper] [Project Page] [Code]
🚗 Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2). [arXiv'25] [Paper]
🚗 FASTopoWM: Fast-Slow Lane Segment Topology Reasoning with Latent World Models. [arXiv'25] [Paper] [Code]
RoboOccWorld: Occupancy World Model for Robots. [arXiv'25] [Paper]
🤖 EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation. [arXiv'25] [Paper] [Project Page]

2024

🚗 DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving. [ECCV'24] [Paper] [Project Page] [Code]
🚗 GenAD: Generative End-to-End Autonomous Driving. [ECCV'24] [Paper] [Code] [Dataset]
🚗 OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving. [ECCV'24] [Paper] [Code]
🚗 NeMo: Neural Volumetric World Models for Autonomous Driving. [ECCV'24] [Paper]
🚗 DriveWorld: 4D pre-trained scene understanding via world models for autonomous driving. [CVPR'24] [Paper]
🚗 OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving. [arXiv'24] [Paper]

Decision-Coupled / Sequential / Decomposed Rendering Representation

2025

🤖 ManiGaussian++: General Robotic Bimanual Manipulation with Hierarchical Gaussian World Model. [IROS'25] [Paper] [Code]
🤖 PIN-WM: Learning physics-informed world models for non-prehensile manipulation. [RSS'25] [Paper] [Project Page] [Code]
🤖 PWTF: Prompting with the Future: Open-World Model Predictive Control with Interactive Digital Twins. [RSS'25] [Paper] [Project Page] [Code]
🤖 DreMa: Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination. [ICLR'25] [Paper] [Project Page] [Code]
🤖 GAF: Gaussian Action Field as a 4D Representation for Dynamic World Modeling in Robotic Manipulation. [arXiv'25] [Paper] [Project Page]
🚗 DTT: Delta-Triplane Transformers as Occupancy World Models. [arXiv'25] [Paper]

2024

🤖 Physically Embodied Gaussian Splatting: A Realtime Correctable World Model for Robotics. [CoRL'24] [Paper] [Project Page] [Code] [Dataset]
🤖 ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation. [ECCV'24] [Paper] [Project Page] [Code]
🤖 $\text{DexSim2Real}^{2}$: Building Explicit World Model for Precise Articulated Object Dexterous Manipulation. [arXiv'24] [Paper] [Project Page] [Code] [Video]

Decision-Coupled / Global / Token Feature Sequence

2025

🤖 LaDi-WM: A Latent Diffusion-based World Model for Predictive Manipulation. [CoRL'25] [Paper] [Project Page] [Code]
🤖 FLARE: Robot Learning with Implicit World Modeling. [RSSW'25] [Paper] [Project Page] [Code]
🚗 GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control. [arXiv'25] [Paper] [Code]
🤖 villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models. [arXiv'25] [Paper] [Project Page] [Code]

2024

🤖 VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation. [NeurIPS'24] [Paper]
🚗 TOKEN: Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving. [CoRL'24] [Paper] [Project Page]

Decision-Coupled / Global / Spatial Latent Grid

2025

🤖 TesserAct: Learning 4D Embodied World Models. [ICCV'25] [Paper] [Project Page] [Code]
🚗 World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model. [ICCV'25] [Paper] [Code]
🚗 Imagine-2-Drive: Leveraging High-Fidelity World Models via Multi-Modal Diffusion Policies. [IROS'25] [Paper] [Project Page] [Video]
🤖 COMBO: Compositional World Models for Embodied Multi-Agent Cooperation. [ICLR'25] [Paper] [Project Page] [Code]
🤖 EmbodieDreamer: Advancing Real2Sim2Real Transfer for Policy Training via Embodied World Modeling. [arXiv'25] [Paper] [Project Page] [Code]
🤖 ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance. [arXiv'25] [Paper]
🤖 3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model. [arXiv'25] [Paper] [Code]

2024

🤖 RoboDreamer: Learning Compositional World Models for Robot Imagination. [ICML'24] [Paper] [Project Page] [Code]
🚗 Drive-WM: Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving. [CVPR'24] [Paper] [Project Page] [Code]
🚗 DFIT-OccWorld：An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-assisted Training. [arXiv'24] [Paper]

General-Purpose / Sequential / Token Feature Sequence

2025

🚗 Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models. [NeurIPS'25] [Paper] [Project Page] [Code]
🤖 RLVR-World: Training World Models with Reinforcement Learning. [NeurIPS'25] [Paper] [Project Page] [Code] [Dataset]
🚗 DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment. [ACMMM'25] [Paper] [Code]
🤖 Long-Context State-Space Video World Models. [ICCV'25] [Paper] [Project Page]
🚗 World model-based end-to-end scene generation for accident anticipation in autonomous driving. [Nat. Commun. Eng.'25] [Paper] [Code] [Dataset]
🤖 EVA: Empowering World Models with Reflection for Embodied Video Prediction. [ICML'25] [Paper] [Project Page]
🤖 AdaWorld: Learning Adaptable World Models with Latent Actions. [ICML'25] [Paper] [Project Page] [Code]
🎬 DINO-World: Back to the Features: DINO as a Foundation for Video World Models. [arXiv'25] [Paper]
🤖 RoboScape: Physics-informed Embodied World Model. [arXiv'25] [Paper] [Code]
🎬 Yume: An Interactive World Generation Model. [arXiv'25] [Paper] [Project Page] [Code] [Video] [Dataset]
🤖 World4Omni: A Zero-Shot Framework from Image Generation World Model to Robotic Manipulation. [arXiv'25] [Paper] [Project Page]
🤖 Vid2World: Crafting Video Diffusion Models to Interactive World Models. [arXiv'25] [Paper] [Project Page]
🎬 Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling. [arXiv'25] [Paper] [Project Page] [Code]
🎬 DeepVerse: 4D Autoregressive Video Generation as a World Model. [arXiv'25] [Paper] [Project Page] [Code]
🤖 VRAG: Learning World Models for Interactive Video Generation. [arXiv'25] [Paper]
🤖 StateSpaceDiffuser: Bringing Long Context to Diffusion World Models. [arXiv'25] [Paper]
🚗 LongDWM: Cross-Granularity Distillation for Building a Long-Term Driving World Model. [arXiv'25] [Paper] [Project Page] [Code]
🚗 MiLA: Multi-view Intensive-fidelity Long-term Video Generation World Model for Autonomous Driving. [arXiv'25] [Paper] [Code]
🤖 S2-SSM: Learning Local Causal World Models with State Space Models and Attention. [arXiv'25] [Paper]
🤖 WorldGym: World Model as An Environment for Policy Evaluation. [arXiv'25] [Paper] [Project Page] [Code] [Demo]
🤖 WorldEval: World Model as Real-World Robot Policies Evaluator. [arXiv'25] [Paper] [Project Page] [Code]
🤖 World-in-World: World Models in a Closed-Loop World. [arXiv'25] [Paper] [Project Page] [Code] [Dataset]

2024

🤖 iVideoGPT: Interactive VideoGPTs are Scalable World Models. [NeurIPS'24] [Paper] [Project Page] [Code] [Poster]
🎬 Genie: Generative Interactive Environments. [ICML'24] [Paper] [Code]
🚗 GenAD: Generalized Predictive Model for Autonomous Driving. [CVPR'24] [Paper] [Dataset] [Poster] [Video]
🎬 Owl-1: Omni World Model for Consistent Long Video Generation. [arXiv'24] [Paper] [Code]
🎬 Pandora: Towards General World Model with Natural Language Actions and Video States. [arXiv'24] [Paper] [Project Page] [Code] [Video]
🚗 InfinityDrive: Breaking Time Limits in Driving World Models. [arXiv'24] [Paper] [Project Page]

Earlier

🧭 PACT: Perception-Action Causal Transformer for Autoregressive Robotics Pre-Training. [IROS'23] [Paper]

General-Purpose / Sequential / Spatial Latent Grid

2025

🚗 STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation. [IROS'25] [Paper] [Project Page]
🎬 GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control. [CVPR'25] [Paper] [Project Page] [Code]
🚗 LidarDM: Generative LiDAR Simulation in a Generated World. [ICRA'25] [Paper] [Project Page] [Code]
🎬 FOLIAGE: Towards Physical Intelligence World Models Via Unbounded Surface Evolution. [arXiv'25] [Paper]
🧭 MindJourney: Test-Time Scaling with World Models for Spatial Reasoning. [arXiv'25] [Paper] [Project Page] [Code]
🧭 Learning 3D Persistent Embodied World Models. [arXiv'25] [Paper]

2024

🚗 Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability. [NeurIPS'24] [Paper] [Project Page] [Code]
🚗 Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion. [ICLR'24] [Paper]
🚗 ViDAR: Visual Point Cloud Forecasting enables Scalable Autonomous Driving. [CVPR'24] [Paper] [Code]
🚗 DOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model. [arXiv'24] [Paper] [Project Page] [Code]
🚗 Delphi: Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation. [arXiv'24] [Paper] [Project Page] [Code]

Earlier

🎬 PhyDNet: Disentangling Physical Dynamics From Unknown Factors for Unsupervised Video Prediction. [CVPR'20] [Paper] [Code]

General-Purpose / Sequential / Decomposed Rendering Representation

2025

🚗 InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models. [ICCV'25] [Paper] [Project Page] [Code]
🚗 GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction. [CVPR'25] [Paper] [Code]
🎬 Video World Models with Long-term Spatial Memory. [arXiv'25] [Paper] [Project Page]

General-Purpose / Global / Token Feature Sequence

2025

🎬 MarsGen: Martian World Models: Controllable Video Synthesis with Physically Accurate 3D Reconstructions. [NeurIPS'25] [Paper] [Project Page] [Code] [Dataset]
🚗 MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction. [CVPR'25] [Paper] [Code] [Video]
🎬 EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance. [CVPR'25] [Paper] [Code]
🎬 V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning. [arXiv'25] [Paper] [Project Page] [Code] [Blog]
🚗 AD-L-JEPA: Self-Supervised Representation Learning with Joint Embedding Predictive Architecture for Automotive LiDAR Object Detection. [arXiv'25] [Paper]
🎬 AirScape: An Aerial Generative World Model with Motion Controllability. [arXiv'25] [Paper] [Project Page]
🤖 ForeDiff: Consistent World Models via Foresight Diffusion. [arXiv'25] [Paper]

2024

🎬 V-JEPA: Revisiting Feature Prediction for Learning Visual Representations from Video. [TMLR'24] [Paper] [Code] [Blog] [Video]
🎬 WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens. [arXiv'24] [Paper] [Project Page] [Code]
🎬 Sora: Video generation models as world simulators. [OpenAI'24] [Project Page]

General-Purpose / Global / Spatial Latent Grid

2025

🚗 HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation. [ICCV'25] [Paper] [Project Page] [Code]
🎬 Aether: Geometric-Aware Unified World Modeling. [ICCV'25] [Paper] [Project Page] [Code]
🚗 PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth. [IROS'25] [Paper]
🚗 DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes. [ICLR'25] [Paper] [Project Page] [Code]
🚗 DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation. [AAAI'25] [Paper] [Project Page] [Code]
🚗 UniFuture: Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception. [arXiv'25] [Paper] [Project Page] [Code]
🚗 Towards foundational LiDAR world models with efficient latent flow matching. [arXiv'25] [Paper] [Project Page]
🚗 COME: Adding Scene-Centric Forecasting Control to Occupancy World Model. [arXiv'25] [Paper] [Code]
🤖 Geometry-aware 4D Video Generation for Robot Manipulation. [arXiv'25] [Paper] [Project Page] [Code] [Dataset]
🚗 EOT-WM: Other Vehicle Trajectories Are Also Needed: A Driving World Model Unifies Ego-Other Vehicle Trajectories in Video Latent Space. [arXiv'25] [Paper]
🤖 ORV: 4D Occupancy-centric Robot Video Generation. [arXiv'25] [Paper] [Project Page] [Code]

2024

🚗 Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications. [CVPR'24] [Paper] [Code]
🚗 BEVWorld: A Multimodal World Simulator for Autonomous Driving via Scene-Level BEV Latents. [arXiv'24] [Paper]
🚗 OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving. [arXiv'24] [Paper] [Code]
🚗 DrivePhysica: Physical Informed Driving World Model. [arXiv'24] [Paper] [Code]

Earlier

🚗 Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting. [CVPR'23] [Paper] [Project Page] [Code] [Video]
🚗 Differentiable Raycasting for Self-Supervised Occupancy Forecasting. [ECCV'22] [Paper] [Project Page] [Code] [Video]
🚗 Self-supervised Point Cloud Prediction Using 3D Spatio-temporal Convolutional Networks. [CoRL'21] [Paper] [Code]

General-Purpose / Global / Decomposed Rendering Representation

2025

🚗 DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation. [CVPR'25] [Paper] [Project Page] [Code]
🚗 ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration. [CVPR'25] [Paper] [Project Page] [Code]

2024

🚗 UnO: Unsupervised Occupancy Fields for Perception and Forecasting. [CVPR'24] [Paper]
🚗 MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes. [arXiv'24] [Paper] [Project Page] [Code]

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
slides		slides
README.md		README.md

Li-Zn-H/AwesomeWorldModels

Folders and files

Latest commit

History

Repository files navigation

A Comprehensive Survey on World Models for Embodied AI

Papers 📚

Decision-Coupled / Sequential / Global Latent Vector

2025

2024

Earlier

Decision-Coupled / Sequential / Token Feature Sequence

2025

2024

Earlier

Decision-Coupled / Sequential / Spatial Latent Grid

2025

2024

Decision-Coupled / Sequential / Decomposed Rendering Representation

2025

2024

Decision-Coupled / Global / Token Feature Sequence

2025

2024

Decision-Coupled / Global / Spatial Latent Grid

2025

2024

General-Purpose / Sequential / Token Feature Sequence

2025

2024

Earlier

General-Purpose / Sequential / Spatial Latent Grid

2025

2024

Earlier

General-Purpose / Sequential / Decomposed Rendering Representation

2025

General-Purpose / Global / Token Feature Sequence

2025

2024

General-Purpose / Global / Spatial Latent Grid

2025

2024

Earlier

General-Purpose / Global / Decomposed Rendering Representation

2025

2024

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages