Skip to content

Li-Zn-H/AwesomeWorldModels

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 

Repository files navigation

A Comprehensive Survey on World Models for Embodied AI

This repository accompanies our survey, A Comprehensive Survey on World Models for Embodied AI. World models function as internal simulators of environmental dynamics, enabling forward and counterfactual rollouts that unify perception, prediction, and control across tasks and domains. For a brief overview of survey, please refer to the two slides.: πŸ“‘ English (PDF) Β· πŸ“‘ Chinese (PDF)

Papers πŸ“š

Icon legend β€” πŸš— Autonomous Driving Β· πŸ€– Robotic Manipulation Β· 🧭 Navigation Β· 🎬 Video Generation (Indicates the predominant domain. categories are non-exclusive, e.g., robotics and driving may involve generative modeling.)

Decision-Coupled / Sequential / Global Latent Vector

2025

  • πŸ€– DisWM: Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning. [ICCV'25] [Paper] [Project Page] [Code] [Dataset]

  • πŸ€– FOUNDER: Grounding Foundation Models in World Models for Open-Ended Embodied Decision Making. [ICML'25] [Paper] [Project Page]

  • πŸ€– SENSEI: Semantic Exploration Guided by Foundation Models to Learn Versatile World Models. [ICML'25] [Paper] [Project Page] [Code]

  • πŸ€– SR-AIF: Solving Sparse-Reward Robotic Tasks From Pixels with Active Inference and World Models. [ICRA'25] [Paper] [Code]

  • πŸ€– LUMOS: Language-Conditioned Imitation Learning with World Models. [ICRA'25] [Paper] [Project Page] [Code]

  • πŸ€– WMP: World Model-Based Perception for Visual Legged Locomotion. [ICRA'25] [Paper] [Project Page] [Code]

  • 🧭 X-MOBILITY: End-to-end generalizable navigation via world modeling. [ICRA'25] [Paper] [Project Page] [Code]

  • πŸš— AdaWM: Adaptive World Model based Planning for Autonomous Driving. [ICLR'25] [Paper]

  • πŸ€– DreamerV3: Mastering diverse control tasks through world models. [Nature'25] [Paper] [Project Page] [Code]

  • πŸ€– GLAM: Global-Local Variation Awareness in Mamba-based World Model. [AAAI'25] [Paper] [Code]

  • πŸ€– WMR: Learning Humanoid Locomotion with World Model Reconstruction. [arXiv'25] [Paper]

  • πŸš— VL-SAFE: Vision-Language Guided Safety-Aware Reinforcement Learning with World Models for Autonomous Driving. [arXiv'25] [Paper] [Project Page] [Code] [Poster] [Video]

  • πŸš— CALL: Ego-centric Learning of Communicative World Models for Autonomous Driving. [arXiv'25] [Paper]

  • πŸ€– Latent Policy Steering with Embodiment-Agnostic Pretrained World Models. [arXiv'25] [Paper]

  • πŸ€– ReDRAW: Adapting World Models with Latent-State Dynamics Residuals. [arXiv'25] [Paper] [Project Page]

  • πŸ€– OSVI-WM: One-Shot Visual Imitation for Unseen Tasks using World-Model-Guided Trajectory Generation. [arXiv'25] [Paper]

  • πŸ€– Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics. [arXiv'25] [Paper]

2024

  • πŸ€– PreLAR: World Model Pre-training with Learnable Action Representation. [ECCV'24] [Paper] [Code] [Video]

  • πŸ€– DWL: Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning. [RSS'24] [Paper]

  • πŸ€– HRSSM: Learning Latent Dynamic Robust Representations for World Models. [ICML'24] [Paper] [Project Page] [Code] [Poster]

  • πŸš— SEM2: Enhance Sample Efficiency and Robustness of End-to-End Urban Autonomous Driving via Semantic Masked World Model. [TITS'24] [Paper]

  • πŸš— Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models. [arXiv'24] [Paper] [Video]

Earlier

  • πŸ€– DayDreamer: World Models for Physical Robot Learning. [CoRL'22] [Paper] [Project Page] [Code]

  • πŸ€– TransDreamer: Reinforcement Learning with Transformer World Models. [arXiv'22] [Paper] [Code]

  • πŸš— MILE: Model-Based Imitation Learning for Urban Driving. [NeurIPS'22] [Paper] [Code]

  • πŸ€– Iso-Dream: Isolating and Leveraging Noncontrollable Visual Dynamics in World Models. [NeurIPS'22] [Paper] [Code]

  • πŸ€– DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations. [ICML'22] [Paper] [Project Page] [Code]

  • πŸ€– Dreaming: Model-based Reinforcement Learning by Latent Imagination without Reconstruction. [ICRA'21] [Paper]

  • πŸ€– DreamerV2: Mastering Atari with Discrete World Models. [ICLR'21] [Paper] [Project Page] [Code] [Blog] [Poster]

  • πŸ€– GLAMOR: Planning from Pixels using Inverse Dynamics Models. [ICLR'21] [Paper] [Code]

  • πŸ€– Dreamer: Dream to Control: Learning Behaviors by Latent Imagination. [ICLR'20] [Paper] [Project Page] [Code] [Blog] [Poster]

  • πŸ€– PlaNet: Learning Latent Dynamics for Planning from Pixels. [ICML'19] [Paper] [Project Page] [Code] [Blog] [Poster]

  • πŸ€– Recurrent World Models Facilitate Policy Evolution. [NeurIPS'18] [Paper] [Project Page] [Video]

Decision-Coupled / Sequential / Token Feature Sequence

2025

  • πŸ€– EgoAgent: A Joint Predictive Agent Model in Egocentric Worlds. [ICCV'25] [Paper] [Project Page] [Code] [Video]

  • 🧭 NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments. [ICCV'25] [Paper] [Code]

  • πŸ€– DyWA: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation. [ICCV'25] [Paper] [Project Page] [Code]

  • πŸš— Epona: Autoregressive Diffusion World Model for Autonomous Driving. [ICCV'25] [Paper] [Project Page] [Code]

  • πŸ€– MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control. [IROS'25] [Paper] [Project Page] [Code] [Dataset]

  • πŸ€– $\text{D}^2\text{PO}$: World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning. [ACL'25] [Paper] [Code] [Dataset]

  • πŸ€– ReOI: Reimagination with Test-time Observation Interventions: Distractor-Robust World Model Predictions for Visual Model Predictive Control. [RSSW'25] [Paper]

  • πŸ€– WoMAP: World Models For Embodied Open-Vocabulary Object Localization. [RSSW'25] [Paper]

  • πŸ€– TWM: Improving Transformer World Models for Data-Efficient RL. [ICML'25] [Paper]

  • πŸ€– TrajWorld: Trajectory World Models for Heterogeneous Environments. [ICML'25] [Paper] [Code] [Dataset]

  • πŸš— SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model. [CVPR'25] [Paper]

  • 🧭 NWM: Navigation World Models. [CVPR'25] [Paper] [Project Page] [Code]

  • πŸš— Learning to Drive from a World Model. [CVPRW'25] [Paper]

  • πŸš— LatentDriver: Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving. [ICRA'25] [Paper] [Project Page] [Code]

  • πŸš— Planning with Adaptive World Models for Autonomous Driving. [ICRA'25] [Paper]

  • πŸ€– TWISTER: Learning Transformer-based World Models with Contrastive Predictive Coding. [ICLR'25] [Paper] [Code]

  • πŸ€– DCWM: Discrete Codebook World Models for Continuous Control. [ICLR'25] [Paper] [Project Page] [Code] [Video]

  • πŸ€– Object-Centric World Model for Language-Guided Manipulation. [ICLRW'25] [Paper]

  • 🧭 NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning. [TPAMI'25] [Paper] [Code]

  • πŸ€– Dyn-O: Building Structured World Models with Object-Centric Representations. [arXiv'25] [Paper]

  • πŸ€– MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft. [arXiv'25] [Paper] [Code]

  • πŸ€– EvoAgent: Self-evolving Agent with Continual World Model for Long-Horizon Tasks. [arXiv'25] [Paper]

  • πŸ€– RoboHorizon: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic Manipulation. [arXiv'25] [Paper]

  • πŸ€– WorldVLA: Towards Autoregressive Action World Model. [arXiv'25] [Paper] [Code]

  • πŸš— FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving. [arXiv'25] [Paper]

  • πŸ€– Dyna-Think: Synergizing Reasoning, Acting, and World Model Simulation in AI Agents. [arXiv'25] [Paper]

  • πŸ€– RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy. [arXiv'25] [Paper]

  • πŸ€– Language Agents Meet Causality -- Bridging LLMs and Causal World Models. [ICLR'25] [Paper] [Project Page] [Code]

2024

  • πŸ€– ECoT: Robotic Control via Embodied Chain-of-Thought Reasoning. [CoRL'24] [Paper] [Project Page] [Code]

  • πŸ€– PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation. [NeurIPS'24] [Paper] [Project Page] [Code]

  • πŸš— CarFormer: Self-driving with Learned Object-Centric Representations. [ECCV'24] [Paper] [Project Page] [Code]

  • πŸ€– $\Delta$-IRIS: Efficient World Models with Context-Aware Tokenization. [ICML'24] [Paper] [Code]

  • πŸ€– Statler: State-Maintaining Language Models for Embodied Reasoning. [ICRA'24] [Paper] [Project Page] [Code]

  • πŸš— DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT. [arXiv'24] [Paper] [Project Page] [Code] [Video]

  • πŸš— Doe-1: Closed-Loop Autonomous Driving with Large World Model. [arXiv'24] [Paper] [Project Page] [Code]

  • πŸš— DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers. [arXiv'24] [Paper] [Project Page] [Code]

Earlier

  • πŸ€– TWM: Transformer-based World Models Are Happy With 100k Interactions. [ICLR'23] [Paper] [Code]

  • πŸ€– IRIS: Transformers are Sample-Efficient World Models. [ICLR'23] [Paper] [Code]

  • πŸ€– Inner Monologue: Embodied Reasoning through Planning with Language Models. [CoRL'22] [Paper] [Project Page] [Video]

  • πŸ€– MWM: Masked World Models for Visual Control. [CoRL'22] [Paper] [Project Page] [Code]

Decision-Coupled / Sequential / Spatial Latent Grid

2025

  • πŸ€– ParticleFormer: A 3D Point Cloud World Model for Multi-Object, Multi-Material Robotic Manipulation. [CoRL'25] [Paper] [Project Page]

  • πŸš— WoTE: End-to-End Driving with Online Trajectory Evaluation via BEV World Model. [ICCV'25] [Paper] [Code]

  • 🧭 WMNav: Integrating Vision-Language Models into World Models for Object Goal Navigation. [IROS'25] [Paper] [Project Page] [Code] [Video]

  • πŸ€– DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning. [ICML'25] [Paper] [Project Page] [Code] [Dataset]

  • πŸš— RenderWorld: World Model with Self-Supervised 3D Label. [ICRA'25] [Paper]

  • πŸš— PreWorld: Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving. [ICLR'25] [Paper] [Code]

  • πŸš— SSR: Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving. [ICLR'25] [Paper] [Code]

  • πŸš— LAW: Enhancing End-to-End Autonomous Driving with Latent World Model. [ICLR'25] [Paper] [Code]

  • πŸš— Drive-OccWorld: Driving in the Occupancy World: Vision-Centric 4D occupancy forecasting and planning via world models for autonomous driving. [AAAI'25] [Paper] [Project Page] [Code]

  • πŸš— Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2). [arXiv'25] [Paper]

  • πŸš— FASTopoWM: Fast-Slow Lane Segment Topology Reasoning with Latent World Models. [arXiv'25] [Paper] [Code]

  • RoboOccWorld: Occupancy World Model for Robots. [arXiv'25] [Paper]

  • πŸ€– EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation. [arXiv'25] [Paper] [Project Page]

2024

  • πŸš— DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving. [ECCV'24] [Paper] [Project Page] [Code]

  • πŸš— GenAD: Generative End-to-End Autonomous Driving. [ECCV'24] [Paper] [Code] [Dataset]

  • πŸš— OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving. [ECCV'24] [Paper] [Code]

  • πŸš— NeMo: Neural Volumetric World Models for Autonomous Driving. [ECCV'24] [Paper]

  • πŸš— DriveWorld: 4D pre-trained scene understanding via world models for autonomous driving. [CVPR'24] [Paper]

  • πŸš— OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving. [arXiv'24] [Paper]

Decision-Coupled / Sequential / Decomposed Rendering Representation

2025

  • πŸ€– ManiGaussian++: General Robotic Bimanual Manipulation with Hierarchical Gaussian World Model. [IROS'25] [Paper] [Code]

  • πŸ€– PIN-WM: Learning physics-informed world models for non-prehensile manipulation. [RSS'25] [Paper] [Project Page] [Code]

  • πŸ€– PWTF: Prompting with the Future: Open-World Model Predictive Control with Interactive Digital Twins. [RSS'25] [Paper] [Project Page] [Code]

  • πŸ€– DreMa: Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination. [ICLR'25] [Paper] [Project Page] [Code]

  • πŸ€– GAF: Gaussian Action Field as a 4D Representation for Dynamic World Modeling in Robotic Manipulation. [arXiv'25] [Paper] [Project Page]

  • πŸš— DTT: Delta-Triplane Transformers as Occupancy World Models. [arXiv'25] [Paper]

2024

  • πŸ€– Physically Embodied Gaussian Splatting: A Realtime Correctable World Model for Robotics. [CoRL'24] [Paper] [Project Page] [Code] [Dataset]

  • πŸ€– ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation. [ECCV'24] [Paper] [Project Page] [Code]

  • πŸ€– $\text{DexSim2Real}^{2}$: Building Explicit World Model for Precise Articulated Object Dexterous Manipulation. [arXiv'24] [Paper] [Project Page] [Code] [Video]

Decision-Coupled / Global / Token Feature Sequence

2025

  • πŸ€– LaDi-WM: A Latent Diffusion-based World Model for Predictive Manipulation. [CoRL'25] [Paper] [Project Page] [Code]

  • πŸ€– FLARE: Robot Learning with Implicit World Modeling. [RSSW'25] [Paper] [Project Page] [Code]

  • πŸš— GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control. [arXiv'25] [Paper] [Code]

  • πŸ€– villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models. [arXiv'25] [Paper] [Project Page] [Code]

2024

  • πŸ€– VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation. [NeurIPS'24] [Paper]

  • πŸš— TOKEN: Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving. [CoRL'24] [Paper] [Project Page]

Decision-Coupled / Global / Spatial Latent Grid

2025

  • πŸ€– TesserAct: Learning 4D Embodied World Models. [ICCV'25] [Paper] [Project Page] [Code]

  • πŸš— World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model. [ICCV'25] [Paper] [Code]

  • πŸš— Imagine-2-Drive: Leveraging High-Fidelity World Models via Multi-Modal Diffusion Policies. [IROS'25] [Paper] [Project Page] [Video]

  • πŸ€– COMBO: Compositional World Models for Embodied Multi-Agent Cooperation. [ICLR'25] [Paper] [Project Page] [Code]

  • πŸ€– EmbodieDreamer: Advancing Real2Sim2Real Transfer for Policy Training via Embodied World Modeling. [arXiv'25] [Paper] [Project Page] [Code]

  • πŸ€– ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance. [arXiv'25] [Paper]

  • πŸ€– 3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model. [arXiv'25] [Paper] [Code]

2024

  • πŸ€– RoboDreamer: Learning Compositional World Models for Robot Imagination. [ICML'24] [Paper] [Project Page] [Code]

  • πŸš— Drive-WM: Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving. [CVPR'24] [Paper] [Project Page] [Code]

  • πŸš— DFIT-OccWorld:An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-assisted Training. [arXiv'24] [Paper]

General-Purpose / Sequential / Token Feature Sequence

2025

  • πŸš— Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models. [NeurIPS'25] [Paper] [Project Page] [Code]

  • πŸ€– RLVR-World: Training World Models with Reinforcement Learning. [NeurIPS'25] [Paper] [Project Page] [Code] [Dataset]

  • πŸš— DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment. [ACMMM'25] [Paper] [Code]

  • πŸ€– Long-Context State-Space Video World Models. [ICCV'25] [Paper] [Project Page]

  • πŸš— World model-based end-to-end scene generation for accident anticipation in autonomous driving. [Nat. Commun. Eng.'25] [Paper] [Code] [Dataset]

  • πŸ€– EVA: Empowering World Models with Reflection for Embodied Video Prediction. [ICML'25] [Paper] [Project Page]

  • πŸ€– AdaWorld: Learning Adaptable World Models with Latent Actions. [ICML'25] [Paper] [Project Page] [Code]

  • 🎬 DINO-World: Back to the Features: DINO as a Foundation for Video World Models. [arXiv'25] [Paper]

  • πŸ€– RoboScape: Physics-informed Embodied World Model. [arXiv'25] [Paper] [Code]

  • 🎬 Yume: An Interactive World Generation Model. [arXiv'25] [Paper] [Project Page] [Code] [Video] [Dataset]

  • πŸ€– World4Omni: A Zero-Shot Framework from Image Generation World Model to Robotic Manipulation. [arXiv'25] [Paper] [Project Page]

  • πŸ€– Vid2World: Crafting Video Diffusion Models to Interactive World Models. [arXiv'25] [Paper] [Project Page]

  • 🎬 Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling. [arXiv'25] [Paper] [Project Page] [Code]

  • 🎬 DeepVerse: 4D Autoregressive Video Generation as a World Model. [arXiv'25] [Paper] [Project Page] [Code]

  • πŸ€– VRAG: Learning World Models for Interactive Video Generation. [arXiv'25] [Paper]

  • πŸ€– StateSpaceDiffuser: Bringing Long Context to Diffusion World Models. [arXiv'25] [Paper]

  • πŸš— LongDWM: Cross-Granularity Distillation for Building a Long-Term Driving World Model. [arXiv'25] [Paper] [Project Page] [Code]

  • πŸš— MiLA: Multi-view Intensive-fidelity Long-term Video Generation World Model for Autonomous Driving. [arXiv'25] [Paper] [Code]

  • πŸ€– S2-SSM: Learning Local Causal World Models with State Space Models and Attention. [arXiv'25] [Paper]

  • πŸ€– WorldGym: World Model as An Environment for Policy Evaluation. [arXiv'25] [Paper] [Project Page] [Code] [Demo]

  • πŸ€– WorldEval: World Model as Real-World Robot Policies Evaluator. [arXiv'25] [Paper] [Project Page] [Code]

  • πŸ€– World-in-World: World Models in a Closed-Loop World. [arXiv'25] [Paper] [Project Page] [Code] [Dataset]

2024

  • πŸ€– iVideoGPT: Interactive VideoGPTs are Scalable World Models. [NeurIPS'24] [Paper] [Project Page] [Code] [Poster]

  • 🎬 Genie: Generative Interactive Environments. [ICML'24] [Paper] [Code]

  • πŸš— GenAD: Generalized Predictive Model for Autonomous Driving. [CVPR'24] [Paper] [Dataset] [Poster] [Video]

  • 🎬 Owl-1: Omni World Model for Consistent Long Video Generation. [arXiv'24] [Paper] [Code]

  • 🎬 Pandora: Towards General World Model with Natural Language Actions and Video States. [arXiv'24] [Paper] [Project Page] [Code] [Video]

  • πŸš— InfinityDrive: Breaking Time Limits in Driving World Models. [arXiv'24] [Paper] [Project Page]

Earlier

  • 🧭 PACT: Perception-Action Causal Transformer for Autoregressive Robotics Pre-Training. [IROS'23] [Paper]

General-Purpose / Sequential / Spatial Latent Grid

2025

  • πŸš— STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation. [IROS'25] [Paper] [Project Page]

  • 🎬 GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control. [CVPR'25] [Paper] [Project Page] [Code]

  • πŸš— LidarDM: Generative LiDAR Simulation in a Generated World. [ICRA'25] [Paper] [Project Page] [Code]

  • 🎬 FOLIAGE: Towards Physical Intelligence World Models Via Unbounded Surface Evolution. [arXiv'25] [Paper]

  • 🧭 MindJourney: Test-Time Scaling with World Models for Spatial Reasoning. [arXiv'25] [Paper] [Project Page] [Code]

  • 🧭 Learning 3D Persistent Embodied World Models. [arXiv'25] [Paper]

2024

  • πŸš— Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability. [NeurIPS'24] [Paper] [Project Page] [Code]

  • πŸš— Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion. [ICLR'24] [Paper]

  • πŸš— ViDAR: Visual Point Cloud Forecasting enables Scalable Autonomous Driving. [CVPR'24] [Paper] [Code]

  • πŸš— DOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model. [arXiv'24] [Paper] [Project Page] [Code]

  • πŸš— Delphi: Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation. [arXiv'24] [Paper] [Project Page] [Code]

Earlier

  • 🎬 PhyDNet: Disentangling Physical Dynamics From Unknown Factors for Unsupervised Video Prediction. [CVPR'20] [Paper] [Code]

General-Purpose / Sequential / Decomposed Rendering Representation

2025

  • πŸš— InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models. [ICCV'25] [Paper] [Project Page] [Code]

  • πŸš— GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction. [CVPR'25] [Paper] [Code]

  • 🎬 Video World Models with Long-term Spatial Memory. [arXiv'25] [Paper] [Project Page]

General-Purpose / Global / Token Feature Sequence

2025

  • 🎬 MarsGen: Martian World Models: Controllable Video Synthesis with Physically Accurate 3D Reconstructions. [NeurIPS'25] [Paper] [Project Page] [Code] [Dataset]

  • πŸš— MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction. [CVPR'25] [Paper] [Code] [Video]

  • 🎬 EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance. [CVPR'25] [Paper] [Code]

  • 🎬 V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning. [arXiv'25] [Paper] [Project Page] [Code] [Blog]

  • πŸš— AD-L-JEPA: Self-Supervised Representation Learning with Joint Embedding Predictive Architecture for Automotive LiDAR Object Detection. [arXiv'25] [Paper]

  • 🎬 AirScape: An Aerial Generative World Model with Motion Controllability. [arXiv'25] [Paper] [Project Page]

  • πŸ€– ForeDiff: Consistent World Models via Foresight Diffusion. [arXiv'25] [Paper]

2024

  • 🎬 V-JEPA: Revisiting Feature Prediction for Learning Visual Representations from Video. [TMLR'24] [Paper] [Code] [Blog] [Video]

  • 🎬 WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens. [arXiv'24] [Paper] [Project Page] [Code]

  • 🎬 Sora: Video generation models as world simulators. [OpenAI'24] [Project Page]

General-Purpose / Global / Spatial Latent Grid

2025

  • πŸš— HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation. [ICCV'25] [Paper] [Project Page] [Code]

  • 🎬 Aether: Geometric-Aware Unified World Modeling. [ICCV'25] [Paper] [Project Page] [Code]

  • πŸš— PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth. [IROS'25] [Paper]

  • πŸš— DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes. [ICLR'25] [Paper] [Project Page] [Code]

  • πŸš— DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation. [AAAI'25] [Paper] [Project Page] [Code]

  • πŸš— UniFuture: Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception. [arXiv'25] [Paper] [Project Page] [Code]

  • πŸš— Towards foundational LiDAR world models with efficient latent flow matching. [arXiv'25] [Paper] [Project Page]

  • πŸš— COME: Adding Scene-Centric Forecasting Control to Occupancy World Model. [arXiv'25] [Paper] [Code]

  • πŸ€– Geometry-aware 4D Video Generation for Robot Manipulation. [arXiv'25] [Paper] [Project Page] [Code] [Dataset]

  • πŸš— EOT-WM: Other Vehicle Trajectories Are Also Needed: A Driving World Model Unifies Ego-Other Vehicle Trajectories in Video Latent Space. [arXiv'25] [Paper]

  • πŸ€– ORV: 4D Occupancy-centric Robot Video Generation. [arXiv'25] [Paper] [Project Page] [Code]

2024

  • πŸš— Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications. [CVPR'24] [Paper] [Code]

  • πŸš— BEVWorld: A Multimodal World Simulator for Autonomous Driving via Scene-Level BEV Latents. [arXiv'24] [Paper]

  • πŸš— OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving. [arXiv'24] [Paper] [Code]

  • πŸš— DrivePhysica: Physical Informed Driving World Model. [arXiv'24] [Paper] [Code]

Earlier

  • πŸš— Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting. [CVPR'23] [Paper] [Project Page] [Code] [Video]

  • πŸš— Differentiable Raycasting for Self-Supervised Occupancy Forecasting. [ECCV'22] [Paper] [Project Page] [Code] [Video]

  • πŸš— Self-supervised Point Cloud Prediction Using 3D Spatio-temporal Convolutional Networks. [CoRL'21] [Paper] [Code]

General-Purpose / Global / Decomposed Rendering Representation

2025

  • πŸš— DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation. [CVPR'25] [Paper] [Project Page] [Code]

  • πŸš— ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration. [CVPR'25] [Paper] [Project Page] [Code]

2024

  • πŸš— UnO: Unsupervised Occupancy Fields for Perception and Forecasting. [CVPR'24] [Paper]

  • πŸš— MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes. [arXiv'24] [Paper] [Project Page] [Code]

About

A Comprehensive Survey on World Models for Embodied AI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published