GitHub Discussions
Join the paper-list discussion
Introduce yourself, ask questions, or suggest updates to the WAM paper list and taxonomy.
Open discussionsDream less Act more
National University of Singapore
A structured survey of predictive-action models that make a forecast of the future available to action. The homepage mirrors the paper’s two central views: a philosophy-level taxonomy and a component-level anatomy.
Community
Use GitHub Discussions to ask questions, suggest missing WAM papers, debate taxonomy placement, and connect with researchers following World Action Models, video world models, VLAs, robot learning, and embodied predictive-action methods.
GitHub Discussions
Introduce yourself, ask questions, or suggest updates to the WAM paper list and taxonomy.
Open discussionsWeChat Group
Scan the QR code in the discussion thread to join the Chinese community chat for WAM survey readers.
Open WeChat threadDefinition
In the survey, a World Action Model begins when a predicted future becomes action-facing. The future may be rendered pixels, latents, features, flow, affordance maps, audio, or tokens. What matters is that the predicted future helps produce, score, or train the action path.
o observation l language or goal a action o' future observation p(x|y) conditional model
acts from the current context — no predicted future
p(a | o, l)
a what-if simulator: action in, future out — emits no action
p(o′ | o, a, l)
the predicted future helps produce, score, or train the action
p(o′, a | o, l)
Three different shapes, three different things: a camera frame = what it observes now (o), a thought bubble = the future it predicts (o′), a joystick = the action it outputs (a). A world model stops at the bubble; a WAM routes that bubble forward, in accent, into the action.
Boundary: the predicted future must help produce, score, or train action.
Not WAM: a future head discarded before action use, or a simulator used only outside the policy path.
VLA: maps observation and instruction directly to action. World model: predicts a future observation or state. Either can be useful without being a WAM.
A WAM keeps the predicted future in the action path. The action may come after prediction, be scored by prediction, or be generated jointly with prediction.
A direct VLA with an auxiliary future loss, a simulator used only for RL training, or a future head discarded before action use does not satisfy the WAM definition.
Chronological map
The timeline groups representative works by design philosophy. Render-and-Decode appears first, followed by Latent-Only shortcuts and Video-Generation-Free methods that carry predictive supervision outside video generation.
Fine-grained paper list
The list is generated from the current Section 4 paper table, then enriched with arXiv links, first-version dates, and weekly updatable citation counts.
Taxonomy × substrate