Target-field ambiguity
Softly averaging several source fields can destroy the semantic identity of a capability query. The update may point to no real teacher behavior.
DanceOPD: hard-routed sample-wise field matchingTreat every source capability as a velocity field, then learn where and how to query those fields on the student's own rollout.
Modern image generation demands a single model that unifies diverse capabilities, including text-to-image generation, local editing, and global editing. These abilities are rarely naturally aligned: editing can degrade T2I performance, while global and local editing can interfere with each other.
DanceOPD is an on-policy generative field distillation framework for flow-matching models. Each sample is routed to one frozen capability field, one low-noise student-induced state is queried, and the student is trained with a simple velocity MSE objective. The same formulation also absorbs operator-defined fields such as classifier-free guidance.
Once each frozen source is viewed as a velocity field over the shared flow state space, capability synthesis depends on three choices: which field supervises a sample, where the field is queried, and how many states from a rollout are used.
Softly averaging several source fields can destroy the semantic identity of a capability query. The update may point to no real teacher behavior.
DanceOPD: hard-routed sample-wise field matchingData states or teacher trajectories are off-policy for the student. They miss the states the deployed model actually visits at inference time.
DanceOPD: query on stop-gradient student rollout statesDense states from the same rollout share prompt, noise, dynamics, and history. More states can over-weight one correlated path.
DanceOPD: one low-noise semantic-side queryAdd editing ability while retaining text-to-image prompt following and visual quality.
Fuse preservation-heavy local editing with transformation-heavy global editing.
Move the student toward a quality or style field while keeping base T2I behavior.
Internalize classifier-free guidance as an operator-defined velocity field.
DanceOPD keeps each local target semantically well-defined, queries the target where the current student actually goes, and avoids dense correlated supervision. The full update is a local field-matching step on a stop-gradient rollout state.
Keep one semantic target per sample instead of averaging teachers.
Ask the frozen field at a state from the current student rollout.
The selected field and student velocity meet in one local MSE.
Classifier-free guidance is another velocity field to distill.
Each sample chooses exactly one frozen capability field. Unless stated otherwise, active capability buckets use a uniform route ratio.
The target field is queried at sg(ztθ), exposing the teacher to student-visited states without backpropagating through the solver.
Low-noise states concentrate edit, style, and visual-attribute signals; one query avoids within-rollout correlation.
The desired behavior is not a midpoint between specialists. A single student should strengthen the target capability while preserving the anchor capability under the same deployment model.
+8.1% over the best reproduced OPD baseline and +8.5% over the edit source on GEditBench.
+16.1% over the best competing composition baseline and +7.9% over the local edit source.
| Method | GEditBench Avg ↑ | GenEval Overall ↑ | Takeaway |
|---|---|---|---|
| Joint training | 4.617 | 0.808 | Mixed supervision dilutes edit capability. |
| Weight merge | - | 0.836 | Preserves T2I but collapses editing. |
| Off-policy distill. | 4.528 | 0.818 | Teacher states leave a train–inference mismatch. |
| DiffusionOPD | 4.947 | 0.833 | Improves editing but below DanceOPD. |
| Flow-OPD | 4.854 | 0.814 | OPD baseline still suffers capability interference. |
| DanceOPD | 5.347 | 0.849 | Best edit score and best GenEval in this block. |
| Method | GEditBench Avg ↑ | GenEval Overall ↑ | Takeaway |
|---|---|---|---|
| Joint training | 4.546 | 0.821 | Conflict between preservation and transformation. |
| Weight merge | 4.715 | 0.811 | Static parameter interpolation remains a compromise. |
| Off-policy distill. | 4.736 | 0.798 | Target ability improves less and T2I drops. |
| DiffusionOPD | 4.661 | 0.822 | Below DanceOPD on both metrics. |
| Flow-OPD | 4.679 | 0.827 | Stable but not enough to fuse local/global behaviors. |
| DanceOPD | 5.498 | 0.848 | Best capability synthesis in the harder conflict setting. |
| Source | GEditBench Avg ↑ | GenEval Overall ↑ | Role |
|---|---|---|---|
| T2I | — | 0.832 | Anchor generation field. |
| Edit | 4.930 | 0.711 | General edit source. |
| Local Edit | 5.095 | 0.793 | Preservation-heavy source. |
| Global Edit | 3.750 | 0.808 | Transformation-heavy source. |
The latest ablations show that failures are not simply about loss naming or training length. They trace back to query construction: ambiguous targets, off-policy states, and correlated dense trajectory samples.
5.751 hard-routed MSE vs. 4.994 soft-teacher MSE. Averaging all teachers erases capability identity.
At 2k steps, low-t reaches 5.751, above median-t 4.649 and high-t 4.813.
K=1 reaches 5.751; weighted K=4 drops to 5.330, and weighted K=16 drops to 5.127.
At 2k steps, 8/16/20/28 rollout steps stay in a practical band; 16 steps gives 5.751 / 0.858.
Velocity MSE reaches 5.751, outperforming timestep weighting, KL weighting, DMD-style, SDS-style, and consistency variants.
Training α and inference β multiply approximately. Best measured composition is 5.833; over-guided αβ=49 drops to 4.015.




The gallery follows the manuscript organization: global edits, local/global edits, additional material and style edits, pure T2I preservation, same-object transformations, and local/global training progression.






arXiv:2606.27377 and code are now available.
@article{zhou2026danceopd,
title={DanceOPD: On-Policy Generative Field Distillation},
author={Zhou, Wei and Zhu, Xiongwei and Xu, Zelin and Dong, Bo and Gong, Lixue and Liang, Yongyuan and Chu, Meng and Qu, Leigang and Kong, Lingdong and Liu, Wei and others},
journal={arXiv preprint arXiv:2606.27377},
year={2026}
}