Dual-Policy-Preference-Optimization

The codebase for the preprint `Do Thinking Tokens Help or Trap? Towards More Efficient Large Reasoning Model'

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md

Provide feedback