This repository contains a regularly updated paper list for Efficient Reasoning.
- Content
- Keywords Convention
- Papers
- Survey
- Efficient Training
- Latent Chain-of-Thought
- Long-to-Short Chain-of-Thought
- Adaptive Thinking
- Reasoning Shortcuts
- Reasoning Step Decomposition
- Small Reasoning Models & CoT Distillation
- Small & Large Reasoning Model Collaboration
- Speculative Decoding for CoT Efficiency
- Parallel Thinking
- Sparse Attention & KV Cache
- Optimal Test-Time Scaling
- Efficient Sampling
- Efficient Self-Consistency
- Long-Context Reasoning Efficiency
- Multimodal Reasoning Efficiency
- Other Work
- Benchmarks
- Analysis
- Applications
- Blog & Project
- Talks
- Resources
- Contributors
- Contributing to this paper list
- Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Yang Sui, Yu-Neng Chuang, Guanchu Wang, Jiamu Zhang, Tianyi Zhang, Jiayi Yuan, Hongyi Liu, Andrew Wen, Shaochen (Henry) Zhong, Hanjie Chen, Xia Hu. [pdf], [paper list], 2025.03. - A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
Xiaoye Qu, Yafu Li, Zhaochen Su, Weigao Sun, Jianhao Yan, Dongrui Liu, Ganqu Cui, Daizong Liu, Shuxian Liang, Junxian He, Peng Li, Wei Wei, Jing Shao, Chaochao Lu, Yue Zhang, Xian-Sheng Hua, Bowen Zhou, Yu Cheng. [pdf], [paper list], 2025.03. - Efficient Inference for Large Reasoning Models: A Survey
Yue Liu, Jiaying Wu, Yufei He, Hongcheng Gao, Hongyu Chen, Baolong Bi, Jiaheng Zhang, Zhiqi Huang, Bryan Hooi. [pdf], [paper list], 2025.03. - Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models
Rui Wang, Hongru Wang, Boyang Xue, Jianhui Pang, Shudong Liu, Yi Chen, Jiahao Qiu, Derek Fai Wong, Heng Ji, Kam-Fai Wong. [pdf], [paper list], 2025.03. - Efficient Reasoning Models: A Survey
Sicheng Feng, Gongfan Fang, Xinyin Ma, Xinchao Wang. [pdf], [paper list], 2025.04. - Reasoning Beyond Language: A Comprehensive Survey on Latent Chain-of-Thought Reasoning
Xinghao Chen, Anhao Zhao, Heming Xia, Xuan Lu, Hanlin Wang, Yanjun Chen, Wei Zhang, Jian Wang, Wenjie Li, Xiaoyu Shen. [pdf], [paper list], 2025.05. - Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs
Mohammad Ali Alomrani, Yingxue Zhang, Derek Li, Qianyi Sun, Soumyasundar Pal, Zhanguang Zhang, Yaochen Hu, Rohan Deepak Ajwani, Antonios Valkanas, Raika Karimi, Peng Cheng, Yunzhou Wang, Pengyi Liao, Hanrui Huang, Bin Wang, Jianye Hao, Mark Coates. [pdf], 2025.07. - A Survey on Latent Reasoning
Rui-Jie Zhu, Tianhao Peng, Tianhao Cheng, Xingwei Qu, Jinfa Huang, Dawei Zhu, Hao Wang, Kaiwen Xue, Xuanliang Zhang, Yong Shan, Tianle Cai, Taylor Kergan, Assel Kembay, Andrew Smith, Chenghua Lin, Binh Nguyen, Yuqi Pan, Yuhong Chou, Zefan Cai, Zhenhe Wu, Yongchi Zhao, Tianyu Liu, Jian Yang, Wangchunshu Zhou, Chujie Zheng, Chongxuan Li, Yuyin Zhou, Zhoujun Li, Zhaoxiang Zhang, Jiaheng Liu, Ge Zhang, Wenhao Huang, Jason Eshraghian. [pdf], [paper list], 2025.07. - Towards Concise and Adaptive Thinking in Large Reasoning Models: A Survey
Jason Zhu, Hongyu Li. [pdf], 2025.07. - Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models
Linan Yue, Yichao Du, Yizhi Wang, Weibo Gao, Fangzhou Yao, Li Wang, Ye Liu, Ziyu Xu, Qi Liu, Shimin Di, Min-Ling Zhang. [pdf], [paper list], 2025.07. - Implicit Reasoning in Large Language Models: A Comprehensive Survey
Jindong Li, Yali Fu, Li Fan, Jiahong Liu, Yao Shu, Chengwei Qin, Menglin Yang, Irwin King, Rex Ying. [pdf], [paper list], 2025.09. - A Survey on Parallel Reasoning
Ziqi Wang, Boye Niu, Zipeng Gao, Zhi Zheng, Tong Xu, Linghui Meng, Zhongli Li, Jing Liu, Yilong Chen, Chen Zhu, Hua Wu, Haifeng Wang, Enhong Chen. [pdf], [paper list], 2025.10. - From Efficiency to Adaptivity: A Deeper Look at Adaptive Reasoning in Large Language Models
Chao Wu, Baoheng Li, Mingchen Gao, Zhenyi Wang. [pdf], 2025.11.
- s1: Simple test-time scaling
Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel Candès, Tatsunori Hashimoto. [pdf], [code], 2025.01. - LIMO: Less is More for Reasoning
Yixin Ye, Zhen Huang, Yang Xiao, Ethan Chern, Shijie Xia, Pengfei Liu. [pdf], [code], 2025.02. - TreeRL: LLM Reinforcement Learning with On-Policy Tree Search
Zhenyu Hou, Ziniu Hu, Yujiang Li, Rui Lu, Jie Tang, Yuxiao Dong. [pdf], [code], 2025.02. - Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond
Liang Wen, Yunke Cai, Fenrui Xiao, Xin He, Qi An, Zhenyu Duan, Yimin Du, Junchen Liu, Lifu Tang, Xiaowei Lv, Haosheng Zou, Yongchao Deng, Shousheng Jia, Xiangzheng Zhang. [pdf], [code], 2025.03. - DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Qiying Yu, Zheng Zhang, Ruofei Zhu, Yufeng Yuan, Xiaochen Zuo, Yu Yue, Tiantian Fan, Gaohong Liu, Lingjun Liu, Xin Liu, Haibin Lin, Zhiqi Lin, Bole Ma, Guangming Sheng, Yuxuan Tong, Chi Zhang, Mofan Zhang, Wang Zhang, Hang Zhu, Jinhua Zhu, Jiaze Chen, Jiangjie Chen, Chengyi Wang, Hongli Yu, Weinan Dai, Yuxuan Song, Xiangpeng Wei, Hao Zhou, Jingjing Liu, Wei-Ying Ma, Ya-Qin Zhang, Lin Yan, Mu Qiao, Yonghui Wu, Mingxuan Wang. [pdf], [code], [homepage], 2025.03. - FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models
Mingyang Song, Mao Zheng, Zheng Li, Wenjie Yang, Xuan Luo, Yue Pan, Feng Zhang. [pdf], [code], 2025.03. - Understanding R1-Zero-Like Training: A Critical Perspective
Zichen Liu, Changyu Chen, Wenjun Li, Penghui Qi, Tianyu Pang, Chao Du, Wee Sun Lee, Min Lin. [pdf], [code], 2025.03. - Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training
Brian R. Bartoldson, Siddarth Venkatraman, James Diffenderfer, Moksh Jain, Tal Ben-Nun, Seanie Lee, Minsu Kim, Johan Obando-Ceron, Yoshua Bengio, Bhavya Kailkhura. [pdf], 2025.03. - CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models
Zhihang Lin, Mingbao Lin, Yuan Xie, Rongrong Ji. [pdf], [code], 2025.03. - Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
Taiwei Shi, Yiyang Wu, Linxin Song, Tianyi Zhou, Jieyu Zhao. [pdf], [code], 2025.04. - VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
Yu Yue, Yufeng Yuan, Qiying Yu, Xiaochen Zuo, Ruofei Zhu, Wenyuan Xu, Jiaze Chen, Chengyi Wang, TianTian Fan, Zhengyin Du, Xiangpeng Wei, Xiangyu Yu, Gaohong Liu, Juncai Liu, Lingjun Liu, Haibin Lin, Zhiqi Lin, Bole Ma, Chi Zhang, Mofan Zhang, Wang Zhang, Hang Zhu, Ru Zhang, Xin Liu, Mingxuan Wang, Yonghui Wu, Lin Yan. [pdf], 2025.04. - Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Kianté Brantley, Mingyu Chen, Zhaolin Gao, Jason D. Lee, Wen Sun, Wenhao Zhan, Xuezhou Zhang. [pdf], [code], 2025.05. - AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning
Wei Fu, Jiaxuan Gao, Xujie Shen, Chen Zhu, Zhiyu Mei, Chuyi He, Shusheng Xu, Guo Wei, Jun Mei, Jiashu Wang, Tongkai Yang, Binhang Yuan, Yi Wu. [pdf], [code], [homepage], 2025.05. - Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Shenzhi Wang, Le Yu, Chang Gao, Chujie Zheng, Shixuan Liu, Rui Lu, Kai Dang, Xionghui Chen, Jianxin Yang, Zhenru Zhang, Yuqiong Liu, An Yang, Andrew Zhao, Yang Yue, Shiji Song, Bowen Yu, Gao Huang, Junyang Lin. [pdf], [homepage], 2025.06. - Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts
Haizhong Zheng, Yang Zhou, Brian R. Bartoldson, Bhavya Kailkhura, Fan Lai, Jiawei Zhao, Beidi Chen. [pdf], [homepage], [code], 2025.06. - EPiC: Towards Lossless Speedup for Reasoning Training through Edge-Preserving CoT Condensation
Jinghan Jia, Hadi Reisizadeh, Chongyu Fan, Nathalie Baracaldo, Mingyi Hong, Sijia Liu. [pdf], [code], 2025.06. - SPEED-RL: Faster Training of Reasoning Models via Online Curriculum Learning
Ruiqi Zhang, Daman Arora, Song Mei, Andrea Zanette. [pdf], 2025.05. - Truncated Proximal Policy Optimization
Tiantian Fan, Lingjun Liu, Yu Yue, Jiaze Chen, Chengyi Wang, Qiying Yu, Chi Zhang, Zhiqi Lin, Ruofei Zhu, Yufeng Yuan, Xiaochen Zuo, Bole Ma, Mofan Zhang, Gaohong Liu, Ru Zhang, Haotian Zhou, Cong Xie, Ruidong Zhu, Zhi Zhang, Xin Liu, Mingxuan Wang, Lin Yan, Yonghui Wu. [pdf], 2025.06. - QFFT, Question-Free Fine-Tuning for Adaptive Reasoning
Wanlong Liu, Junxiao Xu, Fei Yu, Yukang Lin, Ke Ji, Wenyu Chen, Yan Xu, Yasheng Wang, Lifeng Shang, Benyou Wang. [pdf], [code], 2025.06. - TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling
Yizhi Li, Qingshui Gu, Zhoufutu Wen, Ziniu Li, Tianshun Xing, Shuyue Guo, Tianyu Zheng, Xin Zhou, Xingwei Qu, Wangchunshu Zhou, Zheng Zhang, Wei Shen, Qian Liu, Chenghua Lin, Jian Yang, Ge Zhang, Wenhao Huang. [pdf], 2025.08. - History Rhymes: Accelerating LLM Reinforcement Learning with RhymeRL
Jingkai He, Tianjian Li, Erhu Feng, Dong Du, Qian Liu, Tao Liu, Yubin Xia, Haibo Chen. [pdf], 2025.08. - FastGRPO: Accelerating Policy Optimization via Concurrency-aware Speculative Decoding and Online Draft Learning
Yizhou Zhang, Ning Lv, Teng Wang, Jisheng Dang. [pdf], [code], 2025.09. - SPEC-RL: Accelerating On-Policy Reinforcement Learning via Speculative Rollouts
Bingshuai Liu, Ante Wang, Zijun Min, Liang Yao, Haibo Zhang, Yang Liu, Anxiang Zeng, Jinsong Su. [pdf], [code], 2025.09. - Self-Aligned Reward: Towards Effective and Efficient Reasoners
Peixuan Han, Adit Krishnan, Gerald Friedland, Jiaxuan You, Chris Kong. [pdf], 2025.09. - CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
Runpeng Dai, Linfeng Song, Haolin Liu, Zhenwen Liang, Dian Yu, Haitao Mi, Zhaopeng Tu, Rui Liu, Tong Zheng, Hongtu Zhu, Dong Yu. [pdf], 2025.09. - On Predictability of Reinforcement Learning Dynamics for Large Language Models
Yuchen Cai, Ding Cao, Xin Xu, Zijun Yao, Yuqing Huang, Zhenyu Tan, Benyi Zhang, Guiquan Liu, Junfeng Fang. [pdf], [code], 2025.10. - ReSpec: Towards Optimizing Speculative Decoding in Reinforcement Learning Systems
Qiaoling Chen, Zijun Liu, Peng Sun, Shenggui Li, Guoteng Wang, Ziming Liu, Yonggang Wen, Siyuan Feng, Tianwei Zhang. [pdf], 2025.10. - SRT: Accelerating Reinforcement Learning via Speculative Rollout with Tree-Structured Cache
Chi-Chih Chang, Siqi Zhu, Zhichen Zeng, Haibin Lin, Xin Liu, Jiaxuan You, Mohamed S. Abdelfattah, Ziheng Jiang, Xuehai Qian. [pdf], 2025.10. - Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning
Ruoyu Qin, Weiran He, Weixiao Huang, Yangkun Zhang, Yikai Zhao, Bo Pang, Xinran Xu, Yingdi Shan, Yongwei Wu, Mingxing Zhang. [pdf], 2025.11. - Beat the long tail: Distribution-Aware Speculative Decoding for RL Training
Zelei Shao, Vikranth Srivatsa, Sanjana Srivastava, Qingyang Wu, Alpay Ariyak, Xiaoxia Wu, Ameen Patel, Jue Wang, Percy Liang, Tri Dao, Ce Zhang, Yiying Zhang, Ben Athiwaratkun, Chenfeng Xu, Junxiong Wang. [pdf], 2025.11. - Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter
Qinghao Hu, Shang Yang, Junxian Guo, Xiaozhe Yao, Yujun Lin, Yuxian Gu, Han Cai, Chuang Gan, Ana Klimovic, Song Han. [pdf], [code], 2025.11. - Fast LLM Post-training via Decoupled and Best-of-N Speculation
Rongxin Cheng, Kai Zhou, Xingda Wei, Siyuan Liu, Mingcong Han, Mingjing Ai, Yeju Zhou, Baoquan Zhong, Wencong Xiao, Rong Chen, Haibo Chen. [pdf], 2025.11. - RLHFSpec: Breaking the Efficiency Bottleneck in RLHF Training via Adaptive Drafting
Siqi Wang, Hailong Yang, Junjie Zhu, Xuezhu Wang, Yufan Xu, Depei Qian. [pdf], 2025.12.
- Reasoning Beyond Language: A Comprehensive Survey on Latent Chain-of-Thought Reasoning
Xinghao Chen, Anhao Zhao, Heming Xia, Xuan Lu, Hanlin Wang, Yanjun Chen, Wei Zhang, Jian Wang, Wenjie Li, Xiaoyu Shen. [pdf], [paper list], 2025.05. - Think before you speak: Training Language Models With Pause Tokens
Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, Vaishnavh Nagarajan. [pdf], 2023.10. - Guiding Language Model Reasoning with Planning Tokens
Xinyi Wang, Lucas Caccia, Oleksiy Ostapenko, Xingdi Yuan, William Yang Wang, Alessandro Sordoni. [pdf], 2023.10. - Implicit Chain of Thought Reasoning via Knowledge Distillation
Yuntian Deng, Kiran Prasad, Roland Fernandez, Paul Smolensky, Vishrav Chaudhary, Stuart Shieber. [pdf], 2023.11. - Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models
Jiacheng Ye, Shansan Gong, Liheng Chen, Lin Zheng, Jiahui Gao, Han Shi, Chuan Wu, Xin Jiang, Zhenguo Li, Wei Bi, Lingpeng Kong. [pdf], 2024.02. - Let's Think Dot by Dot: Hidden Computation in Transformer Language Models
Jacob Pfau, William Merrill, Samuel R. Bowman. [pdf], 2024.04. - From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step
Yuntian Deng, Yejin Choi, Stuart Shieber. [pdf], 2024.05. - Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding
Tianqiao Liu, Zui Chen, Zitao Liu, Mi Tian, Weiqi Luo. [pdf], 2024.09. - Do LLMs Really Think Step-by-step In Implicit Reasoning?
Yijiong Yu. [pdf], 2024.11. - Disentangling Memory and Reasoning Ability in Large Language Models
Mingyu Jin, Weidi Luo, Sitao Cheng, Xinyi Wang, Wenyue Hua, Ruixiang Tang, William Yang Wang, Yongfeng Zhang. [pdf], [code], 2024.11. - Training Large Language Models to Reason in a Continuous Latent Space
Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason Weston, Yuandong Tian. [pdf], [code], 2024.12. - Compressed Chain of Thought: Efficient Reasoning Through Dense Representations
Jeffrey Cheng, Benjamin Van Durme. [pdf], 2024.12. - Efficient Reasoning with Hidden Thinking
Xuan Shen, Yizhou Wang, Xiangxi Shi, Yanzhi Wang, Pu Zhao, Jiuxiang Gu. [pdf], 2025.01. - Inner Thinking Transformer: Leveraging Dynamic Depth Scaling to Foster Adaptive Internal Thinking
Yilong Chen, Junyuan Shang, Zhenyu Zhang, Yanxi Xie, Jiawei Sheng, Tingwen Liu, Shuohuan Wang, Yu Sun, Hua Wu, Haifeng Wang. [pdf], 2025.02. - LightThinker: Thinking Step-by-Step Compression
Jintian Zhang, Yuqi Zhu, Mengshu Sun, Yujie Luo, Shuofei Qiao, Lun Du, Da Zheng, Huajun Chen, Ningyu Zhang. [pdf], [code], 2025.02. - Reasoning with Latent Thoughts: On the Power of Looped Transformers
Nikunj Saunshi, Nishanth Dikkala, Zhiyuan Li, Sanjiv Kumar, Sashank J. Reddi. [pdf], 2025.02. - CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation
Zhenyi Shen, Hanqi Yan, Linhai Zhang, Zhanghao Hu, Yali Du, Yulan He. [pdf], 2025.02. - Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Jonas Geiping, Sean McLeish, Neel Jain, John Kirchenbauer, Siddharth Singh, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Tom Goldstein. [pdf], [code], 2025.02. - LLM Pretraining with Continuous Concepts
Jihoon Tack, Jack Lanchantin, Jane Yu, Andrew Cohen, Ilia Kulikov, Janice Lan, Shibo Hao, Yuandong Tian, Jason Weston, Xian Li. [pdf], [code], 2025.02. - Scalable Language Models with Posterior Inference of Latent Thought Vectors
Deqian Kong, Minglu Zhao, Dehong Xu, Bo Pang, Shu Wang, Edouardo Honig, Zhangzhang Si, Chuan Li, Jianwen Xie, Sirui Xie, Ying Nian Wu. [pdf], 2025.02. - Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning
Qifan Yu, Zhenyu He, Sijie Li, Xun Zhou, Jun Zhang, Jingjing Xu, Di He. [pdf], [code], 2025.02. - Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
DiJia Su, Hanlin Zhu, Yingchen Xu, Jiantao Jiao, Yuandong Tian, Qinqing Zheng. [pdf], 2025.02. - Implicit Reasoning in Transformers is Reasoning through Shortcuts
Tianhe Lin, Jian Xie, Siyu Yuan, Deqing Yang. [pdf], 2025.03. - Reasoning to Learn from Latent Thoughts
Yangjun Ruan, Neil Band, Chris J. Maddison, Tatsunori Hashimoto. [pdf], 2025.03. - Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation
Jiakai Tang, Sunhao Dai, Teng Shi, Jun Xu, Xu Chen, Wen Chen, Wu Jian, Yuning Jiang. [pdf], 2025.03. - Efficient Pretraining Length Scaling
Bohong Wu, Shen Yan, Sijun Zhang, Jianqiao Lu, Yutao Zeng, Ya Wang, Xun Zhou. [pdf], 2025.04. - Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space
Zhen Zhang, Xuehai He, Weixiang Yan, Ao Shen, Chenyang Zhao, Shuohang Wang, Yelong Shen, Xin Eric Wang. [pdf], [code], 2025.05. - Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
Wenhui Tan, Jiaze Li, Jianzhong Ju, Zhenbo Luo, Jian Luan, Ruihua Song. [pdf], [homepage], 2025.05. - Hybrid Latent Reasoning via Reinforcement Learning
Zhenrui Yue, Bowen Jin, Huimin Zeng, Honglei Zhuang, Zhen Qin, Jinsung Yoon, Lanyu Shang, Jiawei Han, Dong Wang. [pdf], [code], 2025.05. - Efficient Post-Training Refinement of Latent Reasoning in Large Language Models
Xinyuan Wang, Dongjie Wang, Wangyang Ying, Haoyue Bai, Nanxu Gong, Sixun Dong, Kunpeng Liu, Yanjie Fu. [pdf], 2025.06. - DART: Distilling Autoregressive Reasoning to Silent Thought
Nan Jiang, Ziming Wu, De-Chuan Zhan, Fuming Lai, Shaobing Lian. [pdf], 2025.06. - Parallel Continuous Chain-of-Thought with Jacobi Iteration
Haoyi Wu, Zhihao Teng, Kewei Tu. [pdf], [code], 2025.06. - Multimodal Chain of Continuous Thought for Latent-Space Reasoning in Vision-Language Models
Tan-Hanh Pham, Chris Ngo. [pdf], 2025.08. - LLMs are Single-threaded Reasoners: Demystifying the Working Mechanism of Soft Thinking
Chünhung Wu, Jinliang Lu, Zixuan Ren, Gangqiang Hu, Zhi Wu, Dai Dai, Hua Wu. [pdf], 2025.08. - Soft Tokens, Hard Truths
Natasha Butt, Ariel Kwiatkowski, Ismail Labiad, Julia Kempe, Yann Ollivier. [pdf], 2025.09. - SIM-CoT: Supervised Implicit Chain-of-Thought
Xilin Wei, Xiaoran Liu, Yuhang Zang, Xiaoyi Dong, Yuhang Cao, Jiaqi Wang, Xipeng Qiu, Dahua Lin. [pdf], [code], 2025.09. - KaVa: Latent Reasoning via Compressed KV-Cache Distillation
Anna Kuzina, Maciej Pioro, Paul N. Whatmough, Babak Ehteshami Bejnordi. [pdf], 2025.10. - SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs
Dachuan Shi, Abedelkadir Asi, Keying Li, Xiangchi Yuan, Leyan Pan, Wenke Lee, Wen Xiao. [pdf], [code], 2025.10. - Towards Inference-time Scaling for Continuous Space Reasoning
Minghan Wang, Thuy-Trang Vu, Ehsan Shareghi, Gholamreza Haffari. [pdf], 2025.10. - Latent Reasoning in LLMs as a Vocabulary-Space Superposition
Jingcheng Deng, Liang Pang, Zihao Wei, Shichen Xu, Zenghao Duan, Kun Xu, Yang Song, Huawei Shen, Xueqi Cheng. [pdf], [code], 2025.10. - Continuous Autoregressive Language Models
Chenze Shao, Darren Li, Fandong Meng, Jie Zhou. [pdf], [code], 2025.10. - Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models
Tianyu Fu, Yichen You, Zekai Chen, Guohao Dai, Huazhong Yang, Yu Wang. [pdf], [code], 2025.11. - Think Consistently, Reason Efficiently: Energy-Based Calibration for Implicit Chain-of-Thought
Zhikang Chen, Sen Cui, Deheng Ye, Yu Zhang, Yatao Bian, Tingting Zhu. [pdf], 2025.11. - Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space
Chengzhi Liu, Yuzhe Yang, Yue Fan, Qingyue Wei, Sheng Liu, Xin Eric Wang. [pdf], [homepage], [code], 2025.12.
- Chain-of-Symbol Prompting Elicits Planning in Large Langauge Models
Hanxu Hu, Hongyuan Lu, Huajian Zhang, Yun-Ze Song, Wai Lam, Yue Zhang. [pdf], [code], 2023.05. - The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models
Matthew Renze, Erhan Guven. [pdf], [code], 2024.01. - Efficiently Serving LLM Reasoning Programs with Certaindex
Yichao Fu, Junda Chen, Siqi Zhu, Zheyu Fu, Zhongdongming Dai, Aurick Qiao, Hao Zhang. [pdf], 2024.12. - C3oT: Generating Shorter Chain-of-Thought without Compromising Effectiveness
Yu Kang, Xianghui Sun, Liangyu Chen, Wei Zou. [pdf], 2024.12. - Token-Budget-Aware LLM Reasoning
Tingxu Han, Zhenting Wang, Chunrong Fang, Shiyu Zhao, Shiqing Ma, Zhenyu Chen. [pdf], [code], 2024.12. - O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
Haotian Luo, Li Shen, Haiying He, Yibo Wang, Shiwei Liu, Wei Li, Naiqiang Tan, Xiaochun Cao, Dacheng Tao. [pdf], [code], 2025.01. - Kimi k1.5: Scaling Reinforcement Learning with LLMs
Kimi Team. [pdf], 2025.01. - Training Language Models to Reason Efficiently
Daman Arora, Andrea Zanette. [pdf], [code], [homepage], 2025.02. - Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models
Yuan Sui, Yufei He, Tri Cao, Simeng Han, Bryan Hooi. [pdf], 2025.02. - CoT-Valve: Length-Compressible Chain-of-Thought Tuning
Xinyin Ma, Guangnian Wan, Runpeng Yu, Gongfan Fang, Xinchao Wang. [pdf], [code], 2025.02. - TokenSkip: Controllable Chain-of-Thought Compression in LLMs
Heming Xia, Yongqi Li, Chak Tou Leong, Wenjie Wang, Wenjie Li. [pdf], [code], 2025.02. - Self-Training Elicits Concise Reasoning in Large Language Models
Tergel Munkhbat, Namgyu Ho, Seo Hyun Kim, Yongjin Yang, Yujin Kim, Se-Young Yun. [pdf], [code], 2025.02. - Chain of Draft: Thinking Faster by Writing Less
Silei Xu, Wenhao Xie, Lingxiao Zhao, Pengcheng He. [pdf], [code], 2025.02. - L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
Pranjal Aggarwal, Sean Welleck. [pdf], [code], [homepage], 2025.03. - DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models
Yi Shen, Jian Zhang, Jieyun Huang, Shuming Shi, Wenjing Zhang, Jiangze Yan, Ning Wang, Kai Wang, Shiguo Lian. [pdf], 2025.03. - How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach
Ayeong Lee, Ethan Che, Tianyi Peng. [pdf], [code], 2025.03. - Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Simon A. Aytes, Jinheon Baek, Sung Ju Hwang. [pdf], [code], 2025.03. - Adaptive Group Policy Optimization: Towards Stable Training and Token-Efficient Reasoning
Chen Li, Nazhou Liu, Kai Yang. [pdf], 2025.03. - Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging
Han Wu, Yuxuan Yao, Shuqi Liu, Zehua Liu, Xiaojin Fu, Xiongwei Han, Xing Li, Hui-Ling Zhen, Tao Zhong, Mingxuan Yuan. [pdf], [code], 2025.03. - Think When You Need: Self-Adaptive Chain-of-Thought Learning
Junjie Yang, Ke Lin, Xing Yu. [pdf], 2025.04. - ThinkPrune: Pruning Long Chain-of-Thought of LLMs via Reinforcement Learning
Bairu Hou, Yang Zhang, Jiabao Ji, Yujian Liu, Kaizhi Qian, Jacob Andreas, Shiyu Chang. [pdf], [code], 2025.04. - Reasoning Models Can Be Effective Without Thinking
Wenjie Ma, Jingxuan He, Charlie Snell, Tyler Griggs, Sewon Min, Matei Zaharia. [pdf], 2025.04. - ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning
Jingyang Yi, Jiazheng Wang. [pdf], 2025.04. - Dynamic Early Exit in Reasoning Models
Chenxu Yang, Qingyi Si, Yongjie Duan, Zheliang Zhu, Chenyu Zhu, Zheng Lin, Li Cao, Weiping Wang. [pdf], 2025.04. - AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization
Haotian Luo, Haiying He, Yibo Wang, Jinluan Yang, Rui Liu, Naiqiang Tan, Xiaochun Cao, Dacheng Tao, Li Shen. [pdf], [code], 2025.04. - Concise Reasoning via Reinforcement Learning
Mehdi Fatemi, Banafsheh Rafiee, Mingjie Tang, Kartik Talamadupula. [pdf], 2025.04. - Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models
Bin Yu, Hang Yuan, Yuliang Wei, Bailing Wang, Weizhen Qi, Kai Chen. [pdf], [code], 2025.05. - ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning
Ziqing Qiao, Yongheng Deng, Jiali Zeng, Dong Wang, Lai Wei, Fandong Meng, Jie Zhou, Ju Ren, Yaoxue Zhang. [pdf], 2025.05. - Think or Not? Exploring Thinking Efficiency in Large Reasoning Models via an Information-Theoretic Lens
Xixian Yong, Xiao Zhou, Yingying Zhang, Jinlin Li, Yefeng Zheng, Xian Wu. [pdf], 2025.05. - Scalable Chain of Thoughts via Elastic Reasoning
Yuhui Xu, Hanze Dong, Lei Wang, Doyen Sahoo, Junnan Li, Caiming Xiong. [pdf], 2025.05. - Let LRMs Break Free from Overthinking via Self-Braking Tuning
Haoran Zhao, Yuchen Yan, Yongliang Shen, Haolei Xu, Wenqi Zhang, Kaitao Song, Jian Shao, Weiming Lu, Jun Xiao, Yueting Zhuang. [pdf], [code], 2025.05. - S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models
Muzhi Dai, Chenxu Yang, Qingyi Si. [pdf], 2025.05. - Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement
Xuechen Zhang, Zijian Huang, Chenchun Ni, Ziyang Xiong, Jiasi Chen, Samet Oymak. [pdf], 2025.05. - Accelerating Chain-of-Thought Reasoning: When Goal-Gradient Importance Meets Dynamic Skipping
Ren Zhuang, Ben Wang, Shuifa Sun. [pdf], 2025.05. - SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning
Zheng Li, Qingxiu Dong, Jingyuan Ma, Di Zhang, Zhifang Sui. [pdf], 2025.05. - Not All Thoughts are Generated Equal: Efficient LLM Reasoning via Multi-Turn Reinforcement Learning
Yansong Ning, Wei Li, Jun Fang, Naiqiang Tan, Hao Liu. [pdf], [code], 2025.05. - Fractured Chain-of-Thought Reasoning
Baohao Liao, Hanze Dong, Yuhui Xu, Doyen Sahoo, Christof Monz, Junnan Li, Caiming Xiong. [pdf], 2025.05. - Efficient RL Training for Reasoning Models via Length-Aware Optimization
Danlong Yuan, Tian Xie, Shaohan Huang, Zhuocheng Gong, Huishuai Zhang, Chong Luo, Furu Wei, Dongyan Zhao. [pdf], 2025.05. - Can Pruning Improve Reasoning? Revisiting Long-CoT Compression with Capability in Mind for Better Reasoning
Shangziqi Zhao, Jiahao Yuan, Guisong Yang, Usman Naseem. [pdf], 2025.05. - DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models
Yuxuan Jiang, Dawei Li, Frank Ferraro. [pdf], 2025.05. - SEAL: Steerable Reasoning Calibration of Large Language Models for Free
Runjin Chen, Zhenyu Zhang, Junyuan Hong, Souvik Kundu, Zhangyang Wang. [pdf], [code], 2025.05. - FlashThink: An Early Exit Method For Efficient Reasoning
Guochao Jiang, Guofeng Quan, Zepeng Ding, Ziqin Luo, Dixuan Wang, Zheng Hu. [pdf], 2025.05. - Optimizing Anytime Reasoning via Budget Relative Policy Optimization
Penghui Qi, Zichen Liu, Tianyu Pang, Chao Du, Wee Sun Lee, Min Lin. [pdf], [code], 2025.05. - VeriThinker: Learning to Verify Makes Reasoning Model Efficient
Zigeng Chen, Xinyin Ma, Gongfan Fang, Ruonan Yu, Xinchao Wang. [pdf], [code], 2025.05. - Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning
Jiwon Song, Dongwon Jo, Yulhwa Kim, Jae-Joon Kim. [pdf], [code], 2025.05. - ThinkLess: A Training-Free Inference-Efficient Method for Reducing Reasoning Redundancy
Gengyang Li, Yifeng Gao, Yuming Li, Yunfang Wu. [pdf], 2025.05. - Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
Wei Liu, Ruochen Zhou, Yiyun Deng, Yuzhen Huang, Junteng Liu, Yuntian Deng, Yizhe Zhang, Junxian He. [pdf], [code], 2025.05. - R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search
Yibo Wang, Li Shen, Huanjin Yao, Tiansheng Huang, Rui Liu, Naiqiang Tan, Jiaxing Huang, Kai Zhang, Dacheng Tao. [pdf], [code], 2025.05. - Incentivizing Dual Process Thinking for Efficient Large Language Model Reasoning
Xiaoxue Cheng, Junyi Li, Zhenduo Zhang, Xinyu Tang, Wayne Xin Zhao, Xinyu Kong, Zhiqiang Zhang. [pdf], 2025.05. - Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning
Junhong Lin, Xinyue Zeng, Jie Zhu, Song Wang, Julian Shun, Jun Wu, Dawei Zhou. [pdf], 2025.05. - ConciseRL: Conciseness-Guided Reinforcement Learning for Efficient Reasoning Models
Razvan-Gabriel Dumitru, Darius Peteleaza, Vikas Yadav, Liangming Pan. [pdf], [code], 2025.05. - TrimR: Verifier-based Training-Free Thinking Compression for Efficient Test-Time Scaling
Weizhe Lin, Xing Li, Zhiyuan Yang, Xiaojin Fu, Hui-Ling Zhen, Yaoyuan Wang, Xianzhi Yu, Wulong Liu, Xiaosong Li, Mingxuan Yuan. [pdf], 2025.05. - Not All Tokens Are What You Need In Thinking
Hang Yuan, Bin Yu, Haotian Li, Shijun Yang, Christina Dan Wang, Zhou Yu, Xueyin Xu, Weizhen Qi, Kai Chen. [pdf], [code], 2025.05. - LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling
Yang Xiao, Jiashuo Wang, Ruifeng Yuan, Chunpu Xu, Kaishuai Xu, Wenjie Li, Pengfei Liu. [pdf], [code], 2025.05. - Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning
Mingyang Song, Mao Zheng. [pdf], [code], 2025.05. - AdaCtrl: Towards Adaptive and Controllable Reasoning via Difficulty-Aware Budgeting
Shijue Huang, Hongru Wang, Wanjun Zhong, Zhaochen Su, Jiazhan Feng, Bowen Cao, Yi R. Fung. [pdf], [code], 2025.05. - CoThink: Token-Efficient Reasoning via Instruct Models Guiding Reasoning Models
Siqi Fan, Peng Han, Shuo Shang, Yequan Wang, Aixin Sun. [pdf], 2025.05. - Stable Reinforcement Learning for Efficient Reasoning
Muzhi Dai, Shixuan Liu, Qingyi Si. [pdf], 2025.05. - Don't Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models
Sohyun An, Ruochen Wang, Tianyi Zhou, Cho-Jui Hsieh. [pdf], 2025.05. - LLMs Can Reason Faster Only If We Let Them
Bilgehan Sel, Lifu Huang, Naren Ramakrishnan, Ruoxi Jia, Ming Jin. [pdf], 2025.05. - DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning
Shih-Yang Liu, Xin Dong, Ximing Lu, Shizhe Diao, Mingjie Liu, Min-Hung Chen, Hongxu Yin, Yu-Chiang Frank Wang, Kwang-Ting Cheng, Yejin Choi, Jan Kautz, Pavlo Molchanov. [pdf], 2025.05. - A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings
Xiaoang Xu, Shuo Wang, Xu Han, Zhenghao Liu, Huijia Wu, Peipei Li, Zhiyuan Liu, Maosong Sun, Zhaofeng He. [pdf], [code], 2025.05. - TL;DR: Too Long, Do Re-weighting for Efficient LLM Reasoning Compression
Zhong-Zhi Li, Xiao Liang, Zihao Tang, Lei Ji, Peijie Wang, Haotian Xu, Xing W, Haizhen Huang, Weiwei Deng, Ying Nian Wu, Yeyun Gong, Zhijiang Guo, Xiao Liu, Fei Yin, Cheng-Lin Liu. [pdf], [code], 2025.06. - Answer Convergence as a Signal for Early Stopping in Reasoning
Xin Liu, Lu Wang. [pdf], 2025.06. - How Far Are We from Optimal Reasoning Efficiency?
Jiaxuan Gao, Shu Yan, Qixin Tan, Lu Yang, Shusheng Xu, Wei Fu, Zhiyu Mei, Kaifeng Lyu, Yi Wu. [pdf], [code], 2025.06. - Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs
Roy Eisenstadt, Itamar Zimerman, Lior Wolf. [pdf], [homepage], [code], 2025.06. - Bingo: Boosting Efficient Reasoning of LLMs via Dynamic and Significance-based Reinforcement Learning
Hanbing Liu, Lang Cao, Yuanyi Ren, Mengyu Zhou, Haoyu Dong, Xiaojun Ma, Shi Han, Dongmei Zhang. [pdf], 2025.06. - Brevity is the soul of sustainability: Characterizing LLM response lengths
Soham Poddar, Paramita Koley, Janardan Misra, Sanjay Podder, Navveen Balani, Niloy Ganguly, Saptarshi Ghosh. [pdf], 2025.06. - Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency
Chenlong Wang, Yuanning Feng, Dongping Chen, Zhaoyang Chu, Ranjay Krishna, Tianyi Zhou. [pdf], 2025.06. - Causal Sufficiency and Necessity Improves Chain-of-Thought Reasoning
Xiangning Yu, Zhuohan Wang, Linyi Yang, Haoxuan Li, Anjie Liu, Xiao Xue, Jun Wang, Mengyue Yang. [pdf], 2025.06. - PREMISE: Scalable and Strategic Prompt Optimization for Efficient Mathematical Reasoning in Large Models
Ye Yu, Yaoning Yu, Haohan Wang. [pdf], 2025.06. - Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning
Violet Xiang, Chase Blagden, Rafael Rafailov, Nathan Lile, Sang Truong, Chelsea Finn, Nick Haber. [pdf], 2025.06. - ReCUT: Balancing Reasoning Length and Accuracy in LLMs via Stepwise Trails and Preference Optimization
Zhensheng Jin, Xinze Li, Yifan Ji, Chunyi Peng, Zhenghao Liu, Qi Shi, Yukun Yan, Shuo Wang, Furong Peng, Ge Yu. [pdf], [code], 2025.06. - Fast on the Easy, Deep on the Hard: Efficient Reasoning via Powered Length Penalty
Zehui Ling, Deshu Chen, Hongwei Zhang, Yifeng Jiao, Xin Guo, Yuan Cheng. [pdf], 2025.06. - Efficient Reasoning Through Suppression of Self-Affirmation Reflections in Large Reasoning Models
Kaiyuan Liu, Chen Shen, Zhanwei Zhang, Junjie Liu, Xiaosong Yuan, Jieping ye. [pdf], 2025.06. - Steering LLM Thinking with Budget Guidance
Junyan Li, Wenshuo Zhao, Yang Zhang, Chuang Gan. [pdf], [code], 2025.06. - Optimizing Length Compression in Large Reasoning Models
Zhengxiang Cheng, Dongping Chen, Mingyang Fu, Tianyi Zhou. [pdf], [code], 2025.06. - Exploring and Exploiting the Inherent Efficiency within Large Reasoning Models for Self-Guided Efficiency Enhancement
Weixiang Zhao, Jiahe Guo, Yang Deng, Xingyu Sui, Yulin Hu, Yanyan Zhao, Wanxiang Che, Bing Qin, Tat-Seng Chua, Ting Liu. [pdf], 2025.06. - ConciseHint: Boosting Efficient Reasoning via Continuous Concise Hints during Generation
Siao Tang, Xinyin Ma, Gongfan Fang, Xinchao Wang. [pdf], 2025.06. - AdapThink: Adaptive Thinking Preferences for Reasoning Language Model
Xu Wan, Wei Wang, Wenyue Xu, Wotao Yin, Jie Song, Mingyang Sun. [pdf], 2025.06. - Less Data Less Tokens: Multilingual Unification Learning for Efficient Test-Time Reasoning in LLMs
Kang Chen, Mengdi Zhang, Yixin Cao. [pdf], 2025.06. - AALC: Large Language Model Efficient Reasoning via Adaptive Accuracy-Length Control
Ruosen Li, Ziming Luo, Quan Zhang, Ruochen Li, Ben Zhou, Ali Payani, Xinya Du. [pdf], [code], 2025.06. - Do Thinking Tokens Help or Trap? Towards More Efficient Large Reasoning Model
Bowen Ding, Yuhan Chen, Futing Wang, Lingfeng Ming, Tao Lin. [pdf], [code], 2025.07. - EfficientXLang: Towards Improving Token Efficiency Through Cross-Lingual Reasoning
Sanchit Ahuja, Praneetha Vaddamanu, Barun Patra. [pdf], [code], 2025.07. - Activation Steering for Chain-of-Thought Compression
Seyedarmin Azizi, Erfan Baghaei Potraghloo, Massoud Pedram. [pdf], [code], 2025.07. - SmartThinker: Learning to Compress and Preserve Reasoning by Step-Level Length Control
Xingyang He, Xiao Ling, Jie Liu. [pdf], 2025.07. - Controlling Thinking Speed in Reasoning Models
Zhengkai Lin, Zhihang Fu, Ze Chen, Chao Chen, Liang Xie, Wenxiao Wang, Deng Cai, Zheng Wang, Jieping Ye. [pdf], 2025.07. - Verbosity-Aware Rationale Reduction: Sentence-Level Rationale Reduction for Efficient and Effective Reasoning
Joonwon Jang, Jaehee Kim, Wonbin Kweon, Seonghyeon Lee, Hwanjo Yu. [pdf], 2025.07. - Test-time Prompt Intervention
Chenxu Yang, Qingyi Si, Muzhi Dai, Dingyu Yao, Mingyu Zheng, Minghui Chen, Zheng Lin, Weiping Wang. [pdf], 2025.08. - Reconsidering Overthinking: Penalizing Internal and External Redundancy in CoT Reasoning
Jialiang Hong, Taihang Zhen, Kai Chen, Jiaheng Liu, Wenpeng Zhu, Jing Huo, Yang Gao, Depeng Wang, Haitao Wan, Xi Yang, Boyan Wang, Fanyu Meng. [pdf], 2025.08. - Compressing Chain-of-Thought in LLMs via Step Entropy
Zeju Li, Jianyuan Zhong, Ziyang Zheng, Xiangyu Wen, Zhijian Xu, Yingying Cheng, Fan Zhang, Qiang Xu. [pdf], [code], 2025.08. - Efficient Reasoning for Large Reasoning Language Models via Certainty-Guided Reflection Suppression
Jiameng Huang, Baijiong Lin, Guhao Feng, Jierun Chen, Di He, Lu Hou. [pdf], 2025.08. - Train Long, Think Short: Curriculum Learning for Efficient Reasoning
Hasan Abed Al Kader Hammoud, Kumail Alhamoud, Abed Hammoud, Elie Bou-Zeid, Marzyeh Ghassemi, Bernard Ghanem. [pdf], [code], 2025.08. - Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning
Vaishnavi Shrivastava, Ahmed Awadallah, Vidhisha Balachandran, Shivam Garg, Harkirat Behl, Dimitris Papailiopoulos. [pdf], 2025.08. - SABER: Switchable and Balanced Training for Efficient LLM Reasoning
Kai Zhao, Yanjun Zhao, Jiaming Song, Shien He, Lusheng Zhang, Qiang Zhang, Tianjiao Li. [pdf], 2025.08. - Promoting Efficient Reasoning with Verifiable Stepwise Reward
Chuhuai Yue, Chengqi Dong, Yinan Gao, Hang He, Jiajun Chai, Guojun Yin, Wei Lin. [pdf], 2025.08. - Aware First, Think Less: Dynamic Boundary Self-Awareness Drives Extreme Reasoning Efficiency in Large Language Models
Qiguang Chen, Dengyun Peng, Jinhao Liu, HuiKang Su, Jiannan Guan, Libo Qin, Wanxiang Che. [pdf], [code], 2025.08. - Stop Spinning Wheels: Mitigating LLM Overthinking via Mining Patterns for Early Reasoning Exit
Zihao Wei, Liang Pang, Jiahao Liu, Jingcheng Deng, Shicheng Xu, Zenghao Duan, Jingang Wang, Fei Sun, Xunliang Cai, Huawei Shen, Xueqi Cheng. [pdf], 2025.08. - BudgetThinker: Empowering Budget-aware LLM Reasoning with Control Tokens
Hao Wen, Xinrui Wu, Yi Sun, Feifei Zhang, Liye Chen, Jie Wang, Yunxin Liu, Ya-Qin Zhang, Yuanchun Li. [pdf], [code], 2025.08. - DRQA: Dynamic Reasoning Quota Allocation for Controlling Overthinking in Reasoning Large Language Models
Kaiwen Yan, Xuanqing Shi, Hongcheng Guo, Wenxuan Wang, Zhuosheng Zhang, Chengwei Qin. [pdf], 2025.08. - CAC-CoT: Connector-Aware Compact Chain-of-Thought for Efficient Reasoning Data Synthesis Across Dual-System Cognitive Tasks
Sunguk Choi, Yonghoon Kwon, Heondeuk Lee. [pdf], 2025.08. - ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models
Qianyu He, Siyu Yuan, Xuefeng Li, Mingxuan Wang, Jiangjie Chen. [pdf], 2025.08. - Less is More Tokens: Efficient Math Reasoning via Difficulty-Aware Chain-of-Thought Distillation
Abdul Waheed, Chancharik Mitra, Laurie Z. Wang, Deva Ramanan, Bhiksha Raj. [pdf], 2025.09. - From Long to Short: LLMs Excel at Trimming Own Reasoning Chains
Wei Han, Geng Zhan, Sicheng Yu, Chenyu Wang, Bryan Hooi. [pdf], 2025.09. - Hierarchical Budget Policy Optimization for Adaptive Reasoning
Shangke Lyu, Linjuan Wu, Yuchen Yan, Xingyu Wu, Hao Li, Yongliang Shen, Peisheng Jiang, Weiming Lu, Jun Xiao, Yueting Zhuang. [pdf], [code], 2025.09. - Early Stopping Chain-of-thoughts in Large Language Models
Minjia Mao, Bowen Yin, Yu Zhu, Xiao Fang. [pdf], 2025.09. - Metacognitive Reuse: Turning Recurring LLM Reasoning Into Concise Behaviors
Aniket Didolkar, Nicolas Ballas, Sanjeev Arora, Anirudh Goyal. [pdf], 2025.09. - Revisiting Model Interpolation for Efficient Reasoning
Taiqiang Wu, Runming Yang, Tao Liu, Jiahao Wang, Ngai Wong. [pdf], [code], 2025.10. - From Long to Lean: Performance-aware and Adaptive Chain-of-Thought Compression via Multi-round Refinement
Jianzhi Yan, Le Liu, Youcheng Pan, Shiwei Chen, Zike Yuan, Yang Xiang, Buzhou Tang. [pdf], [code], 2025.09. - Your Models Have Thought Enough: Training Large Reasoning Models to Stop Overthinking
Jinyi Han, Ying Huang, Ying Liao, Zishang Jiang, Xikun Lu, Haiquan Zhao, Xinyi Wang, Guanghao Zhou, Sihang Jiang, Jiaqing Liang, Weikang Zhou, Zeye Sun, Fei Yu, Yanghua Xiao. [pdf], [code], 2025.09. - Entropy After ⟨/𝚃𝚑𝚒𝚗𝚔⟩ for reasoning model early exiting
Xi Wang, James McInerney, Lequn Wang, Nathan Kallus. [pdf], [code], 2025.10. - SIRI: Scaling Iterative Reinforcement Learning with Interleaved Compression
Haoming Wen, Yushi Bai, Juanzi Li, Jie Tang. [pdf], [huggingface], 2025.10. - Beyond Token Length: Step Pruner for Efficient and Accurate Reasoning in Large Language Models
Canhui Wu, Qiong Cao, Chang Li, Zhenfang Wang, Chao Xue, Yuwei Fan, Wei Xi, Xiaodong He. [pdf], 2025.10. - Explore Briefly, Then Decide: Mitigating LLM Overthinking via Cumulative Entropy Regulation
Tianyi Jiang, Yi Bin, Yujuan Ding, Kainian Zhu, Fei Ma, Jingkuan Song, Heng Tao Shen. [pdf], [code], 2025.10. - Think Right: Learning to Mitigate Under-Over Thinking via Adaptive, Attentive Compression
Joykirat Singh, Justin Chih-Yao Chen, Archiki Prasad, Elias Stengel-Eskin, Akshay Nambi, Mohit Bansal. [pdf], [code], 2025.10. - DRPO: Efficient Reasoning via Decoupled Reward Policy Optimization
Gang Li, Yan Chen, Ming Lin, Tianbao Yang. [pdf], [code], 2025.10. - Upfront Chain-of-Thought: A Cooperative Framework for Chain-of-Thought Compression
Chengzhengxu Li, Xiaoming Liu, Zhaohan Zhang, Shaochu Zhang, Shengchao Liu, Guoxin Ma, Yu Lan, Chao Shen. [pdf], 2025.10. - Mitigating Overthinking through Reasoning Shaping
Feifan Song, Shaohang Wei, Bofei Gao, Yejie Wang, Wen Luo, Wei Li, Linli Yao, Weimin Xiong, Liang Chen, Tianyu Liu, Houfeng Wang. [pdf], 2025.10. - PAC Reasoning: Controlling the Performance Loss for Efficient Reasoning
Hao Zeng, Jianguo Huang, Bingyi Jing, Hongxin Wei, Bo An. [pdf], 2025.10. - Stop When Enough: Adaptive Early-Stopping for Chain-of-Thought Reasoning
Renliang Sun, Wei Cheng, Dawei Li, Haifeng Chen, Wei Wang. [pdf], 2025.10. - Merlin's Whisper: Enabling Efficient Reasoning in LLMs via Black-box Adversarial Prompting
Heming Xia, Cunxiao Du, Rui Li, Chak Tou Leong, Yongqi Li, Wenjie Li. [pdf], [code], 2025.10. - Adaptive Dual Reasoner: Large Reasoning Models Can Think Efficiently by Hybrid Reasoning
Yujian Zhang, Keyu Chen, Zhifeng Shen, Ruizhi Qiao, Xing Sun. [pdf], 2025.10. - Concise Reasoning in the Lens of Lagrangian Optimization
Chengqian Gao, Haonan Li, Taylor W. Killian, Jianshu She, Renxi Wang, Liqun Ma, Zhoujun Cheng, Shibo Hao, Zhiqiang Xu. [pdf], 2025.10. - Towards Flash Thinking via Decoupled Advantage Policy Optimization
Zezhong Tan, Hang Gao, Xinhong Ma, Feng Zhang, Ziqiang Dong. [pdf], 2025.10. - DART: Difficulty-Adaptive Reasoning Truncation for Efficient Large Language Models
Ruofan Zhang, Bin Xia, Zhen Cheng, Cairen Jian, Minglun Yang, Ngai Wong, Yuan Cheng. [pdf], 2025.11. - e1: Learning Adaptive Control of Reasoning Effort
Michael Kleinman, Matthew Trager, Alessandro Achille, Wei Xia, Stefano Soatto. [pdf], 2025.11. - Efficient Reasoning via Reward Model
Yuhao Wang, Xiaopeng Li, Cheng Gong, Ziru Liu, Suiyun Zhang, Rui Liu, Xiangyu Zhao. [pdf], 2025.11. - ORION: Teaching Language Models to Reason Efficiently in the Language of Thought
Kumar Tanmay, Kriti Aggarwal, Paul Pu Liang, Subhabrata Mukherjee. [pdf], 2025.11. - Dual-Density Inference for Efficient Language Model Reasoning
Zhengyi Zhao, Shubo Zhang, Yuxi Zhang, Huimin Wang, Binyang Li, Kam-Fai Wong. [pdf], 2025.12.
- Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL
Songjun Tu, Jiahao Lin, Qichao Zhang, Xiangyu Tian, Linjing Li, Xiangyuan Lan, Dongbin Zhao. [pdf], [code], 2025.05. - AdaptThink: Reasoning Models Can Learn When to Think
Jiajie Zhang, Nianyi Lin, Lei Hou, Ling Feng, Juanzi Li. [pdf], [code], 2025.05. - Thinkless: LLM Learns When to Think
Gongfan Fang, Xinyin Ma, Xinchao Wang. [pdf], [code], 2025.05. - Think Only When You Need with Large Hybrid-Reasoning Models
Lingjie Jiang, Xun Wu, Shaohan Huang, Qingxiu Dong, Zewen Chi, Li Dong, Xingxing Zhang, Tengchao Lv, Lei Cui, Furu Wei. [pdf], 2025.05. - ThinkSwitcher: When to Think Hard, When to Think Fast
Guosheng Liang, Longguang Zhong, Ziyi Yang, Xiaojun Quan. [pdf], 2025.05. - Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning
Jinghui Lu, Haiyang Yu, Siliang Xu, Shiwei Ran, Guozhi Tang, Siqi Wang, Bin Shan, Teng Fu, Hao Feng, Jingqun Tang, Han Wang, Can Huang. [pdf], 2025.05. - ARM: Adaptive Reasoning Model
Siye Wu, Jian Xie, Yikai Zhang, Aili Chen, Kai Zhang, Yu Su, Yanghua Xiao. [pdf], [homepage], [code], 2025.05. - When to Continue Thinking: Adaptive Thinking Mode Switching for Efficient Reasoning
Xiaoyun Zhang, Jingqing Ruan, Xing Ma, Yawen Zhu, Haodong Zhao, Hao Li, Jiansong Chen, Ke Zeng, Xunliang Cai. [pdf], 2025.05. - AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning
Chenwei Lou, Zewei Sun, Xinnian Liang, Meng Qu, Wei Shen, Wenqi Wang, Yuntao Li, Qingping Yang, Shuangzhi Wu. [pdf], 2025.05. - AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models
Feng Luo, Yu-Neng Chuang, Guanchu Wang, Hoang Anh Duy Le, Shaochen Zhong, Hongyi Liu, Jiayi Yuan, Yang Sui, Vladimir Braverman, Vipin Chaudhary, Xia Hu. [pdf], 2025.05. - AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Junyu Zhang, Runpei Dong, Han Wang, Xuying Ning, Haoran Geng, Peihao Li, Xialin He, Yutong Bai, Jitendra Malik, Saurabh Gupta, Huan Zhang. [pdf], [homepage], [code], 2025.05. - OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation
Shengjia Zhang, Junjie Wu, Jiawei Chen, Changwang Zhang, Xingyu Lou, Wangchunshu Zhou, Sheng Zhou, Can Wang, Jun Wang. [pdf], [code], 2025.05. - Long or short CoT? Investigating Instance-level Switch of Large Reasoning Models
Ruiqi Zhang, Changyi Xiao, Yixin Cao. [pdf], 2025.06. - Token Signature: Predicting Chain-of-Thought Gains with Token Decoding Feature in Large Language Models
Peijie Liu, Fengli Xu, Yong Li. [pdf], [code], 2025.06. - Flexible Realignment of Language Models
Wenhong Zhu, Ruobing Xie, Weinan Zhang, Rui Wang. [pdf], [code], 2025.06. - SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model
Wencheng Zhang, Shiqin Qiao, Lingjie Luo, Yinfeng Li, Chuanyang Zheng, Qian Xu, Meng Li, Yong Gui, Yijun He, Jianing Qiu, Jindong Hong, Jiankai Sun. [pdf], 2025.07. - Large Reasoning Models Know How to Think Efficiently
Zeyu XING, Xing Li, Huiling Zhen, Xianzhi Yu, Mingxuan Yuan, Sinno Jialin Pan. [pdf], 2025.07. - Think in Blocks: Adaptive Reasoning from Direct Response to Deep Reasoning
Yekun Zhu, Guang Chen, Chengjun Mao. [pdf], 2025.08. - Gold-Switch: Training-Free Superposition of Slow- and Fast- Thinking LLMs
Jaeseong Lee, Dayoung Kwon, seung-won hwang. [pdf], 2025.10. - MixReasoning: Switching Modes to Think
Haiquan Lu, Gongfan Fang, Xinyin Ma, Qi Li, Xinchao Wang. [pdf], 2025.10. - When to Reason: Semantic Router for vLLM
Chen Wang, Xunzhuo Liu, Yuhan Liu, Yue Zhu, Xiangxi Mo, Junchen Jiang, Huamin Chen. [pdf], 2025.10. - DiffAdapt: Difficulty-Adaptive Reasoning for Token-Efficient LLM Inference
Xiang Liu, Xuming Hu, Xiaowen Chu, Eunsol Choi. [pdf], 2025.10. - DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains
Tian Liang, Wenxiang Jiao, Zhiwei He, Jiahao Xu, Haitao Mi, Dong Yu. [pdf], 2025.11. - MuTIS: Enhancing Reasoning Efficiency through Multi-Turn Intervention Sampling in Reinforcement Learning
Wenshuo Zhao, Haoxing Zhai, Xinyu Qiu, Zhenting Qi, Shuhe Li, Linchao Zhu. [pdf], 2025.11. - Re-FORC: Adaptive Reward Prediction for Efficient Chain-of-Thought Reasoning
Renos Zabounidis, Aditya Golatkar, Michael Kleinman, Alessandro Achille, Wei Xia, Stefano Soatto. [pdf], 2025.11. - Efficient Reasoning via Thought-Training and Thought-Free Inference
Canhui Wu, Qiong Cao, Chao Xue, Wei Xi, Xiaodong He. [pdf], 2025.11.
- Not All Neuro-Symbolic Concepts Are Created Equal: Analysis and Mitigation of Reasoning Shortcuts
Emanuele Marconato, Stefano Teso, Antonio Vergari, Andrea Passerini. [pdf], 2023.05. - Break the Chain: Large Language Models Can be Shortcut Reasoners
Mengru Ding, Hanmeng Liu, Zhizhang Fu, Jian Song, Wenbo Xie, Yue Zhang. [pdf], 2024.06. - Can Language Models Learn to Skip Steps?
Tengxiao Liu, Qipeng Guo, Xiangkun Hu, Cheng Jiayang, Yue Zhang, Xipeng Qiu, Zheng Zhang. [pdf], [code], 2024.11. - TokenSkip: Controllable Chain-of-Thought Compression in LLMs
Heming Xia, Yongqi Li, Chak Tou Leong, Wenjie Wang, Wenjie Li. [pdf], [code], 2025.02. - Stepwise Perplexity-Guided Refinement for Efficient Chain-of-Thought Reasoning in Large Language Models
Yingqian Cui, Pengfei He, Jingying Zeng, Hui Liu, Xianfeng Tang, Zhenwei Dai, Yan Han, Chen Luo, Jing Huang, Zhen Li, Suhang Wang, Yue Xing, Jiliang Tang, Qi He. [pdf], 2025.02. - Accelerating Chain-of-Thought Reasoning: When Goal-Gradient Importance Meets Dynamic Skipping
Ren Zhuang, Ben Wang, Shuifa Sun. [pdf], 2025.05. - DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models
Yuxuan Jiang, Dawei Li, Frank Ferraro. [pdf], 2025.05. - R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search
Yibo Wang, Li Shen, Huanjin Yao, Tiansheng Huang, Rui Liu, Naiqiang Tan, Jiaxing Huang, Kai Zhang, Dacheng Tao. [pdf], [code], 2025.05. - Not All Tokens Are What You Need In Thinking
Hang Yuan, Bin Yu, Haotian Li, Shijun Yang, Christina Dan Wang, Zhou Yu, Xueyin Xu, Weizhen Qi, Kai Chen. [pdf], [code], 2025.05. - LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling
Yang Xiao, Jiashuo Wang, Ruifeng Yuan, Chunpu Xu, Kaishuai Xu, Wenjie Li, Pengfei Liu. [pdf], [code], 2025.05. - Don't Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models
Sohyun An, Ruochen Wang, Tianyi Zhou, Cho-Jui Hsieh. [pdf], 2025.05. - Compressing Chain-of-Thought in LLMs via Step Entropy
Zeju Li, Jianyuan Zhong, Ziyang Zheng, Xiangyu Wen, Zhijian Xu, Yingying Cheng, Fan Zhang, Qiang Xu. [pdf], [code], 2025.08. - Pruning the Unsurprising: Efficient Code Reasoning via First-Token Surprisal
Wenhao Zeng, Yaoning Wang, Chao Hu, Yuling Shi, Chengcheng Wan, Hongyu Zhang, Xiaodong Gu. [pdf], [code], 2025.08. - Beyond Token Length: Step Pruner for Efficient and Accurate Reasoning in Large Language Models
Canhui Wu, Qiong Cao, Chang Li, Zhenfang Wang, Chao Xue, Yuwei Fan, Wei Xi, Xiaodong He. [pdf], 2025.10. - Think Right: Learning to Mitigate Under-Over Thinking via Adaptive, Attentive Compression
Joykirat Singh, Justin Chih-Yao Chen, Archiki Prasad, Elias Stengel-Eskin, Akshay Nambi, Mohit Bansal. [pdf], [code], 2025.10.
- Markov Chain of Thought for Efficient Mathematical Reasoning
Wen Yang, Minpeng Liao, Kai Fan. [pdf], [code], 2024.10. - Atom of Thoughts for Markov LLM Test-Time Scaling
Fengwei Teng, Zhaoyang Yu, Quan Shi, Jiayi Zhang, Chenglin Wu, Yuyu Luo. [pdf], [code], 2025.02. - DISC: Dynamic Decomposition Improves LLM Inference Scaling
Jonathan Light, Wei Cheng, Wu Yue, Masafumi Oyamada, Mengdi Wang, Santiago Paternain, Haifeng Chen. [pdf], 2025.02. - Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?
Kun Xiang, Zhili Liu, Zihao Jiang, Yunshuang Nie, Kaixin Cai, Yiyang Yin, Runhui Huang, Haoxiang Fan, Hanhui Li, Weiran Huang, Yihan Zeng, Yu-Jie Yuan, Jianhua Han, Lanqing Hong, Hang Xu, Xiaodan Liang. [pdf], [code], 2025.03. - From Chaos to Order: The Atomic Reasoner Framework for Fine-grained Reasoning in Large Language Models
Jinyi Liu, Yan Zheng, Rong Cheng, Qiyu Wu, Wei Guo, Fei Ni, Hebin Liang, Yifu Yuan, Hangyu Mao, Fuzheng Zhang, Jianye Hao. [pdf], 2025.03.
- Teaching Small Language Models to Reason
Lucie Charlotte Magister, Jonathan Mallinson, Jakub Adamek, Eric Malmi, Aliaksei Severyn. [pdf], 2022.12. - Mixed Distillation Helps Smaller Language Model Better Reasoning
Chenglin Li, Qianglong Chen, Liangyue Li, Caiyu Wang, Yicheng Li, Zulong Chen, Yin Zhang. [pdf], 2023.12. - Small Language Models Need Strong Verifiers to Self-Correct Reasoning
Yunxiang Zhang, Muhammad Khalifa, Lajanugen Logeswaran, Jaekyeom Kim, Moontae Lee, Honglak Lee, Lu Wang. [pdf], [code], 2024.04. - Distilling Reasoning Ability from Large Language Models with Adaptive Thinking
Xiaoshu Chen, Sihang Zhou, Ke Liang, Xinwang Liu. [pdf], 2024.04. - Teaching Small Language Models Reasoning through Counterfactual Distillation
Tao Feng, Yicheng Li, Li Chenglin, Hao Chen, Fei Yu, Yin Zhang. [pdf], 2024.11. - Improving Mathematical Reasoning Capabilities of Small Language Models via Feedback-Driven Distillation
Xunyu Zhu, Jian Li, Can Ma, Weiping Wang. [pdf], 2024.11. - Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners
Daniele Paliotta, Junxiong Wang, Matteo Pagliardini, Kevin Y. Li, Aviv Bick, J. Zico Kolter, Albert Gu, François Fleuret, Tri Dao. [pdf], 2025.02. - Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning
Xinghao Chen, Zhijing Sun, Wenjin Guo, Miaoran Zhang, Yanjun Chen, Yirong Sun, Hui Su, Yijie Pan, Dietrich Klakow, Wenjie Li, Xiaoyu Shen. [pdf], [code], 2025.02. - Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Runze Liu, Junqi Gao, Jian Zhao, Kaiyan Zhang, Xiu Li, Biqing Qi, Wanli Ouyang, Bowen Zhou. [pdf], [code], [homepage], 2025.02. - Small Models Struggle to Learn from Strong Reasoners
Yuetai Li, Xiang Yue, Zhangchen Xu, Fengqing Jiang, Luyao Niu, Bill Yuchen Lin, Bhaskar Ramasubramanian, Radha Poovendran. [pdf], [code], [homepage], 2025.02. - Towards Reasoning Ability of Small Language Models
Gaurav Srivastava, Shuxiang Cao, Xuan Wang. [pdf], 2025.02. - Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation
Yijia Luo, Yulin Song, Xingyao Zhang, Jiaheng Liu, Weixun Wang, GengRu Chen, Wenbo Su, Bo Zheng. [pdf], [code], 2025.03. - SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
Weihao Zeng, Yuzhen Huang, Qian Liu, Wei Liu, Keqing He, Zejun Ma, Junxian He. [pdf], [code], 2025.03. - Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't
Quy-Anh Dang, Chris Ngo. [pdf], [code], 2025.03. - TwT: Thinking without Tokens by Habitual Reasoning Distillation with Multi-Teachers' Guidance
Jingxian Xu, Mengyu Zhou, Weichang Liu, Hanbing Liu, Shi Han, Dongmei Zhang. [pdf], 2025.03. - When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks
Nan Zhang, Yusen Zhang, Prasenjit Mitra, Rui Zhang. [pdf], 2025.04. - A Short Survey on Small Reasoning Models: Training, Inference, Applications and Research Directions
Chengyu Wang, Taolin Zhang, Richang Hong, Jun Huang. [pdf], 2025.04. - Tina: Tiny Reasoning Models via LoRA
Shangshang Wang, Julian Asilis, Ömer Faruk Akgül, Enes Burak Bilgin, Ollie Liu, Willie Neiswanger. [pdf], [code], 2025.04. - A Technical Study into Small Reasoning Language Models
Xialie Zhuang, Peixian Ma, Zhikai Jia, Zheng Cao, Shiwei Liu. [pdf], 2025.06. - Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners
Xin Xu, Cliveb AI, Kai Yang, Tianhao Chen, Yang Wang, Saiyong Yang, Can Yang. [pdf], [code], 2025.09. - ThinkSLM: Towards Reasoning in Small Language Models
Gaurav Srivastava, Shuxiang Cao, Xuan Wang. [pdf], 2025.11. - Teach Small Models to Reason by Curriculum Distillation
Wangyi Jiang, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun. [pdf], 2025.11.
- Hawkeye: Efficient Reasoning with Model Collaboration
Jianshu She, Zhuohao Li, Zhemin Huang, Qi Li, Peiran Xu, Haonan Li, Qirong Ho. [pdf], 2025.04. - Guiding Reasoning in Small Language Models with LLM Assistance
Yujin Kim, Euiin Yi, Minu Kim, Se-Young Yun, Taehyeon Kim. [pdf], [code], 2025.04. - Thought Manipulation: External Thought Can Be Efficient for Large Reasoning Models
Yule Liu, Jingyi Zheng, Zhen Sun, Zifan Peng, Wenhan Dong, Zeyang Sha, Shiwen Cui, Weiqiang Wang, Xinlei He. [pdf], 2025.04. - SplitReason: Learning To Offload Reasoning
Yash Akhauri, Anthony Fei, Chi-Chih Chang, Ahmed F. AbouElhamayed, Yueying Li, Mohamed S. Abdelfattah. [pdf], 2025.04. - ProxyThinker: Test-Time Guidance through Small Visual Reasoners
Zilin Xiao, Jaywon Koo, Siru Ouyang, Jefferson Hernandez, Yu Meng, Vicente Ordonez. [pdf], [code], 2025.05. - What makes Reasoning Models Different? Follow the Reasoning Leader for Efficient Decoding
Ming Li, Zhengyuan Yang, Xiyao Wang, Dianqi Li, Kevin Lin, Tianyi Zhou, Lijuan Wang. [pdf], 2025.06. - Collaborative LLM Inference via Planning for Efficient Reasoning
Byeongchan Lee, Jonghoon Lee, Dongyoung Kim, Jaehyung Kim, Jinwoo Shin. [pdf], 2025.06. - R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning
Zhuokun Chen, Zeren Chen, Jiahao He, Mingkui Tan, Jianfei Cai, Bohan Zhuang. [pdf], 2025.07.
- Reward-Guided Speculative Decoding for Efficient LLM Reasoning
Baohao Liao, Yuhui Xu, Hanze Dong, Junnan Li, Christof Monz, Silvio Savarese, Doyen Sahoo, Caiming Xiong. [pdf], [code], 2025.01. - SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning
Rui Pan, Yinwei Dai, Zhihao Zhang, Gabriele Oliaro, Zhihao Jia, Ravi Netravali. [pdf], [code], 2025.04. - Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time
Wang Yang, Xiang Yue, Vipin Chaudhary, Xiaotian Han. [pdf], [code], 2025.04. - Efficient Reasoning for LLMs through Speculative Chain-of-Thought
Jikai Wang, Juntao Li, Lijun Wu, Min Zhang. [pdf], [code], 2025.04. - Accelerating Large Language Model Reasoning via Speculative Search
Zhihai Wang, Jie Wang, Jilai Pan, Xilin Xia, Huiling Zhen, Mingxuan Yuan, Jianye Hao, Feng Wu. [pdf], 2025.05. - R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing
Tianyu Fu, Yi Ge, Yichen You, Enshu Liu, Zhihang Yuan, Guohao Dai, Shengen Yan, Huazhong Yang, Yu Wang. [pdf], 2025.05. - Accelerated Test-Time Scaling with Model-Free Speculative Sampling
Woomin Song, Saket Dingliwal, Sai Muralidhar Jayanthi, Bhavana Ganesh, Jinwoo Shin, Aram Galstyan, Sravan Babu Bodapati. [pdf], 2025.06. - Scaling Speculative Decoding with Lookahead Reasoning
Yichao Fu, Rui Ge, Zelei Shao, Zhijie Deng, Hao Zhang. [pdf], [code], 2025.06. - SpecExit: Accelerating Large Reasoning Model via Speculative Exit
Rubing Yang, Huajun Bai, Song Liu, Guanghua Yu, Runzhi Fan, Yanbin Dang, Jiejing Zhang, Kai Liu, Jianchen Zhu, Peng Chen. [pdf], [code], 2025.09. - SpecCoT: Accelerating Chain-of-Thought Reasoning through Speculative Exploration
Junhan Shi, Yijia Zhu, Zhenning Shi, Dan Zhao, Qing Li, Yong Jiang. [pdf], 2025.11. - Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding
Yilong Zhao, Jiaming Tang, Kan Zhu, Zihao Ye, Chi-Chih Chang, Chaofan Lin, Jongseok Park, Guangxuan Xiao, Mohamed S. Abdelfattah, Mingyu Gao, Baris Kasikci, Song Han, Ion Stoica. [pdf], [code], 2025.11.
- Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding
Xuefei Ning, Zinan Lin, Zixuan Zhou, Zifu Wang, Huazhong Yang, Yu Wang. [pdf], [code], [homepage], 2023.06. - Adaptive Skeleton Graph Decoding
Shuowei Jin, Yongji Wu, Haizhong Zheng, Qingzhao Zhang, Matthew Lentz, Z. Morley Mao, Atul Prakash, Feng Qian, Danyang Zhuo. [pdf], 2024.02. - Accelerate Parallelizable Reasoning via Parallel Decoding within One Sequence
Yijiong Yu. [pdf], [code], 2024.03. - Learning Adaptive Parallel Reasoning with Language Models
Jiayi Pan, Xiuyu Li, Long Lian, Charlie Snell, Yifei Zhou, Adam Yala, Trevor Darrell, Kurt Keutzer, Alane Suhr. [pdf], [code], 2025.04. - Group Think: Multiple Concurrent Reasoning Agents Collaborating at Token Level Granularity
Chan-Jan Hsu, Davide Buffelli, Jamie McGowan, Feng-Ting Liao, Yi-Chang Chen, Sattar Vakili, Da-shan Shiu. [pdf], 2025.05. - Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation
Xinyu Yang, Yuwei An, Hongyi Liu, Tianqi Chen, Beidi Chen. [pdf], [homepage], [code], 2025.05. - SPRINT: Enabling Interleaved Planning and Parallelized Execution in Reasoning Models
Emil Biju, Shayan Talaei, Zhemin Huang, Mohammadreza Pourreza, Azalia Mirhoseini, Amin Saberi. [pdf], 2025.06. - Parallel Thinking, Sequential Answering: Bridging NAR and AR for Efficient Reasoning
Qihang Ai, Haiyun Jiang. [pdf], 2025.09. - A2R: An Asymmetric Two-Stage Reasoning Framework for Parallel Reasoning
Ziqi Wang, Boye Niu, Zhongli Li, Linghui Meng, Jing Liu, Zhi Zheng, Tong Xu, Hua Wu, Haifeng Wang, Enhong Chen. [pdf], 2025.09.
- R-KV: Redundancy-aware KV Cache Compression for Training-Free Reasoning Models Acceleration
Zefan Cai, Wen Xiao, Hanshi Sun, Cheng Luo, Yikai Zhang, Ke Wan, Yucheng Li, Yeyang Zhou, Li-Wen Chang, Jiuxiang Gu, Zhen Dong, Anima Anandkumar, Abedelkadir Asi, Junjie Hu. [pdf], 2025.06. - SeerAttention-R: Sparse Attention Adaptation for Long Reasoning
Yizhao Gao, Shuming Guo, Shijie Cao, Yuqing Xia, Yu Cheng, Lei Wang, Lingxiao Ma, Yutao Sun, Tianzhu Ye, Li Dong, Hayden Kwok-Hay So, Yu Hua, Ting Cao, Fan Yang, Mao Yang. [pdf], 2025.06. - Think Clearly: Improving Reasoning via Redundant Token Pruning
Daewon Choi, Jimin Lee, Jihoon Tack, Woomin Song, Saket Dingliwal, Sai Muralidhar Jayanthi, Bhavana Ganesh, Jinwoo Shin, Aram Galstyan, Sravan Babu Bodapati. [pdf], 2025.07. - Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning
Lijie Yang, Zhihao Zhang, Arti Jain, Shijie Cao, Baihong Yuan, Yiwei Chen, Zhihao Jia, Ravi Netravali. [pdf], [code], 2025.08. - ThinKV: Thought-Adaptive KV Cache Compression for Efficient Reasoning Models
Akshat Ramachandran, Marina Neseem, Charbel Sakr, Rangharajan Venkatesan, Brucek Khailany, Tushar Krishna. [pdf], 2025.10. - Which Heads Matter for Reasoning? RL-Guided KV Cache Compression
Wenjie Du, Li Jiang, Keda Tao, Xue Liu, Huan Wang. [pdf], [homepage], 2025.10.
- Scaling LLM Test-Time Compute Optimally Can be More Effective than Scaling Parameters for Reasoning
Charlie Snell, Jaehoon Lee, Kelvin Xu, Aviral Kumar. [pdf], 2024.08. - Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models
Yangzhen Wu, Zhiqing Sun, Shanda Li, Sean Welleck, Yiming Yang. [pdf], [code], [homepage], 2024.08. - Scaling Test-Time Compute Without Verification or RL is Suboptimal
Amrith Setlur, Nived Rajaraman, Sergey Levine, Aviral Kumar. [pdf], 2025.02. - Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?
Zhiyuan Zeng, Qinyuan Cheng, Zhangyue Yin, Yunhua Zhou, Xipeng Qiu. [pdf], [code], 2025.02. - Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning
Wenkai Yang, Shuming Ma, Yankai Lin, Furu Wei. [pdf], 2025.02. - Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Yuxiao Qu, Matthew Y. R. Yang, Amrith Setlur, Lewis Tunstall, Edward Emanuel Beeching, Ruslan Salakhutdinov, Aviral Kumar. [pdf], [code], [homepage], 2025.03. - Is Best-of-N the Best of Them? Coverage, Scaling, and Optimality in Inference-Time Alignment
Audrey Huang, Adam Block, Qinghua Liu, Nan Jiang, Dylan J. Foster, Akshay Krishnamurthy. [pdf], 2025.03. - What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models
Qiyuan Zhang, Fuyuan Lyu, Zexu Sun, Lei Wang, Weixu Zhang, Zhihan Guo, Yufei Wang, Irwin King, Xue Liu, Chen Ma. [pdf], 2025.03. - When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning
Nishad Singhi, Hritik Bansal, Arian Hosseini, Aditya Grover, Kai-Wei Chang, Marcus Rohrbach, Anna Rohrbach. [pdf], [code], 2025.04. - Z1: Efficient Test-time Scaling with Code
Zhaojian Yu, Yinghao Wu, Yilun Zhao, Arman Cohan, Xiao-Ping Zhang. [pdf], [code], 2025.04. - Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods
Junlin Wang, Shang Zhu, Jon Saad-Falcon, Ben Athiwaratkun, Qingyang Wu, Jue Wang, Shuaiwen Leon Song, Ce Zhang, Bhuwan Dhingra, James Zou. [pdf], 2025.04. - Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers
Kusha Sareen, Morgane M Moss, Alessandro Sordoni, Rishabh Agarwal, Arian Hosseini. [pdf], 2025.05. - Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence
Amirhosein Ghasemabadi, Keith G. Mills, Baochun Li, Di Niu. [pdf], [code], 2025.05. - Value-Guided Search for Efficient Chain-of-Thought Reasoning
Kaiwen Wang, Jin Peng Zhou, Jonathan Chang, Zhaolin Gao, Nathan Kallus, Kianté Brantley, Wen Sun. [pdf], [code], 2025.05. - Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning
Michael Hassid, Gabriel Synnaeve, Yossi Adi, Roy Schwartz. [pdf], 2025.05. - First Finish Search: Efficient Test-Time Scaling in Large Language Models
Aradhye Agarwal, Ayan Sengupta, Tanmoy Chakraborty. [pdf], 2025.05. - LLM-First Search: Self-Guided Exploration of the Solution Space
Nathan Herr, Tim Rocktäschel, Roberta Raileanu. [pdf], [code], 2025.06. - Every Rollout Counts: Optimal Resource Allocation for Efficient Test-Time Scaling
Xinglin Wang, Yiwei Li, Shaoxiong Feng, Peiwen Yuan, Yueqi Zhang, Jiayi Shi, Chuyi Tan, Boyuan Pan, Yao Hu, Kan Li. [pdf], 2025.06. - 𝚂𝙿𝙴𝙲𝚂: Faster Test-Time Scaling through Speculative Drafts
Mert Cemri, Nived Rajaraman, Rishabh Tiwari, Xiaoxuan Liu, Kurt Keutzer, Ion Stoica, Kannan Ramchandran, Ahmad Beirami, Ziteng Sun. [pdf], 2025.06. - BEST-Route: Adaptive LLM Routing with Test-Time Optimal Compute
Dujian Ding, Ankur Mallick, Shaokun Zhang, Chi Wang, Daniel Madrigal, Mirian Del Carmen Hipolito Garcia, Menglin Xia, Laks V.S. Lakshmanan, Qingyun Wu, Victor Rühle. [pdf], 2025.07. - Deep Think with Confidence
Yichao Fu, Xuewei Wang, Yuandong Tian, Jiawei Zhao. [pdf], [homepage], 2025.08. - ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute
Hao Wen, Yifan Su, Feifei Zhang, Yunxin Liu, Yunhao Liu, Ya-Qin Zhang, Yuanchun Li. [pdf], 2025.09. - A1: Asynchronous Test-Time Scaling via Conformal Prediction
Jing Xiong, Qiujiang Chen, Fanghua Ye, Zhongwei Wan, Chuanyang Zheng, Chenyang Zhao, Hui Shen, Alexander Hanbo Li, Chaofan Tao, Haochen Tan, Haoli Bai, Lifeng Shang, Lingpeng Kong, Ngai Wong. [pdf], 2025.09. - AsyncSpade: Efficient Test-Time Scaling with Asynchronous Sparse Decoding
Shuqing Luo, Yilin Guan, Pingzhi Li, Hanrui Wang, Tianlong Chen. [pdf], 2025.10. - DeepPrune: Parallel Scaling without Inter-trace Redundancy
Shangqing Tu, Yaxuan Li, Yushi Bai, Lei Hou, Juanzi Li. [pdf], [code], [homepage], 2025.10. - MatryoshkaThinking: Recursive Test-Time Scaling Enables Efficient Reasoning
Hongwei Chen, Yishu Lei, Dan Zhang, Bo Ke, Danxiang Zhu, Xuyi Chen, Yuxiang Lu, Zhengjie Huang, Shikun Feng, Jingzhou He, Yu Sun, Hua Wu, Haifeng Wang. [pdf], 2025.10. - Tracing the Traces: Latent Temporal Signals for Efficient and Accurate Reasoning
Martina G. Vilas, Safoora Yousefi, Besmira Nushi, Eric Horvitz, Vidhisha Balachandran. [pdf], 2025.10. - Seer Self-Consistency: Advance Budget Estimation for Adaptive Test-Time Scaling
Shiyu Ji, Yixuan Wang, Yijun Liu, Qingfu Zhu, Wanxiang Che. [pdf], [code], 2025.11.
- Fast Best-of-N Decoding via Speculative Rejection
Hanshi Sun, Momin Haider, Ruiqi Zhang, Huitao Yang, Jiahao Qiu, Ming Yin, Mengdi Wang, Peter Bartlett, Andrea Zanette. [pdf], [code], 2024.10. - Non-myopic Generation of Language Models for Reasoning and Planning
Chang Ma, Haiteng Zhao, Junlei Zhang, Junxian He, Lingpeng Kong. [pdf], [code], 2024.10. - FastMCTS: A Simple Sampling Strategy for Data Synthesis
Peiji Li, Kai Lv, Yunfan Shao, Yichuan Ma, Linyang Li, Xiaoqing Zheng, Xipeng Qiu, Qipeng Guo. [pdf], 2025.02. - Dynamic Parallel Tree Search for Efficient LLM Reasoning
Yifu Ding, Wentao Jiang, Shunyu Liu, Yongcheng Jing, Jinyang Guo, Yingjie Wang, Jing Zhang, Zengmao Wang, Ziwei Liu, Bo Du, Xianglong Liu, Dacheng Tao. [pdf], 2025.02. - Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls
Ante Wang, Linfeng Song, Ye Tian, Dian Yu, Haitao Mi, Xiangyu Duan, Zhaopeng Tu, Jinsong Su, Dong Yu. [pdf], 2025.02. - Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding
Yiming Wang, Pei Zhang, Siyuan Huang, Baosong Yang, Zhuosheng Zhang, Fei Huang, Rui Wang. [pdf], 2025.03. - Language Models can Self-Improve at State-Value Estimation for Better Search
Ethan Mendes, Alan Ritter. [pdf], 2025.03. - ϕ-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation
Fangzhi Xu, Hang Yan, Chang Ma, Haiteng Zhao, Jun Liu, Qika Lin, Zhiyong Wu. [pdf], [code], 2025.03. - Lost at the Beginning of Reasoning
Baohao Liao, Xinyi Chen, Sara Rajaee, Yuhui Xu, Christian Herold, Anders Søgaard, Maarten de Rijke, Christof Monz. [pdf], 2025.06. - Adaptive Termination for Multi-round Parallel Reasoning: An Universal Semantic Entropy-Guided Framework
Zenan Xu, Zexuan Qiu, Guanhua Huang, Kun Li, Siheng Li, Chenchen Zhang, Kejiao Li, Qi Yi, Yuhao Jiang, Bo Zhou, Fengzong Lian, Zhanhui Kang. [pdf], 2025.07. - MUR: Momentum Uncertainty guided Reasoning for Large Language Models
Hang Yan, Fangzhi Xu, Rongman Xu, Yifei Li, Jian Zhang, Haoran Luo, Xiaobao Wu, Luu Anh Tuan, Haiteng Zhao, Qika Lin, Jun Liu. [pdf], [code], 2025.07. - Parallel Loop Transformer for Efficient Test-Time Computation Scaling
Bohong Wu, Mengzhao Chen, Xiang Luo, Shen Yan, Qifan Yu, Fan Xia, Tianqi Zhang, Hongrui Zhan, Zheng Zhong, Xun Zhou, Siyuan Qiao, Xingyan Bin. [pdf], 2025.10.
- Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning
Yiwei Li, Peiwen Yuan, Shaoxiong Feng, Boyuan Pan, Xinglin Wang, Bin Sun, Heda Wang, Kan Li. [pdf], [code], 2024.01. - Make Every Penny Count: Difficulty-Adaptive Self-Consistency for Cost-Efficient Reasoning
Xinglin Wang, Shaoxiong Feng, Yiwei Li, Peiwen Yuan, Yueqi Zhang, Chuyi Tan, Boyuan Pan, Yao Hu, Kan Li. [pdf], [code], 2024.08. - Path-Consistency: Prefix Enhancement for Efficient Inference in LLM
Jiace Zhu, Yingtao Shen, Jie Zhao, An Zou. [pdf], 2024.08. - Reasoning Aware Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling
Guangya Wan, Yuqi Wu, Jie Chen, Sheng Li. [pdf], [code], 2024.08. - Efficient Test-Time Scaling via Self-Calibration
Chengsong Huang, Langlin Huang, Jixuan Leng, Jiacheng Liu, Jiaxin Huang. [pdf], [code], 2025.02. - Confidence Improves Self-Consistency in LLMs
Amir Taubenfeld, Tom Sheffer, Eran Ofek, Amir Feder, Ariel Goldstein, Zorik Gekhman, Gal Yona. [pdf], 2025.02. - Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning
Zhi Zhou, Tan Yuhao, Zenan Li, Yuan Yao, Lan-Zhe Guo, Xiaoxing Ma, Yu-Feng Li. [pdf], 2025.02. - Slim-SC: Thought Pruning for Efficient Scaling with Self-Consistency
Colin Hong, Xu Guo, Anand Chaanan Singh, Esha Choukse, Dmitrii Ustiugov. [pdf], 2025.09. - Optimal Self-Consistency for Efficient Reasoning with Large Language Models
Austin Feng, Marius Alonso, Ambroise Odonnat. [pdf], 2025.11.
- OmniKV: Dynamic Context Selection for Efficient Long-Context LLMs
Jitai Hao, Yuke Zhu, Tian Wang, Jun Yu, Xin Xin, Bo Zheng, Zhaochun Ren, Sheng Guo. [pdf], 2024.10. - InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models
Yuchen Yan, Yongliang Shen, Yang Liu, Jin Jiang, Mengdi Zhang, Jian Shao, Yueting Zhuang. [pdf], 2025.03. - DELTA: Dynamic Layer-Aware Token Attention for Efficient Long-Context Reasoning
Hossein Entezari Zarch, Lei Gao, Chaoyi Jiang, Murali Annavarm. [pdf], 2025.10.
- PixelThink: Towards Efficient Chain-of-Pixel Reasoning
Song Wang, Gongfan Fang, Lingdong Kong, Xiangtai Li, Jianyun Xu, Sheng Yang, Qiang Li, Jianke Zhu, Xinchao Wang. [pdf], [code], [homepage], 2025.05. - R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning
Jie Jiang, Qi Yang, Bolin Ni, Shiming Xiang, Han Hu, Houwen Peng. [pdf], [code], [huggingface], 2025.08. - Think Smart, Not Hard: Difficulty Adaptive Reasoning for Large Audio Language Models
Zhichao Sheng, Shilin Zhou, Chen Gong, Zhenghua Li. [pdf], 2025.09. - Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Luozheng Qin, Jia Gong, Yuqing Sun, Tianjiao Li, Mengping Yang, Xiaomeng Yang, Chao Qu, Zhiyu Tan, Hao Li. [pdf], [code], [homepage], 2025.09. - ARM2: Adaptive Reasoning Model with Vision Understanding and Executable Code
Jian Xie, Zhendong Chu, Aoxiao Zhong, Kai Zhang, Mingzhe Han, Xing Fan, Jialie Shen, Qingsong Wen. [pdf], 2025.10. - ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping
Shuang Chen, Yue Guo, Yimeng Ye, Shijue Huang, Wenbo Hu, Haoxi Li, Manyuan Zhang, Jiayu Chen, Song Guo, Nanyun Peng. [pdf], [code], 2025.10. - Towards Efficient Multimodal Unified Reasoning Model via Model Merging
Qixiang Yin, Huanjin Yao, Jianghao Chen, Jiaxing Huang, Zhicheng Zhao, Fei Su. [pdf], [code], 2025.10. - Learning to Think Fast and Slow for Visual Language Models
Chenyu Lin, Cheng Chi, Jinlin Wu, Sharon Li, Kaiyang Zhou. [pdf], 2025.11. - ChainV: Atomic Visual Hints Make Multimodal Reasoning Shorter and Better
Yuan Zhang, Ming Lu, Junwen Pan, Tao Huang, Kuan Cheng, Qi She, Shanghang Zhang. [pdf], 2025.11.
- PENCIL: Long Thoughts with Short Memory
PENCIL: Long Thoughts with Short Memory. [pdf], 2025.03. - Fast-Slow-Thinking: Complex Task Solving with Large Language Models
Yiliu Sun, Yanfang Zhang, Zicheng Zhao, Sheng Wan, Dacheng Tao, Chen Gong. [pdf], 2024.04. - M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Junxiong Wang, Wen-Ding Li, Daniele Paliotta, Daniel Ritter, Alexander M. Rush, Tri Dao. [pdf], [code], 2025.04. - Thinker: Learning to Think Fast and Slow
Stephen Chung, Wenyu Du, Jie Fu. [pdf], 2025.05. - Route-and-Reason: Scaling Large Language Model Reasoning with Reinforced Model Router
Chenyang Shao, Xinyang Liu, Yutang Lin, Fengli Xu, Yong Li. [pdf], 2025.06. - Accelerating LLM Reasoning via Early Rejection with Partial Reward Modeling
Seyyed Saeid Cheshmi, Azal Ahmad Khan, Xinran Wang, Zirui Liu, Ali Anwar. [pdf], 2025.06. - Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation
Liliang Ren, Congcong Chen, Haoran Xu, Young Jin Kim, Adam Atkinson, Zheng Zhan, Jiankai Sun, Baolin Peng, Liyuan Liu, Shuohang Wang, Hao Cheng, Jianfeng Gao, Weizhu Chen, Yelong Shen. [pdf], [code], 2025.07. - Retrieval-of-Thought: Efficient Reasoning via Reusing Thoughts
Ammar Ahmed, Azal Ahmad Khan, Ayaan Ahmad, Sheng Di, Zirui Liu, Ali Anwar. [pdf], 2025.09. - Intra-request branch orchestration for efficient LLM reasoning
Weifan Jiang, Rana Shahout, Yilun Du, Michael Mitzenmacher, Minlan Yu. [pdf], 2025.09. - The Markovian Thinker: Architecture-Agnostic Linear Scaling of Reasoning
Milad Aghajohari, Kamran Chitsaz, Amirhossein Kazemnejad, Sarath Chandar, Alessandro Sordoni, Aaron Courville, Siva Reddy. [pdf], [code], 2025.10. - StreamingThinker: Large Language Models Can Think While Reading
Junlong Tong, Yingqi Fan, Anhao Zhao, Yunpu Ma, Xiaoyu Shen. [pdf], 2025.10.
- DNA Bench: When Silence is Smarter -- Benchmarking Over-Reasoning in Reasoning LLMs
Masoud Hashemi, Oluwanifemi Bamgbose, Sathwik Tejaswi Madhusudhan, Jishnu Sethumadhavan Nair, Aman Tiwari, Vikas Yadav. [pdf], 2024.12. - S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models
Wenyuan Zhang, Shuaiyi Nie, Xinghua Zhang, Zefeng Zhang, Tingwen Liu. [pdf], [code], 2025.04. - THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models
Xiao Pu, Michael Saxon, Wenyue Hua, William Yang Wang. [pdf], 2025.04. - THINK-Bench: Evaluating Thinking Efficiency and Chain-of-Thought Quality of Large Reasoning Models
Zhiyuan Li, Yi Chang, Yuan Wu. [pdf], [homepage], [code], 2025.04. - LLMThinkBench: Towards Basic Math Reasoning and Overthinking in Large Language Models
Gaurav Srivastava, Aafiya Hussain, Sriram Srinivasan, Xuan Wang. [pdf], 2025.07. - OptimalThinkingBench: Evaluating Over and Underthinking in LLMs
Pranjal Aggarwal, Seungone Kim, Jack Lanchantin, Sean Welleck, Jason Weston, Ilia Kulikov, Swarnadeep Saha. [pdf], [code], 2025.08. - EffiReason-Bench: A Unified Benchmark for Evaluating and Advancing Efficient Reasoning in Large Language Models
Junquan Huang, Haotian Wu, Yubo Gao, Yibo Yan, Junyan Zhang, Yonghua Hei, Song Dai, Jie Zhang, Puay Siew Tan, Xuming Hu. [pdf], 2025.11.
- The Impact of Reasoning Step Length on Large Language Models
Mingyu Jin, Qinkai Yu, Dong Shu, Haiyan Zhao, Wenyue Hua, Yanda Meng, Yongfeng Zhang, Mengnan Du. [pdf], [code], 2024.01. - Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost
Sania Nayab, Giulio Rossolini, Marco Simoni, Andrea Saracino, Giorgio Buttazzo, Nicolamaria Manes, Fabrizio Giacomelli. [pdf], 2024.07. - Unlocking the Capabilities of Thought: A Reasoning Boundary Framework to Quantify and Optimize Chain-of-Thought
Qiguang Chen, Libo Qin, Jiaqi Wang, Jinxuan Zhou, Wanxiang Che. [pdf], [code], 2024.10. - Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
Xingyu Chen, Jiahao Xu, Tian Liang, Zhiwei He, Jianhui Pang, Dian Yu, Linfeng Song, Qiuzhi Liu, Mengfei Zhou, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu. [pdf], 2024.12. - When More is Less: Understanding Chain-of-Thought Length in LLMs
Yuyang Wu, Yifei Wang, Tianqi Du, Stefanie Jegelka, Yisen Wang. [pdf], 2025.02. - The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks
Alejandro Cuadron, Dacheng Li, Wenjie Ma, Xingyao Wang, Yichuan Wang, Siyuan Zhuang, Shu Liu, Luis Gaspar Schroeder, Tian Xia, Huanzhi Mao, Nicholas Thumiger, Aditya Desai, Ion Stoica, Ana Klimovic, Graham Neubig, Joseph E. Gonzalez. [pdf], [code], 2025.02. - Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Yue Wang, Qiuzhi Liu, Jiahao Xu, Tian Liang, Xingyu Chen, Zhiwei He, Linfeng Song, Dian Yu, Juntao Li, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu. [pdf], 2025.02. - The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer
Marthe Ballon, Andres Algaba, Vincent Ginis. [pdf], 2025.02. - Long Is More Important Than Difficult for Training Reasoning Models
Si Shen, Fei Huang, Zhixiao Zhao, Chang Liu, Tiansheng Zheng, Danhao Zhu. [pdf], [code], 2025.03. - Innate Reasoning is Not Enough: In-Context Learning Enhances Reasoning Large Language Models with Less Overthinking
Yuyao Ge, Shenghua Liu, Yiwei Wang, Lingrui Mei, Lizhe Chen, Baolong Bi, Xueqi Cheng. [pdf], 2025.03. - Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification
Anqi Zhang, Yulin Chen, Jane Pan, Chen Zhao, Aurojit Panda, Jinyang Li, He He. [pdf], 2025.04. - Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?
Chenrui Fan, Ming Li, Lichao Sun, Tianyi Zhou. [pdf], [code], 2025.04. - Time's Up! An Empirical Study of LLM Reasoning Ability Under Output Length Constraint
Yi Sun, Han Wang, Jiaqiang Li, Jiacheng Liu, Xiangyu Li, Hao Wen, Huiwen Zheng, Yan Liang, Yuanchun Li, Yunxin Liu. [pdf], 2025.04. - Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs
Jinyan Su, Jennifer Healey, Preslav Nakov, Claire Cardie. [pdf], 2025.04. - When Can Large Reasoning Models Save Thinking? Mechanistic Analysis of Behavioral Divergence in Reasoning
Rongzhi Zhu, Yi Liu, Zequn Sun, Yiwei Wang, Wei Hu. [pdf], 2025.05. - On Reasoning Strength Planning in Large Reasoning Models
Leheng Sheng, An Zhang, Zijian Wu, Weixiang Zhao, Changshuo Shen, Yi Zhang, Xiang Wang, Tat-Seng Chua. [pdf], [code], 2025.06. - Is Long-to-Short a Free Lunch? Investigating Inconsistency and Reasoning Efficiency in LRMs
Shu Yang, Junchao Wu, Xuansheng Wu, Derek Wong, Ninhao Liu, Di Wang. [pdf], 2025.06. - Thought Anchors: Which LLM Reasoning Steps Matter?
Paul C. Bogdan, Uzay Macar, Neel Nanda, Arthur Conmy. [pdf], 2025.06. - What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
Ming Li, Yanhong Li, Tianyi Zhou. [pdf], [code], 2025.06. - Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer
Wenquan Lu, Yuechuan Yang, Kyle Lee, Yanshu Li, Enqi Liu. [pdf], [code], 2025.07. - First Try Matters: Revisiting the Role of Reflection in Reasoning Models
Liwei Kang, Yue Deng, Yao Xiao, Zhanfeng Mo, Wee Sun Lee, Lidong Bing. [pdf], [code], 2025.10.
- Rethinking Thinking Tokens: LLMs as Improvement Operators
Lovish Madaan, Aniket Didolkar, Suchin Gururangan, John Quan, Ruan Silva, Ruslan Salakhutdinov, Manzil Zaheer, Sanjeev Arora, Anirudh Goyal. [pdf], 2025.10.
- Do LLMs Really Need 10+ Thoughts for "Find the Time 1000 Days Later"? Towards Structural Understanding of LLM Overthinking
Xinliang Frederick Zhang, Anhad Mohananey, Alexandra Chronopoulou, Pinelopi Papalampidi, Somit Gupta, Tsendsuren Munkhdalai, Lu Wang, Shyam Upadhyay. [pdf], 2025.10.
- Demystifying Hybrid Thinking: Can LLMs Truly Switch Between Think and No-Think?
Shouren Wang, Wang Yang, Xianxuan Long, Qifan Wang, Vipin Chaudhary, Xiaotian Han. [pdf], [code], 2025.10.
- Pruning the Unsurprising: Efficient Code Reasoning via First-Token Surprisal
Wenhao Zeng, Yaoning Wang, Chao Hu, Yuling Shi, Chengcheng Wan, Hongyu Zhang, Xiaodong Gu. [pdf], [code], 2025.08. - Audio-Thinker: Guiding Audio Language Model When and How to Think via Reinforcement Learning
Shu Wu, Chenxing Li, Wenfu Wang, Hao Zhang, Hualei Wang, Meng Yu, Dong Yu. [pdf], 2025.08. - ThinkBrake: Mitigating Overthinking in Tool Reasoning
Minjae Oh, Sangjun Song, Seungkyu Lee, Sungmin Jo, Yohan Jo. [pdf], 2025.10.
Optimizing LLM Test-Time Compute Involves Solving a Meta-RL Problem. CMU, University of Toronto. [blog], 2025.01.
Understanding R1-Zero-Like Training: A Critical Perspective. Sea AI Lab. [paper], [code], 2025.03.
The Key Ingredients for Scaling Test-Time Compute. Aviral Kumar. Carnegie Mellon University. [homepage], [video], 2025.03.
Reading lists related to Efficient Reasoning
- Eclipsess/Awesome-Efficient-Reasoning-LLMs
- fscdc/Awesome-Efficient-Reasoning-Models
- XiaoYee/Awesome_Efficient_LRM_Reasoning
- Blueyee/Efficient-CoT-LRMs
- yueliu1999/Awesome-Efficient-Inference-for-LRMs
- DevoAllen/Awesome-Reasoning-Economy-Papers
- Hongcheng-Gao/Awesome-Long2short-on-LRMs
- EIT-NLP/Awesome-Latent-CoT
- yzhangchuck/awesome-llm-reasoning-long2short-papers
- yuelinan/Awesome-Efficient-R1-style-LRMs
- There are cases where we miss important works in this field, please feel free to contribute and promote your awesome work or other related works here! Thanks for the efforts in advance.