Skip to content

NLPJCL/SearchAgent-Zero

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 

Repository files navigation

SearchAgent-Zero

Based on the Qwen3-8B and Verl framework, it is trained using pure reinforcement learning. Compared to models of the same level, this model achieves a state-of-the-art performance (SOTA) of 37.95 on the BrowseComp-Plus dataset, surpassing many large commercial models (such as Gemini 2.5 Pro and Kimi-K2).

This model possesses multi-turn search capabilities, averaging over 20 searches on the training set and over 40 searches on the test set, and it generalizes well to shallow search tasks.

This model is still under development; technical details will be updated here and on Zhihu later.

https://www.zhihu.com/people/li-jia-cheng-63-47

About

A search agent trained purely by RL

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages