Zhibin Gou 苟志斌

I am an AI researcher at DeepSeek, where I work on LLM reasoning and RL.

I hold an M.S. in Artificial Intelligence from Tsinghua University (advisor: Prof. Yujiu Yang) and a B.S. in Computer Science from Beijing University of Posts and Telecommunications.

My long-term research goal is to advance artificial superintelligence (ASI) by developing simple, general, and scalable methods that enable systems to surpass human performance across tasks and environments.

Education

Aug. 2022 - Jun. 2025 M.Sc., Div. of Information Science and Technology, SIGS, Tsinghua University, Beijing, China.
GPA: 4.0/4.0
Sep. 2018 - Jun. 2022 B.Sc., School of Computer Science, Beijing University of Posts and Telecommunications, Beijing, China.
GPA: 3.84/4.0, Rank: 1%

Selected Publications

(* indicates equal contribution)

[Nature 2025] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI (Core Contributor)
[ICLR 2024] ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving [website][code]
Zhibin Gou*, Zhihong Shao*, Yeyun Gong, Yelong Shen, Yujiu Yang, Minlie Huang, Nan Duan, Weizhu Chen
ToRA integrates chain-of-thought reasoning with code-based tools to solve math problems. ToRA-34B is the first open-source LLM that is competitive with GPT-4 in tool-assisted mathematical problem solving.
[ICLR 2024] CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing [code]
Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Nan Duan, Weizhu Chen
The first paper to find that current LLMs struggle with “Self-Verification” and “Self-Correction”, and propose tool-augmented critiquing for reliable self-improvement.
[NeurIPS 2024 Oral] Rho-1: Not All Tokens Are What You Need [code] (🏆Best Paper Runner-up Award)
Zhenghao Lin*, Zhibin Gou*, Yeyun Gong, Xiao Liu, Yelong Shen, Ruochen Xu, Chen Lin, Yujiu Yang, Jian Jiao, Nan Duan, Weizhu Chen
Rho-1 introduces Selective Language Modeling (SLM), a method for token-level pretraining data selection. By applying SLM to math continual pretraining, it enhances math reasoning by over 16%, reaching baseline performance 5-10x faster.
[EMNLP 2024] SciAgent: Tool-augmented Language Models for Scientific Reasoning
Yubo Ma, Zhibin Gou, Junheng Hao, Ruochen Xu, Shuohang Wang, Liangming Pan, Yujiu Yang, Yixin Cao, Aixin Sun, Hany Awadalla, Weizhu Chen
[ACL 2024 findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning [website] [code]
Zicheng Lin*, Zhibin Gou*, Tian Liang, Ruilin Luo, Haowei Liu, Yujiu Yang
[COLM 2024] Exploring the Mystery of Influential Data for Mathematical Reasoning
Xinzhe Ni, Yeyun Gong, Zhibin Gou, Yelong Shen, Yujiu Yang, Nan Duan, Weizhu Chen
[AAAI 2025] Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning
Yiming Huang, Xiao Liu, Yeyun Gong, Zhibin Gou, Yelong Shen, Nan Duan, Weizhu Chen
[ACL 2023] MvP: Multi-view Prompting Improves Aspect Sentiment Tuple Prediction [code]
Zhibin Gou*, Qingyan Guo*, Yujiu Yang
[ACL 2022 findings] Long Time No See! Open-Domain Conversation with Long-Term Persona Memory [data]
Xinchao Xu*, Zhibin Gou*, Wenquan Wu, Zheng-Yu Niu, Hua Wu, Haifeng Wang, and Shihang Wang

Tech Reports

[Arxiv 2024.12] DeepSeek-V3 Technical Report
DeepSeek-AI
[Arxiv 2024.06] DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
DeepSeek-AI

Experience

(June. 2024 - Present) Researcher, DeepSeek, Beijing, China.
Working on LLMs, especially in reasoning and RL.
Represented work: DeepSeek-R1
(Jan. 2023 - May. 2024) Research Intern, NLC Group, Microsoft Research Asia, Beijing, China.
Mentor: Yeyun Gong, Weizhu Chen
Working on LLMs, focusing on reasoning and tool-use.
Represented work: ToRA, CRITIC, Rho-1
(Sep. 2021 - May. 2022) Research Intern, General Dialogue Group, Baidu Inc., Beijing, China.
Mentor: Xinchao Xu, Hua Wu
Working on open-domain dialog: long-term memory, personalized chatbot.

Competitions

(Nov. 2022) 1st Place (1/54) in NLP task of NeurIPS 2022 IGLU Challenge
(Aug. 2022) 2nd Place (2/684) in WAIC 2022 Midu CTC Competition
(May. 2021) 1st Place (auto eval: 1/862, human eval: 3/862) in Multi-skill Dialog task of NLPCC 2021 Language and Intelligence Challenge

Honors & Awards

NeurIPS 2024 Best Paper Runner-up, 2024
Outstanding Graduate Thesis, Beijing, 2022
Outstanding Graduate, Beijing, 2022
National Scholarship (top 1%), Ministry of Education, China, 2020
National Scholarship (top 1%), Ministry of Education, China, 2019
National Scholarship (top 1%), Ministry of Education, China, 2018

Service

Reviewer: NeurIPS, ICML, ACL, EMNLP, etc.