Zhibin Gou 苟志斌

Hi! I’m Zhibin Gou, currently a second-year M.S. student at SIGS, Tsinghua University, advised by Prof. Yujiu Yang. Before that, I received my bachelor’s degree in Computer Science from Beijing University of Posts and Telecommunications in Jun. 2022. My currect research interest lies in LLMs, especially in reasoning and tool use.

Education

Aug. 2022 - Jun. 2025 (Expected) M.Sc., Div. of Information Science and Technology, SIGS, Tsinghua University, Beijing, China.
GPA: 4.0/4.0
Sep. 2018 - Jun. 2022 B.Sc., School of Computer Science, Beijing University of Posts and Telecommunications, Beijing, China.
GPA: 3.84/4.0, Rank: 1%

Selected Publications

(* indicates equal contribution)

[Arxiv 2024] Rho-1: Not All Tokens Are What You Need [code]
Zhenghao Lin*, Zhibin Gou*, Yeyun Gong, Xiao Liu, Yelong Shen, Ruochen Xu, Chen Lin, Yujiu Yang, Jian Jiao, Nan Duan, Weizhu Chen
Rho-1 introduces Selective Language Modeling (SLM), a method for token-level pretraining data selection. By applying SLM to math continual pretraining, it enhances math reasoning by over 16%, reaching baseline performance 5-10x faster.
[ICLR 2024] ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving [website][code] (Stars: 800+)
Zhibin Gou*, Zhihong Shao*, Yeyun Gong, Yelong Shen, Yujiu Yang, Minlie Huang, Nan Duan, Weizhu Chen
ToRA solves math problems by integrating chain-of-thought reasoning with program-based tool use. ToRA-34B is the first open-source LLM that achieves over 50% on MATH, which is competitive with GPT-4 solving problems with programs.
[ICLR 2024] CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing [code] (Citations: 100+)
Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Nan Duan, Weizhu Chen
The first paper to find that current LLMs struggle with Self-Verification and Self-Correction, and propose tool-augmented critiquing for reliable self-improvement.
[ACL 2023] MvP: Multi-view Prompting Improves Aspect Sentiment Tuple Prediction [code]
Zhibin Gou*, Qingyan Guo*, Yujiu Yang
MvP is a simple unified generative framework for structure prediction, achieving state-of-the-art performance on 10 datasets across 4 ABSA tasks.
[ACL 2022 findings] Long Time No See! Open-Domain Conversation with Long-Term Persona Memory [data]
Xinchao Xu*, Zhibin Gou*, Wenquan Wu, Zheng-Yu Niu, Hua Wu, Haifeng Wang, and Shihang Wang
The first long-term memory conversation task and the largest multi-turn mutual persona chat dataset in Chinese.

Please see my Google Scholar profile for more papers.

Experience

(Jan. 2023 - Current) Reserch Intern, NLC Group, Microsoft Research Asia, Beijing, China.
Mentor: Yeyun Gong, Weizhu Chen
Working on large language models, focusing on reasoning and tool-use.
(Sep. 2021 - May. 2022) Research Intern, General Dialogue Group, Baidu Inc., Beijing, China.
Mentor: Xinchao Xu, Hua Wu
Working on open-domain dialog: long-term memory, personalized and safe chatbot.

Competitions

(Nov. 2022) 1st Place (1/54) in NLP task of NeurIPS 2022 IGLU Challenge
(Aug. 2022) 2nd Place (2/684) in WAIC 2022 Midu CTC Competition
(May. 2021) 1st Place (auto eval: 1/862, human eval: 3/862) in Multi-skill Dialog task of NLPCC 2021 Language and Intelligence Challenge

Honors & Awards

Outstanding Graduate Thesis, Beijing, 2022
Outstanding Graduate, Beijing, 2022
National Scholarship (top 1%), Ministry of Education, China, 2020
National Scholarship (top 1%), Ministry of Education, China, 2019
National Scholarship (top 1%), Ministry of Education, China, 2018