Zhibin Gou 苟志斌

Hi! I’m Zhibin Gou, currently a second-year M.S. student at SIGS, Tsinghua University, advised by Prof. Yujiu Yang. Before that, I received my bachelor’s degree in Computer Science from Beijing University of Posts and Telecommunications in Jun. 2022. My currect research interest lies in LLMs, especially in reasoning and tool use.

Education

  • Aug. 2022 - Jun. 2025 (Expected) M.Sc., Div. of Information Science and Technology, SIGS, Tsinghua University, Beijing, China.
    GPA: 4.0/4.0

  • Sep. 2018 - Jun. 2022 B.Sc., School of Computer Science, Beijing University of Posts and Telecommunications, Beijing, China.
    GPA: 3.84/4.0, Rank: 1%

Selected Publications

(* indicates equal contribution)

  • [Arxiv 2024] Rho-1: Not All Tokens Are What You Need [code]
    Zhenghao Lin*, Zhibin Gou*, Yeyun Gong, Xiao Liu, Yelong Shen, Ruochen Xu, Chen Lin, Yujiu Yang, Jian Jiao, Nan Duan, Weizhu Chen
    Rho-1 introduces Selective Language Modeling (SLM), a method for token-level pretraining data selection. By applying SLM to math continual pretraining, it enhances math reasoning by over 16%, reaching baseline performance 5-10x faster.

  • [ICLR 2024] ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving [website][code] (Stars: 800+)
    Zhibin Gou*, Zhihong Shao*, Yeyun Gong, Yelong Shen, Yujiu Yang, Minlie Huang, Nan Duan, Weizhu Chen
    ToRA solves math problems by integrating chain-of-thought reasoning with program-based tool use. ToRA-34B is the first open-source LLM that achieves over 50% on MATH, which is competitive with GPT-4 solving problems with programs.

  • [ICLR 2024] CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing [code] (Citations: 100+)
    Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Nan Duan, Weizhu Chen
    The first paper to find that current LLMs struggle with Self-Verification and Self-Correction, and propose tool-augmented critiquing for reliable self-improvement.

  • [ACL 2023] MvP: Multi-view Prompting Improves Aspect Sentiment Tuple Prediction [code]
    Zhibin Gou*, Qingyan Guo*, Yujiu Yang
    MvP is a simple unified generative framework for structure prediction, achieving state-of-the-art performance on 10 datasets across 4 ABSA tasks.

  • [ACL 2022 findings] Long Time No See! Open-Domain Conversation with Long-Term Persona Memory [data]
    Xinchao Xu*, Zhibin Gou*, Wenquan Wu, Zheng-Yu Niu, Hua Wu, Haifeng Wang, and Shihang Wang
    The first long-term memory conversation task and the largest multi-turn mutual persona chat dataset in Chinese.

Please see my Google Scholar profile for more papers.

Experience

  • (Jan. 2023 - Current) Reserch Intern, NLC Group, Microsoft Research Asia, Beijing, China.
    Mentor: Yeyun Gong, Weizhu Chen
    Working on large language models, focusing on reasoning and tool-use.

  • (Sep. 2021 - May. 2022) Research Intern, General Dialogue Group, Baidu Inc., Beijing, China.
    Mentor: Xinchao Xu, Hua Wu
    Working on open-domain dialog: long-term memory, personalized and safe chatbot.

Competitions

Honors & Awards

  • Outstanding Graduate Thesis, Beijing, 2022
  • Outstanding Graduate, Beijing, 2022
  • National Scholarship (top 1%), Ministry of Education, China, 2020
  • National Scholarship (top 1%), Ministry of Education, China, 2019
  • National Scholarship (top 1%), Ministry of Education, China, 2018