I am a Ph.D. candidate in Computer Science at University of Notre Dame (graduating May 2027), advised by Prof. Xiangliang Zhang. I build RL post-training methods, agentic reasoning systems, and reward-aligned generative models.

My work focuses on the hard problems that arise when LLMs must act, not just predict: how to shape reward signals that remain informative over long horizons, how to build agents that verify their own tool-use chains before committing (multi-agent orchestration, NeurIPS 2025), and how to dynamically allocate inference-time compute to match problem difficulty (AdaReasoner, NeurIPS 2025 Spotlight). I use scientific reasoning as a demanding testbed — where tool errors cascade, outputs require formal verification, and benchmarks I created (NeurIPS 2023, 300+ citations; NeurIPS 2024 Spotlight) have become community standards.

Research

1. RL Post-Training and Inference-Time Reasoning. I develop methods that improve LLM behavior through reinforcement learning — at both training time and inference time. This includes principled reward-shaping frameworks for stable policy optimization (CEPO), adaptive inference-time reasoning that dynamically selects computation depth to reduce cost by 35% with no accuracy loss (AdaReasoner, NeurIPS 2025 Spotlight), and ongoing work unifying reward-aligned generation with KL-regularized RL via optimal-transport coupling.

2. Agentic Reasoning and Structured Tool Use. I build multi-agent architectures with explicit Router–Planner–Executor–Verifier pipelines where each tool call is verified before downstream steps proceed. Our orchestration framework (ChemOrch, NeurIPS 2025) integrates 74+ structured tools and dramatically improves reliability on expert-level, multi-step reasoning. A follow-on system adds knowledge-graph grounding and self-evolving memory.

3. Rigorous Evaluation of Frontier Models. I have led or co-led benchmark projects that identified fundamental capability gaps: ChemLLMBench (NeurIPS 2023, 300+ citations) revealed silent failure modes in LLMs; MolPuzzle (NeurIPS 2024 Spotlight) exposed a multimodal perception gap (GPT-4o: 1.4% on expert tasks). These benchmarks are now community standards.

News

Seeking Internship: Actively looking for Research Scientist / Applied Scientist Internships for Spring/Summer/Fall 2026 — RL post-training, agentic systems, inference-time reasoning. Reach out!
2026.03: Passed Ph.D. Candidacy Exam. Officially a Ph.D. candidate!
2025.12: New preprint: co-first author on Evaluating Large Language Models in Scientific Discovery — a comprehensive multi-domain assessment across 10 scientific fields.
2025.09: Two papers accepted at NeurIPS 2025: ChemOrch (multi-agent orchestration for chemistry) and AdaReasoner (Spotlight, adaptive inference-time reasoning).
2025.08: Completed Applied Scientist Internship at Amazon AWS AI (Deep Engine Team) in NYC.
2025.04: Survey paper accepted at IJCAI 2025 (Survey Track): AI in Spectroscopy.
2024.12: Selected for OpenAI Researcher Access Program.
2024.09: MolPuzzle accepted at NeurIPS 2024 Dataset and Benchmark Track as Spotlight (top ~3%).
2024.09: One paper accepted at main conference of EMNLP 2024.
2024.06: Passed Ph.D. Qualification exam.
2023.09: First-author paper accepted at NeurIPS 2023: ChemLLMBench (300+ citations).

Selected Publications (Full Publications)

Can LLMs Solve Molecule Puzzles? A Multimodal Benchmark for Molecular Structure Elucidation

Kehan Guo, Bozhao Nan, Yujun Zhou, Taicheng Guo, Zhichun Guo, Mihir Surve, Zhenwen Liang, Nitesh V. Chawla, Olaf Wiest, Xiangliang Zhang

NeurIPS, 2024. Spotlight. [Paper] [Code]

What can Large Language Models do in chemistry? A comprehensive benchmark on eight tasks

Taicheng Guo*, Kehan Guo*, Bozhao Nan, Zhenwen Liang, Zhichun Guo, Nitesh V. Chawla, Olaf Wiest, Xiangliang Zhang

NeurIPS, 2023. 300+ citations. [Paper] [Code]

Evaluating Large Language Models in Scientific Discovery

Kehan Guo*, Ziqian Song*, Jiuding Lu*, and others

arXiv preprint, 2025. [Paper] [Code] [Dataset] [Oracle]

Artificial Intelligence in Spectroscopy: Advancing Chemistry from Prediction to Generation and Beyond

Kehan Guo, Yifan Shen, Gil A. Gonzalez-Montiel, Yue Huang, Yujun Zhou, Mihir Surve, Zhichun Guo, ...

IJCAI, 2025. Survey Track. [Paper] [Code]

AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking

Xiangqi Wang, Yue Huang, Yanbo Wang, Xiaonan Luo, Kehan Guo, Yujun Zhou, Xiangliang Zhang

NeurIPS, 2025. Spotlight. [Paper] [Code]

ChemOrch: Empowering LLMs with Chemical Intelligence via Synthetic Instructions

Yue Huang, Zhengzhe Jiang, Xiaonan Luo, Kehan Guo, Haomin Zhuang, Yujun Zhou, Zhengqing Yuan, Xiaoqi Sun, Jules Schleinitz, Yanbo Wang, Shuhao Zhang, Mihir Surve, Nitesh V. Chawla, Olaf Wiest, Xiangliang Zhang

NeurIPS, 2025. [Paper] [Code]

Causally-Enhanced Reinforcement Policy Optimization

Xiangqi Wang, Yue Huang, Yujun Zhou, Xiaonan Luo, Kehan Guo, Xiangliang Zhang

arXiv preprint, 2025. [Paper] [Code]

SEUF: Is Unlearning One Expert Enough for Mixture-of-Experts LLMs?

Haomin Zhuang, Yizhuo Zhang, Kehan Guo, Jinghan Jia, Gang Liu, Sijia Liu, Xiangliang Zhang

ACL, 2025. [Paper]

Beyond Single-Value Metrics: Evaluating and Enhancing LLM Unlearning with Cognitive Diagnosis

Yuang Lang, Kehan Guo, Yue Huang, Yujun Zhou, Haomin Zhuang, Tianyu Yang, Yuxuan Su, Xiangliang Zhang

ACL Findings, 2025. [Paper]

Unveiling the Power of Language Models in Chemical Research Question Answering

Xiuying Chen, Tairan Wang, Taicheng Guo, Kehan Guo, Jingwei Zhou, Hualin Li, Ziqian Song, Xiang Gao, Xiangliang Zhang

Communications Chemistry, 2025. [Paper] [Code]

Defending Jailbreak Prompts via In-Context Adversarial Game

Yujun Zhou, Yufei Han, Haomin Zhuang, Kehan Guo, Zhenwen Liang, Hongyan Bao, Xiangliang Zhang

EMNLP, 2024. [Paper] [Code]

SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark

Zhenwen Liang, Kehan Guo, Gang Liu, Taicheng Guo, Yujun Zhou, Tianyu Yang, Jiajun Jiao, Renjie Pi, Jipeng Zhang, ...

ACL, 2024. [Paper] [Code]

Uncertainty-Aware Yield Prediction with Multimodal Molecular Features

Jiayuan Chen, Kehan Guo, Zhen Liu, Olexandr Isayev, Xiangliang Zhang

AAAI, 2024. [Paper] [Code]

Graph-based Molecular Representation Learning

Zhichun Guo, Kehan Guo, Bozhao Nan, Yijun Tian, Roshni G. Iyer, Yihong Ma, Olaf Wiest, Xiangliang Zhang, Wei Wang, ...

IJCAI, 2023. [Paper]

Education

2022.09 - 2027.05 (Expected), Ph.D, University of Notre Dame — Advisor: Prof. Xiangliang Zhang
2020.09 - 2022.05, M.S, Boston University

Awards

2025 NeurIPS Spotlight (top ~3%) — AdaReasoner
2024 NeurIPS Spotlight (top ~3%) — MolPuzzle
2024 OpenAI Researcher Access Program

Professional Service

Reviewer: NeurIPS (2024–25), ICLR (2025–26), ICML, AAAI, IJCAI, KDD, ACL Rolling Review, WWW, EMNLP