Kehan Guo

Kehan Guo

Building Foundation Models for System 2 Reasoning.

I build Self-Evolving Scientific Agents—systems that move beyond generative fluency to rigorous, System 2 understanding.

My research lies at the intersection of Agentic Reasoning, RL Post-Training, and Generative Planning. By aligning LLMs with structured verification loops and non-autoregressive mechanisms (Diffusion), I aim to enable AI systems to perform reliable, multi-step deduction in open-ended domains.

System 2 Reasoning Architecture
Figure 1: The Blueprint for System 2 AI. Integrating Multi-Agent Planning, Knowledge Graphs, and RL Alignment for rigorous discovery.

News

  • Status 🚀 Actively seeking Research Scientist Internships for Spring/Summer/Fall 2026.
  • 2025.12 New Preprint: Co-first author on Evaluating Large Language Models in Scientific Discovery — a massive community benchmark for AI in Science.
  • 2025.09 Two papers accepted to NeurIPS 2025: ChemOrch (Agentic Chemistry) and AdaReasoner (NeurIPS Spotlight).
  • 2025.08 Completed Applied Scientist Internship at Amazon AWS AI (Deep Engine Team) in NYC.
  • 2024.12 Thrilled to be awarded OpenAI’s Researcher Access Program.
  • 2024.09 MolPuzzle has been accepted by NeurIPS 2024 Dataset and Benchmark Track as a NeurIPS Spotlight!
  • 2024.09 One paper has been accepted by main conference of EMNLP 2024!
  • 2024.06 I passed my Ph.D. Qualification exam!
  • 2023.09 One first-author paper has been accepted by NeurIPS 2023

Research Pillars

System 2 Reasoning & Evaluation

Defining and measuring rigorous thinking. I build comprehensive benchmarks (e.g., MolPuzzle) and evaluation protocols to stress-test LLMs on multi-step deduction, identifying the gap between fluency and true reasoning.

Agentic Alignment & RAG

Orchestrating self-correcting agents. I combine Retrieval-Augmented Generation (RAG) with Reinforcement Learning (RL) post-training to align agentic workflows with structured knowledge, ensuring verifiable and robust planning.

Generative Planning & Diffusion

Beyond autoregressive limits. I explore Diffusion Models as non-autoregressive planners for structured generation. This enables global context modeling and precise controllability in complex design spaces.

Selected Publications

View All →