Google Scholar /
Twitter /
BlueSky /
GitHub
I'm a 4th year PhD student in CS at Princeton University, advised by Danqi Chen, and affiliated with Princeton Language and Intelligence. Previously, I was an undergraduate at the University of Cambridge, where I was fortunate to work with Adrian Weller. During my PhD, I have also spent time interning at the Ai2.
My research interests revolve around building and understanding large language models, with a particular focus on their training data (QuRating, WebOrganizer, ProLong, Masking Rates). I have also worked on understanding why LMs are easy to adapt (via Kernel Behavior) and how we can interpret their internal workings (Transformer Programs, Edge Pruning).
I am also part of the team that built SWE-bench and SWE-agent.
(* indicates equal contribution)
How to Train Long-Context Language Models (Effectively) Pre-print 2024
Finding Transformer Circuits with Edge Pruning NeurIPS 2024 (Spotlight)
QuRating: Selecting High-Quality Data for Training Language Models ICML 2024 (Spotlight)
Language Models as Science Tutors ICML 2024
SWE-bench: Can Language Models Resolve Real-World GitHub Issues? ICLR 2024 (Oral)
Learning Transformer Programs NeurIPS 2023 (Oral)
A Kernel-Based View of Language Model Fine-Tuning ICML 2023 ICLR 2023 Workshop on Mathematical and Empirical Understanding of Foundation Models (Spotlight)