03-06
Building Expert-Level Language Models from Decomposed Weak Validations

Image
Ruiqi Zhong

Language models (LMs) can process large volumes of information and perform complex reasoning. They hold the promise of executing expert-level tasks, such as brainstorming scientific hypotheses or developing complex software. However, building these LMs requires humans to validate their outputs, which is challenging; e.g., developers cannot easily validate whether complex software is bug-free. If our validation is fallible, LMs may learn to "hack" the validators, convincing us that they are right even when they are wrong.

To address this, I show how to decompose complex validation tasks into "weaker" ones that are easier for humans or LMs: e.g., validating return values rather than entire programs, or validating discoveries on individual samples rather than on entire datasets. Through several examples, I show how these techniques allow us to use LMs for expert-level tasks more reliably. Looking forward, I discuss how to use LMs to automate these task decompositions, and how we can use these frameworks to monitor both individual AI systems and their broader impact within society.

Bio: Ruiqi Zhong is a final-year Ph.D. student at UC Berkeley, co-advised by Jacob Steinhardt and Dan Klein. He was previously a part-time member of the technical staff at Anthropic, where he worked on the automated red teaming team. His research is at the intersection of machine learning and NLP, and he develops language model systems to advance the frontier of human capabilities. He developed the earliest prototype of instruction-tuning, and his research contribution has been scaled up by leading language model companies, such as Google, OpenAI, and Anthropic.


To request accommodation for a disability, please contact Emily Lawrence, emilyl@cs.princeton.edu, at least one week prior to the event.

Date and Time
Thursday March 6, 2025 12:30pm - 1:30pm
Location
Computer Science Small Auditorium (Room 105)
Host
Danqi Chen

Contributions to and/or sponsorship of any event does not constitute departmental or institutional endorsement of the specific program, speakers or views presented.

CS Talks Mailing List