04-10
Towards Autonomous Language Model Systems

Language models (LMs) are increasingly used to assist users in day to day tasks such as programming (Github Copilot) or search (Google's AI Overviews). But can we build language model systems that are able to autonomously complete entire tasks end-to-end? In this talk I'll discuss our efforts to build autonomous LM systems, focusing on the software engineering domain. I'll present SWE-bench, our novel method for measuring AI systems on their abilities to fix real issues in popular software libraries. I'll then discuss SWE-agent, our system for solving SWE-bench tasks. SWE-bench and SWE-agent are used by many leading AI orgs in academia and industry including OpenAI, Anthropic, Meta, and Google, and SWE-bench has been downloaded over 2 million times. These projects show that academics on tight budgets are able to have substantial impact in steering the research community towards building autonomous systems that can complete challenging tasks.

Bio: Ofir Press is a postdoc at Princeton University where he mainly works with Karthik Narasimhan's lab. He previously completed his PhD at the University of Washington in Seattle, where he was advised by Noah Smith. During his PhD he spent two years at Facebook AI Research Labs on Luke Zettlemoyer's team.

To request accommodation for a disability, please contact Emily Lawrence, emilyl@cs.princeton.edu, at least one week prior to the event.

Date and Time

Thursday April 10, 2025 12:30pm - 1:30pm

Location

Computer Science Small Auditorium (Room 105)

Event Type

CS Colloquium Series

Speaker

Ofir Press, from Princeton University

Host

Peter Henderson

Contributions to and/or sponsorship of any event does not constitute departmental or institutional endorsement of the specific program, speakers or views presented.

CS Talks Mailing List

04-10 Towards Autonomous Language Model Systems

04-10
Towards Autonomous Language Model Systems