COS 597k: Systems for Serving Generative AI, Fall 2024

Instructor: Ravi Netravali
Lectures: Wednesday 10:00-11:20am (ET) in Computer Science Building, Room TBD
Office Hours: by appointment (rnetravali@cs.princeton.edu)

Course Overview

Generative machine learning models, from large language models for chatbots to diffusion models for text-to-image generation, have become key players in important decisions and tasks across societal sectors. Unfortunately, their abilities to engage in conversation and spark creativity is coupled with high costs (both monetary and compute resource) and sometimes sluggish performance, especially when contextualized relative to the millions of requests that such models face for serving in production services. This research-centric course examines a wide range of systems optimizations that enable large-scale serving of generative models. The course will be heavily based on studying, presenting, and actively discussing research papers, and will also involve a semester long research project.

Grading

  • 40% Participation in paper discussions

  • 25% Paper presentation/lecture

  • 35% Research project (report and presentation)

Paper Reading and Discussion

A major component of this course is reading and discussing research papers in depth. To ensure a lively and focused discussion, you should closely read each paper multiple times prior to the lecture in which it will be discussed. You should aim to come to class prepared with several points that will substantially contribute to the group discussion. General tips on reading research papers can be found here, and we strongly encourage you to review and follow the discussion suggestions from a prior COS 561 offering. Your participation grade will be determined based on attendance and, more importantly, substantial contributions to paper discussions in class; as a rule of thumb, given the small class size, you should aim for at least two discussion contributions (deep questions, observations, etc.) per lecture.

Paper Presentation/Lecture

In each class, 1-2 students will be expected to present the scheduled paper and lead the discussion for it. Presentations should start with a (roughly) 20-25 minute overview of the paper; in many cases, especially for the first paper in a given topic, presenters are responsible for providing background for the given area (please reach out to the instructor for background pointers). The format of this part of the presentation should be ‘‘conference style,’’ i.e., covering the domain and relevant background for the paper, the problem statement and challenges, the solution, results, and potential limitations and improvements. However, the presentation should go into more detail than a typical conference talk would, particularly on the design of the proposed solution; for this reason, while public conference slides for the paper can be used as an aid, they will not suffice for the lecture. The remainder of the lecture will involve leading discussion by both fielding and posing questions to spark discussion. Non-presenters are expected to actively participate in the discussions and bring discussion points (including questions) of their own. Active participation will lead to a lively discussion that will benefit everyone.

Research Project

In addition to paper reading, this course will also include a semester-long research project. Students will carry out projects in pairs. The goal of this research project is not necessarily to fully implement a research idea. Instead, students are encouraged to pick a problem that is new (i.e., previously unsolved) and exciting to them, and focus primarily on building (small-scale) prototypes and collecting measurements to motivate the problem and their solution. Thus, implementation is a key aspect of the project, but students are encouraged to aim high, and not feel restricted to topics or ideas that could be 100% implemented before the course concludes. The scope of acceptable topics is quite large – anything related to improving inference pipelines for generative models (language, image, multi-modal, etc.) is fair game. Extensions to ongoing research projects can be used if in scope; please see the instructor to discuss your specific ongoing project and how you would like to extend it for the course. It is strongly encouraged to begin thinking about project topics early on in the semester by reviewing the reading list/topics, and discussing with the instructor. The timeline and deliverables for the project are:

  • Team formation (due Wednesday 9/11 at 5pm ET)

  • 1-2 page project proposal+plan (due Friday, 10/11 at 5pm ET)

  • Final project presentation (during lecture, Wednesday 12/4)

  • 5-6 page final project report (due Dean's date: 12/13 at 5pm ET); this should be submitted as a PDF generated using the Usenix conference research paper format.