Instructor | Prof. Ellen Zhong |
Time | Thursdays 3:00-5:00p, |
"Precept" / student-only discussion | Wednesdays 1:00-2:00p, CS 301 |
Office hours | Mondays 4:00-5:00p, CS 314 |
Slack | Link |
Syllabus | Link |
Recent breakthroughs in machine learning algorithms have transformed the study of the 3D structure of proteins and other biomolecules. This seminar class will survey recent papers on ML applied to tasks in protein structure prediction, structure determination, computational protein design, physics-based modeling, and more. We will take a holistic approach when discussing papers, including discussing their historical context, algorithmic contributions, and potential impact on scientific discovery and applications such as drug discovery.
For more information on the discussion format, expectations, and grading, see the course syllabus.
A non-exhaustive list of topics we will cover include:
Selected papers will cover a broad range of algorithmic concepts and machine learning techniques including:
In addition to the assigned papers, optional primers or reviews on relevant topics will be made available for background reading.
Assignment 1. Due 11am, Friday, September 30th via Canvas
Assignment 2. Due 11am, Friday, October 14th via Canvas
Assignment 3. Due 3pm, Thursday, November 3rd in class and via Canvas
Assignment 4. Due 3pm, Thursday, December 8th via Canvas
Assignment / Quiz 5. 3pm, Thursday, December 15th in class
Thursday September 22nd, 3pm ET
Dr. Michael Figurnov (DeepMind)
Title: Highly accurate protein structure prediction with AlphaFold
Abstract: Predicting a protein’s structure from its primary sequence has been a grand challenge in
biology for the past 50 years, holding the promise to bridge the gap between the pace of genomics
discovery and resulting structural characterization. In this talk, we will describe work at DeepMind to
develop AlphaFold, a new deep learning-based system for structure prediction that achieves high accuracy
across a wide range of targets. We demonstrated our system in the 14th biennial Critical Assessment of
Protein Structure Prediction (CASP14) across a wide range of difficult targets, where the assessors judged
our predictions to be at an accuracy “competitive with experiment” for approximately 2/3rds of proteins.
The talk will focus on the underlying machine learning ideas, while also touching on the implications for
biological research.
Bio: Michael Figurnov is a Staff Research Scientist at DeepMind. He has been working with the
AlphaFold team for the past four years. Before joining DeepMind, he did his Ph.D. in Computer Science at
the Bayesian Methods Research Group under the supervision of Dmitry Vetrov. His research interests include
deep learning, Bayesian methods, and machine learning for biology.
Thursday November 10th, 12:30p ET (CS 105)
Dr. John Ingraham (Generate Biomedicines)
Title: Illuminating protein space with a programmable generative model
Abstract: Three billion years of evolution have produced a tremendous diversity of protein
molecules, but it is yet unknown how thoroughly evolution has sampled the space of possible protein folds
and functions. Here, by introducing a new, scalable generative prior for proteins and protein complexes,
we provide further evidence that earth's extant molecular biodiversity represents only a small fraction of
what is possible for polypeptides. To enable this, we introduce customized neural networks that enable
long-range reasoning, that respect the statistical structures of polymer ensembles, and that can
efficiently realize 3D structures of proteins from predicted geometries. We show how this framework
broadly enables protein design under auxiliary constraints, which can be any composition of semantics,
substructure, symmetries, shape, and even natural language prompts.
Bio: John Ingraham is the Head of Machine Learning at Generate Biomedicines, Inc, where he leads a
team of scientists and engineers developing new kinds of machine learning systems for protein design. He
has spent most of his career developing structured statistical models of the rich diversity found in
protein sequences and structures, including as a postdoc at MIT CSAIL with Tommi Jaakkola and Regina
Barzilay working on some of the first generative models for structure-based sequence design and before
that in his PhD with Debora Marks at Harvard Medical School developing deep learning and
statistical-physics inspired models of deep evolutionary sequence variation and protein folding.