Princeton University
|
Computer Science 597G
|
Fall 2018
|
This is a graduate course focused on research in theoretical aspects of deep learning. In recent years, deep learning has become the central paradigm of machine learning and related fields such as computer vision and natural language processing. But mathematical understanding for many aspects of this endeavor are still lacking. When and how fast does training succeed, and using how many examples? What are strengths and limitations of various architectures?
The course is geared towards graduate students in computer science and allied fields, but may be OK for undergrads who're suitably prepared. (Knowledge of machine learning as well as algorithm design/analysis. Ideally, they would have taken at least one of COS 521 and COS 511.)
Enrolled students as well as auditors are expected to come to class regularly and participate in class discussion. To get a grade in the class, students should also be prepared to present a topic some time in the course, and write a brief paper or lecture notes on somebody else's presentation. There may also be an opportunity to do a course project, depending upon student interest.
This course does not satisfy any undergrad requirements in the COS major (BSE or AB).
Instructor: Sanjeev Arora- 407
CS
Building - 609-258-3869 arora AT the domain name cs.princeton.edu
TA: Yuping Luo yupingl AT the domain name cs.princeton.edu
Date/Topic |
Required Reading |
Additional Reading |
Sept 13: Basic setup of deep learning.
Gradient descent, back propagation, and training methods. |
Michael
Nielsen's book for the basics |
|
Sept 18: Representing functions using deep
nets. Barron's theorem. |
Scribe notes
by Neelesh Kumar and Yuting Wang. |
Barron's
paper. Holden Lee's notes. A bit rough. Matus Telgarsky's lecture slides. |
Sept 20: Basics of optimization concepts. |
Scribe notes
by Abhishek Bhrushundi, Pritish Sahu, Wenjia Zhang |
|
Sept 25: Introduction to practical aspects of
deep learning. (Lecturer Yuping Luo). |
Slides |
|
Sept 27: No class |
||
Oct 2: Understanding adaptive regularization/
preconditioners/learning rates (and how they are related) |
Scribe notes
by Ethan Tseng, DiJia(Andy) Su |
Description of Adagrad p 89-94 of Hazan's OCO book. Catalog of adaptive learning algorithms with brief discussion. (Sebastian Ruder's blog article.) |
Oct 4: Concluding remarks on adaptive
regularization. Escaping saddle points via perturbed
gradient descent. |
Scribe notes
by Pranay Manocha and Mohammed T. Islam. |
Blog
post by Chi Jin and Mike Jordan. 2017 Lecture notes by Misha Khodak |
Oct 9: Generalization: basic concepts. The
ubiquitous union bound. Rademacher Complexity. |
Updated
scribe notes by Charlie Hou. |
Blog
post. 2017 scribe notes by Hrishikesh Khanderparker. |
Oct 11: PAC-Bayes bounds and some
applications. |
||
Oct 16: Generalization: Proving
generalization via compression arguments. Linear classifiers
with margin. ReLU deep nets. |
Stronger
Generalization Bounds via a Compression Approach.
(ICML'18) See also Blog
post. |
|
Oct 18: Finish generalization via
compression. Start unsupervised learning. Various notions of
unsupervised learning and how they're connected. |
Unsupervised
learning and VAEs by Nataly Brukhim, Arushi Gupta,
and Edgar Minasyan. |
Blog
post. |
Oct 23: Variational Auto-encoders (VAEs).
Start GANs. |
||
Oct 25: Metrics on distributions and GANs.
Limitations of GANs. Training objective cannot prevent
mode-collapse |
Blog post
1, post 2,
post 3 |
|
Nov 6: More on gradient based training and
"shaping" the gradient. Proximal gradient descent.
Architectures for "shaping" the gradient: ResNets,
DenseNets, and BatchNorm. |
Blogpost
on towarddatascience describing various ResNet models. |
|
Nov 8: Proving gradient descent attains zero
training error with overparametrized nets (in simple
architectures). |
Paper
by Du, Zhai, Poczos and Singh. |
|
Nov 13: Language models. Introduction to
n-grams, word embeddings, LSTMs. |
Language
Modeling (book Chapter by Mike Collins) Lecture notes by Socher et al (from Stanford CS224) |
|
Nov 15 (guest lecture by Sham Kakade): Recent
trends in language models: Elmo and his friends from
Sesame Street. |
Rough
notes by Sham. Scribe notes coming. |
Google news releases on Transformers
and BERT.
|
Nov 20: Sham Kakade guest lecture.
Continuation of language modeling. |
More
notes by Sham. |
|
Nov 27: Understanding word and sentence
embeddings. |
Blog post
1, post
2, post
3, post
4. |
|
Nov 29: Finish sentence embeddings. Start
Adversarial Examples. |
Blog post
1. Paper 1( Liu et al), Paper 2 (Tramer et al) Paper 3 (Madry et al) |
|
Dec 4: Certified defense against adversarial
examples. |
paper1
(Wong and Kolter'18) and paper 2(Wong et
al, Nov'18) |
|
Dec 11: Recent analyses of teacher-student
net training where student net is overparametrized. |
Paper
by Simon Du et al. |
|
Dec 13: Teacher-student nets contd.
Generalization in presence of vast overparametrization. |
Please use this style file to scribe notes. Sample files are a source file and a compiled file.