Princeton University
|
Computer Science 597B
|
Fall 2019
|
This is a graduate course focused on research in theoretical aspects of deep learning. In recent years, deep learning has become the central paradigm of machine learning and related fields such as computer vision and natural language processing. But mathematical understanding for many aspects of this endeavor are still lacking. When and how fast does training succeed, and using how many examples? What are strengths and limitations of various architectures?
The course is geared towards graduate students in computer science and allied fields, (Knowledge of machine learning as well as algorithm design/analysis. Ideally, they would have taken at least one of COS 521 and COS 511.) Auditors are welcome, provided there is space in the class-room. We will prepare detailed notes on the lectures, and the plan is to convert them into a monograph. There will be many guest lecturers from the ongoing IAS special year on Optimization, Statistics and Machine Learning.
Enrolled students as well as auditors are expected to come to class regularly and participate in class discussion. Students who fail to do this will not get credit for the course.
This course does not satisfy any undergrad requirements in the COS major (BSE or AB) and undergrads are not allowed to take this course for a grade.
Instructor: Sanjeev Arora- 407
CS
Building - 609-258-3869 arora AT the domain name cs.princeton.edu
Date |
Topic and main reading |
Additional Reading |
Sept 13 |
Basic framework, intro to optimization and
generalization. Lecture
notes. Also book draft above. |
|
Sept 20 |
Nonconvex landscapes. Generalized linear models, PCA etc. (Awaiting notes.) | |
Sept 27 |
Basic generalization theory. Escaping
saddle points in landscape. Reading: see book draft
above. |
|
Oct 4 |
Towards understanding generalization puzzle
(part 1): Infinitely wide deep nets and associated Neural
Tangent Kernels |
|
Oct 11 |
Current ways to understand generalization of
finite but overparametrized nets. (+ their limitations) |
|
Oct 18 |
Possibly no lecture; instead attend IAS
workshop on theory of DL that week (as your schedule
permits) |
|
Oct 25 |
Implicit regularization in the
Algorithm. |
|
Nov 1 |
Fall break |
|
Nov 8 |
Understanding effect of Dropout
regularization + |
|
Nov 15 |
Variational auto-encoders and loglikelihood
objectives. Generative Adversarial Nets and their
limitations. Optimization of min-max objectives. |
GANS blogpost1 and blogpost2 Paper that Chi Jin talked about. |
Nov 22 |
Empirically successful tricks (eg Momentum,
AdaGrad, Adam etc.) and efforts to understand them.
Connections to 2nd order methods. Batchnorm and its bizarre effects on landscape. |
Blog post listing variants of GD. Paper by Zhiyuan and Sanjeev. Blog post explaining backpropagation and Perlmutter's trick. Paper on LiSSA |
Nov 29 |
Thanksgiving |
|
Dec 6 |
Implicit regularization and acceleration by
going deeper : understanding via dynamics of gradient
descent on linear nets. |
Nadav Cohen's lecture slides. |
Dec 13 |
Adversarial examples and approaches towards
certified defense. Min-max algorithms. |
|
|
||
Please use this style file to scribe notes. Sample files are a source file and a compiled file.