COS598D Princeton University Overcoming Intractability in Machine Learning

Princeton University
Computer Science Department

Computer Science 598D
Overcoming Intractability in Machine Learning

Sanjeev Arora

Spring 2015

Course Summary

This is a seminar course that will focus on the following phenomenon: many problems in machine learning are formally intractable (e.g., NP-hard). Nevertheless they are solved in practice by heuristics. Can we design algorithms with provable guarantees (running time, solution quality)? The course will cover some successes in this direction and focus on discussing open problems (of which there are many).

The course is geared towards graduate students but may be OK for undergrads who are suitably prepared. (Undergrad level training in algorithms and machine learning. Knowledge of material in COS 521 as taught in Fall 2014 would be highly desirable, especially the lectures on high dimensional geometry and SVD.)

Enrolled students as well as auditors are expected to come to class regularly and participate in class discussion. To get a grade in the class, students should also be prepared to present a topic some time in the course, and write a brief paper or lecture notes on somebody else's presentation. There may also be an opportunity to do a course project, depending upon student interest.

Administrative Information

Lectures: Thurs 13:30-16:20 Room: 302 CS Building. First meeting: Feb 5.

Instructor: Sanjeev Arora- 307 CS Building - 609-258-3869 arora AT the domain name cs.princeton.edu

Lecture Schedule

(Note: lecture notes are actually rough notes for myself from the night before. They are completely unpolished, and possibly riddled with inaccuracies.)

Feb 5: Whirlwind intro to ML. (Supervised, Unsupervised, Discriminative, Generative, Max Entropy Models, Max likelihood, etc.). Intro to themes of the course.	Lecture 1 notes Relevant chapter on generative vs discriminative in Tom Mitchell’s book. Survey by K Conrad on Max Entropy.	Readings: On discriminative vs generative: A comparison of logistic regression and naïve bayes, by Ng and Jordan in NIPS 2001. Generative or Discriminative? Getting the best of both worlds by Bishop and Lasserre. Bayesian Statistics 2007. Connection between max likelihood and max entropy. (Note by J. Coughlan.) More philosophical: Steinhardt's blogpost on the old Bayesian vs Frequentist argument (and how it relates to more modern arguments in ML).
Feb 12: Linearized models in ML. SVD: A Swiss army knife for solving many such problems.	Brief Lecture notes. COS521 lecture notes (Lec 1 and Lec 2) on SVD. Moitra's chapter on Sparse Recovery. Hopcroft Kannan Section 8.4 (spectral clustering).	Read wikipedia pages on topic modeling, sparse coding, restricted isometry property (RIP).
Feb 19: Nonnegative Matrix Factorization and Topic Modeling.	Moitra's Chapter 2 except Sec 2.3	Papers: Arora, Ge, Moitra, Kannan on NMF (STOC 2012); Arora, Ge Moitra on Topic Models (FOCS 2012) and Arora et al. on efficient versions of topic modeling algorithm (ICML'13)
Feb 26: No Class
Mar 5: Sparse coding/dictionary learning, and unsupervised learning of deep nets. (A common thread is a simple algorithm for community detection.)	Scribe notes by Max Simchowitz.	Papers: Arora, Ge, Moitra on Provable algorithms for Dictionary Learning (COLT'14) Arora, Bhaskara, Ge, Ma on Unsupervised learning of Deep nets (ICML'14)
Mar 12: Learning latent variable models by Tensor Decomposition. Learning topic models under l1 error (new manuscript by Bansal, Bhattacharya, Kannan; presented by Andrej Risteski)	Scribe notes by Holden Lee.	Tensor decompositions for learning latent variable models by Anandkumar, Ge, Hsu, Kakade, Telgarsky. Provable SVD-based algorithm for topic modeling by Bansal, Bhattacharya, Kannan.
Mar 26: Graphical models; Markov Random Fields; Bayes Nets; Gibbs Distribution. Computational Intractability of Inference problems;	Slides from a Short Course on Graphical Models by Mark Paskin. (Lectures 2, 3 are especially useful.)	Graphical models, exponential families, and variational inference by Wainwright and Jordan (a book-length survey) A sensitivity analysis of Pathfinder by Kevin Ng (evaluates this seminal use of graphical models in medical diagnoses). Chapter 7: Gibbs States in Probability on Graphs by G. Grimmett
April 2: Introduction to NLP; Markovian models; Smoothing. Semantic Word Vectors.	Some well-written book chapters by Mike Collins (a) Language models (b) Loglinear models. Kathy McKeown's lecture notes on Hidden Markov Models. (Also explains Part-of-speech tagging, and Forward-Backword and Viterbi algorithm.)	Lecture notes by Kathy McKeown n-grams and corpus linguistics.
April 9: More NLP. Probabilistic CFGs; lexicalized CFGs. Semantics.	Book chapters by Mike Collins: Probabilistic CFGs Lexicalized CFGs. IBM Translation Models	A convex alternative to the IBM Model 2, by Simion, Collins and Stein. EMNLP 2013.
April 16: Semantics. Semantic word embeddings.	Paper by Arora, Li, Liang, Ma, Risteski.	Paper in Neuron 2013 showing how to use topic modeling to understand fMRI data. (by Stansbury, Naselaris, Gallant).
April 23: (1) SVD based algorithms for learning HMMs and continuous-output HMMs. (Presentation by Alex Smith.) (2) Matrix Completion via convex relaxation.	(1) Spectral Algorithm for Learning HMMs by Hsu, Kakade and Zhang. (2) Hilbert Space Embeddings of HMMs by Song et al. (3) Moitra's lecture notes on matrix completion.	Slides by Le Song on paper (2).
April 30: (1) Efficiently learning Ising models on arbitrary graphs. (presentation by Holden Lee) (2) Energy based modeling in ML (including RBMs, word2vec, etc.)	Paper by Guy Bresler.

General Resources

Rob Schapire's notes on Machine Learning Theory.
Ankur Moitra's notes Algorithmic Aspects of Machine Learning.
Sanjeev Arora's grad course on Advanced Algorithm Design.
Draft of Foundations of Data Science by Hopcroft and Kannan.

Princeton University Computer Science Department

Computer Science 598D Overcoming Intractability in Machine Learning