MIT Course 6.858/18.428: Machine Learning

This machine learning course covered the following topics:

Formal models of machine learning
Learning concepts from examples
Learnable classes of concepts
PAC learning
VC-dimension
Bayesian Inference
Neural Nets
Learning from queries
Learning with noise
Learning finite automata
Hidden Markov Models

Available Lecture Notes Fall 1994

Lecture 1: Introduction to the course. Defining models for machine learning. Learning conjunctions in the mistake-bounded model.
Lecture 2: Review of on-line mistake-bounded model. The halving algorithm. Winnow algorithm for learning linearly separable boolean functions.
Lecture 3: Review of the Winnow Algorithm. Perceptron Convergence Theorem. Relationship between VC-dimension and mistake bounds.
Lecture 4: Probably Approximately Correct (PAC) Learning. PAC learning conjunctions.
Lecture 5: Intractability of learning 3-term DNF by 3-term DNF. Occam Algorithms. Learning 3-term DNF by 3-CNF.
Lecture 6: Learning k-decision lists. Occam's Razor (general case). Learning conjunctions with few literals.
Lecture 7: VC-dimension and PAC learning
Lecture 8: PAC learnability of infinite concept classes with finite VC-dimension.
Lecture 9: Estimating error rates. Uniform convergence and VC-dimension.
Lecture 10: Lower bound on sample complexity for PAC learning. Cover's coin problem.
Lecture 11: Bayesian learning. Minimum description length principle.
Lecture 12: Definition of weak learning. Confidence boosting. Accuracy boosting.
Lecture 13: Finish proof that weak learnability implies strong learnability.
Lecture 14: Finish discussion of weak learnability. Freund's boosting method. Introduction to neural networks.
Lecture 15: Neural networks and back propagation.
Lecture 16: Applications of neural nets. Sejnowski and Rosenberg's NETtalk system for learning to pronounce English text. Gorman and Sejnowski's network for learning how to classify sonar targets.
Lecture 17: Computational complexity of training neural nets. Training a 3-Node Neural Node is NP-complete. Expressive power of continuous valued neural networks with only two hidden layers.
Lecture 18: VC-dimension of neural networks. Asymptotic error rates of neural networks.
Lecture 19: Learning in the presence of noise. Malicious noise. Classification noise. Minimizing disagreements to handle classification noise. Minimizing disagreements for conjunctions in NP-hard.
Lecture 20: Statistical query model. Statistical query algorithm for conjunctions. Statistical query learnability implies PAC learnability in the presence of classification noise.
Lecture 21: Learning decision trees using the fourier spectrum (in the membership query model, with respect to the uniform distribution).
Lecture 22: Finish algorithm for learning decision trees. Learning DNF with membership queries with respect to the uniform distribution.
Lecture 23: Learning finite automata with membership and equivalence queries.
Lecture 24: Finish algorithm for learning finite automata with membership and equivalence queries. Learning finite automata without resets using homing sequences.
Lecture 25: Hidden Markov models and an application to speech recognition.
Lecture 26: Piecemeal learning of unknown environments.

Ron Rivest (rivest@theory.lcs.mit.edu)
Mona Singh (mona@cs.princeton.edu)