COS 511 Foundations of Machine Learning Theory, Spring 2003

COS 511 FOUNDATIONS OF MACHINE LEARNING

Instructor: Rob Schapire

TA: Donna Gabai

Time: Tue / Thurs 11:00-12:20

Place: Frick Lab room 124

General course information.
Preparing scribe notes.
Final project information.

Homeworks

Homework #1: Due Feb. 18. stats.
Homework #2: Due Feb. 27. stats.
Homework #3: Due Mar. 6. Clarification on problem 1a. stats.
Homework #4: Due Mar. 25. stats.
Homework #5: Due Apr. 8. stats.

Lecture notes

Feb. 4: Introduction, basic definitions, the consistency model. ps pdf
Feb. 6: Review of probability; PAC learning model. ps pdf
Feb. 11: Examples of PAC learnable classes; Sample complexity for finite hypothesis spaces; Occam's razor. ps pdf
Feb. 13: General bounds on sample complexity for infinite hypothesis spaces. ps pdf
Feb. 18: Sample complexity bound in terms of the growth function; VC-dimension; Sauer's Lemma. ps pdf
Feb. 20: A lower bound on sample complexity using VC-dimension; handling inconsistent hypotheses; beginning of Chernoff bounds. ps pdf
Feb. 25: Chernoff bounds proof; McDiarmid's inequality; error bounds for inconsistent hypotheses; overfitting. ps pdf
Feb. 27: Boosting: Introduction and bounding the training error. ps pdf
Mar. 4: Generalization error of boosting. ps pdf
Mar. 6: Finished generalization error of boosting; support-vector machines. ps pdf
Mar. 11: Finished support-vector machines; learning with expert advice; the halving algorithm. ps pdf
Mar. 13: Continued learning with expert advice; the weighted majority algorithm. ps pdf
Mar. 25: Perceptron and winnow. ps pdf
Mar. 27: Analysis of winnow; square loss; beginning linear regression. ps pdf
Apr. 1: Finish linear regression; Widrow-Hoff and other on-line algorithms for linear regression. ps pdf
Apr. 3: Converting on-line to batch; neural networks; modelling probability distributions. ps pdf
Apr. 8: Maximum likelihood and maximum entropy modeling of distributions. ps pdf
Apr. 10: Analysis of maximum entropy; Bayes algorithm for on-line learning with log loss. ps pdf
Apr. 15: Bayes algorithm; switching experts; beginning investment portfolios. ps pdf
Apr. 17: Cover's universal portfolio algorithm; tips on running machine learning experiments. ps pdf
Apr. 22: Membership and equivalence queries; Angluin's algorithm for learning finite automata. ps pdf
Apr. 24: Angluin's algorithm continued; begin reinforcement learning and Markov decision processes. ps pdf
Apr. 29: Reinforcement learning. ps pdf
May 1: More reinforcement learning. ps pdf

Additional Readings

(Related papers that go into more detail on topics discussed in class.)

Robert E. Schapire.
The boosting approach to machine learning: An overview.
In MSRI Workshop on Nonlinear Estimation and Classification, 2002.
Postscript or gzipped postscript.
Robert E. Schapire, Yoav Freund, Peter Bartlett and Wee Sun Lee.
Boosting the margin: A new explanation for the effectiveness of voting methods.
The Annals of Statistics, 26(5):1651-1686, 1998.
Postscript or compressed postscript.
Avrim Blum.
On-line algorithms in machine learning.
In Dagstuhl Workshop on On-Line Algorithms, June, 1996.
Gzipped postscript.
Jyrki Kivinen and Manfred K. Warmuth.
Additive versus exponentiated gradient updates for linear prediction.
Information and Computation, 132(1):1-64, January, 1997.
Postscript.
There are a number of excellent papers and other materials on maximum entropy here.
The duality theorem given in class is taken from:
Stephen Della Pietra, Vincent Della Pietra and John Lafferty.
Inducing features of random fields.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4):380-393, April, 1997.
Postscript.
Mark Herbster and Manfred K. Warmuth.
Tracking the Best Expert.
Machine Learning, 32(2):151-178, August, 1998.
Postscript.
Avrim Blum and Adam Kalai.
Universal Portfolios With and Without Transaction Costs.
Machine Learning, 35:193-205, 1999.
Gzipped postscript.
Richard S. Sutton and Andrew G. Barto.
Reinforcement Learning: An Introduction.
MIT Press, 2001.
See especially Chapters 3, 4 and 6.

Books

Here is a list of optional books for further background reading. All of these are being placed on reserve at the Engineering Library.

Luc Devroye, László Györfi, Gábor Lugosi.
A probabilistic theory of pattern recognition .
Springer, 1996.
Richard O. Duda, Peter E. Hart and David G. Stork.
Pattern classification .
Wiley, 2001.
Trevor Hastie, Robert Tibshirani and Jerome Friedman.
The elements of statistical learning: data mining, inference, and prediction .
Springer, 2001.
Michael J. Kearns and Umesh V. Vazirani.
An introduction to computational learning theory.
MIT Press, 1994.
Tom M. Mitchell.
Machine learning .
McGraw-Hill, 1997.
Vladimir N. Vapnik.
The nature of statistical learning theory .
Springer, 1995.
Vladimir N. Vapnik.
Statistical learning theory .
Wiley, 1998.