COS 511: Schedule & Readings

Princeton University
Computer Science Department

Computer Science 511
Theoretical Machine Learning

Rob Schapire

Spring 2008

Directory
General Information | Schedule & Readings | Assignments | Final Project

Schedule and readings

Numbers in brackets under "reading" refer to chapters or sections of Kearns & Vazirani, available from the library's electronic reserves, or blackboard. Note that this schedule is continually being updated as the semester progresses, and therefore is subject to change.

#	Date	Topic	Core reading	Other (optional) readings and links
1	M 2/4	General introduction to machine learning; consistency model	scribe notes
2	W 2/6	Consistency model; review of probability; begin PAC model	scribe notes	for a very brief review of probability, see, for instance, Appendix C.2 and C.3 in Introduction to Algorithms by Cormen, Leiserson, Rivest and Stein.
3	M 2/11	PAC model; simple special cases; begin Occam's razor	scribe notes [1], [2.1-2.2]	Valiant's original "Theory of the Learnable" paper, which first got theoretical computer scientists working on machine learning
4	W 2/13	Prove Occam's razor; begin sample complexity for infinite hypothesis spaces	scribe notes [3.1]	original "Occam's Razor" paper
5	M 2/18	Growth function and sample complexity for infinite hypothesis spaces; VC-dimension	scribe notes [3.5]
6	W 2/20	Sauer's lemma; upper bound on sample complexity; lower bounds on sample complexity	scribe notes [3.2-3.4], [3.6]
7	M 2/25	Finish lower bound; handling inconsistent hypotheses; begin Chernoff bounds	scribe notes
8	W 2/27	Finish Chernoff bounds; McDiarmid's inequality; error bounds for inconsistent hypotheses; overfitting	scribe notes
9	M 3/3	Boosting; training error bound	scribe notes Robert E. Schapire. The boosting approach to machine learning: An overview. In D. D. Denison, M. H. Hansen, C. Holmes, B. Mallick, B. Yu, editors, Nonlinear Estimation and Classification. Springer, 2003. Postscript or gzipped postscript. Slides.
10	W 3/5	Generalization error bounds: naive and margins-based	scribe notes Robert E. Schapire, Yoav Freund, Peter Bartlett and Wee Sun Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5):1651-1686, 1998. Postscript or compressed postscript.
11	M 3/10	Finish boosting; begin support-vector machines	scribe notes
12	W 3/12	Support-vector machines	scribe notes Excerpt from Vapnik's The nature of statistical learning theory.	Chris Burges's tutorial on svm's
13	M 3/24	Finish support-vector machines; on-line learning; learning with expert advice; weighted majority algorithm	scribe notes Avrim Blum. On-line algorithms in machine learning. In Dagstuhl Workshop on On-Line Algorithms, June, 1996. Gzipped postscript.	"mind-reading" game based on on-line learning
14	W 3/26	Finish weighted majority algorithm; perceptron algorithm	scribe notes
	M 3/31	(no class)
15	W 4/2	Winnow; begin regression	scribe notes
16	M 4/7	linear regression; Widrow-Hoff algorithm	scribe notes Jyrki Kivinen and Manfred K. Warmuth. Exponentiated Gradient versus Gradient Descent for Linear Predictors. Information and Computation, 132(1):1-63, January, 1997. pdf.
17	W 4/9	finish Widrow-Hoff; EG; converting on-line to batch	scribe notes
18	M 4/14	modeling probability distributions; maximum likelihood; maximum entropy modeling of distributions	scribe notes The duality theorem given in class is taken from: Stephen Della Pietra, Vincent Della Pietra and John Lafferty. Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4):380-393, April, 1997. Postscript	There are a number of excellent papers and other materials on maximum entropy here.
19	W 4/16	finish maxent; begin on-line log loss	scribe notes
20	M 4/21	on-line log loss and universal compression; Bayes algorithm	scribe notes
21	W 4/23	shifting experts	scribe notes Mark Herbster and Manfred K. Warmuth. Tracking the Best Expert. Machine Learning, 32(2):151-178, August, 1998. pdf
22	M 4/28	portfolio selection	scribe notes Avrim Blum and Adam Kalai. Universal Portfolios With and Without Transaction Costs. Machine Learning, 35:193-205, 1999. Gzipped postscript.	For an opposing view to this approach, see the following short paper (and try to detect what is unusual about the writing): Paul A. Samuelson Why we should not make mean log of wealth big though years to act are long Journal of Banking and Finance, 3:305-307, 1979. pdf
23	W 4/30	game theory and learning	scribe notes Yoav Freund and Robert E. Schapire. Game theory, on-line prediction and boosting. In Proceedings of the Ninth Annual Conference on Computational Learning Theory, pages 325-332, 1996. Postscript or compressed postscript.	For more details, see: Yoav Freund and Robert E. Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29:79-103, 1999. Postscript or compressed postscript.

Books

Here is a list of optional books for further background reading. All of these are being placed on reserve at the Engineering Library.

Ethem Alpaydin.
Introduction to machine learning.
MIT Press, 2004.
Nicolò Cesa-Bianchi and Gábor Lugosi.
Prediction, learning and games.
Cambridge University Press, 2007.
Luc Devroye, László Györfi, Gábor Lugosi.
A probabilistic theory of pattern recognition .
Springer, 1996.
Richard O. Duda, Peter E. Hart and David G. Stork.
Pattern classification .
Wiley, 2001.
Trevor Hastie, Robert Tibshirani and Jerome Friedman.
The elements of statistical learning: data mining, inference, and prediction .
Springer, 2001.
Michael J. Kearns and Umesh V. Vazirani.
An introduction to computational learning theory.
MIT Press, 1994.
Tom M. Mitchell.
Machine learning .
McGraw-Hill, 1997.
Vladimir N. Vapnik.
The nature of statistical learning theory .
Springer, 1995.
Vladimir N. Vapnik.
Statistical learning theory .
Wiley, 1998.

Princeton University Computer Science Department

Computer Science 511 Theoretical Machine Learning Rob Schapire

Spring 2008

Schedule and readings

#

Date

Topic

Core reading

Other (optional) readings and links

Books

Princeton University
Computer Science Department

Computer Science 511
Theoretical Machine Learning

Rob Schapire