Princeton University
|
Computer Science 511
|
|
Numbers in brackets under "reading" refer to chapters or sections of Mohri et al. Note that this schedule is continually being updated as the semester progresses, so check back often. (If I seem to be falling behind in keeping this up-to-date, please send me a reminder.)
# |
Date |
Topic |
Core reading |
Other (optional) readings and links |
1 | Tu 2/5 | General introduction to machine learning; consistency model | [1] scribe notes |
|
2 | Th 2/7 | Consistency model; review of probability; begin PAC model |
[2.1] scribe notes |
[Appendix C], a review of probability (containing more material than is actually needed in this course) |
3 | Tu 2/12 | PAC model; simple special cases; begin Occam's razor |
[2.2] scribe notes |
Valiant's original "Theory of the Learnable" paper, which first got theoretical computer scientists working on machine learning |
4 | Th 2/14 | Prove Occam's razor; begin sample complexity for infinite hypothesis spaces; growth function | scribe notes | original "Occam's Razor" paper |
5 | Tu 2/19 | sample complexity for infinite hypothesis spaces; VC-dimension; | [3.3]
scribe notes |
|
6 | Th 2/21 | Sauer's lemma; upper bound on sample complexity; lower bounds on sample complexity; | [3.4]
scribe notes |
|
7 | Tu 2/26 | Finish lower bound; handling inconsistent hypotheses; begin Chernoff bounds | [2.3] scribe notes |
|
8 | Th 2/28 | Finish Chernoff bounds; McDiarmid's inequality; error bounds for inconsistent hypotheses; overfitting | [D.1-D.2] scribe notes |
|
9 | Tu 3/5 | Rademacher complexity | [3.1-3.2] scribe notes |
|
10 | Th 3/7 | Finish Rademacher complexity; Boosting; training error bound |
[6.1-6.2] (okay to skip or skim 6.2.2 and 6.2.3)
slides scribe notes |
introductory chapter from Boosting: Foundations and Algorithms by Schapire & Freund |
11 | Tu 3/12 | Generalization error bounds: direct and margins-based | [6.3.1-6.3.3]
slides margins "movie" scribe notes |
|
12 | Th 3/14 | Support-vector machines | [4]; [5.1-5.3] (okay to skip or skim the many parts
of these chapters that we are not covering)
scribe notes |
tutorial on SVM's |
13 | Tu 3/26 | Guest lecture: Vladimir Vapnik on "Learning with teacher using privileged information" |
"A new learning paradigm: Learning using privileged
information" by Vladimir Vapnik & Akshay Vashist
(may need to be on Princeton intranet to access) slides scribe notes |
|
14 | Th 3/28 | Finish support-vector machines; online learning; learning with expert advice | [7.1-7.2.1] scribe notes |
"mind-reading" game based on online learning |
15 | Tu 4/2 | Weighted majority algorithm | [7.2.2-7.2.3] scribe notes |
|
16 | Th 4/4 | Perceptron algorithm; winnow | [7.3]
scribe notes |
|
17 | Tu 4/9 | regression; linear regression; begin Widrow-Hoff algorithm | [10.1; 10.3.1;10.3.6]
scribe notes |
rest of
[10] Jyrki Kivinen and Manfred K. Warmuth. |
18 | Th 4/11 | finish Widrow-Hoff; EG; mirror descent; converting on-line to batch | [7.4]
scribe notes |
|
19 | Tu 4/16 | modeling probability distributions; maximum likelihood; maximum entropy modeling of distributions | The duality theorem given in class is taken from:
Stephen Della Pietra, Vincent Della Pietra and John Lafferty. |
More readings, software, etc. on maximum entropy are available here. |
20 | Th 4/18 | finish maxent; begin on-line log loss | scribe notes | |
21 | Tu 4/23 | on-line log loss and universal compression; Bayes algorithm | scribe notes | |
22 | Th 4/25 | shifting experts | Mark Herbster and Manfred K. Warmuth. |
|
23 | Tu 4/30 | portfolio selection | Avrim Blum and Adam Kalai. |
For a view that says this is not the way to do things, see this
short piece
(and try to guess what is weird in how he wrote it):
Paul A. Samuelson |
24 | Th 5/2 | game theory and learning | Yoav Freund and Robert E. Schapire. |
For more details, see: Yoav Freund and Robert
E. Schapire. |