Princeton University |
Computer Science
493
|
|
Week 1: Compression: Models, Information Theory, Huffman coding and Arithmetic coding |
|
Week 2: Lempel-Ziv algorithms, Burrows Wheeler, Lossy compression |
|
Week 3: Indexing: inverted file compression and models, signature files and bitmaps |
|
Week 4: Error correcting codes, Reed Solomon codes |
|
Week 5: Decoding Reed Solomon codes, Tornado codes |
|
Week 6: Classification and clustering: Naive Bayes, support vector machines and k-means. |
|
Week 7: Streaming Algorithms: reservoir sampling, estimating distinct values and frequency moments |
|
Week 8: Sketching algorithms for estimating similarity and applications to clustering. |
|
Week 9: Dimension reduction and nearest neighbor algorithms. |
|
Week 10: Link analysis: Pagerank and Kleinberg's Hubs and Authorities |
|
Week 11: Singular Value Decomposition and models for the web. |
|
Week 12: Web characterization, power laws and research problems. |
Compression and indexing (weeks 1-3):
Managing Gigabytes: Compressing and Indexing Documents and Images, by Ian H. Witten, Alistair Moffat and Timothy C. Bell, Morgan Kauffman. (reference text for weeks 1-3) |
|
Notes on Compression from Guy Blelloch's page at CMU on "Algorithms in the Real World". |
|
David Salomon. Data Compression: The Complete Reference. Springer Verlag, 1998. |
|
Khalid Sayood. Introduction to Data Compression, Second Edition. Morgan Kaufmann, 2000. |
|
Compressed Bloom Filters, Michael Mitzenmacher, in PODC 2001. (ps, pdf) |
Error-correcting codes (weeks 4-5):
Michael Mitzenmacher's slides on codes (Shannon's theorem and introduction to error correcting codes: ps, Reed-Solomon Codes: ps) |
|
Practical Loss-Resilient Codes, M. Luby, M. Mitzenmacher, A. Shokrollahi, D. Spielman, and V. Stemann, in STOC 1997 (ps, pdf, slides(ppt)) |
|
Analysis of Random Processes via And-Or Tree Evaluation, M. Luby, M. Mitzenmacher, and A. Shokrollahi, in SODA 98 (ps, pdf) |
|
Analysis of Low Density Codes and Improved Designs Using Irregular Graphs, M. Luby, M. Mitzenmacher, A. Shokrollahi, and D. Spielman, in STOC 1998 (ps, pdf, slides(ppt)) |
Classification and clustering (week 6):
Material for Bayesian Learning was from Chapter 6 of: Machine Learning, Tom Mitchell, McGraw Hill, 1997. (The website for the book has additional materials such as slides). |
|
Data clustering: a review, A. K. Jain, M. N. Murty, P. J. Flynn, ACM Computing Surveys, 31(3), 1999. |
Streaming and sketching algorithms (weeks 7-8):
Random Sampling with a Reservoir, Jeffrey Scott Vitter, Transactions on Mathematical Software 11(1): 37-57 (1985). |
|
Noga Alon, Yossi Matias, Mario Szegedy: The Space Complexity of Approximating the Frequency Moments, Journal of Computer and System Sciences, 58(1): 137-147 (1999). (alternate link). |
|
Synopsis data structures for massive data sets, P. B. Gibbons and Y. Matias, in DIMACS Series in Discrete Mathematics and Theoretical Computer Science. |
|
Incremental Clustering and Dynamic Information Retrieval, M. Charikar, C. Chekuri, T. Feder and R. Motwani, in STOC 1997. |
|
Clustering data streams, S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan, in FOCS 2000. |
|
Syntactic Clustering of the Web, Andrei Broder, Steven Glassman, Mark Manasse, Geoffrey Zweig, in WWW6 1997. |
|
Min-Wise
Independent Permutations, A. Broder, M. Charikar, A. Frieze and M.
Mitzenmacher, JCSS 60(3): 630-659 (2000). |
Lecture Notes
(Notes for lectures 3 onwards are still being edited. Let me know if you see any errors.)
2/28/02 (ps,pdf) ???
4/02/02 (ps,pdf)
5/02/02 (ps,pdf) Tony Wirth
5/07/02 (ps,pdf)
Template for scribe notes.
Last updated by Moses Charikar, 15-May-2002 10:40 PM