Computers have made it possible, even easy, to collect vast amounts of data from a wide variety of sources. It is not always clear, however, how to use that data, and how to extract useful information from it. This problem is faced in a tremendous range of business, medical and scientific applications. The purpose of this course is to teach some of the best and most general approaches to this broad problem of how to get the most out of data. The course will explore both theoretical foundations and practical applications. Students will gain experience analyzing several kinds of data, including document collections, biological data, and natural images.
Topics will include:
Tuesday/Thursday 11:00AM-12:20PM, Friend 006
David Blei
204 CS Building
blei [at] cs.princeton.edu
659-258-9907
Office hours: by appointment
Robert Schapire
407 CS Building
schapire [at] cs.princeton.edu
609-258-7726
Office hours: by appointment, or just stop by
Jonathan Chang
004 CS Building
jcone [at] princeton.edu
659-258-1785
Office hours: Wednesdays, 3:00PM-5:00PM
Chenwei Zhu
221 Fine Hall
czhu [at] princeton.edu
659-258-5785
Office hours: Fridays, 3:00PM-4:00PM
This list will be used by the course staff for general announcements such as last minute corrections to the homeworks and changes in due dates. This list can also be used by students for discussing course material and homeworks.
The course staff will monitor and respond to questions on this list. If your question is specific to your own work, please contact them directly.
You can post to the list by sending mail to cos424@lists.cs.princeton.edu. Note that you can only post to the list using the email address you used to subscribe to it.
The prequesites are MAT101, MAT201, COS126, and some exposure to probability and/or statistics (such as what is covered in COS341 or COS402). In general, you should be comfortable with computer programming and basic linear algebra, and have some familiarity with probability and statistics. Come see one of the professors if you are unsure.
There is no textbook that perfectly fits the material in this course. Instead, students will be asked to take "scribe notes" on the lectures which will be posted on the course website (see below). Additional papers and book chapters will also be provided.
A list of other books for further background reading appears on the Syllabus page, and are being placed on reserve at the Engineering Library.
The course consists of lectures, readings, homework assignments, and a final project.
There will be about four homework assignments given roughly once every two weeks; these will be a mix of written exercises and programming. Homeworks count for 65% of your grade.
The class project will constitute about one month of work, and count for the remaining 35% of your grade. For the project, you are required to undertake a thorough piece of applied data analysis and clearly report your findings in a written report and poster presentation. You can work alone or in groups of 2-3. See this page for more details..
The homework assignments will be done using R. More information about R can be found here.
Because there is no perfect textbook for this course, students will be asked to take turns preparing "scribe notes" for posting on the course web site (specifically, on the Syllabus page). Each class, one student will be the designated "scribe", taking careful notes during class, writing them up, and sending them to the instructor for posting on the web. Here is more information on how to be a scribe. These will not be graded; however, assuming there are more students than lectures, anyone who volunteers to scribe will receive extra credit.
Failure to complete any significant component of the course may result in a grade of D or F, regardless of performance on the other components. Final grades may be adjusted upward for positive and regular class participation.
The final project cannot be turned in late, nor can written material be turned in beyond "Dean's Date" without a dean's permission.
If you are turning in a late homework after hours when no one is around to accept it, please indicate at the top that it is late, and clearly mark the day and time when it was turned in. Failure to do so may result in the TA considering the homework to be submitted at the time when it was picked up (which might be many hours, or even a day or two after when you actually submitted it).