Princeton University
|
Computer Science 511
|
|
Machine learning studies automatic methods for learning to make accurate predictions or useful decisions based on past observations. This course introduces theoretical machine learning, including mathematical models of machine learning, and the design and rigorous analysis of learning algorithms. Topics include: bounds on the number of random examples needed to learn; learning from non-random examples in the on-line learning model (for instance, for investment portfolio selection); how to boost the accuracy of a weak learning algorithm; support-vector machines.
Here is a tentative list of topics. (Bullets do not correspond precisely to lectures.)
Lectures:
Monday and Wednesday, 1:30-2:50, Friend Center, room 008
Professor:
Rob Schapire: 407 CS Building,
x8-7726, schapire@cs
Office hours: by appointment (via email), or just stop by.
Graders:
Because the university is so poor, there will not be a TA for this course. However, Zafer Barutcuoglu (zbarutcu@), Wei Dong (wdong@cs) and Umar Syed (usyed@cs) have graciously agreed to help me grade the homeworks. You should only contact them if you have a question about how one of the homeworks was graded. Otherwise, all questions should be sent directly to me.
Graduate Coordinator:
Melissa Lawson: 310 CS Building, x8-5387, mml@cs
This class has no formal prerequisites. However, it is assumed that students will have a general background in computer science (especially theoretical computer science), and an ability to solve mathematical problems rigorously. Background in basic probability would also be helpful. Some calculus and linear algebra will be used.
There is no textbook that perfectly fits the material in this course. However, the one that comes closest is An Introduction to Computational Learning Theory by Michael Kearns and Umesh Vazirani. We will cover about the first third of the book, namely, the first three chapters, all of which are available (free) from the Princeton library in their electronic course reserves. To obtain these, log on here with user name "cos 511" (noting the lowercase lettering, as well as the space between "cos" and "511"); the password is the instructor's last name (all lowercase). You can also access these by logging on to blackboard (assuming you are registered for the course).
I will also post other papers or notes for topics not in the book.
A list of other books for further background reading appears on the Schedule & Readings page, and are being placed on reserve at the Engineering Library.
Grading will be based on homework assignments (65%) and a final project (35%). Homeworks will be given roughly every week or two, and will each consist of a small number of problems (no programming). Students will also be asked to prepare scribe notes for one class (see below). In all cases, failure to complete any significant component of the course may result in a grade of D or F, regardless of performance on the other components. Final grades may be adjusted slightly upward for regular and positive class participation, or slightly downward for conspicuous absence from lecture.
For the final project, you can pick any topic you want for further study. Your project could involve implementing an algorithm, running experiments, doing further reading or trying to theoretically extend a result of interest. In all cases, the project must have a theoretical component, and the end product will be a written report. More details will be given later in the semester.
All work should be turned in at the end of class, or submitted to me in my office, or put in the envelope by my door if I'm not around. If you need to turn in an assignment after hours when the upper floors of the CS building are locked, you can instead submit your homework in the box outside room 001C in the basement of the CS building.
Please submit hard copy only.
All assignments are due at 11:59pm on the due date.
Each student will be allotted seven free days which can be used to turn in homework assignments late without penalty. For instance, you might choose to turn in HW#1 two days late, HW#4 three days late and HW#6 two days late. Once your free days are used up, late homeworks will be penalized 20% per day. (For instance, a homework turned in two days late will receive only 60% credit.) Homeworks will not receive credit more than five days past the deadline, whether or not free days are being used. Even so, all homeworks must be completed and turned in, even if this five-day limit has passed. As noted above, failure to do so may result in a final grade of D or F.
Exceptions to these rules will of course be made for serious illness or other emergency circumstances; in these cases, please contact me as soon as you are aware of the problem.
In counting late days, a weekend (that is, Saturday and Sunday together), count as a single "day". For instance, a homework that is due on Thursday but turned in on Sunday would be considered two days late, rather than three.
If you are turning in a late homework after hours when no one is around to accept it, please indicate at the top that it is late, and clearly mark the day and time when it was turned in. Failure to do so may result in me considering the homework to be submitted at the time when I pick it up (which might be many hours, or even a day or two after when you actually submitted it).
Collaboration on the problem sets is encouraged in this course, subject to the following guidelines:
Before working with someone else, you should first spend a substantial amount of time trying to arrive at a solution by yourself. Some of the problems are easier and should be solved individually from start to finish.
Discussing and solving harder problems with fellow students (who have not already solved it) is allowed and encouraged to the extent that it leads all participants to a better understanding of the problem and the material. Following such discussions, you should only take away your understanding of the problem; you should not take notes, particularly on anything that might have been written down. This is meant to ensure that you understand the discussion well enough to reproduce its conclusions on your own. You should also note on your solution who you worked with.
Needless to say, simply telling the solution to someone else is prohibited, as is showing someone a written solution.
The final writing of the problem set must be done individually and strictly on your own.
If you happen to have already seen one of the assigned problems elsewhere, please indicate this when you write up your solution; I won't take off credit, I just would like to know. However, although background reading is always encouraged, you should not attempt to "solve" these problems by looking for the solutions in the literature. Try to solve the problems on your own, or ask me or fellow students for help if you can't.
Because there is no perfect textbook for this course, students will be asked to take turns preparing "scribe notes" for posting on the course web site (specifically, on the Schedule and Readings page). Each class, one student will be the designated "scribe", taking careful notes during class, writing them up, and sending them to me for posting on the web. Here is more information on how to be a scribe. I will not grade the scribe notes, but I will give credit for doing them (about equal to one problem set).
Auditors are welcome, and are encouraged to take part in class discussions. If you wish to receive official "credit" for auditing this course, you must attend the vast majority of the lectures; this requirement may in exceptional circumstances be waived if you complete a substantial fraction of the course work instead. Depending on enrollment, auditors may also be required to prepare scribe notes for one lecture.
As soon as possible, please join the course mailing list by visiting https://lists.cs.princeton.edu/mailman/listinfo/cos511 and following the instructions for subscribing. I will use this list for general announcements, such as last minute corrections to the homeworks, changes in due date, etc. This list can also be used by students for discussing course material, general-interest questions about the homeworks, etc. Of course, if your question is specific to your own work, you will probably want to contact me directly.
When signing up for the mailing list, please provide your name, especially if you are using a non-Princeton email address. Email addresses that I cannot identify as legitimate will be removed from the list.
You can post to the list by sending mail to cos511@lists.cs.princeton.edu. (Note that you can only post to the list using the email address you used to subscribe to it.)