COS424: Interacting with Data
|
|
General information about the course
- Summary
- Lectures
- Mailing List
- Prerequisites
- Reading
- Using R
- Course grades and workload
- Assignment policies
Summary
Computers have made it possible, even easy, to collect vast
amounts of data from a wide variety of sources. It is not always
clear, however, how to use that data, and how to extract useful
information from it. This problem is faced in a tremendous range of
business, medical and scientific applications. The purpose of this
course is to teach some of the best and most general approaches to
this broad problem of how to get the most out of data. The course
will explore both theoretical foundations and practical applications.
Students will gain experience analyzing several kinds of data,
including document collections, biological data, and natural images.
Topics will include:
- Classification
- Clustering
- Regression
- Dimensionality reduction
- Advanced topics and applications
Lectures
Tuesday/Thursday 11:00AM-12:20PM, Computer Science 104
Course staff
David Blei
(Professor)
204 CS Building
blei [at] cs.princeton.edu
659-258-9907
Office hours: by appointment
Indraneel Mukherjee (Teaching Assistant)
103C CS Building
imukherj [at] cs.princeton.edu
Office hours: Monday 6:30PM-8:30PM; 103C Computer Science
Martin Suchara (Teaching Assistant)
103A CS Building
msuchara [at] cs.princeton.edu
Office hours: Wednesday 6:30PM-8:30PM; 103A Computer Science
Mailing list
Join the course mailing list by visiting
here
and following the instructions for subscribing. When signing up for
the mailing list, please provide your name, especially if you
are using a non-Princeton email address. To prevent spam, email
addresses that cannot be identified as legitimate will be removed from
the list.
This list will be used by the course staff for general
announcements such as last minute corrections to the homeworks and
changes in due dates. This list can also be used by students for
discussing course material and homeworks.
The course staff will monitor and respond to questions on this
list. If your question is specific to your own work, please contact
them directly.
You can post to the list by sending mail to
cos424@lists.cs.princeton.edu. Note that you can only post to the
list using the email address you used to subscribe to it.
Prerequisites
The prequesites are MAT101, MAT201, COS126, and some exposure to
probability and/or statistics (such as what is covered in COS341 or
COS402). In general, you should be comfortable with computer
programming and basic linear algebra, and have some familiarity with
probability and statistics. Contact Prof. Blei if have concerns about
your prerequisite coursework.
Reading
There is no textbook that perfectly fits the material in this
course. Instead, students will be asked to take " scribe notes
" on the lectures, which will be posted on the course website (see
below). Additional papers and book chapters will also be provided.
A lot of the material will be drawn from these two books, which
are on reserve at the Engineering library.
- Christopher M. Bishop.
Pattern Recognition and Machine
Learning.
Springer 2006.
- Trevor Hastie, Robert Tibshirani and Jerome Friedman.
The Elements of Statistical Learning: Data mining,
Inference, and Prediction .
Springer, 2001.
Course grades and workload
The course consists of lectures, readings, homework assignments,
and a final project.
There will be about four homework assignments given roughly once
every two weeks; these will be a mix of written exercises and
programming. Homeworks count for 65% of your grade.
The class project will constitute about one month of work, and
count for the remaining 35% of your grade. For the project, you
are required to undertake a thorough piece of applied data analysis
and clearly report your findings in a written report and poster
presentation. You can work alone or in groups of 2-3.
See this page for more details..
Because there is no perfect textbook for this course, students will
be asked to take turns preparing "scribe notes" for posting
on the course web site (specifically, on
the Syllabus page). Each class, 1-2
student will be the designated "scribe", taking careful
notes during class, writing them up, and sending them to the
instructor for posting on the
web. Here is more information on
how to be a scribe. These will not be graded; however, assuming
there are more students than lectures, anyone who volunteers to scribe
will receive extra credit.
Failure to complete any significant component of the course may
result in a grade of D or F, regardless of performance on the other
components. Final grades may be adjusted upward for positive and
regular class participation.
Using R
The homework assignments will be done using
R. More information about using R for this course can be
found here.
Assignment policies
Handing in. Please submit code to
moodle.
Written exercises and some code must be handed in as hard copy into one
of the marked boxes outside of 103C (1st floor of the
Computer Science building).
Late days. All assignments are due at 11:59pm on the
due date. Each student will be allotted seven free days which can be
used to turn in homework assignments late without penalty. For
instance, you might choose to turn in the first homework two days
late, and the third homework three days late. Once your free
days are used up, late homeworks will be penalized 20% per day.
(For instance, a homework turned in two days late will receive only
60% credit.) Homeworks will not be accepted more than five days
past the deadline, whether or not free days are being used.
Exceptions to these rules will of course be made for serious illness
or other genuine emergency circumstances, and free late days
should not be used for these purposes; in these cases, please
contact a professor as soon as you are aware of the problem. A
weekend, that is, Saturday and Sunday together, count as a single late
"day". For instance, a homework that is due on Friday
but turned in on Monday would be considered two days late, rather than
three.
The final project cannot be turned in late, nor can written
material be turned in beyond "Dean's Date" without a dean's
permission.
If you are turning in a late homework after hours when no one is
around to accept it, please indicate at the top that it is late, and
clearly mark the day and time when it was turned in.
Failure to do so may result in the TA considering the homework to be
submitted at the time when it was picked up (which might be many
hours, or even a day or two after when you actually submitted it).
Grading. Homeworks are graded largely on getting the
right answer or getting the program to work. In many homeworks, there
are some more "free form" questions asking for exploration and
experimentation. These questions will be graded more subjectively (as
in the humanities). Ideal answers are thoughtful, perceptive,
critical, clear, and concise.
Collaboration. The collaboration policy for this
course is based on the overarching objective of maximizing your
educational experience, that is, what you gain in knowledge,
understanding and the ability to solve problems. Obviously, you do not
learn anything by copying someone else's solution. On the other hand,
forbidding any and all discussion of course material may deprive you
of the opportunity to learn from fellow students. The middle ground
between these two extremes also needs to be defined with this basic
principle in mind. Before working with another student, you should
ask yourself if you would gain more or less by working together or
individually, and then act accordingly. Here are some specific
guidelines based on this principle:
-
You are certainly free (and encouraged) to talk to others about
the material in this course, or for general help with R, moodle,
etc.
-
Before working with someone else, you should first spend a substantial amount of time trying to arrive at a solution by yourself. Easier problems, including
many or most of the written exercises, should be solved individually from start to finish.
-
Discussing harder problems or programming assignments with fellow
students is allowed to the extent that it leads all participants
to a better understanding of the problem and the material.
Following such discussions, you should only take away your
understanding of the problem; you should not take notes,
particularly on anything that might have been written down. This
is meant to ensure that you understand the discussion well enough
to reproduce its conclusions on your own. You should also
note on your solution who you worked with.
-
Needless to say, simply telling the solution to someone else is
prohibited, as is showing someone a written solution or a portion
of your code. Comparing code or solutions also is not generally
permitted. However, comparing and discussing the results of
experiments is okay if done in the spirit of the guidelines
above.
-
All writing and programming must be done strictly
on your own. Copying of any sort is not allowed. Unless
instructed otherwise, you may not use
code or solutions taken from any student, from the web, from prior
year solutions, or from any other source.