|
Computer Science 435
Information Retrieval, Discovery, and Delivery
Andrea
LaPaugh
|
Spring 2016
|
Directory
General Information | Schedule
and Assignments | Project Page | Announcements
Course Summary
This course examines the methods used to gather, organize and search
for information in large digital collections (e.g. web search
engines). It also explores the discovery of information
through the analysis of relationships between items of interest,
including both information items and social objects. We study
classic techniques of indexing documents and searching text and also
algorithms that exploit properties of the Web (e.g. links), of
social networks and of other digital collections, including
multimedia collections. Techniques include those for relevance and
ranking of documents, exploiting user history, clustering and
network analysis. We also examine systems aspects of search
technology: how distributed computing and storage are used to make
information delivery efficient.
Prerequisites
COS 226
and some familiarity with linear algebra.
Administrative Information
Meeting time: Monday,
Wednesday 1:30-2:50pm
Meeting place: Robertson 016
Extra meetings: If we need to make up a class due to my
schedule, we may have a class during reading period and/or an
evening class during the semester. Class participants will be
consulted before any make-up class time is chosen.
Professor: Andrea
LaPaugh, aslp@cs. ...
304 Computer Science Building, 258-4568
Office hours: Mondays 3:00-5:00pm or by appointment.
Easiest way to make an appointment is by email.
Teaching Assistant: Yinda Zhang,
yindaz@cs. ...
Office hours:
Tuesday, 2:00pm-4:00pm, 418b Computer Science Building
Course secretary: Mitra Kelly, 323 CS building, 258-4562,
mkelly@cs. ...
For email addresses specified above, "..." stands for princeton.edu
Reading
Required reading:
Options other than buying the printed books: The print version
of each of the three books below is available online through a
Princeton University Library subscription to Safari books online.
You must access these from domain princeton.edu.
Also, each of the books has a version available for download as a
pdf file. Details are given below.
Primary text book:
- Manning, Christopher D.; Raghavan, Prabhakar; Schütze,
Hinrich, Introduction
to
Information Retrieval, Cambridge University
Press, 2008, reprinted 2009. The above link is to
the website for the book, which contains, among other things,
links to complete html and pdf (6.6 MB) versions. The Safari
books online version is available here.
We will also use selections from
the following two books.
- Rajaraman, Anand;
Leskovec,
Jure; Ullman, Jeffrey
D, Mining
of Massive Data Sets. Cambridge University
Press. 2011. You can download a pdf file (2.9
MB) of the latest version of
the book ( March 2014 as of this writing). Safari books
online offers the earlier printed version (2011) here.
I recommend the latest version.
- Easley, David; Kleinberg, Jon. Networks,
Crowds, and Markets: Reasoning about a Highly
Connected World, Cambridge University Press, July
19, 2010. You can download a pdf file (18.6 MB) of a draft
version dated June 10, 2010. The Safari books online
version is available here.
Supplemental reading (check back for additions as we progress in the
semester):
On reserve at Engineering
Library:
- Croft, Bruce; Metzler, Donald; Strohman, Trevor,
Search Engines: Information
Retrieval in Practice, Addison Wesley, 2010.
- Langville, Amy N. and Meyer, Carl D. Google's PageRank and Beyond :
the Science of Search Engine Rankings, Princeton
University Press, 2006.
Work of the Course
The course will have the following components weighted as indicated:
- Problem sets 25%
- Midterm exam 15%
- Second exam 20%
- Class participation 5%
- Project 35%
Problem sets
There will be 6 problem sets distributed throughout the
semester.
Exam
There will two take-home exams during the semester, each covering
roughly half the course material. There is no exam during final exam
period.
Project
Students will do a final project in pairs. The choice of topic
is up to each pair, but must be related to the material of the
course. The project must be approved in advance by the course
instructor. See the project page
for more information and a list of suggested projects.
Communication
All assignments will be made available on the course Web site (see Schedule and Assignments). ``Handouts''
and copies of any slides used in class will be posted on the course
Web site as well.
We will use Piazza for all course
announcements and quick questions. Students are
responsible for registering on Piazza and adding themselves to COS
435 - Spring 2016. Students are also
responsible for monitoring the postings on the Piazza cos 435 site
for important course announcements. Piazza is great
for sharing questions and answers with the class (private questions
addressed only to the instructors are also possible). However,
an old fashioned face-to-face meeting is still best for addressing
deeper confusions and other technical discussions.
Schedule changes will be made on Schedule and Assignments
and announced on Piazza.
Syllabus
(This is the general list of topics and probably a superset of
what we will have time to cover. Please see Schedule and Assignments for specific
topics and reading assignments as the semester progresses)
- Modeling information objects
- Query models for searching
- Indexing and inverted files
- Ranking documents
- Using linking structure for Web content analysis
- System design of search engines: distributed storage and
computing
- personalized search
- recommender systems
- Social networks as sources of meta-information
- Discovering information from social network analysis
- Privacy issues
- Evaluating retrieval systems
- Web crawling
- Document similarity
- Clustering
- Non-text media search: e.g. music, images
- Searching dynamic information sources
A.S. LaPaugh content last changed Thu Feb 11 16:52:19
EST 2016