|
Computer Science 435
Information Retrieval, Discovery, and Delivery
Andrea LaPaugh
|
Spring 2011
|
Directory
General Information |
Schedule
and Assignments |
Project Page |
Announcements
Course Summary
This course examines the methods used to search for information in
large
digital collections (e.g. Google) and how digital content is gathered
by
search engines. We study classic techniques of indexing documents and
searching text and also new algorithms that exploit properties of the
Web (e.g. links) and other digital collections, including multimedia
collections. Techniques include those for relevance and ranking of
document, exploiting user history, and information clustering. We also
examine systems
aspects
of search technology: how distributed computing and storage are used to
make information delivery efficient.
Prerequisites
COS 226.
Administrative Information
Meeting time: Monday,
Wednedsay 1:30-2:50pm
Meeting place:
Friend Center 006
Extra meetings: If we need to make up a class due to my
schedule, we may have a class during reading
period and/or an evening class during the semester. Class
participants
will be consulted before any make-up class time is chosen.
Professor: Andrea
LaPaugh, aslp@ ...
304 Computer Science Building, 258-4568
Office hours: Monday 3-4:30pm or
by
appointment. Easiest way to make an appointment is by email.
Teaching Assistant: Siyu Yang, siyuy@ ...
313 Computer Science Building
Office hours: Tuesday
1-2:30pm or by
appointment.
Course secretary: Mitra Kelly, 323 CS building, 258-4562,
mkelly@ ...
All email addresses are at cs.princeton.edu
Reading
Required reading:
- Manning, Christopher D.; Raghavan, Prabhakar; Schütze,
Hinrich, Introduction
to
Information
Retrieval, Cambridge
University Press, 2008.
- The link is to a complete online version
of the textbook.
- We
will also use reprints and other
online
material.
Supplemental reading (check back for additions as we progress in the semester):
On reserve at Engineering Library:
- Croft, Bruce; Metzler, Donald; Strohman, Trevor, Search Engines: Information Retrieval in
Practice, Addison Wesley, 2010.
- Grossman, David and Frieder, Ophir, Information Retrieval : Algorithms and
Heuristics, 2nd edition, Springer, 2004.
- Chakrabarti, Soumen, Mining
the Web: Discovering Knowledge from Hypertext Data, Elsevier
(Morgan_Kaufmann Division), 2003.
- Langville, Amy N. and Meyer, Carl D. Google's PageRank and Beyond : the Science
of Search Engine Rankings,
Princeton University Press, 2006.
Work of the Course
The course will have the following components weighted as
indicated (note that thes are slightly different from those in Course Offerings):
- Problem sets 25%
- Midterm exam 15%
- Second exam 20%
- Class participation 5%
- Project 35%
Problem sets
There will be 5 to 6 problem sets distributed throughout the
semester.
Exam
There will two take-home exams during the semester, each covering
roughly half
the course material. There is no exam during final exam period.
Project
Each student or pair of students will do a final project of his/her or
their choosing related to
the material of the course.
The project must be approved in advance by the course instructor.
See
the project page for more
information and a list of suggested projects.
Communication
All assignments will be
made available on the course Web site (see Schedule
and
Assignments). ``Handouts'' and copies of any
transparencies
used in class will be posted on the course Web site as well. Important
announcements on all aspects of the course will be made on the Announcements
page.
Students are responsible for monitoring the postings
under ``Announcements''. Schedule changes will be made on Schedule
and Assignments and announced on Announcements.
You are encouraged to use electronic mail to set up appointments,
leave
messages, and ask quick questions (like ``What was that reference you
gave
today in class?'' or ``I've been at McCosh Infirmary all week; can I
have
an extension on my assignment?'') However, an old fashioned
face-to-face
meeting is still best for clarifying confusions and other technical
discussions.
Syllabus
(This is the general list of topics and probably a superset of what
we will have time to cover. Please see Schedule
and
Assignments
for specific topics and reading assignments as the
semester progresses)
- Models of documents
- Query models for searching (focus on keyword-based search)
- Indexing and inverted files
- Ranking documents
- Using linking structure for Web content analysis
- Semantic and feedback techniques
- User behavior-based relevance criteria
- Privacy issues
- Manipulating search engine results (SEOs)
- Distributed computation for by search engines
- Evaluating retrieval systems
- Web crawling
- Document similarity
- Clustering
- Non-text media search: e.g. music, images
- Adding structure to information: databases, XML, the
sematic Web
- System design of search engines: distributed storage and
computing
- Searching dynamic information sources
- Information caching
- Reliability and permanence of information
A.S. LaPaugh content last changed Sun Jan 30 12:51:09 EST 2011