|
Computer Science 435
Information Retrieval, Discovery, and Delivery
Andrea LaPaugh
|
Spring 2006
|
Directory
General Information |
Schedule
and Readings |
Work of the Course |
Project Page |
Announcements
Course Summary
We study both classic techniques of indexing documents and searching
text and also new algorithms that exploit properties of the Web (e.g.
links) and modern digital libraries,
including multimedia collections.
We also study techniques for finding relationships and patterns that
have
not been explicitly modeled within digital collections, e.g. "mining"
data in massive databases for new information. Finally, improvements in
network technology alone cannot meet the
ever-increasing demand for more information faster. We examine
techniques
such as caching and distributed storage for making information delivery
more
efficient.
Prerequisites
COS 217 and 226.
Administrative Information
Meeting time: Tuesday, Thursday
11:00am -12:20pm
Meeting place: 301 Computer
Science Building
Extra meetings: We may need to make up a class or two that we
miss due to my schedule. Therefore, we may have a class during reading
period and/or some evening classes during the semester. Class
participants
will be consulted before any make-up class time is chosen.
Professor: Andrea
LaPaugh, aslp@cs.princeton.edu,
304 CS Building, 258-4568,
Office hours (changed)
Tuesday 12:20--1:20PM and Wednesday 10:00--11:00AM or by
appointment. Please catch me after class or send email to make an
appointment.
Course secretary: Mitra Kelly, 323 CS building, 258-4562,
mkelly@cs.princeton.edu
Reading
Required text: None
Supplemental reading (check back for additions as we progress in the semester):
On reserve at Engineering Library:
- Baeza -Yates and Ribeiro-Neto, Modern Information Retrieval,
Addison-Wesley, 1999.
- (I have requested that the University Store have in stock a
small number of copies.)
- Grossman, David and Frieder, Ophir, Information Retrieval : Algorithms and
Heuristics, 2nd edition, Springer, 2004.
- (This text has more examples than Baeza-Yates and Ribeiro-Neto,
but the hardcover version is very expensive and the softcover is hard
to get.)
- Chakrabarti, Soumen, Mining
the Web: Discovering Knowledge from Hypertext Data, Elsevier
(Morgan_Kaufmann Division), 2003.
We will also use reprints and online
material.
Syllabus - tentative- work in progress
(this is the general list of topics and probably a superset of what
we will have time to cover. Please see Schedule
and Readings for specific topics and reading assignments)
Part 1, topics in information retrieval and manipulation:
- Indexing and inverted files
- Keyword-based searching
- Vector space model of documents
- Latent Semantic Indexing.
- Ranking documents
- Evaluating retrieval systems
- Using URL structure for Web document categorizing
Part 2, topics in document similarity and information discovery:
- Web crawling
- Document similarity
- Clustering
- Pattern recognition
- Semantic and feedback techniques
Part 3, systems issues in delivering digital information:
- Information caching
- Information prefetching
- Distributed storage
- Broadcast-based systems
- Reliability and permanence
A.S. LaPaugh content last changed Fri Feb 24 11:19:00 EST 2006