|
Computer Science 435
Information Retrieval, Discovery, and Delivery
Andrea LaPaugh
|
Spring 2002
|
Directory
General Information |
Schedule
and Readings |
Work of the Course |
Project Page |
Announcements
Course Summary
We study both classic techniques of indexing documents and
searching text and also new algorithms that exploit properties
of the Web (e.g. links) and modern digital libraries,
including multimedia collections.
We also study techniques for finding relationships and patterns that have
not been explicitly modeled within digital collections,
e.g. "mining" data in massive databases for new information.
Finally, improvements in network technology alone cannot meet the
ever-increasing demand for more information faster. We examine techniques
such as caching and distributed storage for making information delivery more
efficient.
Prerequisites
COS 217 and 226.
Administrative Information
Meeting time: Mon, Wed 11:00AM--12:20 PM
Meeting place: Room 301 Computer Science Building
Extra meetings: We may need to make up a class or two that we
miss due to my schedule. Therefore, we may have a class during reading
period and/or some evening classes during the semester. Class participants
will be consulted before any make-up class time is chosen.
Professor:
Andrea
LaPaugh, aslp@cs.princeton.edu,
304 CS Building, 258-4568,
or Forbes College Office*, 258-5232
Office hours Monday and Wednesday 12:20--1:00PM
or by appointment. Please catch me after class or send email to
make an appointment.
* in my "other life" I am Master of Forbes College; you are welcome
to call me at either office.
Course secretary: Mitra Kelly, 323 CS building, 258-4562,
mkelly@cs.princeton.edu
Reading
Required text: None
Supplemental reading on reserve at Engineering Library
- Baeza -Yates and Ribeiro-Neto, Modern Information Retrieval,
Addison-Wesley, 1999.
- expect additions as we progress in the semester
We will also use reprints and online material.
Syllabus
(this is the general list of topics and probably a superset of what
we will have time to cover. Please see Schedule
and Readings for specific topics and reading assignments)
Part 1, topics in information retrieval and manipulation:
- Indexing and inverted files
- Keyword-based searching
- Vector space model of documents
- Latent Semantic Indexing.
- Ranking documents
- Evaluating retrieval systems
- Using URL structure for Web document categorizing
Part 2, topics in document similarity and information discovery:
- Web crawling
- Document similarity
- Clustering
- Pattern recognition
- Semantic and feedback techniques
Part 3, systems issues in delivering digital information:
- Information caching
- Information prefetching
- Distributed storage
- Broadcast-based systems
- Reliability and permanence
A.S. LaPaugh
content last changed February 2002