COS 435, Spring 2006: Home Page

Princeton University
Computer Science Dept.

Computer Science 435
Information Retrieval, Discovery, and Delivery

Andrea LaPaugh

Spring 2006

Directory
General Information | Schedule and Readings | Work of the Course | Project Page | Announcements

Course Summary

We study both classic techniques of indexing documents and searching text and also new algorithms that exploit properties of the Web (e.g. links) and modern digital libraries, including multimedia collections. We also study techniques for finding relationships and patterns that have not been explicitly modeled within digital collections, e.g. "mining" data in massive databases for new information. Finally, improvements in network technology alone cannot meet the ever-increasing demand for more information faster. We examine techniques such as caching and distributed storage for making information delivery more efficient.

Prerequisites

COS 217 and 226.

Administrative Information

Meeting time: Tuesday, Thursday 11:00am -12:20pm
Meeting place: 301 Computer Science Building
Extra meetings: We may need to make up a class or two that we miss due to my schedule. Therefore, we may have a class during reading period and/or some evening classes during the semester. Class participants will be consulted before any make-up class time is chosen.

Professor: Andrea LaPaugh, aslp@cs.princeton.edu,
304 CS Building, 258-4568,
Office hours (changed) Tuesday 12:20--1:20PM and Wednesday 10:00--11:00AM or by appointment. Please catch me after class or send email to make an appointment.

Course secretary: Mitra Kelly, 323 CS building, 258-4562, mkelly@cs.princeton.edu

Reading

Required text: None

Supplemental reading (check back for additions as we progress in the semester):

On reserve at Engineering Library:

Baeza -Yates and Ribeiro-Neto, Modern Information Retrieval, Addison-Wesley, 1999.

(I have requested that the University Store have in stock a small number of copies.)

Grossman, David and Frieder, Ophir, Information Retrieval : Algorithms and Heuristics, 2nd edition, Springer, 2004.

(This text has more examples than Baeza-Yates and Ribeiro-Neto, but the hardcover version is very expensive and the softcover is hard to get.)

Chakrabarti, Soumen, Mining the Web: Discovering Knowledge from Hypertext Data, Elsevier (Morgan_Kaufmann Division), 2003.

We will also use reprints and online material.

Syllabus - tentative- work in progress

(this is the general list of topics and probably a superset of what we will have time to cover. Please see Schedule and Readings for specific topics and reading assignments)

Part 1, topics in information retrieval and manipulation:

Indexing and inverted files
Keyword-based searching
Vector space model of documents
Latent Semantic Indexing.
Ranking documents
Evaluating retrieval systems
Using URL structure for Web document categorizing

Part 2, topics in document similarity and information discovery:

Web crawling
Document similarity
Clustering
Pattern recognition
Semantic and feedback techniques

Part 3, systems issues in delivering digital information:

Information caching
Information prefetching
Distributed storage
Broadcast-based systems
Reliability and permanence

A.S. LaPaugh content last changed Fri Feb 24 11:19:00 EST 2006

Princeton University Computer Science Dept.

Computer Science 435 Information Retrieval, Discovery, and Delivery Andrea LaPaugh