Part 1: topics in information retrieval and manipulation
WEEK 1
Tues. Feb. 7:Overview of
course topics and organization.
Begin information retrieval topics: How do we model information
retrieval and how do we evaluate it?
*Modern Information
Retrieval,
Chapter 1, Sections 1-3;
*Modern Information Retrieval Chapter 3.
Thurs. Feb. 9: Evaluation in the age of the Web
Class
presentation on "Relevance by TREC method" now on
line as pdf.
Reading:
Voorhees, Ellen, Evaluation by highly
relevant documents,Proceedings of the 24th Annual
International ACM SIGIR Conference on Research and Development in
Information Retrieval,
2001, pp. 74 - 82.
*Information Retrieval :
Algorithms and Heuristics Chapter 2, section 2.6
(preferrable) or *Modern Information Retrieval Chapter 2,
section 2.7.2
Also of interest:
Telcordia has a
Web site for Latent Semantic Indexing, which contains, among other
material, pointers to many other papers for those interested in
learning more.
WEEK 3 Sunday
Feb. 19:Homework 1 is now available; due Feb. 27, 2006.
Tues. Feb. 21: Classic models
continued: the Bayesian inference network
probabilistic model.
Reading:
*Information Retrieval :
Algorithms and Heuristics Chapter 2, section 2.4
(preferrable) or *Modern Information Retrieval Chapter 2,
section 2.8
Thurs. Feb. 23: Ranking
documents using linked structures (social networks, hypertext, the Web)
Reading:
*Mining
the Web: Discovering Knowledge from Hypertext Data Chapter
7, Sections 7.1-7.5
Thurs.
March 9Homework 2 due. Thurs.
March 9: Guest presentation:
Professor Tom Funkhouser speaks on A Search
Engine for 3D Models
Reading:
Thomas Funkhouser, Patrick Min, Michael Kazhdan, Joyce Chen,
Alex Halderman, David Dobkin, and David Jacobs, A Search Engine
for 3D Models (local copy),
ACM Transactions on Graphics,
22(1), January 2003.
WEEK 6
Tues. March 14: compression of indexes(moved from Mar. 7) ;
*Information Retrieval :
Algorithms and Heuristics, Section
5.4
Andrei Z. Broder,
Steven C. Glassman, Mark S. Manasse, and Geoffrey Zweig, Syntactic
Clustering of the WebProceedings of The Sixth International
WWW Conference,
1997.
Also of interest:
*Mining
the Web: Discovering Knowledge from Hypertext Data, Chapter
1
*Information Retrieval :
Algorithms and Heuristics Chapter 3, Section3.2
* Chapter 16 "Clustering" in Information Retrieval: Data
Structures and Algorithms, Frakes, William and Baeza-Yates,
Ricardo, editors,
Prentice Hall, 1992. Now on reserve at the Engineering Library. (more
detailed than the first reading).
Thurs.
March 30: Clustering
continued; Martin Makowiecki presents k-means algorithm
*Mining
the Web: Discovering Knowledge from Hypertext Data Chapter 4,
Sections 4.1-4.5. (There is some overlap with the reading for
March 28.)
Friday, March 31 Homework 3due.Note change of due date.
WEEK 8 Tues.
April 4: Clustering
continued (example of use of min-max cut, cluster comparison, cluster
applications) and question and
answer
session for the exam.
Reading:
class slides
on cluster comparison and clustering applications we discussed prior to
spring break
TAKE-HOME
EXAM
AVAILABLE in class
12:20pm THURDAY
April 6; DUE 11am in class TUESDAY April 11.
See the March 31 posting on announcements
for further information.
Thurs.
April 6:
Clustering based on minimum cut trees
Reading:
Graph
Clustering and Minimum Cut Trees, G.W. Flake, K. Tsioutsiouliklis,
R.E. Tarjan, Internet Mathematics Vol. 1,
No. 4: 385-408, 2004. Note that the algorithm presented in this paper
is a variation of the one presented in class. I
cannot find a reference for the version I presented in class other than
Kostas Tsioutsioulikis' dissertation (citation below).
Also of interest:
This discussion is based on results presented in the
Ph.D. dissertation of Konstantinos Tsioutsioulikis,
Department of Computer Science, Princeton University, June 2002.
WEEK 9 Tues. April 11: take-home
exam due in class 11am.
Tues. April 11: Data
mining - associations within data
Mining
Surprising Periodic Patterns Jiong Yang, Wei Wang, Philip
S. Yu Data
Mining and Knowledge Discovery(DMKD),
Volume 9, Number 2 September 2004pp. 189 - 216.
*Mining
the Web: Discovering Knowledge from Hypertext Data, Chapter 8,
section 8.3.1
Evaluating
topic-driven web crawlers, Filippo Menczer, Gautam Pant,
Padmini Srinivasan, Miguel E. Ruiz, Proc.
Intern.ACM SIGIR Conf. on Research and Development in Information
Retrieval (SIGIR Conf.), ACM, 2001, pages: 241 - 249.
WEEK 11
Tuesday April 25: presentation by Joseph Bradley on
latent Dirichlet allocation; overview of Web caching.
Reading:
pdf of slides from
Joseph Bradley's presentation on latent Dirichlet allocation.
Latent
Dirichlet allocation, D. Blei, A. Ng, and M. Jordan, Journal of Machine Learning Research,
3:993–1022, January 2003.
A
Web Caching Primer, Davison, B.D.
IEEE Internet Computing, IEEE Vol. 5, Issue 4 Jul/Aug
2001 pgs 38-45.
Also of interest:
Web site Brian D. Davison's
Web Caching and Content Delivery Resources: www.web-caching.com
Wednesday Aprill 26:Homework 4 (last) now available - due Wednesday
May 3Note change of due date. Thursday April 27: presentation
by Frank Macreery
on prediction of Web page access; finish Web caching
Reading:
pdf of slides
from Frank Macreery's presentation on prediction of access.
Using path
profiles to predict HTTP requests, S. Schechter, M. Krishnan and
M.D. Smith, Proceedings of the
Seventh International World-Wide Web
Conference, 1998. Published
in Computer Networks and ISDN Systems, Volume 30, issues
1-7(1998), Elsevier Science B.V..
WEEK 12 Tuesday
May 2: Content distribution networks; very brief
publish-subscribe overview
The many faces of publish/subscribe,Patrick
Th. Eugster, Pascal A. Felber, Rachid
Guerraoui, Anne-Marie Kermarrec ACM
Computing Surveys, Vol. 35 (2), June 2003m oo, 114-131.
Also of interest:
RFC 2616: Hypertext Transfer Protocol -- HTTP/1.1, in particular Chapter 13: Caching. Obtain using ftp
link from W3C Hypertext
Transfer Protocal Overview page, under "Specifications, Drafts,
Papers and
Reports - HTTP Working Group,"
Wednesday May 3: Homework
4due.Note change of due date. Thursday May 4: presentations
by Michael Wenzel on music classification and Sergio Garza on the Cat-a-Cone search
interface; wrap-up.
Reading
pdf of slides
from Sergio Garza's presentation on Cat-aCone
pdf of slides
from Michael Wenzel's presentation on music classification
Project
demonstration:If you have implemented somthing that lends itself to
live
demonstartion, I would like to see it after I receive your report
and before
5pm Mon. May 22, 2006.