|
Computer Science 435
Information Retrieval, Discovery, and Delivery
Andrea LaPaugh |
Spring 2002 |
General Information |
Schedule and Readings |
Work of the Course |
Project Page |
Announcements
Mon. Feb. 4: Organization and brief overview of topics
Part 1: topics in information retrieval and manipulation
Wed. Feb. 6: Classic models for information retrieval
- Reading:
- *Modern Information Retrieval Chapter 1, Chapter 2 sections 2.1-2.5.
Mon. Feb. 11: Classic models continued: extended Boolean model
and latent semantic indexing
- Reading:
- *Modern Information Retrieval Chapter 2 sections 2.6 and 2.7.2.
-
Deerwester, S., Dumais, S. T., Landauer, T. K., Furnas, G. W. and
Harshman, R. A. (1990),
"Indexing by latent semantic analysis." (PDF, no figures)
Journal of the American Society for Information Science, 41(6), 391-407.
- Also of interest:
Wed. Feb. 13: Classic models continued: the
Bayesian network probabilistic model
- Reading:
- *Modern Information Retrieval Chapter 2 section 2.8.1
- Turtle, Howard and Croft, W. Bruce,
Evaluation of an inference network-based retrieval model
ACM Transactions on Information Systems (TOIS),
Vol. 9, No. 3 (July 1991), pp. 187 - 222.
This paper is long. Read selectively for the theoretical development and
evaluation (Sections 1-3 and 6-8).
Mon. Feb. 18: Ranking documents using URL structure:
hubs and authorities and Google's pagerank.
- Reading:
- *Modern Information Retrieval Section 13.4.4
- Brin, Sergey and Page, Lawrence
The Anatomy of a Large-Scale Hypertextual Web Search Engine, Proceedings of the
Seventh International WWW Conference (WWW 7) (1998).
- Kleinberg, Jon,
Authoritative sources in a hyperlinked environment, Journal of the ACM, Vol. 46, No. 5(Sept. 1999), pp.604-632.
- D. Gibson, J. Kleinberg, and P. Raghavan.
Structural Analysis of the World Wide Web,
WWW Consortium Web Characterization Workshop, November 1998.
- Also of interest:
Wed. Feb. 20:Evaluation of Retrieval Techniques
- Reading:
- *Modern Information Retrieval Chapter 3
Mon. Feb. 25:
Problem set 1 due today.
Finish Evaluation. (postponed starting "Indexing and Searching").
- Reading:
- Voorhees, Ellen,
Evaluation by highly relevant
documents, Proceedings of the 24th Annual International ACM SIGIR
Conference on Research and Development in Information Retrieval,
2001, pp. 74 - 82.
- Also of interest:
-
D. K. Harman "The TREC Conferences" In Proceedings of Hypertext -
Information Retrieval - Multimedia (HIM) '95:
Synergieeffekte Elektronischer Informationssysteme,
Universitaetsforlag Konstanz, Konstanz, Germany, pp. 9-28 (1995).
Wed. Feb. 27: Indexing and Searching
- Reading:
- *Modern Information Retrieval Chapter 4, Chapter 8
sections 8.1-8.4.
Mon. March 4:
Project proposal due today.
Indexing and Searching continued
- Reading:
- *Modern Information Retrieval Section 7.4.5
- Also of interest:
Wed. March 6: Indexing and Searching continued
- Reading:
- *Modern Information Retrieval Section 6.3.3 and Chapter 10
- review discussion of data structures in
"The Anatomy of a Large-Scale Hypertextual Web Search Engine" posted for 2/18.
Mon. March 11:
Problem set 2 postponed.
!!! Guest Speaker: George Tzanetakis on audio information retrieval.
Wed. March 13 General remarks on media retrieval (postponed from last
time) and image retrieval
- Reading (originally assigned for 3/11):
Mon. March 18 and Wed. March 20 SPRING BREAK
Part 2: topics in document similarity and information discovery
Mon. March 25: Introduction to data mining; document similarity
- Reading:
- Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, and Geoffrey Zweig,
Syntactic Clustering of the Web
Proceedings of The Sixth International WWW Conference, 1997.
Wed. March 27: Overview of clustering
Thurs. March 28:
Problem set 2 due 3pm today.
Mon. April 1: Clustering, continued.
- Reading:
- Chapter 16 "Clustering" in Information Retrieval: Data Structures and
Algorithms, Frakes, William and Baeza-Yates, Ricardo, editors,
Prentice Hall, 1992. Now on reserve at the Engineering Library.
- David Harel and Yehuda Koren,
Clustering
spatial data using random walks,
Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001,
pg. 281 - 286.
-
Wed. April 3: Problem set 3 due today.
Question and answer session in preparation for take-home exam. No new material.
Thurs. April 4:
Take-home exam available for pick-up starting at 9am today.
See Announcements for
specific instructions.
Mon. April 8: Clustering based on minimum cut trees
- Reading:
- This discussion is based on results being presented in the Ph.D. dissertation of Konstantinos Tsioutsioulikis,
Department of Computer Science, Princeton University, June 2002.
- Part of the material discussed to day is presented
in
G.W. Flake, K. Tsioutsiouliklis, R.E. Tarjan,
Graph Clustering Techniques based on Minimum Cut Trees,
Technical Report 2002-06, NEC, Princeton, NJ, 2002.
(pdf version)
Tues. April 9:
Take-home exam due at 5pm today.
See Announcements for
specific instructions.
Wed. April 10:Issues in Web crawling.
- Reading:
- Edwards, McCurley, and Tomlin,
An Adaptive Model for Optimizing Performance of an Incremental Web Crawler,"
Tenth International World Wide Web Conference (WWW10), 2001.
-
Aggarwal, Al-Garawi, and Yu, Intelligent Crawling On the World Wide Web
with Arbitrary Predicates," Tenth International World Wide Web Conference (WWW10), 2001.
Mon. April 15: Recommender Systems
Part 3: systems issues in delivering digital information
Wed. April 17: Web caching and prefetching
- Reading:
- J. Wang, "A Survey of Web Caching Schemes for the Internet," ACM Computer Communication
Review, (29):36-46, October 1999
(available from NEC ResearchIndex).
- Kroeger, Long, and Mogul, "Exploring the Bounds of
Web Latency Reduction from Caching and Prefetching," Proc. of the USENIX Symposium
on Internet Technologies and Systems, November 1997
(available from NEC ResearchIndex).
- Also of interest:
- Breslau, Cao, Fan, Phillips, and Shenker, "Web Caching and Zipf-like Distributions:
Evidence, and Implications," Proceedings of IEEE INFOCOM, 1999
(available from NEC ResearchIndex).
Mon. April 22:
Problem set 4 due today.
Caching and prefetching, continued.
Wed. April 24: Caching and prefetcing, continued.
Mon. April 29: Student presentations
- Students:
- Jeff Bigham
- Ben Haskell
- Bismarck Paliz
- Jonathan Foote,
An Overview of Audio Information Retrieval, Multimedia Systems,
Vol. 7, No. 1 (Jan. 1999) pp. 2-10.
- Ghias, Logan, Chamberlin, and Smith, Query by Humming:
Musical Information Retrieval in an Audio Database
International Multimedia Conference, ACM, 10995, pp. 231-236.
Wed. May 1: Student presentations, content distribution networks,
and concluding remarks
- Reading
- Krishnamurthy, Wills and Zhang, "On the Use and Performance of
Content Distribution Networks," ACM SIGCOMM Internet Measurement
Workshop, 2001 (
(available from NEC ResearchIndex).
Tues. May 14, Dean's Date:
Projects due at 5pm today.
Wed. May 15 :EXTRA CLASS at NOON for student presentations and
project demonstrations. In Rm 301.
* on reserve in the Engineering Library
A.S. LaPaugh
Mon May 13 23:08:25 EDT 2002