|
Computer Science 598b
Advanced Topics in Computer Science
Digital Information Access
Andrea LaPaugh |
Spring 1999 |
General Information |
Schedule and Readings |
WWW Resources | Course tools
Tues, Feb. 2: Organization and brief overview of topics
Reading:
Part 1: topics in information retrieval and manipulation
Thurs., Feb. 4: class cancelled.
Tues., Feb. 9: Review of indexing, hashing, basic searching.
Reading:
- Lesk, Practical Digital Libraries, Chapter 5 (also review Chapter 2 material) -- main information source
- * Salton, G. et. al. "A Vector Space Model for Automatic
Indexing,"
I don't expect you to read this paper in detail for today, but get
main ideas.
Thurs., Feb. 11: Handling large spaces in the vector space model:
Latent Semantic Indexing.
Reading:
-
Deerwester, S., Dumais, S. T., Landauer, T. K., Furnas, G. W. and
Harshman, R. A. (1990),
"Indexing by latent semantic analysis." (PDF)
Journal of the Society for Information Science, 41(6), 391-407.
Also of interest:
Tues., Feb. 16: Handling large collections, continued: inverted
files.
Reading:
Thurs., Feb. 18: No class due to conflict in my schedule
Tues., Feb. 23:
Evaluating retrieval systems:
Reading:
-
Lesk, Practical Digital Libraries, Chapter 7, focussing on Section 6.
-
* editors K. Sparck Jones and P. Willett, preface to "Evaluation", Chapter 4 of
Readings in Information Retrieval Morgan Kaufmann, San Francisco, Ca.
(1997)
-
* T. Saracevic, P. Kantor, A. Y. Chamis, and D. Trivison,
"A Study of Information Seeking and Retrieving, Part I.
Background and Methodology" J. American Society for Information Science
39 (1988), 161-176.
-
* D. K. Harman "The TREC Conferences" In Proceedingsf of Hypertext -
Information Retrieval - Multimedia (HIM) '95:
Synergieeffekte Elektronischer Informationssysteme,
Universitaetsforlag Konstanz, Konstanz, Germany, pp. 9-28 (1995).
-
Eric Lagergren and Paul Over,
Comparing interactive information retrieval systems across sites:
the TREC-6 interactive track matrix experiment,
Proc. 21st Annual Intern. ACM SIGIR Conf. on Research and Development in
Information Retrieval (SIGIR '98), pg. 164-172.
-
Gordon V. Cormack, Christopher R. Palmer and Charles L. A. Clarke,
Efficient construction of large test collections,
i>Proc. 21st Annual Intern. ACM SIGIR Conf. on Research and Development in
Information Retrieval (SIGIR '98), pg. 282-289.
-
Justin Zobel, How reliable are the results of large-scale information retrieval experiments?,
Proc. 21st Annual Intern. ACM SIGIR Conf. on Research and Development in
Information Retrieval (SIGIR '98), pg. 307-314.
Thurs., Feb.25 :
Ranking documents
Reading:
Tues., Mar. 1:
Using URL structure for Web document categorizing:
Reading:
The following three papers are available from
Jon Kleinberg's
publication list. The first paper is a technique developed by Kleinberg
to use the URL structure of hypertext documents to deduce which documents
might be authorities for a topic. This technique is used by the following
two papers. We will study the first paper in detail and then more briefly
look at its applications as demonstrated by the folllow-on papers.
The papers appear in the order in which you should give your attention to them.
- Jon Kleinberg, "Authoritative sources in a hyperlinked environment",
Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, (1998).
- S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, P. Raghavan, and
S. Rajagopalan, "Automatic resource list compilation by analyzing hyperlink
structure and associated text," Proc. 7th International World Wide Web
Conference, (1998).
- D. Gibson, J. Kleinberg, and P. Raghavan,
"Inferring Web communities from link topology,"
Proc. 9th ACM Conference on Hypertext and Hypermedia, (1998).
Thurs., Mar. 3:
Document similarity and clustering
Reading:
- Oren Zamir and Oren Etzioni,
Web document clustering: a feasibility demonstration
Proc. 21st Annual Intern. ACM SIGIR Conf. on Research and Development in
Information Retrieval (SIGIR '98), pg. 46-54.
- Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, and Geoffrey Zweig,
Syntactic Clustering of the Web
Proceedings of The Sixth International WWW Conference, 1997.
- * G. Salton, A. Wong and C. S. Yang "A Vector Space Model for Automatic
Indexing," Communications of the ACM, 18 (1975), pp. 613-620.
Handed out originally for Feb. 9 class. We will look at the clustering
aspects now.
Also of interest:
- Chapter 16 "Clustering Algorithms" in Information Retrieval:
Data Structures and Algorithms Frakes and Baeza-Yates, ed. (on reserve).
Tues., Mar. 9:
Finish "document clustering and similarity".
Class will end at 4:00pm so that people can attend
the Program in Science, Technology and Ethics seminar
Internet Privacy: A Right or a Contradiction, by Jason Catlett
of Junkbusters Corporation.
4:30 pm in Bowl 5, Roberston Hall
Thurs., Mar.11: Semantic and feedback techniques.
Reading:
-
Chakravarthy, A. S. and K. B. Haase.
``NetSerf: using semantic knowledge to find Internet information.''
In: Proceedings of the 18th Annual ACM SIGIR Conference
on Research and Development in Information Retrieval, Seattle,
1995.
- * Chen, H., A. L. Houston, R. R. Sewell, and B. R. Schatz,
``Internet Browsing and Searching: User Evaluation of Category Map and Concept
Space Techniques,''
Journal of the American Society for Information Science,
Vol. 49:7 (1998) pgs. 582-603
(Special Issue on AI Techniques for Emerging Information Systems
Applications).
- * Larry Fitzpatrick and Mei Dent,
Automatic feedback using past queries: social searching?,
Proc. 20th Annual Intern. ACM SIGIR Conf. on Research and Development in
Information Retrieval (SIGIR '97), pg. 306-313.
- * Bienvenido Vélez, Ron Weiss, Mark A. Sheldon and David K. Gifford,
Fast and effective query refinement,
Proc. 20th Annual Intern. ACM SIGIR Conf. on Research and Development in
Information Retrieval (SIGIR '97), pg. 6-15.
Also of interest:
Tues., Mar. 16: and Thurs., Mar. 18: spring recess
Part 2: systems issues in delivering digital information
Tues., Mar. 23: Spiders
Reading:
- Junghoo Cho, Hector Garcia-Molina, and Lawrence Page,
Efficient Web Crawling through URL ordering
Proceedings of the 7th World Wide Web conference, April 1998.
- Robert C. Miller and Krishna Bharat
SPHINX: a framework for creating personal, site-specific Web crawlers
Proceedings of the 7th World Wide Web conference, April 1998.
- Scott Spekta
The TkWWW Robot: Beyond Browsing
Proceedings of the 2nd World Wide Web Conference: Mosaic and the Web,
October 1994
- added late Martijn Koster,
Robots in the Web: threat or treat?
ConneXions, Volume 9, No. 4, April 1995.
Also of interest:
Thurs., Mar. 25: More on "crawling" the web:
We will finish the papers
from Tuesday and discuss the Introna and Nissenbaum paper.
Class will end at 3:55pm so that people may attend the department colloquium.
Reading:
- * Lucas Introna and Helen Nissenbaum, The Politics of Search Engines,
manuscript.
Also of interest:
Tues., Mar. 30: Web caching
Reading:
Also of interest:
- Marc Abrams, Charles R. Standridge, Ghaleb Abdulla, Stephen Williams, and
Edward A. Fox,
Caching Proxies: Limitations and Potentials
WWW4, 1995.
- S. Glassman, A Caching Relay for the World-Wide Web,
PostScript First
International World-Wide Web Conference (WWW1), pages 69-76, May 1994;
also appeared in Computer Networks and ISDN Systems 27, No. 2, 1994.
Thurs., Apr. 1: Web prefetching
Reading:
Also of interest:
-
Home page of Wcol, the prefetching proxy server of the
NARA Institute of Science and Technology, Japan, which discussed in the
Chinen and Yamaguchi paper above.
- Carlos R. Cunha and Carlos F.B. Jaccoud, Determining WWW User's Next
Access and Its Application to Pre-fetching, Proceedings of
ISCC'97: The Second IEEE Symposium on Computers and Communications.
Alexandria, Egypt, 1-3 July 1997.
PostScript of extended version can be found at
C. Cunha's
BU home page.
- J. Gwertzman and M. Seltzer,
An Analysis of Geographical Push-Caching, technical report.
- Tong Sau Loon and Vaduvur Bharghavan,
Alleviating the Latency and Bandwidth Problems in WWW Browsing,
Usenix Symposium on Internet Technologies and Systems, 1997.
- Evangelos P. Markatos and Catherine E. Chronaki,
A Top-10 Approach to Prefetching the Web,
Proceedings of INET' 98 (The Internet Summit),
Geneva, Switzerland, July 1998.
Tues., Apr. 6: meta-engines
Reading:
- Daniel Dreilinger and Adele Howe.
Experiences with Selecting Search Engines Using Meta-Search
ACM Transactions on Information Systems,
Vol. 15, No. 3 (July 1997), Pages 195-222.
bibliography with link to PostScript version under Papers).
- Steve Lawrence and C. Lee Giles,
Inquirus, the NECI meta search engine,
Proceedings of the Seventh International World Wide Web Conference
p. 95-105, 1998.
- Steve Lawrence and C. Lee Giles,
Context and Page Analysis for Improved Web Search,
IEEE Internet Computing, Volume 2, Number 4, July/August 1998,
pp. 38-46.
- Chia-Hui Chang and Ching-Chi Hsu,
Customizable Multi-Engine Search Tool with Clustering
Sixth International World Wide Web Conference, 1997.
Also of interest:
- Selberg, Erik and Etzioni, Oren.
The MetaCrawler Architecture for Resource Aggregation on the Web
IEEE Expert, January/February 1997, Volume 12 No. 1, pp. 8-14.
- Selberg, Erik and Etzioni, Oren.
Multi-Service Search and Comparison using the MetaCrawler
Fourth International World Wide Web Conference, Dec. 1995.
- Ellen M. Voorhees, and Richard M. Tong,
Multiple Search Engines in Database Merging,
Digital Libraries 97, 1997.
- James P. Callan, Zhihong Lu, and W. Bruce Croft,
Searching Distributed Collections With Inference Networks,
Proceedings of 18th Annual International ACM SIGIR Conference
on Research and Development in Information Retrieval, 1995.
Thurs., Apr. 8:agents
Reading:
- J. Alfredo Sánchez, John J. Leggett and John L. Schnase
AGS: introducing agents as services provided by digital libraries,
Proceedings of the Second ACM
Digital Library Conference, Philadelphia, PA, USA, July 1997.
- Weinstein, P., Alloway, G.
Seed Ontologies: growing digital libraries as
distributed, intelligent systems. Proceedings of the Second ACM
Digital Library Conference, Philadelphia, PA, USA, July 1997.
- Wellman, M.P., Durfee, E.H., and Birmingham, W.P.
The Digital Library as Community of Information Agents, IEEE Expert
11(3):10-11, June 1996.
Also of interest:
- Peter Weinstein and William P. Birmingham,
Runtime Classification of Agent Services
Proceedings of the AAAI-97 Spring Symposium on Ontological Engineering
Palo Alto, CA, March 1997.
- Birmingham, W. P., Durfee, E. H., Mullen, T., and Wellman, M. P.
The Distributed Agent Architecture of the University of
Michigan Digital Library,
AAAI Spring Symposium on Information Gathering
in Heterogeneous, Distributed Environments, Stanford, CA, AAAI Press.
Un-zipped PostScript temporarily available
locally.
Tues., Apr. 13: more on interoperability
Following class is the Program in Science, Technology and Ethics seminar
by Peter Arge: The Embedded Internet (co-sponsored by Computer Science Dept.)
Reading:
- Andreas Paepcke, Michelle Baldonado, Chen-Chuan K. Chang, Steve Cousins,
and Hector Garcia-Molina,
Building the InfoBus: A Review of Technical Choices in the Stanford Digital Library Project, Stanford Digital Library Project report.
- Michelle Baldonado, Chen-Chuan K. Chang, Luis Gravano, and Andreas Paepcke,
The Stanford Digital Library Metadata Architecture
Internationa Journal of Digital Libraries, 1(2), 1997.
Thurs., Apr. 15: reliability and permanence
This class will end at 4pm due to another commitment that I need to keep.
Let's try to start on time.
Reading:
- David M. Levy,
Heroic measures: reflections on the possibility and purpose of digital preservation,
Proceedings of the Third ACM
Digital Library Conference, Pittsburgh, PA, USA, June 1998, pp. 152-161.
- William Y. Arms,
Key Concepts in the Architecture of the Digital Library,
D-Lib Magazine, July 1995
- Robert Kahn and Robert Wilensky,
A Framework for Distributed Digital Object Services, technical report
from the Corporation for National Research Initiatives (CNRI) (cnri.dlib/tn95-01), May 13, 1995.
Also of interest:
Tues., Apr. 20: Peter Yianilos: Archival Intermemory
Reading:
Also of interest:
Thurs., Apr. 22: Geliang Tong: The CMU video library project.
Reading:
- Christel, M., Winkler, D., and Taylor, R., Multimedia Abstractions
for a Digital Video Library,
(pdf format)
Proceedings of ACM Digital Libraries Conference,
Philadelphia, PA. July 1997.
- Smith, M., and Kanade, T., Video Skimming for Quick Browsing Based on
Audio and Image Characterization
(pdf format)
Technical Report CMU-CS-95, 1995.
Tues., Apr. 27: Jon Forsyth: audio retrieval
Reading:
- Foote, An Overview of Audio Information Retrieval,
PDF temporarily available locally.
- Wold et. al., Content-Based Classification, Search
IEEE MultiMedia 1996.
HTML temporarily available locally.
- McNab et. al., Tune Retrieval in the Multimedia Library
PostScript temporarily available locally.
Also of interest:
- Martin & Scheirer, Music Content Analysis Through Models of
Audition
PDF temporarily available locally.
- McNab, Signal Processing for Melody Transcription
PostScript temporarily available locally.
- Foote, Content-Based Retrieval of Music and Audio
PDF temporarily available locally.
Thurs., Apr. 29: Peter Mei: multivalent-documents
Reading:
Also of interest:
Mon., May 17: course projects due
Tues., May 18, 10:30-noon Final meeting:
presentation of class projects and discussion of economic issues.
Reading:
Also of interest:
- S. Cousins, S. Ketchpel, A. Paepcke, H. Garcia-Molina, S. Hassan, and M. Röscheisen,
InterPay: Managing Multiple Payment Mechanisms in Digital Libraries,
in Digital Libraries 95. PostScript and PDF available from the
puplic reports list
of the Stanford Digital Library Project.
* indicates handed out in class