|
Computer Science 598b
Advanced Topics in Computer Science
Information Access: Issues for the Web & Digital Libraries
Andrea LaPaugh |
Spring 1998 |
Directory
Schedule and Readings |
Papers of Interest
General Information
Meeting time: Tues, Thurs 1:30--2:50 PM
Meeting place: Room 302 CS building.
First meeting: Tuesday, February 3, 1998.
Professor: Andrea LaPaugh,
304 CS Building, 258-4568,
aslp@cs.princeton.edu,
Office hours by appointment. Please send email.
Course secretary: Sandra Barbu, 323 CS building, 258-4562,
barbu@cs.princeton.edu
Course Description
In 1990, the World Wide Web was born. In seven years, it has become a
pervasive part of our society. (One needs no more evidence than that
every television advertisement gives a Web site.)
During the 1990's the acquisition of digital material by
libraries has also expanded greatly. Both the Web and large digital library
collections present similar problems in the management and retrieval of
information. The Web is particularly a problem because anyone and everyone
can put (good and bad) information on the Web.
How do the providers of search and data organization
services give useful access to information? Some of the questions are
sociological; many are technical. Researchers in the computer science
community and related communities have very active programs exploring solutions
to these problems. In 1994 NSF, DARPA, and NASA jointly funded six major
research projects under the NSF/DARPA/NASA Digital Libraries Initiative.
Many other research efforts exist as well.
The Sixth International World Wide Web Conference
was held in the spring of 1997, presenting technical papers on information
access as well as other technical issues of the Web;
the 1997 IEEE International Forum on Research and Technology Advances in
Digital Libraries ( ADL '97 ) was also held in the spring of 1997;
the Second ACM International Conference on Digital Libraries was held
in the summer of 1997. These conferences are representative of the
technical work being done in this area.
This graduate seminar will study current research on information access
for large digital collections. We will read a selection of papers
reporting results of recent research. We will develop necessary background
as we study individual topics. Our core topics will be data organization
and algorithms for getting useful information from digital collections of
textual data. Examples of other potential topics include audio and video
retrieval, architectures to enhance Web retrieval, new scholarship
through digital access, and extracting information from the link structure
of a set of documents. Students will participate in the development
of the reading list. Each students will present and lead the discussion
of several papers. Students will also develop projects related to the seminar
themes as a major component of the course. Details of project development
will be discussed at the first meeting.
Course Requirements for Credit
- Presentation of papers from the research literature.
- Short homework assignments
- A project to be designed by the student in consultation
with the instructor.
Prerequisites
The seminar is intended for graduate students and advanced undergraduates.
Graduate students in areas other than computer science or computer engineering
and all undergraduates who wish to take this seminar should contact Professor
LaPaugh to discuss their background.
Schedule and Readings
- Tuesday, Feb. 3: organization and brief overview of topics
- Thursday, Feb 5: overview of Digital Library Initiative, focus on project
at Illinois if time.
Reading (I know time is short, so work on them in the order given as you have time):
- Information Retrieval in Digital Libraries: Bringing Search to the Net,
Bruce R. Schatz, Science, Vol. 275 (Jan. 17, 1997) pp. 327-334
(hardcopies outside my door).
-
Building Large-Scale Digital Libraries , IEEE Computer
theme issue on the US Digital Library Initiative, May 1996.
- Also try to "surf" the Digital Library Initiative Project Web Sites,
accessible from
NSF/DARPA/NASA Digital Libraries Initiative Projects
and look over the other articles in the May 1996 issue of Computer. Below
are pointers to the ones I could find on-line for free. The rest can be
obtained by students who are members of IEEE Computer Society through
the
online Computer
site (anyone can get abstracts)
or can be copied from the hardcopy on the bulletin board
outside my door.
- Tuesday, Feb 10. Finish overview of Digital Library Initiative projects
(still to discuss: UCSB, CMU, Stanford, Michigan)
- Thursday, Feb 12. Comparisons of Web search engines.
Papers 7 and 8 under Papers of Interest. Also
see sites giving search engine comparisons in the list of WWW resources.
- Tuesday, Feb 17. Classic document ranking. I will give an overview
of text-book material on document retrieval by relevance ranking. See
the text by Salton on reserve in the Engineering Library.
- Thursday, Feb 19. Use of co-occurrence graphs and concept graphs.
Papers 4 (5, but not yet available) and 6 under
Papers of Interest.
Presenter: Haiping Zhaou (haipingz@cs.princeton.edu).
- Tues, Feb 24 NO CLASS
- Thurs, Feb 26. query disambiguation.
Papers 9, 10, and for background only, 11 under
Papers of Interest.
Presenter: Ned Locke.
- Tues, Mar. 3. Protocals -- the Stanford approach. Papers 12, 13, and 14
under Papers of Interest. Paper 15 gives details
presented in class but is optional reading for members of the class.
Presenter: Ken Yanovsky
- Thurs. Mar 5. Protocals via agents -- the Michigan approach.
Papers 16, 17, 18 and 19 under Papers of Interest.
(Papers 20, 21, and 22 are also relevant.)
Presenter: Andrea LaPaugh
There will be a meeting for those students doing projects immediately
following class.
- Tues. Mar. 10. Parallel processing. Papesr 30, 31 and 32
under Papers of Interest.
Presenter: Minwen Ji.
- Thurs. Mar. 12. Meta-engines: searching using multiple search engines.
Papers 23, 24, and 25 under Papers of Interest.
(Papers 26 - 29 are also relevant).
Presenter: Yuanyuan Zhou
- March 17, March 19 SPRING BREAK
- Tues. Mar. 24. CANCELLED DUE TO ILLNESS OF PRESENTER. WILL BE
RESCHEDULED. Merging of responses from several (non-indentical)
collections/data bases. Papers 27(Chang and Hsu), 33(Callan, Lu and Croft),
and 34(Voorhees and Tong) under Papers of Interest.
Presenter: Dongming Jiang.
- Thurs. Mar. 26. Detecting indentical/similar documents. Papers
35(Broder et. al.) and 36(van Noortwijk and DeMulder)
under Papers of Interest.
Presenter: Hongzhang Shan
Proposals for projects due.
- Tues. Mar. 31 Finding authoritative sources. Papers 37(Kleinberg) and
38 (Chakrabari et. al.) under Papers of Interest.
Presenter: William Avery
- Thurs. Apr. 2 a presentation on the PRECISE project by guest
Thomas Knowles (Dongming's presentation again postponed due to lack of time.)
- Tues. Apr. 7 The WordNet project and its use in information access.
Papers listed under Item 58: WordNet papers in
Papers of Interest, especially the
first two papers: by Fellbaum and by Leacock and Chodorow.
Presenter: Randee Tengi
- Thurs. Apr. 9 Merging of responses from several (non-indentical)
collections/data bases. Papers 27(Chang and Hsu), 33(Callan, Lu and Croft),
and 34(Voorhees and Tong)
under Papers of Interest.
Presenter: Dongming Jiang.
Early project reports are due
- Tues. Apr. 14 Spiders. Paper 41(Cho, Garcia-Molina, and Page)
under Papers of Interest. Papers
39 (Koster) and 40(Eichmann) are also relevant.
Presenter: Trevor Sumner
- Thurs. Apr. 16
Guest lecture by Wayne Wolf on his video project.
Papers 42-44 (Wolf) and 45 (Boreczky) under
Papers of Interest.
- Tues. Apr. 21
Multi-dimensional indexing and smart indicies
Papers 46 (Kanth, et. al.), 47 (X. Cheng et. al.) and 48 (S. Geffner et. al.)
under Papers of Interest.
Presenter: Limin Wang.
- Thurs. Apr. 23
CMU video library project -- details of their audio +
video approach. Papers 49 (Hauptmann and Smith.), 50 (Christel),
51-52 (Christel et. al.) and 53 (Smith and Kanade)
under Papers of Interest.
Presenter: Joshua Toub
- Tues. Apr. 28 Guest lecture by Dannie Durand
- Thurs. Apr. 30 Mulivalent documents (Berkeley approach).
Papers 54 and 55 (Phelps and Wilensky)
under Papers of Interest. Papers 56 and 57
also relevant.
Presenter: Andrea LaPaugh
intermediate project reports are due
- Friday May 15 FINAL PROJECT REPORTS AND DEMOS (if applicable) DUE
On Reserve at Engineering Library
- Salton, Gerard, Automatic text processing: the transformation,
analysis, and retrieval of information by computer Reading, Mass.:
Addison-Wesley, 1988.
- SIGIR Conference Proceedings on Information Retrieval: 1994, 1995, 1997
(the library does not have 1996).
WWW Resources
A.S. LaPaugh
Wed Apr 29 14:27:27 EDT 1998