(HITS algorithm) Kleinberg, Jon,
Authoritative sources in a hyperlinked environment, Journal of
the ACM, Vol. 46, No. 5(Sept. 1999), pp.604-632. (Earlier
versions appeared in Proc. 9th
ACM-SIAM Symposium on Discrete Algorithms, 1998 and as IBM
Research Report RJ 10076, May 1997.)
(PageRank algorithm) Page, Larry and Sergey Brin, R. Motwani,
T. Winograd, The PageRank Citation Ranking: Bringing
Order to the Web, Stanford Digital Library
Technologies Project TR, Jan. 1998. (Early version: L.
Page. PageRank: Bringing order to
the web. Stanford Digital
Libraries Working Paper 1997-0072, Stanford University, 1997. )
WEEK 3 Mon. Feb. 14: Evaluation of
retrieval systems
MapReduce: simplified
data
processing
on
large
clusters, Jeffrey
Dean
and Sanjay Ghemawat, Communications of the
ACM, 51(1), Jan. 2008. (Special
50th Anniversary issue: Breakthrough
research:
a
preview
of
things
to
come.)
An XQuery Sandbox example tool that uses XML marked-up
Shakespeare plays can be found on the eXist
Project
Web
site. The eXist Project is centered around eXist-db, which is (in their
words) "an open source database management system entirely built
on XML technology."
take-home EXAM 1:
DISTRIBUTED
end
of
class
Wednesday
March
9.
DUE
3:00
PM
Friday,
March
11.
Spring
break WEEK 7
Mon. March 21:canceled
Wed. March 23: Search
refinement;using users
behavior
The Adaptive Web, P.
Brusilovsky,
A. Kobsa, W. Nejdl, eds., Lecture
Notes
in
Computer
Science book series Vol 4321, Springer, 2007. This book
contains
several relevant chapters. Chapter
6:
Personalized
Search on the World Wide Web by A. Micarelli, F.Gasparetti,
F.Sciarrone and S. Gauch is of particular interest. The chapters
are available
as pdf files to members of the Princeton University community by
accessing them from the princeton.edu domain.
Introduction
to
Information
Retrieval, Chapter 16, Section 6.3 is
recommended if you are going to read research papers on
clustering. We will touch on external evaluation criteria very briefly.
Evaluating
topic-driven web crawlers, Filippo Menczer, Gautam Pant,
Padmini Srinivasan, Miguel E. Ruiz, Proc.
Intern.ACM SIGIR Conf. on Research and Development in Information
Retrieval (SIGIR Conf.), ACM, 2001, pages: 241 - 249.
Wed. April 13:Focused crawling;Characteristics
of
the
changing
Web
Harnessing
the
Deep
Web:
Present
and
Future
(pdf), Jayant Madhavan,
Loredana Afanasiev, Lyublena Antova, and Alon Halevy, 4th Biennial Conference on Innovative Data
Systems Research (CIDR), Jan. 2009.
Searching
the
deep
web, Alex Wright, Communications
of
the ACM, Vol. 51 No. 10 (Oct. 2008), pages 14-15.
Google's
Deep-Web Crawl (pdf),Jayant Madhavan, David Ko, Lucja Kot,
Vignesh Ganapathy, Alex Rasmussen, and Alon Y. Halevy, 34th
Intern. Conf. on Very
Large Data Bases, VLDB Endowment, Aug. 2008.
Accessing
the
deep
web, Bin He, Mitesh Patel, Zhen Zhang, and Kevin
Chen-Chuan Chang, Communications
of
the
ACM, Vol. 50 No. 5 (May 2007), pages 94-101.
Searching
for
Hidden-Web
Databases, Luciano Barbosa and Juliana
Freire., Proceedings of the 8th ACM
SIGMOD International Workshop on Web and Databases (WebDB), pp.
1-6, ACM 2005. (A more recent, more complicated version of the
crawler is described at the 2007 WWW conf.)
The
Semantic
Web, Tim Berners-Lee, James Hendler and Ora Lassila, Scientific
American 284(5), May 2001, p. 34-43. (Scientific American is available
online through the Princeton University Library.)
In class we saw two illustrations
of an OWL ontology; they are
from Electronics and Telecommunications Research Institute of
Korea ezOWL
project, a Semantic Web Ontology Editor.