|
Computer Science 435
Information Retrieval, Discovery, and Delivery
Andrea LaPaugh
|
Spring 2011
|
Directory
General Information |
Schedule
and Assignments |
Project Page |
Announcements
Information about the Course Project
Each student or pair of students
will do a final project of his/her or
their choosing related to
the material of the course.
Project requirements:
Proposal due 5:00pm Monday,
Feb. 28, 2011:
Email a paragraph describing
your
proposed
project to Prof. LaPaugh. Include
as
much detail as possible. This will be the starting point of a
discussion with Professor LaPaugh to make sure the project is of the
appropriate scope for a class project.
UPDATED
Progress
report week
of
April
4,
2011 between April 4 and April 15 :
Meet
with Professor LaPaugh to discuss your progress on your project.
Expect to spend about 15 minutes discussing your work to
date. You will not give a formal
presentation, but you
should prepare slides that summarize any algorithms, system
architecture, or experiments you are developing for the
project. Email these to Professor LaPaugh ahead of your
meeting time.
After spring break, you will be able to sign up for your appointment
using OIT's office
hours scheduling system WASS.
Wait
until
the
availability of appointment blocks is announced (watch
the course Announcements
page). To use WASS, log in and click the "Make an Appointment"
menu button. Search for the calendar under
name "LaPaugh" or NetId "aslp" entitled COS
435 calendar. Once the calendar is found, click
"Make Appointment".
If you have conflicts with all available times, email
Professor
LaPaugh. Caution: do not use the calendar entitled Advising calendar for Andrea LaPaugh.
Project Report due
5:00
pm
Dean's
Date,
Tuesday
May
10, 2011:
You are required to submit a final report that describes
your project. This must include the statement of the topic and the
goals of the project, your methodology
and the results. If it is an experimental project, you need to describe
what was implemented, the major implementation decisions, how you
designed
the experiments, and the experimental results. If you developed a
system or tool, you may not have experiments per se, but you must
describe how you are evaluating the project and the outcome. You
should also relate your work to other work on the problem. Your
code should be in an
appendix or posted on a Web page with the URL provided (Web posting is
preferred). If your project is a theoretical study, you need to
describe the problem,
review what was known about the problem before your analysis, and give
the details and the results of your theoretical analysis. If your
project is a literature-based
project, you need to describe the major issues under study, summarize
the
major techniques and the theoretical and/or experimental results
presented in the literature and critically
analyze the results. For any type of
project, be sure to include a bibliography of all the sources you used.
Projects will be graded
on thoroughness and depth of thought. Difficulty
will be taken into consideration. Keep in mind that evaluation
is an important part of any project. Be clear on the goals of your
project
and how you demonstrate or measure success.
Project
Demonstration
After submitting your final report, meet briefly with Professor LaPaugh
to discuss the results of your project . If you have implemented
something that lends itself to live
demonstration, this is the time to show it. All meeting must
occur before 5pm Mon. May 16,
2011.
List of suggested projects:
These topics are fairly
broad and need further refinement based on students' particular
interests. Students are
encouraged to suggest other project topics based on their
own interests. Check back
for updates and additions.
- PageRank and/or HITS can be applied to any directed graph.
Explore the use of one or both of them in another application
domain. This is intended to be an experimental project, but the
literature for the application should be explored as well.
- Investigate the use of link analysis to determine the subject of
non-text pages. For example, if a Web page contains only an
image, not only the anchor text of links pointing to the page but the
subject matter of pages pointing to the page may allow one to decide
the general subject of the image. Can this be done without
informative anchor text? (An example of uninformative anchor text
is here.)
- Investigate the use of dependence among index terms (e.g.
co-occurrence) in the literature and by your own
experiments. Latent Semantic Indexing is one example of a
technique that uses
co-occurrence.
- Propose and implement a visualization of the relationship between
some collection of objects (text documents, images, Web pages, etc.)
- Investigate searches for handheld display. What special
things are done now by companies providing service? How do
search engines perform? Are special ranking algorithms needed
that do really well at
getting the top few ( 5? 7?)? Are there things
that can be done? Propose one and test.
- Investigate probabilistic models for information retrieval.
For example, compare the performace of a probabilistic model to the
vector model.
- Do a literature search and analysis of the state of the art of
image retrieval by image properties, not
text labels. You should include an analysis of such retrieval
systems available on the Web. Any
other non-text media can be substituted for images. We will
briefly discuss such not-text retrieval in class; your research
must be substantially more thorough.
- Several search engines
keep
personal user histories (versus aggregate histories) to improve search
quality and ad placement. Design and execute an experiment to try
to determine if search engine quality is improved by using personal
information.
- Investigate the use of clustering in some application. For
example, can snippets be clustered in a way that is helpful for search
results?
- Several search engines currently use clustering in their
presentation of search results. Find out what you can about the
clustering and visualization techniques used and assess their
effectiveness from a user
perspective. Consider alternative techniques.
- Experiment with techniques for detecting duplicate
documents.
- Investigate personalized or topic-directed crawling techniques
and their effectiveness.
- Do
an in-depth investigation of cluster machine architectures for
indexing and query-processing on large collections. Find
and compare state-of-the-art alternatives. Some simulation may be
a part of this project. Recent publications should be the primary
source of information on the state of the art.
- Build an application that uses a customized information retrieval
system.
Resources:
Please see this Resources for COS 435 Projects Web
Page for a list of available data sets and software. If you
need something and can't find it, ask for help!
last revised Tue Mar
29 15:17:12 EDT 2011
Copyright
2008-2011 Andrea S. LaPaugh