Princeton University
Computer Science Dept.

Computer Science 435
Information Retrieval, Discovery, and Delivery

Andrea LaPaugh

Spring 2011


Directory
General Information | Schedule and Assignments |  Project Page | Announcements

Information about the Course Project


Each student or pair of students will do a final project of his/her or their choosing related to the material of the course.

Project requirements:

Proposal due 5:00pm Monday, Feb. 28, 2011:

Email a paragraph describing your proposed project to Prof. LaPaugh.  Include as much detail as possible.  This will be the starting point of a discussion with Professor LaPaugh to make sure the project is of the appropriate scope for a class project.


UPDATED  Progress report
week of April 4, 2011 between April 4 and April 15 :

Meet with Professor LaPaugh to discuss your progress on your project.    Expect to spend about 15 minutes discussing your work to date.   You will not give a formal presentation, but you should prepare slides that summarize any algorithms, system architecture, or experiments you are developing for the project.   Email these to Professor LaPaugh ahead of your meeting time. 

After spring break, you will be able to sign up for your appointment using OIT's office hours scheduling system WASS.   Wait until the availability of appointment blocks is announced (watch the course Announcements page).  To use WASS, log in and click the "Make an Appointment" menu button.  Search for the  calendar under name "LaPaugh" or NetId "aslp" entitled COS 435 calendar.  Once the calendar is found, click "Make Appointment".    If you have conflicts with all available times, email Professor LaPaugh.   Caution: do not use the calendar entitled Advising calendar for Andrea LaPaugh.

Project Report due 5:00 pm Dean's Date, Tuesday May 10, 2011: 

You are required to submit a final report that describes your project. This must include the statement of the topic and the goals of the project, your methodology and the results. If it is an experimental project, you need to describe what was implemented, the major implementation decisions,  how you designed the experiments, and the experimental results. If you developed a system or tool, you may not have experiments per se, but you must describe how you are evaluating the project and the outcome.  You should also relate your work to other work on the problem.  Your code should be in an appendix or posted on a Web page with the URL provided (Web posting is preferred).   If your project is a theoretical study, you need to describe the problem, review what was known about the problem before your analysis, and give the details and the results of your theoretical analysis. If your project is a literature-based project, you need to describe the major issues under study, summarize the major techniques and the theoretical and/or experimental results presented in the literature and critically analyze the results.  For any type of project, be sure to include a bibliography of all the sources you used.

Projects will be graded on thoroughness and depth of thought. Difficulty will be taken into consideration. Keep in mind that evaluation is an important part of any project. Be clear on the goals of your project and how you demonstrate or measure success.


Project Demonstration

After submitting your final report, meet briefly with Professor LaPaugh to discuss the results of your project .  If you have implemented something that lends itself to live demonstration, this is the time to show it.  All meeting must occur  before 5pm Mon. May 16, 2011.


List of suggested projects:

These topics are fairly broad and need further refinement based on students' particular interests. Students are encouraged to suggest other project topics based on their own interests.  Check back for updates and additions.

  1. PageRank and/or HITS can be applied to any directed graph.  Explore the use of one or both of them in another application domain.  This is intended to be an experimental project, but the literature for the application should be explored as well.
  2. Investigate the use of link analysis to determine the subject of non-text pages.  For example, if a Web page contains only an image, not only the anchor text of links pointing to the page but the subject matter of pages pointing to the page may allow one to decide the general subject of the image.  Can this be done without informative anchor text?  (An example of uninformative anchor text is here.)
  3. Investigate the use of dependence among index terms (e.g. co-occurrence) in the literature and by your own experiments.  Latent Semantic Indexing is one example of a technique that uses co-occurrence.
  4. Propose and implement a visualization of the relationship between some collection of objects (text documents, images, Web pages, etc.)
  5. Investigate searches for handheld display.  What special things are done now by companies  providing service?  How do search engines perform?  Are special ranking algorithms needed that do really well at getting the top few ( 5? 7?)? Are there things that can be done?  Propose one and test.
  6. Investigate probabilistic models for information retrieval.  For example, compare the performace of a probabilistic model to the vector model.
  7. Do a literature search and analysis of the state of the art of image retrieval by image properties, not text labels.  You should include an analysis of such retrieval systems available on the Web.  Any other non-text media can be substituted for images.   We will briefly discuss such not-text retrieval in class;  your research must be substantially more thorough.  
  8. Several search engines keep personal user histories (versus aggregate histories) to improve search quality and ad placement.  Design and execute an experiment to try to determine if search engine quality is improved by using personal information.
  9. Investigate the use of clustering in some application.  For example, can snippets be clustered in a way that is helpful for search results? 
  10. Several search engines currently use clustering in their presentation of search results.   Find out what you can about the clustering and visualization techniques used and assess their effectiveness from a user perspective.  Consider alternative techniques.
  11. Experiment with techniques for detecting duplicate documents. 
  12. Investigate personalized or topic-directed crawling techniques and their effectiveness.
  13. Do an in-depth investigation of cluster machine architectures for indexing and query-processing on large collections.   Find and compare state-of-the-art alternatives.  Some simulation may be a part of this project.  Recent publications should be the primary source of information on the state of the art.
  14. Build an application that uses a customized information retrieval system.


Resources:

Please see this Resources for COS 435 Projects Web Page for a list of available data sets and software.  If you need something and can't find it, ask for help!



last revised Tue Mar 29 15:17:12 EDT 2011
Copyright  2008-2011 Andrea S. LaPaugh