COS 435 Spr 15: PS 2

COS 435, Spring 2014 - Problem Set 2

Due at 5:00 PM, Wednesday, February 25, 2015

(note special time)

Collaboration and Reference Policy

You may discuss the general methods of solving the problems with other students in the class. However, each student must work out the details and write up his or her own solution to each problem independently.

Some problems have been used in previous offerings of COS 435. You are NOT allowed to use any solutions posted for previous offerings of COS 435 or any solutions produced by anyone else for the assigned problems. You may use other reference materials; you must give citations to all reference materials that you use.

Lateness Policy

A late penalty will be applied, unless there are extraordinary circumstances and/or prior arrangements:

Penalized 10% of the earned score if submitted by 111:59 PM Wednesday (2/25/15).

Penalized 25% of the earned score if submitted by 4:30pm Friday (2/27/15).

Penalized 50% if submitted later than 4:30pm Friday (2/27/15).

Problem

This problem is our class experiment with evaluating search engines. We will compare Google, to DuckDuckGo. While DuckDuckGo is not widely used, it makes some interesting claims. Let's see how it stands up to Google. This is only meant to be an exercise, so I do not expect we can do a thorough enough job to call the study valid. But it will have the components of a full evaluation and hopefully we will get something interesting.

Part A: Choose an information need. The information need should require gathering information about a subject from several Web sites with good information. An example of an activity that would provide an appropriate information need is doing a report for a course. You should choose an information need that that you think is neither too easy nor too difficult for a search engine. For example, one expects looking for information on meningitis to yield essentially 100% relevant pages - too easy; conversely, looking for information on the history of the LaPaugh family in Europe might (at best) yield one relevant result in 20 - too hard.

Write a description of your information need that can be used to judge whether any given Web search result is relevant or not. Use the style of the TREC topic specifications, using title, description, and narrative sections. (See the example of a TREC topic specifications in the class presentation on the evaluation of retrieval systems.) You will be distinguishing between "highly relevant" and "simply relevant", so you may wish to distinguish these in your narrative section, but it is fine to leave the distinction between "highly relevant" and "simply relevant" as a quality judgment. In either case, you should be demanding in your criteria for "highly relevant". Once you have your information need described, write one query that you will use on both search engines to capture the information need. The query should have the following properties:

The query should be constructed to avoid anticipated ambiguity. Use several search terms if necessary.

Do not use advanced search operations for either search engine. Exception: you may use quotes around phrases, e.g. "computer architecture".

Make sure you use the same query for both search engines.

Before proceeding to Part B, submit your description of information need and your query to Professor LaPaugh by email for approval. This is primarily to make sure no two people have the same information need or query.

Part B:
Run your query on each of Google and DuckDuckGo.   Run the queries while remaining as anonymous as possible to the search engines: without search engine toolbars active, with the "Suggested Sites" feature of Internet Explorer off, and logged off all search engine and social network accounts. Consider only the regular search results, not sponsored links. Ignore “image results”, “video results”, “news results” and any other special results - note that these may cause the Google first results page to have less than 10 regular results.   If you are having trouble with several results in languages other than English, you can go to the advanced search and choose English only, but then do this for both of the search engines. (In my trials, I did not get foreign-language results with a regular search, so this may not be an issue.)   You may turn on "safe-search", but be sure it is on for both search engines. To access these settings, click "settings" at the bottom right of the Google home page and the menu button (3 horizontal bars) at the top right of the DuckDuckGo home page. Note that safe search settings are on by default for DuckDuckGo and off by default for Google.
Record the first 30 results returned.

Pooling: To get a pool for hand assessment, take the first 20 results from each search engine. Remove duplicates, and visit each result to decide relevance. Score each result as "highly relevant" , "simply relevant" or irrelevant according to your description of Part A.    Record the size of the pool (number of unique results produced by the combined results 1 - 20 of each search engine). Also record the number of "highly relevant" and "simply relevant" results in the pool.

Scoring: After constructing the pool, go back and score each of the first 30 results returned by each search engine based on your scoring of the pool. If a result does not appear in the pool, it receives a score of irrelevant.   If a document appears twice under different URLs in the list for one search engine, count it only for its better ranking for that search engine and delete any additional appearances within the same list. In this case there will be less than 30 distinct results returned by the search engine. Do not go back to the search engine to get more results. Keep only what was returned in the first 30, with their original ranks. For each search engine, calculate the following measures. For all but discounted cumulative gain (measure 4), "simply relevant" and "highly relevant" should be lumped together as "relevant".

The reciprocal rank of the results.

Precision at rank 10.
The discounted cumulative gain at rank 10, DCG(10), using a gain of 0 for irrelevant documents, a gain of 1 for "simply relevant" documents and a gain of 5 for "highly relevant documents".

The average over the precisions at each point that a relevant document appears in the first 20 (yes 20, not 30). We the staff will compute the mean average precision for all the queries.
Rank at 100% recall: If all the relevant documents in the pool appear in the first 30 returned results, record the rank at which the last relevant result appears. This is the point of 100% recall of the relevant documents in the pool. If there is not 100% recall, record "N/A".
The recall at rank 30. If all the relevant documents in the pool appear in the first 30 returned results, this will be 1.

The first 4 measures are ways of capturing the quality of the first 20 results, which is about as far as most people look. The fifth and sixth measures gives credit to one search engine for finding relevant documents returned earlier by the other search engine.

For Part B, you must report:

the information need and query used
the pool size (after duplicate removal)
the number of relevant results in the pool
for each search engine, the relevance score of the result at each position, 1 through 30 - one per line
for each search engine, the scores described above.

What to hand in for Part B and how to submit:

Download the template ps2template.txt and fill it in with your findings of Part b. Use this sample filled-in template as a guide. In particular, note the use of "dup" for the relevance score at the position of a removed duplicate and the use of "N/A" for the rank at 100% recall when 100% recall is not achieved. Name your file ps2data.txt .

Submit your data file using the Computer Science Department DropBox submission system for COS435. If you have not used this facility before, consult these instructions. Note that you are automatically enrolled in CS DropBox using the registrar's COS435 enrollment list.

It is important that you strictly follow the template because we will use a script based on this template to calculate the average across the class for each of the pool size, number of relevant results in the pool, and 6 scores.

Part C:
What observations do you make about usability issues (user friendliness) of each search engine - separate from the quality of results you have been assessing in Part B?

What to hand in for Part C and how to submit: Record your observations in a text file named ps2observe.txt . Submit using CS DropBox as for Part B.

Enrichment: You may be interested in ComScore's January 2015 U.S. Search Engine Rankings. Take special note of the information on "Powered By" Reporting at the bottom of the page. Google and Bing power most of the search engines, including Bing contributing to DuckDuckgo results. An example of a different approach to search engine comparison is the equally to ours (more?) unscientific 2011 comparison of Google and Bing by Conrad Saam of Search Engine Land.