You may discuss the
general
methods of solving the problems with other students in the class.
However, each
student must work out the details and write up his or her own solution
to each
problem independently.
Some problems
have
been used in
previous offerings of COS 435. You are NOT allowed to use any solutions
posted
for previous offerings of COS 435 or any solutions produced by anyone
else for
the assigned problems. You may use
other reference materials; you must give citations to all reference
materials
that you use.
This problem is our class
experiment with evaluating search engines. We will compare Google, to Microsoft's Bing. (You
may be
interested in comScore's
January
2011
U.S. Search Engine Rankings. Take special note of the
information on "Powered By" Reporting at the bottom of the page. ) This
is only
meant to be an exercise, so I do not expect we can do a thorough enough
job to
call the study valid. But it will have the components of a full
evaluation and
hopefully we will get something interesting. You may be
interested in the equally (more?) unscientific comparison
of
Google and Bing by Conrad Saam of Search Engine Land.
Part A: Choose an information need. The
information need should require gathering information about a subject
from
several Web sites with good information. An example of an
activity that
would provide an appropriate information need is doing a report for a
course. You should choose an information need that that you
think
is neither too easy nor too difficult for a search engine.
For example,
one expects looking for information on the H1N1 flu to yield
essentially
100%
relevant pages - too easy; conversely, looking for information on
the
history of the LaPaugh family in Europe might (at best) yield
one
relevant result in 20 - too hard.
Write a description of your
information need
that can be
used to
judge whether any given Web search result is relevant or not.
Use the style of the TREC topic specifications, using title, description, and
narrative
sections. (See the
examples of TREC topic specifications in slides 28 and 29
of the
class presentation on the evaluation of retrieval systems.) You
will be distinguishing between "highly relevant" and "simply
relevant", so you may wish to distinguish these in your narrative
section, but it is fine to leave the distinction between "highly
relevant" and "simply relevant" as a quality judgment. In
either case, you should be demanding in your criteria for "highly
relevant". Once
you have
your information need described, write one query that
you will
use on both search engines to capture the information need.
The
query should have the following properties:
Before proceeding to
Part B,
submit your description of information need and your query to Professor
LaPaugh
by email for approval. This is primarily to make sure
no two
people have the same information need or query.
Part B:
Run your query on each of Google
and Bing. Run the queries
while remaining as anonymous as possible to the search engines: without
Bing or Google toolbars active, with the "Suggested Sites" feature of
Internet Explorer off, and logged off your Google account. Consider
only the regular
search results, not sponsored links. Ignore “image results”,
“video
results”, “news results” and any other special
results - these are not
counted as part of the first 10 results on the first results
page and may cause the first results page to have less than 10 regular
results. If
you are having trouble with several results in languages other than
English,
you can go to the advanced search and choose English only, but then do
this for
both of the search engines. (In my trials, I did not get
foreign-language
results with a regular search, so this may not be an
issue.) Record
the first 30 results returned.
Pooling: To get a pool for hand assessment, take the
first 20
results from each search engine. Remove duplicates, and visit
each result
to decide relevance. Score each result as "highly relevant" ,
"simply relevant" or irrelevant
according to your description of Part A.
Record the size of the pool (number of
unique results produced by the combined results 1 - 20 of each search
engine). Also record the number of "highly relevant" and "simply
relevant" results in the pool.
Scoring: After constructing the pool, go back and score
each of
the first 30 results returned by each search engine based on your
scoring of
the pool. If a result does not appear in the pool, it
receives a
score of irrelevant. If a document appears twice under
different
URLs in the list for one search engine, count it only for its better
ranking
for that search engine and delete any additional appearances within the
same
list. In this case there will be less than 30 distinct results
returned by
the search engine. Do not go back to the search engine to get
more results. Keep only what was returned in the first 30, with
their
original
ranks. For each search engine, calculate the following
measures. For
all
but discounted cumulative gain (measure 4), "simply relevant"
and "highly relevant" should be lumped together as "relevant".
The
first 4 measures are ways of capturing the
quality of
the first 20 results, which is about as far as most people look.
The fourth measure gives credit to one
search
engine for finding relevant documents returned earlier by the other
search engine.
What to hand in for Part B: Email to Professor
LaPaugh and Siyu Yang:
These
results will be averaged across the class, so please report each number
on a
separate line, clearly labeled as to what it is.
Part C:
What observations do you make about usability issues (user
friendliness)
of each search engine - separate from the quality of results you have
been
assessing in Part B? You may email your observations with Part
B, but write them after, and clearly separated from, the Part B
results.