|
Computer Science 302
Introduction to Artificial Intelligence
Project 2
|
Fall 2001 Due Jan 15 |
Solve SAT-style analogy problems.
Your program should take a file of analogy problems and return the
list with answers filled in.
Here are the results, now that the project
is over.
You will run your program and send us the results.
Rules
- You must let me know if you make use of previously published
problems, so I can make sure my test set is disjoint from your
training data.
- Your program may use the web, wordnet, or any other online
resources.
- Your program should run with no human intervention
(interacting with people by email, hand labeling of question
types, etc.)
- In addition to the program, each group needs to submit a
report describing the approach and comparing it with the
three standard options (hand-tuned, labeled corpus, unlabeled
corpus). That way you need to do some thinking even if you come
up with a trick that happens to work well. In addition, if you
come up with something interesting, but it doesn't work out, we
can still appreciate your cleverness.
- Groups of up to 4 people. Here are the group assignments.
- You are free to come up with your own approach.
- We will test you on a set of analogies from a real exam
(and others).
- Winners will be announced; it's a competition.
Input File Format
Input files will consist of a sequence of analogy problems in blocks
of nine lines. Abstractly, the format looks like:
description (free form)
question
choice a
choice b
choice c
choice d
choice e
correct answer letter
(blank line)
For example, consider the analogy below, which I typed in from
"One-on-one with the SATs" (Problem 1, easy section):
overcoat : warmth ::
(a) glove : hand
(b) jewelry : wealth
(c) slicker : moisture
(d) disguise : identification
(e) helmet : protection
The answer is "e", and the input format representation for this would
be
ML: "One-on-one with the SATs" (1) EASY
overcoat warmth
glove hand
jewelry wealth
slicker moisture
disguise identification
helmet protection
e
Examples are separated by a blank line. Words in the question and
choices are tab separated.
Input Files
The following files are available.
- 13 Trial Analogies: As sent by email.
- 233 Training Analogies: Encrypted.
Decrypt using "crypt password < train.crypt > train" (there is
a copy on bolle.cs.princeton.edu at /usr/bin/crypt). To get the
password, send me
five made up problems and any others you have found. It's kind of a
"co-op".
- Test Analogies: Now offline.
Output Format
In the test analogy file, answers will appear as the letter "x". Your
program needs to read the test analogies and replace the letter "x"
with your program's answer choice. The resulting file should be
mailed to me for
grading.
Synonyms
We had two checkpoints in which the groups solved a set of synonym pairs, deciding which pairs were
synonyms and which weren't. In the set, the first 100 pairs are
clue-answer pairs that have appeared in crossword puzzles. They are
all synonyms (though some are iffy). The next 50 word pairs were made
by taking a random single-word crossword clue and pairing it with
another random single-word crossword clue. These are all
non-synonyms.
The first 150 should be somewhat easy to distinguish. We also
prepared a list of 50 distractor pairs by taking one-word crossword
clues and looking for related words on the LSA website. They are the final 50
non-synonym pairs in the list.
Results.
WordNet On CS machines.
Thanks and Links
Thanks to Peter
Turney for pointers to example analogies, John Zedlewski for
suggesting the group project format and the common-type link below, Mary Dunlop who found
the GRE site.