COS 302 Project 2

Computer Science 302
Introduction to Artificial Intelligence
Project 2

Fall 2001
Due Jan 15

Solve SAT-style analogy problems.

Your program should take a file of analogy problems and return the list with answers filled in.

Here are the results, now that the project is over.

You will run your program and send us the results.

Rules

You must let me know if you make use of previously published problems, so I can make sure my test set is disjoint from your training data.
Your program may use the web, wordnet, or any other online resources.
Your program should run with no human intervention (interacting with people by email, hand labeling of question types, etc.)
In addition to the program, each group needs to submit a report describing the approach and comparing it with the three standard options (hand-tuned, labeled corpus, unlabeled corpus). That way you need to do some thinking even if you come up with a trick that happens to work well. In addition, if you come up with something interesting, but it doesn't work out, we can still appreciate your cleverness.
Groups of up to 4 people. Here are the group assignments.
You are free to come up with your own approach.
We will test you on a set of analogies from a real exam (and others).
Winners will be announced; it's a competition.

Input File Format

Input files will consist of a sequence of analogy problems in blocks of nine lines. Abstractly, the format looks like:

 description (free form)
 question
 choice a
 choice b
 choice c
 choice d
 choice e
 correct answer letter
 (blank line)

For example, consider the analogy below, which I typed in from "One-on-one with the SATs" (Problem 1, easy section):

overcoat : warmth ::
 (a) glove : hand
 (b) jewelry : wealth
 (c) slicker : moisture
 (d) disguise : identification
 (e) helmet : protection

The answer is "e", and the input format representation for this would be

ML: "One-on-one with the SATs" (1) EASY
overcoat        warmth
glove   hand
jewelry wealth
slicker moisture
disguise        identification
helmet  protection
e

Examples are separated by a blank line. Words in the question and choices are tab separated.

Input Files

The following files are available.

13 Trial Analogies: As sent by email.
233 Training Analogies: Encrypted. Decrypt using "crypt password < train.crypt > train" (there is a copy on bolle.cs.princeton.edu at /usr/bin/crypt). To get the password, send me five made up problems and any others you have found. It's kind of a "co-op".
Test Analogies: Now offline.

Output Format

In the test analogy file, answers will appear as the letter "x". Your program needs to read the test analogies and replace the letter "x" with your program's answer choice. The resulting file should be mailed to me for grading.

Synonyms

We had two checkpoints in which the groups solved a set of synonym pairs, deciding which pairs were synonyms and which weren't. In the set, the first 100 pairs are clue-answer pairs that have appeared in crossword puzzles. They are all synonyms (though some are iffy). The next 50 word pairs were made by taking a random single-word crossword clue and pairing it with another random single-word crossword clue. These are all non-synonyms.

The first 150 should be somewhat easy to distinguish. We also prepared a list of 50 distractor pairs by taking one-word crossword clues and looking for related words on the LSA website. They are the final 50 non-synonym pairs in the list.

Results.
WordNet On CS machines.

Thanks and Links

Thanks to Peter Turney for pointers to example analogies, John Zedlewski for suggesting the group project format and the common-type link below, Mary Dunlop who found the GRE site.

Computer Science 302 Introduction to Artificial Intelligence Project 2

Fall 2001Due Jan 15