COS 402: Artificial Intelligence

Homework #3

Satisfaction

Fall 2007

Due: Wednesday, October 24


Special late policy:  For this homework only, there will be a special late policy to help reduce the pressure of midterms followed by break.  During midterms week, one late "day" will be equivalent to two consecutive calendar days, meaning in particular that the Thursday and Friday of midterms week will together count as a single late "day".  All other rules of the standard late policy will remain in effect (including weekends counting as a single late "day", and the limit of not turning in the assignment more than five "days" late).  To be clear, here is a table summarizing how many "days" late your assignment will be counted if turned in on the following dates:

Calendar date Number of late days charged
Wednesday, October 24 0
Friday, October 26 1
Sunday, October 28 2
Monday, October 29 3
Tuesday, October 30 4
Wednesday, October 31 5
Thursday, November 1 not accepted on or after this date

For this homework only, if you are out of the Princeton area over break, you may mail or email the written parts of the assignment to Joe Calandrino (keeping in mind that the written exercises should be prepared separately from the written part of the programming assignment so that they can be graded separately).  If mailed, your homework is considered submitted on the post mark date, and should be sent to this address:  Joe Calandrino, Princeton University, Department of Computer Science, 35 Olden Street, Princeton, NJ  08540.  (It would be wise to send him email at the same time you mail your assignment so that he can look out for it; also, save a photocopy of your work.)


Part I:  Written Exercises

Submit hard copies of these on or before the due date as described on the assignments page.  (Numbers in brackets indicate approximate point values.)

1.  [15]  Exercise 7.4a,b,c in R&N.  Be sure to prove these assertions; don't just wave your hands.

2.  [15]  (Adapted from Exercise 7.9 in R&N.)  Consider the following facts:

"If the unicorn is mythical, then it is immortal, but if it is not mythical, then it is a mortal mammal.  If the unicorn is either immortal or a mammal, then it is horned.  The unicorn is magical if it is horned."

  1. Translate the facts above directly into sentences of propositional logic using a suitable (and clearly defined) set of proposition symbols.
  2. Determine whether or not each of the following are entailed by the facts above, and justify your answers:
    1. "The unicorn is mythical."
    2. "The unicorn is not mythical."
    3. "The unicorn is magical."
  3. Simulate the resolution algorithm to show how it would determine whether or not the following statement is entailed by the facts above:  "The unicorn is both magical and horned."  You can resolve clause-pairs in whatever order you wish.  Be sure to state the final outcome of running the algorithm.

3.  [10]  Exercise 7.17 in R&N.

4.  [15]  Exercise 13.11 in R&N.  Be sure to justify your answers.


Part II:  Programming

In this assignment, you will have a chance to develop and experiment with your own ideas for solving CNF satisfiability problems.  The assignment has three parts, one of which is optional and for extra credit:

  1. Write the fastest program you can for finding satisfying assignments (models) for a given CNF sentence.
  2. Write a program for generating many "interesting" CNF sentences.
  3. (Optional) Run a systematic experiment using your solver and generator programs.

Each of these are described in detail below.  After the submission deadline, we will also conduct our own "implementation challenge" to see whose program is fastest and most effective.

A satisfiability solver

The first part of the assignment is to implement a satisfiability solver, or "sat-solver" for short.  Given a CNF sentence, your sat-solver should attempt to find a satisfying assignment as quickly as possible.  Your sat-solver does not need to be complete in the sense of returning "failure" when the CNF is unsatisfiable.  Rather, if no satisfying assignment can be found, it should return the "best" assignment it can find, i.e., one with as many clauses satisfied as possible.  Most likely, you will want to use a variant of WalkSat, but this is certainly not required, and you are welcome to try out other methods that we discussed in class, or your own ideas.  Be creative.  There are many ways in which even WalkSat can be tinkered with to improve performance.

Your sat-solver will be given a CNF in a representation described below.  It also will be given a time limit in the form of a "timer" object.  Your sat-solver must return with some answer (i.e., some assignment) by the time the timer reaches zero, or as soon thereafter as possible.

A generator

The second part of the assignment is to devise and implement a method of generating many CNF sentences corresponding to a natural class of problems.  For instance, in class, we discussed how graph-coloring problems can be converted into CNF satisfiability problems.  Thus, one way of generating interesting CNF sentences is to first generate random graph-coloring problems, and then convert them into CNF sentences whose satisfiability must be solved to find a coloring of the randomly generated graph.  Similarly, we saw in class how a blocks-world planning problem can be converted into a satisfiability problem.  A generator in this case could be designed by first generating random blocks-world problems (i.e., start and goal configurations) and then converting these into satisfiability problems.

For this part of the assignment, you should (1) choose a natural problem, (2) figure out how to reduce this problem to a satisfiability problem, (3) implement a generator for natural instances of the original problem, and (4) implement your reduction.  The end result will be a program that generates CNF sentences, but of a very particular form.  Since we discussed graph coloring and blocks-world planning problems in class, these would not be appropriate choices for this assignment.  However, here are some other examples:

Some of these are deliberately vague.  You can fill in the details as you wish.  You should not feel constrained to choose any of these.  On the contrary, you are strongly encouraged to be creative and come up with your own problem.

You need to pick a problem for which it is easy and natural to generate many CNF sentences.  For instance, it would not be appropriate to choose a problem such as n-queens since, for each n, there is only a single problem instance.

Once you have chosen a problem, you need to figure out how to generate random instances, and how to reduce these instances to satisfiability problems.  Be very careful to check that an assignment that satisfies the CNF also is a solution to the original problem instance, and vice versa.

For some problems, it is possible to design your generator to produce only satisfiable CNF sentences.  For instance, for the graph-coloring problem, you can start by 3-coloring a set of vertices and then add random edges, never adding an edge that would connect two like-colored vertices.  The resulting graphs (without the colors, of course) will clearly be 3-colorable by construction, so the generated CNF's will also be satisfiable.  If possible, you are encouraged (but not required) to produce a generator of this form since it should make the implementation challenge more interesting (obviously, no one will be able to solve an unsatisfiable CNF).

The final step is to implement all of the above.  The end product will be a method for generating many CNF sentences.  Your generator (and those of your classmates) will be used to generate the CNF sentences used for the class implementation challenge.

In case the process of creating a generator is still unclear, here is a clarifying note written by Matt Hibbs when he was a TA for this course in 2003.

A systematic experiment (optional)

The final part of this assignment, which is optional and for extra credit, is to conduct a systematic experiment of some sort exploring some aspect of your sat-solver.  An example of such an experiment is the one shown in Figure 7.18b of R&N.  In this experiment, the performance of WalkSat and DPLL were tested on randomly generated 3-CNF sentences with the number of symbols fixed, but with varying clause-to-symbol ratios.  Most likely, your generator will take some parameters, such as the number of edges and number of vertices in the graph that is to be 3-colored in the coloring problem.  You might try measuring run time as a function of one of the parameters (with the others held fixed).  For instance, in this example, you could measure run time as a function of the number of edges with the number of vertices held fixed.  Here is how you might conduct such an experiment:

  1. Fix the number of vertices to be, say, 50.  Let the number of edges vary taking values, say, of 20, 40, 60, ..., 200.
  2. For each of these settings, generate, say, 50 random graphs and convert them to CNF sentences.  If possible, generate only CNF sentences that are known to be satisfiable (see note above).
  3. Run your sat-solver on all of the generated CNF sentences, and record all the results, i.e., running time and the number of unsatisfied clauses.  You may need to limit the running time of each run to keep the experiment from running for too long.
  4. Make a plot with the number of edges on the x-axis, and the average (or median) running time on the y-axis.  (There are many programs available for making plots including excel, gnuplot and matlab.)

The numbers chosen here for number of vertices and number of edges were arbitrary.  In your own experiment, if possible, you should try to identify an "interesting" region for the parameter settings, as in R&N Figure 7.18b where the running time increases but then decreases as the clause-to-symbol ratio is increased.

This is just one possible experiment that could be conducted.  Myriad possibilities exist.  For instance, you might want to compare performance of two substantially different algorithms, or variants of the same algorithm, over a range of problems.  Or, if your algorithm has a parameter, you can see how the algorithm's performance varies as a function of this parameter.  Most likely, if you play around with your algorithm and your generator, you will encounter some aspect of one or the other that seems worth exploring systematically.  Moreover, such an experiment may help you to design a more effective sat-solver.  As with the other parts of this assignment, you are encouraged to think creatively.  It is sufficient to conduct a single experiment for this part of the assignment, although you are welcome to do more.

Be careful to plan your experiment carefully, possibly after running some preliminary test.  Because we are dealing with randomized algorithms and random instances, you will almost certainly need to run your algorithm many times, taking the average (or median) of the results.  This also means that the experiments can take quite a while to run.  This is why it is so important that you start early if you are planning to do this part of the assignment.  Ideally, you would run your algorithm hundreds of times to get reliable results.  Unfortunately, you probably will not have time to do this, and may be forced to run only ten or twenty times.  Plan carefully, and be sure your experiments will terminate in a reasonable period of time before beginning them.  For instance, the experiment outlined above will require generating 50 * 10 = 500 CNF sentences.  If we allow our sat-solver 30 seconds for each one, this gives a total (worst-case) running time of 500 * 30 = 15000 seconds, or about 4 hours, 10 minutes.

Obviously, running times may vary depending on who else is using the computer you are on.  If you are using a shared computer, this might lead to unreliable results.  In this case, you may instead wish to measure some quantity other than running time, such as the number of variable flips made by the algorithm.

The implementation challenge

Once you turn in your sat-solver and generator, we will run everyone's sat-solver on everyone's generator to determine whose sat-solver runs the fastest.  In particular, each submitted generator will be used to generate two CNF sentences.  We will then run every submitted sat-solver on the resulting collection of CNF sentences.  Each sat-solver will be given 15 seconds on a 2+ GHz PC to solve each CNF sentence.  Here is how the scoring will work:

So, in short, if your sat-solver was among the best on a particular CNF, then you will be charged the actual running time, but if yours was not the best, then you will be charged a minimum of 15 seconds.  We will then take the average (over all CNF's) of these running times to determine the fastest and most effective sat-solvers.  Thus, the idea is to return an assignment satisfying as many clauses as possible, and to do so as quickly as possible.

We also will run other statistics, such as the average number of unsatisfied clauses achieved by each sat-solver.

To make the implementation challenge interesting, please try to set up your generator to produce CNF's that are challenging but that at least your own sat-solver would have a reasonable chance of solving in 15 seconds (if run on a 2+ GHz machine).  Obviously, it won't be much fun if no one's sat-solver can solve anyone's generated CNF's, nor if all of the CNF's are trivially solvable almost instantly.

Your grade will not depend on how well you do in this implementation challenge relative to the rest of the class.  Although you are encouraged to do the best you can, this should not be regarded as anything more than a fun (I hope) communal experiment exploring methods for solving satisfiability problems.

Outside sources

It is okay to use outside sources (such as articles or websites) to come up with ideas for sat-solvers or generators.  However, this is certainly not required, nor even encouraged (it is even slightly discouraged since the emphasis of this assignment is in coming up with your own ideas).  To the extent that it is done in a manner consistent with the course collaboration policy, it is also okay to exchange ideas with your classmates.  In all cases, all outside sources must be acknowledged in the report that you turn in.

The code we are providing

We are providing the following classes and interfaces:

In addition to what is described above, each Generator and SatSolver must provide methods called author() and description() returning as Strings the author of the program, and a brief description of the generator or sat-solver, and how it works.  We will use these methods when publicly reporting the results of the implementation challenge.  The "author" that your program reports can be your real name or, if you prefer to remain anonymous, you can choose a pseudonym (and you do not need to use the same name for your generator and sat-solver).

As explained below, we are using two different representations of CNF sentences: one that you should find convenient when generating CNF's, and the other that provides greater efficiency and ease of use when writing your sat-solver.  We also are providing methods for converting between the representations.

When generating CNF sentences, you should use the Cnf class which has been designed for this purpose.  This class maintains a representation of the CNF in a form in which symbols are represented or named by Strings.  These strings are entirely arbitrary, and two strings are considered to represent the same symbol if and only if they are equal.  For instance, the strings "x", "N(43,12)", "P:18:4" and "qs73" would all be acceptable names for symbols.  A Cnf object will typically be built clause by clause and literal by literal using the methods addClause(), which adds a new (and empty) clause to the CNF, and addLiteral(), which adds a new literal (with the specified symbol and sign) to the specified clause (or to the last clause if no clause is provided).

For instance, here is code for generating a small CNF:

Cnf cnf = new Cnf();                // create new (empty) CNF
cnf.addClause();                    // add new (empty) clause
cnf.addLiteral("w5m9", true);       // add literals to clause
cnf.addLiteral("xy", false);
cnf.addLiteral("N(4,2)", true);
cnf.addClause();                    // add second clause
cnf.addLiteral("a9:51", false);     // add literals to new clause
cnf.addLiteral("N(4,2)", false);

The object cnf now represents the CNF:

(w5m9  v  ~xy  v  N(4,2))  ^  (~a9:51  v  ~N(4,2))

where ~ denotes negation, and the CNF is defined over symbols with names "w5m9", "xy", "N(4,2)" and "a9:51".  The CNF can be printed using the print() method producing this output:

Clause 0:
+ w5m9
- xy
+ N(4,2)

Clause 1:
- a9:51
- N(4,2)

where the + and - signs indicate whether or not each symbol has been negated as it appears in the clause.

Your sat-solver code should use a different representation that will be more efficient and more convenient for the purpose of searching for a satisfying assignment.  In particular, the solve() method of your SatSolver should accept a CNF represented as an array of arrays of Literals.  Here, each component array corresponds to a clause (disjunction of literals) in the natural way, and each clause in turn is represented as an array of  Literals with the natural interpretation.  Thus, such a CNF representation can be viewed equivalently as an array of clauses.  In this representation, unlike for Cnf objects, each Literal object represents symbols by an int rather than a String; this integer is an index (computed automatically by Cnf) that can be converted back to the corresponding String name of the symbol using the method Cnf.getSymbolName().

A Cnf object can be converted into such an array of arrays of Literals using the Cnf.toLiteralArray() method.  Thus, we can convert the CNF above to an array of arrays of Literals using the command:

Literal[][] laa = cnf.toLiteralArray();

For instance, laa[0][1] refers to literal#1 of clause#0 which, in the example above, would be a Literal whose sign field is equal to false, and whose symbol field is equal to 1 (since 1 is the index of cnf.getSymbolName(1) = "xy").

A model or assignment returned by SatSolver.solve() is represented as a boolean array in the natural way.  Thus, model[3] is the true/false assignment to symbol #3 (which, in the example above, is the index of  cnf.getSymbolName(3) = "a9:51").

We also are providing Generators called RandomCnfGenerator and RandomSatisfiedCnfGenerator which generate random CNF formulas of a fixed size with a fixed number of symbols and clauses.  RandomCnfGenerator generates CNF sentences that may or may not be satisfiable, while RandomSatisfiedCnfGenerator is guaranteed to only generate CNF's that are satisfiable.  You can use these in your experiments, or as examples for writing your own generator.

A large number of CNF sentences are available from the website of the DIMACS implementation challenge that was conducted several years ago.  You may download and try out your program on these, or use these in your experiments.  The contents of the files on this website are (very tersely) described here.  Further description, and even more benchmark problems can be found on the satlib site. We have provided a Generator called DimacsFileGenerator that reads in files of this sort (after they have been unpacked and uncompressed, if necessary).

Each of the classes RandomCnfGenerator, RandomSatisfiedCnfGenerator  and DimacsFileGenerator has its own main method which can be used for testing.  Moreover, you may wish to use these as examples of how to use SatSolvers and Generators.

We are also providing a sample SatSolver called RandomSearchSatSolver that tries out random models until it finds one satisfying the given CNF, or runs out of time.

All code can be downloaded from this directory, or all at once from this zip file.  Documentation is available here.

As a general reminder, you should not modify any of the code that we are providing, other than template files, or unless specifically allowed in the instructions. You also should not change the "signature" of any of the methods or constructors that you are required to write, i.e., the parameters of the method (or constructor), their types, and the type of the value returned by the method.

What to turn in

Using moodle, you should turn in the following:

The code that you turn in should not write anything to standard output (this will really mess up the tournament).

In addition to the program itself, you should write a brief description of:

Hard copy of this written work should be submitted in the manner explained on the assignments page.

What you will be graded on

You will be graded on completing all the different parts of this assignment.  You have a great deal of freedom in choosing how much work you want to put into this assignment, and your grade will in part reflect how great a challenge you decide to take on.  Creativity will certainly be an important component of your grade.

Although your standing in the implementation challenge will not affect your grade, your sat-solver should be at least as fast and effective as vanilla WalkSat.  However, even this requirement will be waived if you choose to implement some other method that is significantly different than WalkSat, provided that your implementation is correct, complete and efficient.

Needless to say, your code should not halt prematurely with an exception (unless given bad data).

As usual, you should follow good programming practices, including documenting your code.  Finally, the written part of this assignment should be clear and concise.

This programming assignment will be worth roughly 75 points, which will be divided between the write-up, the generator, the sat-solver, and overall creativity.  The extra credit "systematic experiment" portion will be worth 5-30 points (depending on what you do for this part).