COS 402: Artificial Intelligence

Programming Assignment P2

Satisfaction

Fall 2012

Due: Tuesday, October 23


Special late policy:  For this homework only, there will be a special late policy to help reduce the pressure of midterms followed by break.  During midterms week, and through the Monday of break, every two calendar days will only count as a single late "day".  All other rules of the standard late policy will remain in effect (including the limit of not turning in the assignment for credit more than five "days" late).  To be clear, here is a table summarizing how many "days" late your assignment will be counted if turned in on the following dates:

Calendar date Number of late days charged
Tuesday, October 23 0
Thursday, October 25 1
Saturday, October 27 2
Monday, October 29 3
Tuesday, October 30 4
Wednesday, October 31 5
Thursday, November 1 not accepted for credit on or after this date

 


A satisfiability solver
A generator
A systematic experiment
The implementation challenge
Outside sources
The code we are providing
What to turn in
What you will be graded on


In this assignment, you will have a chance to develop and experiment with your own ideas for solving CNF satisfiability problems.  The assignment has three parts, one of which is optional (for hops):

  1. Write the fastest program you can for finding satisfying assignments (models) for a given CNF sentence.
  2. Write a program for generating many "interesting" CNF sentences.
  3. (Optional) Run a systematic experiment using your solver and generator programs.

Each of these are described in detail below.  After the submission deadline, we will also conduct our own "implementation challenge" to see whose program is fastest and most effective.

A satisfiability solver

The first part of the assignment is to implement a satisfiability solver, or "sat-solver" for short.  Given a CNF sentence, your sat-solver should attempt to find a satisfying assignment as quickly as possible.  Your sat-solver does not need to be complete in the sense of returning "failure" when the CNF is unsatisfiable.  Rather, if no satisfying assignment can be found, it should return the "best" assignment it can find, i.e., one with as many clauses satisfied as possible.  Most likely, you will want to use a variant of WalkSat or DPLL, but this is certainly not required, and you are welcome to try out other methods that we discussed in class, or your own ideas.  Be creative.  There are many ways in which even WalkSat can be tinkered with to improve performance.

Your sat-solver will be given a CNF in a representation described below.  It also will be given a time limit in the form of a "timer" object.  Your sat-solver must return with some answer (i.e., some assignment) by the time the timer reaches zero, or as soon thereafter as possible.

Your code must be entirely sequential; in other words, you may not use multi-threaded code.

A generator

The second part of the assignment is to devise and implement a method of generating many CNF sentences corresponding to a "natural" class of problems.  For instance, in class, we discussed how graph-coloring problems can be converted into CNF satisfiability problems.  Thus, one way of generating interesting CNF sentences is to first generate random graph-coloring problems, and then convert them into CNF sentences whose satisfiability must be solved to find a coloring of the randomly generated graph.  Similarly, we saw in class how a blocks-world planning problem can be converted into a satisfiability problem.  A generator in this case could be designed by first generating random blocks-world problems (i.e., start and goal configurations) and then converting these into satisfiability problems.

For this part of the assignment, you should (1) choose a natural problem, (2) figure out how to reduce this problem to a satisfiability problem, (3) implement a generator for natural instances of the original problem, and (4) implement your reduction.  The end result will be a program that generates CNF sentences, but of a very particular form.  Since we discussed graph coloring and blocks-world planning problems in class, these would not be appropriate choices for this assignment.  However, here are some other examples:

Some of these descriptions are deliberately vague.  You can fill in the details as you wish, as long as you choose a formulation that is precise and sensible, and that you describe precisely in your report.  You should not feel constrained to choose any of these.  On the contrary, you are strongly encouraged to be creative and to come up with your own problem.

You need to pick a problem for which it is easy and natural to generate many CNF sentences.  For instance, it would not be appropriate to choose a problem such as n-queens since, for each n, there is only a single problem instance.

Once you have chosen a problem, you need to figure out how to generate random instances, and how to reduce these instances to satisfiability problems.  Be very careful to check that an assignment that satisfies the CNF also is a solution to the original problem instance, and vice versa.

For some problems, it is possible to design your generator to produce only satisfiable CNF sentences.  For instance, for the graph-coloring problem, you can start by 3-coloring a set of vertices and then add random edges, never adding an edge that would connect two like-colored vertices.  The resulting graphs (without the colors, of course) will clearly be 3-colorable by construction, so the generated CNF's will also be satisfiable.  If possible, you are encouraged (but not required) to produce a generator of this form since it should make the implementation challenge more interesting (obviously, no one will be able to solve an unsatisfiable CNF).

The final step is to implement all of the above.  The end product will be a method for generating many CNF sentences.  Your generator (and those of your classmates) will be used to generate the CNF sentences used for the class implementation challenge.

In case the process of creating a generator is still unclear, here is a clarifying note written by Matt Hibbs when he was a TA for this course in 2003.

A systematic experiment (optional)

The final part of this assignment, which is entirely optional and for honors optional points, is to conduct a systematic experiment of some sort exploring some aspect of your sat-solver.  An example of such an experiment is the one shown in Figure 7.19b of R&N.  In this experiment, the performance of WalkSat and DPLL were tested on randomly generated 3-CNF sentences with the number of symbols fixed, but with varying clause-to-symbol ratios.  Most likely, your generator will take some parameters, such as the number of edges and number of vertices in the graph that is to be 3-colored in the coloring problem.  You might try measuring run time as a function of one of the parameters (with the others held fixed).  For instance, in this example, you could measure run time as a function of the number of edges with the number of vertices held fixed.  Here is how you might conduct such an experiment:

  1. Fix the number of vertices to be, say, 50.  Let the number of edges vary taking values, say, of 20, 40, 60, ..., 200.
  2. For each of these settings, generate, say, 50 random graphs and convert them to CNF sentences.  If possible, generate only CNF sentences that are known to be satisfiable (see note above).
  3. Run your sat-solver on all of the generated CNF sentences, and record all the results, i.e., running time and the number of unsatisfied clauses.  You may need to limit the running time of each run to keep the experiment from running for too long.
  4. Make a plot with the number of edges on the x-axis, and the average (or median) running time on the y-axis.  (There are many programs available for making plots including excel, gnuplot and matlab.)

The numbers chosen here for number of vertices and number of edges were arbitrary.  In your own experiment, if possible, you should try to identify an "interesting" region for the parameter settings, as in the R&N figure where the running time increases but then decreases as the clause-to-symbol ratio is increased.

This is just one possible experiment that could be conducted.  Myriad possibilities exist.  For instance, you might want to compare performance of two substantially different algorithms, or variants of the same algorithm, over a range of problems.  Or, if your algorithm has a parameter, you can see how the algorithm's performance varies as a function of this parameter.  Most likely, if you play around with your algorithm and your generator, you will encounter some aspect of one or the other that seems worth exploring systematically.  Moreover, such an experiment may help you to design a more effective sat-solver.  As with the other parts of this assignment, you are encouraged to think creatively.  It is sufficient to conduct a single experiment for this part of the assignment, although you are welcome to do more.

Be careful to plan your experiment carefully, possibly after running some preliminary test.  Because we are dealing with randomized algorithms and random instances, you will almost certainly need to run your algorithm many times, taking the average (or median) of the results.  This also means that the experiments can take quite a while to run.  This is why it is so important that you start early if you are planning to do this part of the assignment.  Ideally, you would run your algorithm hundreds of times to get reliable results.  Unfortunately, you probably will not have time to do this, and may be forced to run only ten or twenty times.  Plan carefully, and be sure your experiments will terminate in a reasonable period of time before beginning them.  For instance, the experiment outlined above will require generating 50 * 10 = 500 CNF sentences.  If we allow our sat-solver 30 seconds for each one, this gives a total (worst-case) running time of 500 * 30 = 15000 seconds, or about 4 hours, 10 minutes.

If you are having trouble measuring running time (see discussion below of the provided timer object), you may instead wish to measure some quantity other than running time, such as the number of variable flips made by the algorithm.

Clearly, such an experiment involving hundreds or thousands of runs should not be conducted "by hand", i.e., by manually rerunning the program over and over again.  Not only would this be slow and painfully tedious, but it also would likely introduce numerous mistakes due to human error.  Rather, you should write a java program (or a script, such as in perl or python, if you know these languages) which will automatically run all of the experiments from start to finish.  Such a program or script also provides a scientific record of exactly which experiments were conducted with what settings.  You can (and should) also use such automatic methods to compile all of the results into a table or figure or other summarization.

The implementation challenge

Once you turn in your sat-solver and generator, we will run everyone's sat-solver on everyone's generator to determine whose sat-solver runs the fastest.  In particular, each submitted generator will be used to generate one or two CNF sentences.  We will then run every submitted sat-solver on the resulting collection of CNF sentences.  Each sat-solver will be given 15 seconds on a reasonably fast computer to solve each CNF sentence.  We will provide at least 2GB of memory for the java heap (but we do not guarantee more).

Here is how the scoring will work:

So, in short, if your sat-solver was among the best on a particular CNF, then you will be charged the actual running time, but if yours was not the best, then you will be charged a minimum of 15 seconds, up to a maximum charge of 150 seconds.  We will then take the average (over all CNF's) of these running times to determine the fastest and most effective sat-solvers.  Thus, the idea is to return an assignment satisfying as many clauses as possible, and to do so as quickly as possible.

We also will run other statistics, such as the average number of unsatisfied clauses achieved by each sat-solver.

To make the implementation challenge interesting, please try to set up your generator to produce CNF's that are challenging but that at least your own sat-solver would have a reasonable chance of solving in 15 seconds.  Obviously, it won't be much fun if no one's sat-solver can solve anyone's generated CNF's, nor if all of the CNF's are trivially solvable almost instantly.

Your grade will not depend directly on your ranking in this implementation challenge relative to the rest of the class (but see below for more details on grading criteria).  Our main purpose is to conduct what I hope will be a fun communal experiment exploring methods for solving satisfiability problems, and to this end, you are especially encouraged to try out original techniques.

The results of the implementation challenge will be posted publicly on a website, including the author of each generator and sat-solver, a short description of how each works, and details of how each performed.  Note that the name listed as "author" will be a name that you provide. So, if you wish to remain anonymous on the website, you can do so by using a made-up name of your choice, or even a random sequence of letters, but something that will allow you to identify your own entry. (However, please refrain from using a name that might be considered offensive or inappropriate, or which might contain special characters or commands that will cause problems for our html.)

Outside sources

It is okay to use outside sources (such as articles or websites) to come up with ideas for sat-solvers or generators.  However, this is certainly not required, nor even encouraged (it is even slightly discouraged since the emphasis of this assignment is on coming up with your own ideas).  To the extent that it is done in a manner consistent with the course collaboration policy, it is also okay to exchange ideas with your classmates.  In all cases, all outside sources must be acknowledged in the report that you turn in.

The code we are providing

We are providing the following classes and interfaces:

In addition to what is described above, each Generator and SatSolver must provide methods called author() and description() returning as Strings the author of the program, and a brief description of the generator or sat-solver, and how it works.  We will use these methods when publicly reporting the results of the implementation challenge.  The "author" that your program reports can be your real name or, if you prefer to remain anonymous, you can choose a pseudonym, but please do not use a name that is offensive, inappropriate, unprintable, etc.  (And you do not need to use the same name for your generator and sat-solver.)

As explained below, we are using two different representations of CNF sentences: one that you should find convenient when generating CNF's, and the other that provides greater efficiency and ease of use when writing your sat-solver.  We also are providing methods for converting between the representations.

When generating CNF sentences, you should use the Cnf class which has been designed for this purpose.  This class maintains a representation of the CNF in a form in which symbols are represented or named by Strings.  These strings are entirely arbitrary, and two strings are considered to represent the same symbol if and only if they are equal.  For instance, the strings "x", "N(43,12)", "P:18:4" and "qs73" would all be acceptable names for symbols.  A Cnf object will typically be built clause by clause and literal by literal using the methods addClause(), which adds a new (and empty) clause to the CNF, and addLiteral(), which adds a new literal (with the specified symbol and sign) to the specified clause (or to the last clause if no clause is provided).

For instance, here is code for generating a small CNF:

Cnf cnf = new Cnf();                // create new (empty) CNF
cnf.addClause();                    // add new (empty) clause
cnf.addLiteral("w5m9", true);       // add literals to clause
cnf.addLiteral("xy", false);
cnf.addLiteral("N(4,2)", true);
cnf.addClause();                    // add second clause
cnf.addLiteral("a9:51", false);     // add literals to new clause
cnf.addLiteral("N(4,2)", false);

The object cnf now represents the CNF:

(w5m9  v  ~xy  v  N(4,2))  ^  (~a9:51  v  ~N(4,2))

where ~ denotes negation, and the CNF is defined over symbols with names "w5m9", "xy", "N(4,2)" and "a9:51".  The CNF can be printed using the print() method producing this output:

Clause 0:
+ w5m9
- xy
+ N(4,2)

Clause 1:
- a9:51
- N(4,2)

where the + and - signs indicate whether or not each symbol has been negated as it appears in the clause.

Your sat-solver code should use a different representation that will be more efficient and more convenient for the purpose of searching for a satisfying assignment.  In particular, the solve() method of your SatSolver should accept a CNF represented as an array of arrays of Literals.  Here, each component array corresponds to a clause (disjunction of literals) in the natural way, and each clause in turn is represented as an array of  Literals with the natural interpretation.  Thus, such a CNF representation can be viewed equivalently as an array of clauses.  In this representation, unlike for Cnf objects, each Literal object represents symbols by an int rather than a String; this integer is an index (computed automatically by Cnf) that can be converted back to the corresponding String name of the symbol using the method Cnf.getSymbolName().

A Cnf object can be converted into such an array of arrays of Literals using the Cnf.toLiteralArray() method.  Thus, we can convert the CNF above to an array of arrays of Literals using the command:

Literal[][] laa = cnf.toLiteralArray();

For instance, laa[0][1] refers to literal#1 of clause#0 which, in the example above, would be a Literal whose sign field is equal to false, and whose symbol field is equal to 1 (since 1 is the index of cnf.getSymbolName(1) = "xy").

A model or assignment returned by SatSolver.solve() is represented as a boolean array in the natural way.  Thus, model[3] is the true/false assignment to symbol #3 (which, in the example above, is the index of  cnf.getSymbolName(3) = "a9:51").

We also are providing Generators called RandomCnfGenerator and RandomSatisfiedCnfGenerator which generate random CNF formulas of a fixed size with a fixed number of symbols and clauses.  RandomCnfGenerator generates CNF sentences that may or may not be satisfiable, while RandomSatisfiedCnfGenerator is guaranteed to only generate CNF's that are satisfiable.  You can use these in your experiments, or as examples for writing your own generator.

A large number of CNF sentences are available from the website of the DIMACS implementation challenge that was conducted several years ago.  You may download and try out your program on these, or use these in your experiments.  The contents of the files on this website are (very tersely) described here.  Further description, and even more benchmark problems can be found on the satlib site. We have provided a Generator called DimacsFileGenerator that reads in files of this sort (after they have been unpacked and uncompressed, if necessary).

Each of the classes RandomCnfGenerator, RandomSatisfiedCnfGenerator  and DimacsFileGenerator has its own main method which can be used for testing.  Moreover, you may wish to use these as examples of how to use SatSolvers and Generators.

We are also providing a sample SatSolver called RandomSearchSatSolver that tries out random models until it finds one satisfying the given CNF, or runs out of time.

The Timer class that we are providing will, on most machines, use a measure of elapsed CPU time, which should be fairly reliable, even on a shared machine.  However, because CPU time measurement is not supported on all machines, this object will on such machines instead measure the actual wall-clock time that has elapsed, which may be less reliable, particularly on a machine that is being shared with other users.

All code can be downloaded all at once from this zip file.  Documentation is available here.

As a general reminder, you should not modify any of the code that we are providing, other than template files, or unless specifically allowed in the instructions. You also should not change the "signature" of any of the methods or constructors that you are required to write, i.e., the parameters of the method (or constructor), their types, and the type of the value returned by the method.

What to turn in

Using this DropBox link, you should turn in the following:

The code that you turn in should not write anything to standard output (this will really mess up the tournament).

In addition, also using DropBox, you should submit a single pdf file called report.pdf containing your program report describing all of the following:

What you will be graded on

You will be graded on completing all the different parts of this assignment.  You have a great deal of freedom in choosing how much work you want to put into this assignment, and your grade will in part reflect how great a challenge you decide to take on.  Creativity and ingenuity will certainly be important components of your grade.

Although your standing in the implementation challenge will not directly determine your grade, performance will be taken into account as a general indication of the effectiveness of your ideas.  In all cases, your implementation and design should be sensible, correct, complete, and efficient.  Certainly, your sat-solver should be at least as fast and effective as a reasonably efficient implementation of vanilla WalkSat; however, even this requirement may be disregarded if you choose to implement some other method that is significantly original.

Needless to say, your code should not halt prematurely with an exception (unless given bad data).

As usual, you should follow good programming practices, including documenting your code.  Finally, your program report for this assignment should be clear, precise, and complete but concise.

This programming assignment will be worth roughly 75 points, which will be based on your choice and design of sat-solver and generator, including creativity, ingenuity, and ambitiousness; your actual implementation, which should be correct, complete, efficient, well-designed, and in accordance with good programming practices; and the presentation of your program report.

The optional "systematic experiment" will be worth 5-10 honors optional points (depending on what you do for this part).