CS 402: HW#4, Bayes Nets and MCMC

COS 402: Artificial Intelligence

Homework #4

Bayes Nets and MCMC

Fall 2008

Due: Tuesday, November 11

Part I: Written Exercises

See instructions on the assignments page on how and when to turn these in. Approximate point values are given in brackets. Be sure to show your work and justify all of your answers.

1. [10] There are three condemned prisoners A, B and C. The governor has announced that one of the three, chosen at random, has been pardoned, but doesn't say which. Prisoner A, realizing that he only has a 1/3 chance of having been pardoned and anxious to learn his fate, reasons with the warden as follows: "Please tell me the name of one of the other prisoners B or C who will be executed. I already know that at least one of them will be executed so you will not be divulging any information." The warden thinks for a minute and then asks how he should choose between B or C in case both are to be executed. "In that case," A tells him, "simply flip a coin (when I'm not around) to choose randomly between the two." Reluctantly, the warden agrees and the next day tells A that B will be executed. On hearing this news, A smiles and thinks to himself, "What a fool this warden is to fall for my trick. Now my chances of having been pardoned have increased from 1/3 to 1/2" (since either A or C must have been pardoned).

Was A correct in his final assertion? Use Bayes rule to compute the probability that A was pardoned given that the guard tells him that B will be executed. (Note that this is not the same as the probability that A was pardoned given that B will be executed (i.e., was not pardoned). This is because the event that "the guard tells A that B was not pardoned" is different from the event that "B was not pardoned". This is a tricky problem so be careful to distinguish these two events.)
Suppose, in the event that the warden has to choose between B or C, that he chooses B with probability p (rather than 1/2 as above). Now what is the probability that A was pardoned (in terms of p) given, as before, that the guard tells him that B will be executed? (If you wish, you may answer part b first, and then answer part a by setting p=1/2.)

2. [15] Exercise 14.2 in R&N.

3. [10] Consider the burglar alarm example in R&N Figure 14.2 (and discussed extensively in class). In R&N (page 505), it is shown that the probability of a burglary, given that both John and Mary call, is roughly 28.4%. Suppose now that you hear on the radio that there was an earthquake in your area. Now what is the probability of a burglary (given that JohnCalls, MaryCalls and Earthquake are all equal to true)? Has the probability of a burglary increased or decreased as a result of this new information? Intuitively, why does this make sense? (This phenomenon is called "explaining away".)

4. [15] Let G be an undirected graph. Consider a random walk on such a graph in which we begin at a designated start vertex, and proceed, at each time step, to traverse to a randomly selected neighbor of the current vertex. In other words, at each time step, we traverse one of the edges (selected at random) that is incident to the vertex we currently occupy. Let d(v) be the degree of vertex v, i.e., the number of edges that are incident to vertex v. Let m be the total number of edges in the graph.

What is the transition probability function q(v --> v') for this random walk?
Let p(v) = d(v)/(2m). Show that p is a stationary distribution for this random walk, i.e., that it satisfies the stationarity equation (R&N Eq. (14.9)).
Consider a chess king taking a random walk on an ordinary chess board. At each time step, he can move one step in any direction, horizontally, vertically or diagonally (but of course, he has to stay on the board). What is the stationary distribution p in this case?

Part II: Programming

This assignment is about using the Markov Chain Monte Carlo technique (also known as Gibbs Sampling) for approximate inference in Bayes nets. The main components of the assignment are the following:

Implement the MCMC algorithm.
Test the MCMC algorithm on a number of Bayes nets, including one of your own creation.

Optionally, you can also implement and test other inference procedures discussed in class or the book.

Implementing MCMC

The first part of the assignment is to implement the MCMC algorithm. The purpose of this algorithm is to estimate the probability distribution of one variable in a Bayes net, given the settings of a subset of the other variables. As described below, we are providing code for reading in a Bayes net and a query to the Bayes net from files. Your code should then run MCMC some specified number of iterations to estimate the required probability.

Since MCMC is a randomized, iterative algorithm, the actual number of iterations needed to estimate probabilities is an important (and not fully understood) issue, and one that will be explored as part of this assignment.

Which variation of MCMC to use: In the version of MCMC given in R&N, the variables are updated systematically by cycling through them in order. In class, we instead looked at a version in which a single randomly selected non-evidence variable is updated on each iteration (where each non-evidence variable is chosen with equal probability). Please use the random-update version given in class. Also, as in class and in R&N, your estimated probability distribution should be based on all iterations of the algorithm (rather than some of the other variations discussed). For extra credit, you are welcome to experiment with systematic updates or other variations of MCMC.

Sampling from a probability distribution: For one part of the MCMC algorithm, you will need to sample randomly from an arbitrary distribution over a small set of possible values. Here is a simple way to do this, given access to the kind of (pseudo-)random numbers provided by Java. Suppose we want to choose among four values, A, B, C and D, and suppose further that we want to pick A with probability 0.22, B with probability 0.43, C with probability 0.05 and D with probability 0.30. In Java, it is possible to obtain (approximately) uniformly distributed random numbers in the range [0, 1), where [a, b) denotes the set of real numbers bigger than or equal to a, and (strictly) smaller than b. Let r be such a random number. Then, to sample according to the given distribution over A, B, C and D, we just need to pick A if r is in [0.00, 0.22), B if r is in [0.22, 0.65), C if r is in [0.65, 0.70), and D if r is in [0.70, 1.00). Because r is equally likely to be any real number in [0,1), the chance of choosing each of the four values is simply equal to the length of the interval that we associated with it (for instance, the chance of choosing B is the length of the interval [0.22, 0.65), which is 0.43). Thus, each of the four values will be selected with exactly the desired probability.

If implemented properly, this procedure, when selecting randomly among k values, can be made to run in O(k) time. For generating random numbers, see the documentation for the Random class which is provided as part of the Standard Java Platform.

Testing on Bayes nets

Once your MCMC implementation is working and has been tested for correctness, the second part of the assignment is to try it out on a number of Bayes nets and queries, some of which we are providing, and others that you are asked to create.

Specifically, we are asking that you try the following, and that you answer the listed questions.

Earthquakes and burglar alarms

As a "sanity check" to be sure your code is working, start by running MCMC on the "earthquakes and burglar alarms" Bayes net discussed in class and given in R&N Figure 14.2. This Bayes net is provided in the file alarm.bn. Specifically, answer the following:

In R&N (page 505), it is shown that the probability of a burglary, given that both John and Mary call, is roughly 0.284. This query (the probability of burglary given that both John and Mary call) is provided in the file alarm1.qry. Run MCMC on this query for enough iterations that you are convinced that it has converged (say, to a value that is correct to two significant digits of the true probability). For instance, you might print out the answer, say, every 1000 iterations, and watch until it seems that the estimated probabilities are not changing significantly. What answer did you get? How many iterations did you need?
Suppose now that you learn that there was an earthquake in the area. Create an appropriate new query file, called alarm2.qry, and run MCMC to estimate the probability of a burglary (given that John and Mary both called, and that there was an earthquake). Follow the same directions and answer the same questions as in the last question. You should be sure that you are getting the same answer as in written exercise 3 above.

Automobile insurance

The next Bayes net attempts to estimate the risk posed by an individual seeking car insurance. In other words, the network attempts to relate variables like the age and driving history of the individual to the cost and likelihood of property damage or bodily injury. Here is a graphical depiction of the network:

The 12 shaded variables are considered hidden or unobservable, while the other 15 are observable. The network has over 1400 parameters. An insurance company would be interested in predicting the bottom three "cost" variables, given all or a subset of the other observable variables. For instance, for an adolescent driving a car with about 50,000 miles and no anti-lock brakes, we can estimate the cost of property damage to have the following distribution:

Estimated Cost	Probability
$1,000	0.457
$10,000	0.343
$100,000	0.173
$1,000,000	0.028

The network has been provided in the file insurance.bn.

The query given above is provided in the file insurance1.qry. Run MCMC on it following the same directions as in question 1 above, and answer the same questions.
Try out at least one other query of your own invention using MCMC, and report the results. For instance, you might see what happens when it is learned that the driver above has an excellent or very poor driving record. Or you can try out entirely different queries. (Note that, unless you have a good reason for doing so, your queries should not involve hidden variables.) Call your query file(s) insurance2a.qry, insurance2b.qry, etc. Describe what you tried and what results you got. Comment on why your results did or did not make sense, and how they reflect realistic or unrealistic assumptions that have been built into the network.

Leader and followers

Next, we will explore the convergence properties of MCMC on a very simple family of Bayes nets. Theoretically, MCMC will converge to the right answer asymptotically, that is, as the number of iterations becomes very large. However, in practice, it is not always clear how many iterations actually suffice, as this example demonstrates.

Each network in this family has a single "leader" variable, and some number of "follower" variables. The idea is that the leader selects a bit at random (0 or 1 with equal probability), and then all of the followers choose their own bits, each one copying the leader's bit with 90% probability, or choosing the opposite of the leader's bit with 10% probability. (For instance, to be concrete, the "leader" might be President Bush, and the "followers" might be Republican members of congress; the bits then might represent voting for or against a particular bill. We are assuming that each of the follower Republicans follows W.'s lead in how they vote on this bill with 90% probability.)

Here is the topology of such a network with four followers:

Leader-follower networks with k followers have been provided in the files leader-follower-k.bn for k = 5, 10, 15, 20, 25, 30, 35, 50.

The query we will study in each case is rather trivial: We will simply ask what the (marginal) probability distribution is for the leader variable, in the absence of any evidence (i.e., none of the other variables have been set to specified values). This query is encoded in the file leader-follower.qry.

Here are things to try and questions to answer:

What is the correct answer for the query given above? (This should be easy to answer.)
Run MCMC on a network with 10 followers using the given query. Run for at least 100 million (10⁸) iterations, printing the partial results at least every 100,000 iterations. Make a plot of the estimated probability (as estimated by MCMC) of the leader choosing the bit 0 as a function of the number of iterations executed by MCMC. (In other words, the x-axis gives the number of iterations t, and the y-axis gives the probability of the leader bit being 0 as estimated after t iterations of MCMC.) Does MCMC seem to converge to the right answer? If so, roughly how long does it seem to take? (You can use any standard graph plotting software, such as gnuplot, excel or matlab.)
Now repeat the last question on networks with 25 followers and 50 followers. Run for at least a billion (10⁹) iterations, printing partial results at least every million (10⁶) iterations.
Can you explain what you observed? Why does MCMC have such a hard time with these networks? What do these experiments say about the difficulty of detecting convergence?

Your own Bayes net

The last part of the assignment is to devise a Bayes net and at least one query of your own creation. Try to think of a network that models something interesting and/or (semi)realistic, such as the insurance network or burglar alarm networks studied above. Your network does not have to be large and complicated -- even a quite small network can be interesting. Your network can model anything you wish. Be creative! If you have difficulty coming up with your own ideas, you can think about designing a network to model something discussed in the book or in class or in one of the written exercises (however, you should not take an idea for which a network has already been given in class or in the book, such as the "burglar alarm" network). In addition to the topology of the network, you will need to estimate all of the probabilities for all of the conditional probability tables defining the network. You can make these numbers up however you wish, but you should attempt to estimate these numbers based on your intuition or knowledge of the problem as best as you can.

Once you have designed your network, choose one or more queries to test on it. Try to choose a Bayes net and queries whose answers are non-obvious (even approximately), or which reveal something interesting about the nature of probabilistic reasoning (such as the example in written exercise 3 above).

Finally, you should answer the following:

Describe your network and the query or queries that you chose. Give motivation for the choices you made. What aspects of your design are realistic or unrealistic?
Run some experiments using MCMC, and discuss the results. Does MCMC seem to be converging to the right answer? Do the results seem consistent with your intuition about whatever it is that you are modeling? Why or why not?

Call the Bayes net and query files you create mynet1.bn and mynet1.qry. If you create others, name them in a similar fashion.

Other inference algorithms (optional)

For extra credit, you can choose optionally to implement one other algorithm for exact or approximate inference, including any of the algorithms discussed in R&N Sections 14.4 and 14.5. You also can try out variations of MCMC, for instance, with systematic rather than random updating of the variables. If you try any of these, you should experiment with them on a number of Bayes nets in a manner similar to that described above, and you should write up your results and observations. You might want to include comparisons to MCMC, and trade offs of running time versus accuracy of results.

The code we are providing

We are providing a class called BayesNet for reading in a description of a Bayes net from a data file. A description of the topology of the Bayes net is stored in public fields of the BayesNet object, while the conditional probability tables are accessed using a public method. Internally, variables are identified with the integers 0, 1, 2, ..., up to (but not including) the number of variables. The values that each variable can take are also given as integers in a similar manner. Here are the public fields of a BayesNet object: numVariables is the number of variables; varName[i] is the name (as a String) of variable i; numValues[i] is the number of values that variable i can take, and valueNames[i][j] gives the name of variable i's j-th value; numParents[i] is the number of parents of variable i; and parents[i][] is an array listing the parents of variable i.

Access to the conditional probabilities which define the Bayes net is via the public method getCondProb(). This method returns the conditional probability of a specified variable x taking a particular value, given the settings of variable x's parents. The first argument to this method is simply the variable x whose (conditional) probability is to be looked up. The second argument vals[] is an integer array of length equal to the total number of variables, where vals[i] is a provided value for variable i. Values must be provided in this array for x and for all of x's parents; all other values in this array will be ignored. The value returned by the method is the conditional probability of variable x taking the value vals[x], given that all of x's parents are set to the values specified in the vals[] array.

A BayesNet object can be constructed using a constructor which takes as argument the name of the file containing a description of the Bayes net. These files should have a format like this one, alarm.bn, which encodes the "burglar alarm" network (R&N Figure 14.2):

variable Burglary t f variable Earthquake t f variable Alarm t f variable JohnCalls t f variable MaryCalls t f probability Burglary 0.001 0.999 probability Earthquake 0.002 0.998 probability Alarm | Burglary Earthquake t t 0.95 0.05 t f 0.94 0.06 f t 0.29 0.71 f f 0.001 0.999 probability JohnCalls | Alarm t 0.90 0.10 f 0.05 0.95 probability MaryCalls | Alarm t 0.70 0.30 f 0.01 0.99

The first line specifies the name of a variable called "Burglary" which can take two values called "t" and "f". Throughout the program, the ordering of the values of this variable should follow the ordering given here in the .bn file; thus, in this case, "t" will be associated with value 0 for the Burglary variable, and "f" with value 1. The next several lines in the first block similarly define variables and their values.

The next block (beginning "probability Burglary") specifies the probability of each of Burglary's two values: t has probability 0.001 and f has probability 0.999. Note that, because Burglary has no parents, nothing follows the word "Burglary" in this block. Skipping ahead, the fourth block (beginning "probability Alarm") specifies that variable Alarm's parents are Burglary and Earthquake. The lines that follow within this block define Alarm's conditional probability table. For instance, the second line gives the probability distribution of Alarm when Burglary is set to value t and Earthquake is set to f; specifically, the probability that Alarm=t, given that Burglary=t and Earthquake=f is 0.94. If the numbers on such a line do not add up to one, they will automatically be renormalized (i.e., multiplied by a constant so that they do add up to one).

MCMC can have difficulties when the conditional probability tables include probabilities that are equal to zero since this can cause the variables to be initialized to values that would actually be impossible (i.e., have zero probability), which in turn can lead to, for instance, division by zero problems when updating the values of non-evidence variables. To avoid these difficulties, when zero probability values are read in from a file describing a Bayes net, the BayesNet code replaces all such values by a tiny positive constant, say, 1e-50. (In fact, any value smaller than this constant gets replaced.)

We also are providing a Query class for reading in queries from files or from standard input. The query itself is stored in public fields of the Query object. Queries are always of the form: what is the probability distribution of some particular variable, given the settings of some other variables (the so-called evidence variables)? Accordingly, Query objects have two public fields: queryVar is the variable whose value is being queried, and evidence[] is an array specifying the settings of a subset of the other variables. More specifically, evidence[i] gives the value that variable i is to be set to if it is one of the evidence variables for this query; otherwise, if variable i is not an evidence variable, then evidence[i] is set to -1.

Here is an example file, alarm1.qry, showing the format of query files:

Burglary JohnCalls t MaryCalls t

The first line specifies the query variable. Each of the subsequent lines specifies an evidence variable and the value to which it should be set. This particular query asks for the probability distribution of the Burglary variable given that JohnCalls=t and that MaryCalls=t.

The class RunMcmc provides a main that you can use for testing, or you may wish to write your own (although it is still required that your code work with the main that we provided). Once MCMC has been implemented, this main simply runs MCMC for some number of rounds, printing partial results with a specified frequency, on a given Bayes net and query. For instance, the following command might produce the following output (obviously, the output will be different each time that the algorithm is run since MCMC is a randomized algorithm):

> java RunMcmc alarm.bn alarm1.qry 100 20 20 0.85000000 0.15000000 40 0.47500000 0.52500000 60 0.31666667 0.68333333 80 0.37500000 0.62500000 100 0.37000000 0.63000000 >

The command-line arguments are: name of the file describing the Bayes net; name of the file describing the query; maximum number of iterations; frequency with which results should be printed. Here, the specified Bayes net and query correspond to the example files given above. The first line of the output means that, after 20 iterations, the estimated probability that Burglary equals t or f is 0.85 or 0.15, respectively. Similarly for the lines that follow.

The code and all data files are available here, or as a single zip file here. Data is included in the data subdirectory. Documentation on the provided Java classes is available here.

As a general reminder, you should not modify any of the code that we are providing, other than template files, or unless specifically allowed in the instructions. You also should not change the "signature" of any of the methods or constructors that you are required to write, i.e., the parameters of the method (or constructor), their types, and the type of the value returned by the method.

The code that you need to write

Your job is to fill in the MCMC algorithm in the template file Mcmc.java. You need to first fill in a constructor which will take as input a BayesNet and Query, and which will initialize all data structures you are using appropriately, but which will not actually run MCMC. You then need to fill in the method runMoreIterations(). This method takes an integer argument specifying how many additional iterations of MCMC are to be executed. So, calling the constructor does not run MCMC, but only sets everything up. Calling runMoreIterations(1000), for instance, would run MCMC for 1000 iterations. Calling runMoreIterations(1000) again would run MCMC for an additional 1000 iterations. It is therefore imperative that you save the state of the current run of MCMC between calls to runMoreIterations(); do not re-start your run of MCMC each time that this method is called.

Each call to runMoreIterations() should return a double array specifying the probability distribution over the values of the query variable.

What to turn in

Using moodle, you should turn in the following electronically:

The class Mcmc.java with the constructor and the runMoreIterations() method filled in.
Any Bayes net or query files that you created, including at a minimum: alarm2.qry, insurance2a.qry, mynet1.bn and mynet1.qry.
Any other Java files that you wrote and used, or that are needed by your code.
If you did the extra credit part of this assignment, whatever classes you wrote implementing alternative inference algorithms.
If appropriate, a readme.txt file explaining briefly how your code is organized, what data structures you are using, or anything else that will help the TA's understand how your code works and is organized.

In addition, you should turn in a program report in hard copy that includes all of the following:

Your answers to all of the questions above.
Any plots that you made (clearly annotated).
Optionally (but not for extra credit, just as an aid to the TA's), a transcript of a run of your program to go with the questions above.
If you did the extra credit part, a description of what you did and what results you got.

Your program report should be submitted following the instructions on the assignments page.

What you will be graded on

We will automatically test your code on Bayes nets and queries, possibly different from those that we provided. Your grade will depend largely on getting the right answers. In addition, your code should be efficient enough to run reasonably fast (easily executing MCMC at around a million iterations every couple seconds on a fast machine). Your code should not terminate prematurely with an exception (unless given bad data, of course); code that does otherwise risks getting no credit for the automatic testing portion of the grade. As always, you should follow good programming practices, including documenting your code enough for the TA to be able to read and understand it. Creativity, especially in the design of your own Bayes net and query, will be one component of your grade.

Your answers to the questions above should be clear, concise, complete, correct, thoughtful and perceptive.

This programming assignment will be worth about 75 points (divided roughly evenly between the implementation of MCMC and the experimental part of the assignment).

The optional part of this assignment will be worth 5-15 points, depending on what you do.

Acknowledgments

The (fictional) insurance network was extracted from the paper "Adaptive Probabilistic Networks with Hidden Variables" by J. Binder, D. Koller, S. Russell and K. Kanazawa (Machine Learning, 29, 213–244 (1997)).