COS 226 Programming Assignment: Online Bottleneck Paths

COS 226 Programming Assignment 1

Online Bottleneck Paths

Use a parent-link tree representation to maintain maximum bottleneck paths between pairs of vertices in a capacitated network. The bottleneck capacity of a path is the smallest capacity of any edge on that path. The maximum bottleneck capacity path between two vertices is the highest bottleneck value among all paths that connect the two vertices.

The maximum bottleneck path is of interest because it is the one that gives us a way to get as much (of whatever the network carries) as possible between two vertices. For example, in an Internet application, the vertices might be routers, the edges fiber-optic connections between routers, and the weights the bandwidth capacities of the fiber optic links. A maximum bottleneck path between two vertices would be one that allows transfer of information at as high a rate as possible between them.

Remarkably, this problem can be solved with the parent-link forest representation introduced in Sedgewick, Chapter 1, augmented to include capacities. Why is that remarkable? There are N^2 pairs of vertices, but we only need a table of size N to know the maximum bottleneck path between any pair of them. For a large number of vertices, we could not possibly store all of the paths, but your program will be able to print a maximum bottleneck path between any pair of them. Read Chapter 1 to learn about manipulating the parent-link representation. Do not use pointers or any tree representation involving linked allocation of memory for this assignment.

This assignment divides into three parts, each of which count for one-third of the grade.

Part 1. Modify the quick union algorithm (Sedgewick, Program 1.2) so that the forest it maintains is a spanning forest of the network. This means that every link represented must be one of the input edges. When p and q are the roots of their respective trees, this is the case, but when they are not, the quick-union algorithm adds an edge from the root of the tree containing p to the root of the tree containing q, and this edge is not one of the input edges. Your task is to remedy this defect, by reversing all of the edges from p to the root of its tree, then adding an edge from p to q.

To indicate that it represents a spanning tree, change the name of the array from id to st. Your code should take the number of vertices as a command-line argument and to dynamically allocate an array of that size. For example, given the sequence of pairs at left, your program should produce the following table (compare with Sedgewick, Figure 1.6). Note that the third input value, the capacity, is not used until Part 3 of the assignment.

% a.out 10 < input10.txt
 3  4  14   0 1 2 4 4 5 6 7 8 9 
 4  9   6   0 1 2 4 9 5 6 7 8 9 
 8  0   7   0 1 2 4 9 5 6 7 0 9 
 2  3  44   0 1 3 4 9 5 6 7 0 9 
 5  6  12   0 1 3 4 9 6 6 7 0 9 
 2  9   9   already connected
 5  9  11   0 1 3 4 9 9 5 7 0 9 
 7  3  33   0 1 3 4 9 9 5 3 0 9 
 4  8  13   0 1 3 4 8 9 5 3 0 4 
 5  6  17   already connected
 0  2  10   already connected
 6  1  22   8 1 3 4 9 6 1 3 4 5 
 5  8   4   already connected

All of the links in all of these trees are input pairs (unlike all of the tables in Sedgewick, Chapter 1, where maintaining the spanning tree is not a requirement).

Part 2. Write a new program based on your solution to Part 1 that, when it finds that two vertices are connected, prints a path between the two vertices on the output line associated with them (just as in Sedgewick, Figure 1.1). To do so, you will need to compute the least common ancestor between p and q: lca(p, q) is the common ancestor of p and q that is closest to p and q. You may use a vertex-indexed array of vertex marks to keep track of which vertices have been visited as you work up the tree. To print a path between p and q, follow the path up the tree from p to lca(p, q), then follow the path down the tree from lca(p, q) to q. The links only go up the tree, so use recursion to print the part of the path that goes down the tree.

When processing two vertices that are connected, your code must run in time proportional to the length of the path connecting them. In particular, you cannot waste time following pointers all the way to the root if there is an LCA that is much closer to the two vertices, and you cannot waste time re-initializing all the vertex marks. Figuring out how to do this is the trickiest part of this assignment.

For example, given the sequence of pairs at left, your program should produce the following table (compare with Sedgewick, Figure 1.6).

% a.out 10 < input10.txt
 3  4  14   inserted
 4  9   6   inserted
 8  0   7   inserted
 2  3  44   inserted
 5  6  12   inserted
 2  9   9   already connected by path 2-3-4-9
 5  9  11   inserted
 7  3  33   inserted
 4  8  13   inserted
 5  6  17   already connected by path 5-6
 0  2  10   already connected by path 0-8-4-3-2
 6  1  22   inserted
 5  8   4   already connected by path 5-9-4-8

Part 3. Write a new program based on your solution to Part 2 that accepts nonnegative capacities in the input and maintains a forest of trees with maximum bottleneck paths. Your program will do more work so that the spanning forest is such that the the unique path between any two vertices in the forest is a maximum bottleneck path.

The first (easy) thing that you have to do is to handle edge capacities. Define a second vertex-indexed array named wt, and make sure that whenever you set st[p] = q (to indicate that you are adding the edge p-q to the data structure) you also set wt[p] = w (where w is the capacity that was associated with p-q in the input.

Now, when processing a new edge p-q with capacity w, there are three possibilities:

Case 1: p and q are not already connected. In this case, insert the edge p-q into the parent-link data structure, exactly as in Part 1.

Case 2: p and q are already connected, and every edge on the unique path between them has capacity greater than or equal to w. In this case, the edge p-q will not help, so discard it. Print the path connecting p and q (and its bottleneck capacity), just as in Part 2.

Case 3: p and q are already connected, and there exists an edge on the unique path with capacity less than w. In this case, the edge p-q is useful, so add it to the parent-link data structure. Doing so would create a cycle (through the LCA), so to make room for the new edge, delete the edge on the path from p to q with the smallest capacity. Also print an indication that p-q replaced that edge, on the line associated with p-q in the output.

To delete that edge, you need to reverse some pointers, as you did in Part 1. You do not need much code to do this (and you may even be able to reuse the code from Part 1), but make sure that you handle all of the cases that can occur.

For example, given the sequence of triples at left, your program should produce the following table.

% a.out 10 < input10.txt
a.out 10 < input10.txt

 3  4 14  inserted
 4  9  6  inserted
 8  0  7  inserted
 2  3 44  inserted
 5  6 12  inserted
 2  9  9  already connected by path 2-3-4-9 of length 3 and bottleneck capacity 6
          inserted after deleting 4-9
 5  9 11  inserted
 7  3 33  inserted
 4  8 13  inserted
 5  6 17  already connected by path 5-6 of length 1 and bottleneck capacity 12
          inserted after deleting 6-5
 0  2 10  already connected by path 0-8-4-3-2 of length 4 and bottleneck capacity 7
          inserted after deleting 8-0
 6  1 22  inserted
 5  8  4  already connected by path 5-9-2-3-4-8 of length 5 and bottleneck capacity 9

 Average path length = 3.250000

Input and output. The input is a sequence of triples of integers: the first two integers represent the edge, the third represents the capacity. For Parts 1 and 2, you will ignore the capacities. You can test your programs on these sample input files.

Instrument your program for Part 3 so that it prints out a table like the one above when N is less than 20. For brevity, print out only the path length instead of the path itself for larger N. Also, print out the average of all of these path lengths.