Algorthm Development.
Developing a good algorithm is an iterative process. We create a model of the problem, develop
an algorithm, and revise the performance of the algorithm until it meets our needs.
Dynamic connectivity problem. The problem is defined on an undirected graph with N vertices. There are two operations: add an edge and determine whether two vertices are connected by a path. Connectedness is an equivalence relation. This implies that we can partition the vertices into sets such that every vertex is in exactly one set and two vertices are connected if and only if they are in the same set. This problem motivates the union-find data type.
Union-Find. The goal is to develop a data type that support the following two core operations on disjoints sets over the the elements { 0, 1, 2, ..., N − 1 }:
Quick find. This is the most natural solution, where each element is given an explicit identifier that indicates in which set it belongs. We use an array id[] of length N, where id[i] is the identifier of element i (which is returned by find(i)). To union two objects p and q, we set every element with p's identifer to have q's identifier.
Quadratic algorithms don't scale. Given an N times larger problems on an N times faster computer, the problem takes N times as long to run.
Quick union. We store the elements in a forest of trees, with the elements in each tree corresponding to a different set. We store the parent pointers in an array, where parent[i] is the parent in the tree of element i. We use the root of the tree as the set identifier. By convention, we set the parent pointer of a root to itself. The find() method climbs the ladder of parents until it reaches the root (an object whose parent is itself). To union p and q, we set the root of p to point to the root of q.
Weighted quick union (union-by-size). Rather than union(p, q) making the root of p point to the root of q, we instead make the root of the smaller tree point to the root of the larger one. The size of a tree is the number of nodes. Using union-by-size, the height of each tree is at most lg N (you should understand this proof). (An alternate strategy, known as union-by-height, use the height of the tree instead of the size.)
Weighted quick union with path compression. When find is called, the tree is compressed. Results in nearly flat trees. Making M calls to union and find with N objects results in no more than M log*(N) array accesses. For any conceivable values of N in this universe, log*(N) is at most 5.