Algorthm Development.
Developing a good algorithm is an iterative process. We create a model of the problem, develop
an algorithm, and revise the performance of the algorithm until it meets our needs.
Union-Find. The ultimate goal is to develop a data type that support the following operations on a fixed number N of objects:
The find() method is defined so that find(p) == find(q) iff connected(p, q).
Key observation: connectedness is an equivalence relation. Saying that two objects are connected is the same as saying they are in an equivalence class. This is just fancy math talk for saying "every object is in exactly one bucket, and we want to know if two objects are in the same bucket". When you union two objects, you're basically just pouring everything from one bucket into another.
Quick find. This is the most natural solution, where each object is given an explicit number. Uses an array id[] of length N, where id[i] is the bucket number of object i (which is returned by find(i)). To union two objects p and q, we set every object in p's bucket to have q's number.
Quadratic algorithms don't scale. Given an N times larger problems on an N times faster computer, the problem takes N times as long to run.
Quick union. id[i] is the parent object of object i. An object can be its own parent. The find() method climbs the ladder of parents until it reaches the root (an object whose parent is itself). To union p and q, we set the root of p to point to the root of q.
Weighted quick union. Rather than union(p, q) making the root of p point to the root of q, we instead make the root of the smaller tree point to the root of the larger one. The tree's size is the number of nodes, not the height of the tree. Results in tree heights of lg N (you should understand this proof).
Weighted quick union with path compression. When find is called, the tree is compressed. Results in nearly flat trees. Making M calls to union and find with N objects results in no more than M log*(N) array accesses. For any conceivable values of N in this universe, log*(N) is at most 5.