GEOMETRIC APPLICATIONS OF BINARY SEARCH TREES
Binary search trees
Consider a binary search tree with
numerical keys (say doubles). Geometrically, we can view
the tree as recursively subdividing the interval (-∞,∞):
- Each node is associated with an interval (a,b)
that contains all points in the subtree below it
- The root is associated with the whole interval (-∞,∞)
- Each internal node divides its associated interval in two
sub-intervals at node's key. For example, if a node n
is associated with the interval (a,b), then
n.left is associated with the interval (a,n.key)
and n.right is associated with the interval (n.key,b).
kd-Trees
kd trees are a data structure for representing a collection
of points in k-dimensional space. The geometric
description of binary search trees above corresponds to the special
case of 1d trees (also called 1-dimensional kd trees).
kd trees recursively subdivide k-dimensional
space as follows:
- Each node is associated with a bounding box that contains all
points in the subtree below it
- The root is associated with the entire space
- If an internal node divides its associated bounding box in two
at the halfspace that passes through the point associated with
that node and is aligned with the ith axis,
where i is the level of the node mod k. For
example, in a 2d tree, then along any path from root to leaf,
the nodes alternate between splitting the bounding box
horizontally at node.key.x (even levels) and vertically
at node.key.y (odd levels).
See here
for a visual depiction of bounding boxes of kd trees.
Insertion and search run in worst-case Θ(log n) time in a balanced
in a kd tree, and worst-case Θ(n) time in an
unbalanced kd tree.
Nearest neighbor search
(see here)
finds the point in the set that is geometrically closest to a given
target point. Nearest neighbor search traverses the tree
recursively starting from the root, (potentially) exploring both the
left and right child of every node. Efficient nearest neighbor
search is enabled by the following heuristics:
- pruning rule: if the current champion is closer
than the distance to the target point than the bounding box of
the current node, then return -- the nearest neighbor is not in
the subtree rooted at the current node.
- optimistic ordering: after visiting a node, visit
the child that lies on the same side of the halfspace defined by
the node as does the query point.
Typical running time of nearest neighbor search is is Θ(log n);
worst-case is Θ(n).
Range search
(see here)
finds all the points in a kd tree that are contained within
a given (k-dimensional) bounding box. Efficient range
search is enabled by the following pruning heuristic: if
the target range does not intersect the bounding box of the node,
then return -- none of the points in the tree below the current node
may belong to the range. Typical running time of range search is
Θ(log n + m) where n is the number of points in the tree and m is the number of matches; worst-case running time (assuming a balanced tree) is
Θ(n(d-1)/d + m) (e.g., Θ(√n + m) for 2d trees) where d is the number of dimensions.
Recommended Problems
C level
- Suppose that a 3-d tree contains N nodes. What is its height of
the tree in the worst (largest) and best (smallest) case?
Answers
Worst case: Θ(N) (all nodes are arranged in a sequence, and the
right child of every node is null).
Best case: Θ(log2 N) (perfectly balanced).
Note that a kd tree is always a binary tree, regardless of
what k is.
B level
- Suppose that a set of points is organized into
a kd tree. Design an efficient algorithm for finding the nearest
neighbor of a target point that lies within a given bounding box
Answers
Augment a nearest neighbor search with the same pruning strategy as in range search.
-
Consider the set of points (0.1, 9.1), (2.0,1.4), (6.7,5.7), (0.2, 0.2), (4.3,2.7), (1.1,5.7), (5.1, 8.7). In what order should they be inserted into a 2d tree order to mimimize height?
Answers
Insert (2.0,1.4) first: half the points lie to the left
((0.1, 9.1),(0.2,0.2),(1.1,5.7)), and half to the right
((6.7,5.7),(4.3,2.7),(5.1, 8.7)). Next insert
(1.1,5.7) and (6.7,5.7) (half the points that lie
to the left of (2.0,1.4) lie above (1.1,5.7) and
half below; half the points that lie to the right
of (2.0,1.4) lie above (6.7,5.7) and half below). Then insert (0.1, 9.1),(0.2,0.2),(4.3,2.7),(5.1, 8.7) in any order.
Note: (2.0,1.4) should be inserted before
(0.1, 9.1), (6.7,5.7)
(0.2, 0.2), (4.3,2.7)
Augment a nearest neighbor search with the same pruning strategy as in range search.
- Given a sequence of N intervals (a1, b1), ..., (aN, bN), design an O(N log N) sweep-line
algorithm to find a value x that is contained within the maximum number of intervals. You
may assume that no two endpoints have the same value.
- What are the events?
- How do you implement the sweep line?
- What data structure stores the set of intervals that intersect the sweep line?
- How does your sweep-line algorithm work, i.e., how do you process each event?
Answers
For simplicity, we assume no two endpoints have the same value.
-
The 2N events are the left and right endpoints of each interval.
-
To implement the sweep line, sort the endpoints and process in ascending order, say
using mergesort.
-
Store the set of intervals intersecting the sweep line in a priority queue (say, a binary
heap), using the right endpoint as the key.
-
- Left endpoint: insert the interval onto the PQ. Check the number of elements on
the PQ, if it is the most so far, record the x value of the current left endpoint.
- Right endpoint: perform a delete the min on the PQ. This removes the corresponding
interval from the PQ.
Note that the PQ isn’t strictly needed, since we could just increment a counter when processing a left endpoint, and decrement it when processing a right endpoint.
A level
- Suppose that you are given a set S of N 2-dimensional points. Design an algorithm to construct a 2d tree of height Θ(log N) that contains S, in Θ(log N) time. Is it possible to construct a 2d tree of height ~log N?
Answers
Copy the points into two arrays, one sorted by x-coordinate and one
sorted by y-coordinate.
Then recursively construct the 2d tree as follows: if the current
level is an x-level, find a coordinate whose x
coordinate is the median among all points (in Θ(1) time using the
array sorted by x coordinate), and split each of the
x-sorted array and y-sorted arrays into two, based on the criterion
of whether the points x coordinate is lower or higher than the
selected coordinate (Θ(n) time). The recursively construct a kd
tree for the left and right. The case for y-levels is
symmetric.
This recursive procedure satisfies the recurrence T(n) = 4T(n/2) +
n, and therefore operatoes in Θ(nlog n) time.
~log N is not achievable. Consider a set of points that all share
the same x coordinate. Regardless of the way that the tree is
constructed, it has height ~2log N, since one brach of every tree
that is split at an x coordinate is always empty.