RedblackImplementation and Proof of Red-Black Trees
Required Reading
- Section 3.3 of Algorithms, Fourth Edition, by Sedgewick and Wayne, Addison Wesley 2011; or
- Chapter 13 of Introduction to Algorithms, 3rd Edition, by Cormen, Leiserson, and Rivest, MIT Press 2009
- or Wikipedia.
Implementation
From VFA Require Import Perm.
From VFA Require Import Extract.
From Coq Require Import Logic.FunctionalExtensionality.
Open Scope Z_scope.
Definition key := int.
Inductive color := Red | Black.
Section TREES.
Variable V : Type.
Variable default: V.
Inductive tree : Type :=
| E : tree
| T: color → tree → key → V → tree → tree.
Definition empty_tree := E.
lookup is exactly as in our (unbalanced) search-tree algorithm in
Extract, except that the T constructor carries a color component,
which we can ignore here.
Fixpoint lookup (x: key) (t : tree) : V :=
match t with
| E ⇒ default
| T _ tl k v tr ⇒ if ltb x k then lookup x tl
else if ltb k x then lookup x tr
else v
end.
The balance function is copied directly from Okasaki's paper.
Now, the nice thing about machine-checked proof in Coq is that you
can prove this correct without actually understanding it!
So, do read Okasaki's paper, but don't worry too much about the details
of this balance function.
In contrast, Sedgewick has proposed left-leaning red-black trees,
which have a simpler balance function (but a more complicated invariant).
He does this in order to make the proof of correctness easier: there
are fewer cases in the balance function, and therefore fewer cases
in the case-analysis of the proof of correctness of balance. But as you
will see, our proofs about balance will have automated case analyses,
so we don't care how many cases there are!
Definition balance rb t1 k vk t2 :=
match rb with Red ⇒ T Red t1 k vk t2
| _ ⇒
match t1 with
| T Red (T Red a x vx b) y vy c ⇒
T Red (T Black a x vx b) y vy (T Black c k vk t2)
| T Red a x vx (T Red b y vy c) ⇒
T Red (T Black a x vx b) y vy (T Black c k vk t2)
| a ⇒ match t2 with
| T Red (T Red b y vy c) z vz d ⇒
T Red (T Black t1 k vk b) y vy (T Black c z vz d)
| T Red b y vy (T Red c z vz d) ⇒
T Red (T Black t1 k vk b) y vy (T Black c z vz d)
| _ ⇒ T Black t1 k vk t2
end
end
end.
(In contrast, Sedgewick has proposed left-leaning red-black
trees, which have a simpler balance function (but a more
complicated invariant). He does this in order to make the proof of
correctness easier: there are fewer cases in the balance
function, and therefore fewer cases in the case-analysis of the
proof of correctness of balance. But as you will see, our proofs
about balance will have automated case analyses, so we don't
care how many cases there are!)
The insert operation uses a helper function ins to do
the insertion, and another function makeBlack to color
the resulting tree's root black.
Definition makeBlack t :=
match t with
| E ⇒ E
| T _ a x vx b ⇒ T Black a x vx b
end.
Fixpoint ins x vx s :=
match s with
| E ⇒ T Red E x vx E
| T c a y vy b ⇒ if ltb x y then balance c (ins x vx a) y vy b
else if ltb y x then balance c a y vy (ins x vx b)
else T c a x vx b
end.
Definition insert x vx s := makeBlack (ins x vx s).
The elements function does not change except to ignore the color component of T.
Fixpoint elements' (s: tree) (base: list (key×V)) : list (key × V) :=
match s with
| E ⇒ base
| T c a k v b ⇒ elements' a ((k,v) :: elements' b base)
end.
Definition elements (s: tree) : list (key × V) := elements' s nil.
Now that the program has been defined, it's time to prove its properties.
A red-black tree has two kinds of invariants:
First, we'll look at some automation methodology.
Let's start by proving a seemingly obvious fact about ins, namely that the
result of any ins operation is a non-empty tree. (We don't actually need
this fact, but it makes a good example.) The proof turns out to be surprisingly
complicated, because of the need for a big case analysis over all the clauses
of the balance function. These proofs are very tedious to do "by hand,"
but are easy to automate.
Several of the proofs for red-black trees require a big case
analysis over all the clauses of the balance function. These
proofs are very tedious to do by hand, but are easy to automate.
Let's get some practice by automating a proof that insertion
produces a non-empty tree.
- BST: the keys in each left subtree are all less than the node's key, and the keys in each right subtree are greater
- Balanced: there is the same number of black nodes on any path from the root to each leaf; and there are never two red nodes in a row.
Proof Automation for Case-Analysis Proofs.
Proof Automation for Case-Analysis Proofs.
Lemma T_neq_E:
∀ c l k v r, T c l k v r ≠ E.
Lemma ins_not_E: ∀ x vx s, ins x vx s ≠ E.
Proof.
intros. destruct s; simpl.
discriminate. (* This form of the discriminate tactic is shorthand
for intro X; discriminate X. *)
remember (ins x vx s1) as a1.
unfold balance.
Let's destruct on the topmost case, ltb x k. We can use
destruct instead of bdestruct because we don't need to
remember whether x<k or x≥k.
destruct (ltb x k).
(* The topmost test is match c with..., so just destruct c *)
destruct c.
(* This one is easy. *)
discriminate.
(* The topmost test is match a1 with..., so just destruct a1 *)
destruct a1.
(* The topmost test is match s2 with..., so just destruct s2 *)
destruct s2.
(* This one is easy. *)
discriminate.
How long will this go on? A long time! But it will terminate.
Just keep typing. Better yet, let's automate.
The following tactic applies whenever the current goal looks like
match ?c with Red ⇒ _ | Black ⇒ _ end ≠ _
and what it does in that case is destruct c
The following tactic applies whenever the current goal looks like
match ?s with E ⇒ _ | T _ _ _ _ _ ⇒ _ end ≠ _
and what it does in that case is destruct s
Let's apply that tactic again, and then try it on the subgoals,
recursively. Recall that the repeat tactical keeps trying the
same tactic on subgoals.
repeat match goal with
| ⊢ match ?s with E ⇒ _ | T _ _ _ _ _ ⇒ _ end ≠ _⇒destruct s
end.
match goal with
| ⊢ T _ _ _ _ _ ≠ E ⇒ discriminate
end.
This seems promising. Let's start the proof all over again and incorporate this automation.
Abort.
Lemma ins_not_E: ∀ x vx s, ins x vx s ≠ E.
Proof.
intros. destruct s; simpl.
discriminate.
remember (ins x vx s1) as a1.
unfold balance.
This is the beginning of the big case analysis. This time,
let's combine several tactics together:
repeat match goal with
| ⊢ (if ?x then _ else _) ≠ _ ⇒ destruct x
| ⊢ match ?c with Red ⇒ _ | Black ⇒ _ end ≠ _⇒ destruct c
| ⊢ match ?s with E ⇒ _ | T _ _ _ _ _ ⇒ _ end ≠ _⇒destruct s
end.
| ⊢ (if ?x then _ else _) ≠ _ ⇒ destruct x
| ⊢ match ?c with Red ⇒ _ | Black ⇒ _ end ≠ _⇒ destruct c
| ⊢ match ?s with E ⇒ _ | T _ _ _ _ _ ⇒ _ end ≠ _⇒destruct s
end.
What we have left is 117 cases, every one of which can be proved
the same way:
discriminate.
discriminate.
discriminate.
discriminate.
discriminate.
discriminate.
discriminate.
discriminate.
discriminate.
discriminate.
discriminate.
Only 111 cases to go...
discriminate.
discriminate.
discriminate.
discriminate.
discriminate.
discriminate.
discriminate.
Only 107 cases to go...
There has to be a better way, and there is:
all: discriminate.
The all: goal selector applies a tactic to all remaining subgoals.
Alternatively, we can restart the proof yet again, and add one
more clause to the match goal tactic...
Abort.
Let's start over again, adding that apply to the case analysis.
Lemma ins_not_E: ∀ x vx s, ins x vx s ≠ E.
Proof.
intros. destruct s; simpl.
discriminate.
remember (ins x vx s1) as a1.
unfold balance.
repeat match goal with
| ⊢ (if ?x then _ else _) ≠ _ ⇒ destruct x
| ⊢ match ?c with Red ⇒ _ | Black ⇒ _ end ≠ _⇒ destruct c
| ⊢ match ?s with E ⇒ _ | T _ _ _ _ _ ⇒ _ end ≠ _⇒destruct s
| ⊢ T _ _ _ _ _ ≠ E ⇒ discriminate
end.
Qed.
The BST Property
Fixpoint ForallT (P: int → V → Prop) (t: tree) : Prop :=
match t with
| E ⇒ True
| T c l k v r ⇒ P k v ∧ ForallT P l ∧ ForallT P r
end.
Inductive BST : tree → Prop :=
| ST_E : BST E
| ST_T : ∀ c l k v r,
ForallT (fun k' _ ⇒ (int2Z k') < (int2Z k)) l →
ForallT (fun k' _ ⇒ (int2Z k') > (int2Z k)) r →
BST l →
BST r →
BST (T c l k v r).
Lemma balance_BST:
∀ c l k v r,
ForallT (fun k' _ ⇒ (int2Z k') < (int2Z k)) l →
ForallT (fun k' _ ⇒ (int2Z k') > (int2Z k)) r →
BST l →
BST r →
BST (balance c l k v r).
Proof.
intros c l k v r PL PR BL BR.
unfold balance.
∀ c l k v r,
ForallT (fun k' _ ⇒ (int2Z k') < (int2Z k)) l →
ForallT (fun k' _ ⇒ (int2Z k') > (int2Z k)) r →
BST l →
BST r →
BST (balance c l k v r).
Proof.
intros c l k v r PL PR BL BR.
unfold balance.
Use proof automation for this case analysis.
repeat match goal with
| ⊢ BST (match ?c with Red ⇒ _ | Black ⇒ _ end) ⇒ destruct c
| ⊢ BST (match ?s with E ⇒ _ | T _ _ _ _ _ ⇒ _ end) ⇒ destruct s
end.
58 cases to consider!
- constructor. auto. auto. auto. auto.
- constructor; auto.
- constructor; auto.
- constructor; auto.
To prove this one, we need to destruct a hypothesis above the line.
+ simpl. repeat split.
destruct PR as [? [? ?]].
omega.
And for this one, we first need to both invert and destruct above the line.
+ repeat split.
× inv BR.
destruct H5 as [? [? ?]].
omega.
× inv BR.
destruct H5 as [? [? ?]].
auto.
× inv BR.
destruct H5 as [? [? ?]].
auto.
+ constructor; auto.
× inv BR.
destruct H5 as [? [? ?]].
omega.
× inv BR.
destruct H5 as [? [? ?]].
auto.
× inv BR.
destruct H5 as [? [? ?]].
auto.
+ constructor; auto.
This time, let's do the inversion and destruction before deconstructing.
+ inv BR.
inv H7.
constructor; auto.
- constructor; auto.
inv H7.
constructor; auto.
- constructor; auto.
Hmm. Another complicated case. Rather than working through it, let's
try the proof again with stronger automation based on what we learned above.
Whenever we have a hypothesis above the line that looks like
we should invert or destruct it, and if we have a goal below the line of the form
ForallT _ (T _ _ _ _ _)
we should split it up. Otherwise we try
try constructor; auto; try omega
- H: BST (T _ _ _ _ _)
- H: ForallT _ (T _ _ _ _ _)
Abort.
Lemma balance_BST:
∀ c l k v r,
ForallT (fun k' _ ⇒ (int2Z k') < (int2Z k)) l →
ForallT (fun k' _ ⇒ (int2Z k') > (int2Z k)) r →
BST l →
BST r →
BST (balance c l k v r).
Proof.
intros.
unfold balance.
Use heavier-duty proof automation for this case analysis.
repeat (match goal with
| ⊢ BST (match ?c with Red ⇒ _ | Black ⇒ _ end) ⇒ destruct c
| ⊢ BST (match ?s with E ⇒ _ | T _ _ _ _ _ ⇒ _ end) ⇒ destruct s
| ⊢ ForallT _ (T _ _ _ _ _) ⇒ repeat split
| H: ForallT _ (T _ _ _ _ _) ⊢ _ ⇒ destruct H as [? [? ?] ]
| H: BST (T _ _ _ _ _) ⊢ _ ⇒ inv H
end;
(try constructor; auto; try omega)).
(Only) 41 cases to consider now! A little disappointing that we didn't clear
more cases, but let's look at why are we stuck.
The first goal follows from the hypotheses
[ ForallT (fun (k' : int) (_ : V) ⇒ int2Z k' > int2Z k0) r2 ]
and
[ int2Z k1 < int2Z k0 ]
but we need a lemma about >, to be proved inductively over r2.
The other goals look similar.
Abort.
Lemma ForallT_imp : ∀ (P Q : int → V → Prop) t,
ForallT P t →
(∀ k v, P k v → Q k v) →
ForallT Q t.
Proof.
induction t; intros.
- auto.
- destruct H as [? [? ?]]. repeat split; auto.
Qed.
Lemma ForallT_greater : ∀ t k k0,
ForallT (fun k' _ ⇒ int2Z k' > int2Z k) t →
int2Z k > int2Z k0 →
ForallT (fun k' _ ⇒ int2Z k' > int2Z k0) t.
Proof.
intros; eapply ForallT_imp; eauto.
intros. simpl in H1.
eapply Zgt_trans; eauto.
Qed.
We can predict that we'll need a matching lemma for <
Lemma ForallT_less : ∀ t k k0,
ForallT (fun k' _ ⇒ int2Z k' < int2Z k) t →
int2Z k < int2Z k0 →
ForallT (fun k' _ ⇒ int2Z k' < int2Z k0) t.
Proof.
intros; eapply ForallT_imp; eauto.
intros. simpl in H1.
eapply Z.lt_trans; eauto.
Qed.
Lemma balance_BST:
∀ c l k v r,
ForallT (fun k' _ ⇒ (int2Z k') < (int2Z k)) l →
ForallT (fun k' _ ⇒ (int2Z k') > (int2Z k)) r →
BST l →
BST r →
BST (balance c l k v r).
Proof.
intros.
unfold balance.
ForallT (fun k' _ ⇒ int2Z k' < int2Z k) t →
int2Z k < int2Z k0 →
ForallT (fun k' _ ⇒ int2Z k' < int2Z k0) t.
Proof.
intros; eapply ForallT_imp; eauto.
intros. simpl in H1.
eapply Z.lt_trans; eauto.
Qed.
Lemma balance_BST:
∀ c l k v r,
ForallT (fun k' _ ⇒ (int2Z k') < (int2Z k)) l →
ForallT (fun k' _ ⇒ (int2Z k') > (int2Z k)) r →
BST l →
BST r →
BST (balance c l k v r).
Proof.
intros.
unfold balance.
Use same heavier-duty proof automation as before.
repeat (match goal with
| ⊢ BST (match ?c with Red ⇒ _ | Black ⇒ _ end) ⇒ destruct c
| ⊢ BST (match ?s with E ⇒ _ | T _ _ _ _ _ ⇒ _ end) ⇒ destruct s
| ⊢ ForallT _ (T _ _ _ _ _) ⇒ repeat split
| H: ForallT _ (T _ _ _ _ _) ⊢ _ ⇒ destruct H as [? [? ?] ]
| H: BST (T _ _ _ _ _) ⊢ _ ⇒ inv H
end;
(try constructor; auto; try omega)).
Use our lemmas on everything that is left.
With that fact about balance established, let's take a moment to prove that
insertion preserves universal properties of tree nodes. As before, the balance
pattern matching will generate lots of cases, so automation is essential.
Exercise: 2 stars, standard (balanceP)
Lemma balanceP : ∀ P l c r k v,
ForallT P l →
ForallT P r →
P k v →
ForallT P (balance c l k v r).
Proof.
intros.
unfold balance.
(* Hint: Use proof automation with match goal and/or all: to handle this. *)
(* FILL IN HERE *) Admitted.
☐
ForallT P l →
ForallT P r →
P k v →
ForallT P (balance c l k v r).
Proof.
intros.
unfold balance.
(* Hint: Use proof automation with match goal and/or all: to handle this. *)
(* FILL IN HERE *) Admitted.
☐
Lemma insP : ∀ P t k v,
ForallT P t →
P k v →
ForallT P (ins k v t).
Proof.
(* Hint: Use the last lemma. There should be no need for automated case analysis. *)
(* FILL IN HERE *) Admitted.
☐
ForallT P t →
P k v →
ForallT P (ins k v t).
Proof.
(* Hint: Use the last lemma. There should be no need for automated case analysis. *)
(* FILL IN HERE *) Admitted.
☐
Exercise: 3 stars, standard (ins_BST)
Proving fundamental laws relating insert and lookup.
As above, the hard work will be in handling balance. The following intermediate
lemma is useful to characterize the interaction of lookup and balance, and is
not too hard to prove with automation.
Exercise: 3 stars, standard (balance_lookup)
Lemma balance_lookup: ∀ k c k0 v0 l r,
BST l →
BST r →
ForallT (fun k' _ ⇒ int2Z k' < int2Z k0) l →
ForallT (fun k' _ ⇒ int2Z k' > int2Z k0) r →
lookup k (balance c l k0 v0 r) =
if (int2Z) k <? (int2Z k0) then
lookup k l
else if (int2Z k0) <? (int2Z k) then
lookup k r
else v0.
Proof.
intros.
unfold balance.
(* FILL IN HERE *) Admitted.
☐
BST l →
BST r →
ForallT (fun k' _ ⇒ int2Z k' < int2Z k0) l →
ForallT (fun k' _ ⇒ int2Z k' > int2Z k0) r →
lookup k (balance c l k0 v0 r) =
if (int2Z) k <? (int2Z k0) then
lookup k l
else if (int2Z k0) <? (int2Z k) then
lookup k r
else v0.
Proof.
intros.
unfold balance.
(* FILL IN HERE *) Admitted.
☐
Exercise: 2 stars, standard (lookup_ins_eq)
Lemma lookup_ins_eq: ∀ t k v,
BST t →
lookup k (ins k v t) = v.
Proof.
(* FILL IN HERE *) Admitted.
☐
BST t →
lookup k (ins k v t) = v.
Proof.
(* FILL IN HERE *) Admitted.
☐
Theorem lookup_ins_neq: ∀ t k v k0,
BST t →
k ≠ k0 →
lookup k0 (ins k v t) = lookup k0 t.
Proof.
(* FILL IN HERE *) Admitted.
☐
BST t →
k ≠ k0 →
lookup k0 (ins k v t) = lookup k0 t.
Proof.
(* FILL IN HERE *) Admitted.
☐
Exercise: 3 stars, standard (lookup_insert)
Theorem lookup_insert_eq : ∀ t k v,
BST t →
lookup k (insert k v t) = v.
Proof.
(* FILL IN HERE *) Admitted.
Theorem lookup_insert_neq: ∀ t k v k0,
BST t →
k ≠ k0 →
lookup k0 (insert k v t) = lookup k0 t.
Proof.
(* FILL IN HERE *) Admitted.
☐
Check empty_tree_BST.
Check insert_BST.
Check lookup_empty.
Check lookup_insert_eq.
Check lookup_insert_neq.
We could now proceed to reprove all the facts about elements
that we developed in SearchTrees. Since elements does
not not pay attention to colors, and does not rebalance the tree,
these proofs should be a simple copy-paste from that chapter,
with only minor edits. This would not be a very interesting exercise,
so we don't pursue it here.
Red-black trees are supposed to be more efficient than ordinary search trees,
because they stay balanced. In a perfectly balanced tree, any two leaves
have exactly the same depth, or the difference in depth is at most 1.
In an approximately balanced tree, no leaf is more than twice as deep as
another leaf. Red-black trees are approximately balanced.
Consequently, no node is more then 2logN deep, and the run time for
insert or lookup is bounded by a constant times 2logN.
We can't prove anything directly about the run time, because we don't
have a cost model for Coq functions. But we can prove that the trees
stay approximately balanced; this tells us important information about
their efficiency.
Proving Efficiency
Exercise: 4 stars, standard (is_redblack_properties)
The relation is_redblack ensures that there are exactly n black nodes in every path from the root to a leaf, and that there are never two red nodes in a row.Inductive is_redblack : tree → color → nat → Prop :=
| IsRB_leaf: ∀ c, is_redblack E c 0
| IsRB_r: ∀ tl k kv tr n,
is_redblack tl Red n →
is_redblack tr Red n →
is_redblack (T Red tl k kv tr) Black n
| IsRB_b: ∀ c tl k kv tr n,
is_redblack tl Black n →
is_redblack tr Black n →
is_redblack (T Black tl k kv tr) c (S n).
Lemma is_redblack_toblack:
∀ s n, is_redblack s Red n → is_redblack s Black n.
Proof.
(* FILL IN HERE *) Admitted.
Lemma makeblack_fiddle:
∀ s n, is_redblack s Black n →
∃ n, is_redblack (makeBlack s) Red n.
Proof.
(* FILL IN HERE *) Admitted.
nearly_redblack expresses, "the tree is a red-black tree, except that
it's nonempty and it is permitted to have two red nodes in a row at
the very root (only)."
Inductive nearly_redblack : tree → nat → Prop :=
| nrRB_r: ∀ tl k kv tr n,
is_redblack tl Black n →
is_redblack tr Black n →
nearly_redblack (T Red tl k kv tr) n
| nrRB_b: ∀ tl k kv tr n,
is_redblack tl Black n →
is_redblack tr Black n →
nearly_redblack (T Black tl k kv tr) (S n).
Lemma ins_is_redblack:
∀ x vx s n,
(is_redblack s Black n → nearly_redblack (ins x vx s) n) ∧
(is_redblack s Red n → is_redblack (ins x vx s) Black n).
Proof.
induction s; intro n; simpl; split; intros; inv H; repeat constructor; auto.
× destruct (IHs1 n); clear IHs1.
destruct (IHs2 n); clear IHs2.
specialize (H0 H6).
specialize (H2 H7).
clear H H1.
unfold balance.
You will need proof automation, in a similar style to
the proofs of ins_not_E and balance_lookup.
(* FILL IN HERE *) Admitted.
Corollary ins_red :
∀ t k v n,
(is_redblack t Red n → is_redblack (ins k v t) Black n).
Proof.
intros. apply ins_is_redblack. assumption.
Qed.
Lemma insert_is_redblack:
∀ x xv s n, is_redblack s Red n →
∃ n', is_redblack (insert x xv s) Red n'.
Proof.
(* Just apply a couple of lemmas: ins_is_redblack and makeblack_fiddle *)
(* FILL IN HERE *) Admitted.
☐
Exercise: 3 stars, standard, optional (redblack_bound)
Fixpoint height (t:tree) : nat
(* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
Fixpoint mindepth (t:tree) : nat
(* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
Lemma redblack_balanced : ∀ t c n,
is_redblack t c n →
(height t ≤ 2 × mindepth t + 1)%nat.
Proof.
(* Hints:
(1) Prove two auxiliary lemmas, one about height and the other about mindepth,
and then combine them to get the result.
The lemma about height will need a slightly complicated induction
hypothesis for the proof to go through.
(2) Depending on how you defined height and mindepth, the tactic zify
(defined in the standard library Coq.omega.PreOmega) may be useful as a
preliminary to using omega when proving these lemmas. *)
(* FILL IN HERE *) Admitted.
☐
Extraction "redblack.ml" empty_tree insert lookup elements.
You can run this inside the OCaml top level by:
On my machine, in the byte-code interpreter this prints,
You can compile and run this with the ocaml native-code compiler by:
On my machine this prints,
#use "redblack.ml";; #use "test_searchtree.ml";;
Insert and lookup 1000000 random integers in 0.889 seconds. Insert and lookup 20000 random integers in 0.016 seconds. Insert and lookup 20000 consecutive integers in 0.015 seconds.
$ ocamlopt -c redblack.mli redblack.ml $ ocamlopt redblack.cmx -open Redblack test_searchtree.ml -o test_redblack $ ./test_redblack
Insert and lookup 1000000 random integers in 0.436 seconds. Insert and lookup 20000 random integers in 0. seconds. Insert and lookup 20000 consecutive integers in 0. seconds.
Success!
The benchmark measurements above (and in Extract.v) demonstrate that:- On random insertions, red-black trees are slightly faster than ordinary BSTs (red-black 0.436 seconds, vs ordinary 0.468 seconds)
- On consecutive insertions, red-black trees are much faster than ordinary BSTs (red-black 0. seconds, vs ordinary 0.374 seconds)
(* Tue Jan 28 13:58:41 EST 2020 *)