SearchTreeBinary Search Trees

Maps (also called dictionaries, lookup tables, associative arrays, etc.) are a fundamental data structure in computing; they maintain an association between keys and values. At a minimum, a map implementation should support operations to create a new empty map, to look up the value corresponding to a key, and to insert a new key-to-value association into the map. Map implementations might also support other operations, such as returning a list of all key-to-value associations in the map, removing a key, etc.
Binary search trees are an efficient data structure for implementing maps. We have already seen other map representations, including lists (the type partial_map from the Lists chapter) and functions (the types total_map and partial_map in the Maps chapter). Both of these are inefficient implementations: if you add N items to your map, then looking them up takes N comparisons in the worst case, and N/2 comparisons in the average case.
In contrast, if your key type is a total order -- that is, if it has a less-than comparison that's transitive and antisymmetric a<b ~(b<a) then you can implement maps using binary search trees (BSTs). Insert and lookup operations on BSTs take time proportional to the height of the tree, which will be proportional to logN if the trees are roughly balanced. We will assume you know how BSTs work; you can learn this from:
  • Section 3.2 of Algorithms, Fourth Edition, by Sedgewick and Wayne, Addison Wesley 2011; or
  • Chapter 12 of Introduction to Algorithms, 3rd Edition, by Cormen, Leiserson, and Rivest, MIT Press 2009.
Our focus here is to prove the correctness of an implementation of maps using binary search trees. For this purpose, we will consider partial maps containing a finite number of keys, and the lookup operation will take an extra argument giving a default value to be returned if the desired key is not present. (This behavior is a bit different from that of the partial maps defined in Maps, as we will discuss further below.) Finally, we will fix our keys to be of type nat, which is certainly a total order.

Set Warnings "-notation-overridden,-parsing".
From Coq Require Import Strings.String. (* for manual grading *)
From Coq Require Import Logic.FunctionalExtensionality.
From VFA Require Import Perm.
From VFA Require Import Maps.
From VFA Require Import Sort.

Sections

We will use Coq's Section feature to structure this development, so first a brief introduction to Sections. We'll use as example a fragment of a list-based implementation of maps.
Module SectionExample1.
  Definition mymap (V: Type) := list (nat×V).
  Definition empty (V: Type) : mymap V := nil.
  Fixpoint lookup (V: Type) (default: V) (x: nat) (m: mymap V) : V :=
    match m with
    | (a,v)::alif x =? a then v else lookup V default x al
    | nildefault
    end.
  Theorem lookup_empty (V: Type) (default: V):
        x, lookup V default x (empty V) = default.
   Proof. reflexivity. Qed.
  (* A full implementation would have more definitions and theorems... *)
End SectionExample1.
It sure is tedious to repeat the V and default parameters in every definition and every theorem. The Section feature allows us to declare them as parameters to every definition and theorem in the entire section:

Module SectionExample2.
  Section LISTS.
  Variable V : Type.
  Variable default: V.

  Definition mymap := list (nat×V).
  Definition empty : mymap := nil.
  Fixpoint lookup (x: nat) (m: mymap) : V :=
    match m with
    | (a,v)::alif x =? a then v else lookup x al
    | nildefault
    end.
  Theorem lookup_empty:
        x, lookup x empty = default.
   Proof. reflexivity. Qed.
  End LISTS.
End SectionExample2.
At the close of the section, this produces exactly the same result: the functions that "need" to be parametrized by V or default are given extra parameters:
==>
   SectionExample2.mymap = fun V : Typelist (nat × V)
    : TypeType
==>
SectionExample2.empty
     : V : Type, SectionExample2.mymap V
==>
SectionExample2.lookup
     : V : Type, VnatSectionExample2.mymap VV

Program for Binary Search Trees

Now we introduce a tree-based representation for maps, with nat keys and and values drawn from some arbitrary type V, and these operations:
  • empty_tree returns a new empty tree
  • lookup d k t returns the value corresponding to key k in tree t, or returns default value d if k is not in the tree.
  • insert k v t returns a new tree formed by inserting a mapping from k to v into tree t. Note that tree t itself is not changed.
  • elements t returns a list of (key,value) mappings contained in t.
Our assumption is that our binary search trees will only be constructed or inspected using these operations; in other words, we want to treat them as an abstract data type for maps, whose internal representation is not visible to clients making use of maps. (This assumption is not enforced by the Coq code in this chapter; we will see how it can be in a future chapter.)

Section TREES.
Variable V : Type.
We could also add a section Variable default, but it is clearer to keep it as an explicit parameter where needed.

Definition key := nat.

Inductive tree : Type :=
 | E : tree
 | T : tree key V tree tree.

Definition empty_tree : tree := E.

Fixpoint lookup (d:V) (x: key) (t : tree) : V :=
  match t with
  | Ed
  | T tl k v trif x <? k then lookup d x tl
                         else if k <? x then lookup d x tr
                         else v
  end.

Fixpoint insert (x: key) (v: V) (s: tree) : tree :=
 match s with
 | ET E x v E
 | T a y v' bif x <? y then T (insert x v a) y v' b
                        else if y <? x then T a y v' (insert x v b)
                        else T a x v b
 end.
We expect these trees to be Binary Search Trees (BST)'s, i.e.: for any non-empty node with key k, all the values of the left subtree are less than k and all the values of the right subtree are greater than k. However, this fact is not encoded in the tree inductive definition. And as we'll see shortly, it is not actually essential for making lookup and insert work correctly!

Fixpoint elements' (s: tree) (base: list (key×V)) : list (key × V) :=
 match s with
 | Ebase
 | T a k v belements' a ((k,v) :: elements' b base)
 end.

Definition elements (s: tree) : list (key × V) := elements' s nil.
If you're wondering why we didn't implement elements more simply with ++, we'll return to that question below when we discuss a function named slow_elements; feel free to peek ahead now if you're curious.

Search Tree Examples


Section EXAMPLES.
Variables default v2 v4 v5 : V.
Eval compute in insert 5 v5 (insert 2 v2 (insert 4 v5 empty_tree)).
  (*      = T (T E 2 v2 E) 4 v5 (T E 5 v5 E) *)
Eval compute in lookup default 5 (T (T E 2 v2 E) 4 v5 (T E 5 v5 E)).
  (*      = v5 *)
Eval compute in lookup default 3 (T (T E 2 v2 E) 4 v5 (T E 5 v5 E)).
  (*      = default *)
Compute elements (T (T E 2 v2 E) 4 v5 (T E 5 v5 E)).
  (*      = (2, v2); (4, v5); (5, v5) *)
End EXAMPLES.

Properties of lookup and insert

Now that we have our definitions, what should we formally prove about them? How can we even go about determining what a satisfactory specification looks like?
There are several well-established approaches to specifying abstract data types. They share the goal of describing the behavior of the operations without reference to the internal representation of the type (in this case, as a tree).
In algebraic specification, we write down equations relating the results of different operations. More specifically, we divide the operations into two classes: constructors create new instances of the type and observers extract information from existing instances. We then write down equations describing the behavior of each observer on each possible combination of constructors.
In the present case, our constructors are empty_tree and insert, and our observers are lookup and elements. Let's first focus on lookup, and how it should behave on different combinations of constructors. It is easy to see what needs to be true for empty_tree: looking up any value at all in the empty tree should fail and return the default value:
d k, lookup d k empty_tree = d
What about non-empty trees? The essential observation here is that the only way to build a non-empty tree is by applying insert k v t to an existing tree t. So it suffices to describe the behavior of lookup on the result of an arbitrary insert operation. There are two cases. If we look up the same key that was just inserted, we should get the value that was inserted with it:
d k v t, lookup d k (insert k v t) = v
On the hand, if we look up a different key than was just inserted, the insert should not affect the answer, which should be the same as if we did the lookup in the original dictionary before the insert occured:
d k k' v t, k k' lookup d k' (insert k v t) = lookup d k' t
It turns out that these three rules are all that is necessary to fully characterize a map-like type with the functions empty, insert, and lookup. They have the nice feature of explaining the behavior of the "type of interest" (maps) in terms of simple equality statements on the type V of values, which we presumably already understand.
Let's prove these three basic laws!

Theorem lookup_empty : d k, lookup d k empty_tree = d.
Proof.
  auto.
Qed.

Theorem lookup_insert_eq: t d k v,
  lookup d k (insert k v t) = v.
Proof.
induction t; intros.
- simpl. bdestruct (k <? k); try omega; auto.
- simpl. bdestruct (k <? k0); bdestruct (k0 <? k); try omega.
  + simpl. bdestruct (k <? k0); bdestruct (k0 <? k); try omega; auto.
  + simpl. bdestruct (k <? k0); bdestruct (k0 <? k); try omega; auto.
  + simpl. bdestruct (k0 <? k0); try omega; auto.
Qed.
It seems like the basic method of this proof is simply to bdestruct everything in sight. This calls for automation!.

Ltac bd1 :=
  match goal with
    (* matches if any subterm of the goal looks like "if e1 =? e2 then e3 else e4";
       if so, execute the tactic on the right-hand side of the "==>" *)

  | ⊢ context [ if ?X =? ?Y then _ else _] ⇒ bdestruct (X =? Y)
    (* and similarly for <=? or <? *)
  | ⊢ context [ if ?X <=? ?Y then _ else _] ⇒ bdestruct (X <=? Y)
  | ⊢ context [ if ?X <? ?Y then _ else _] ⇒ bdestruct (X <? Y)
  end.

Ltac bdall := simpl; repeat (bd1; try omega; auto).

Theorem lookup_insert_eq': d t k v,
  lookup d k (insert k v t) = v.
Proof.
induction t; intros.
repeat bdall.
repeat bdall.
Qed.

Exercise: 2 stars, standard (lookup_insert_neq)

First, prove this without bdall, just using induction, simpl, bd1, auto, omega, and auto. In the process, try to understand the case analysis used in this proof.
Theorem lookup_insert_neq: d t k v k',
   k k' lookup d k' (insert k v t) = lookup d k' t.
Proof.
(* FILL IN HERE *) Admitted.
Now, make a more automated proof using bdall.

Theorem lookup_insert_neq': d t k v k',
   k k' lookup d k' (insert k v t) = lookup d k' t.
Proof.
(* FILL IN HERE *) Admitted.
Perhaps surprisingly, the proofs of these results do not depend on knowing that the tree t actually has the BST property! That is, even if we start with a tree that has values "in the wrong order" the above theorems hold. Intuitively, that's because lookup and insert follow the same path through the tree, so even if that path is "wrong" it will be consistently "wrong".

Derived Properties

The three rules proved here may look familiar, because we developed and proved very similar rules in Maps for maps defined as functions, where tree insert corresponds to functional map update and tree lookup corresponds to applying the functional map to an argument. The corresponding rules we defined there for partial maps were:

Check update_eq.
(*  : forall (A : Type) (m : partial_map A) (x : nat) (v : A),
       update m x v x = Some v   *)

Check update_neq.
(* : forall (X : Type) (v : X) (x1 x2 : nat) (m : partial_map X),
       x1 <> x2 -> update m x1 v x2 = m x2    *)

Check apply_empty.
(* : forall (A : Type) (x : nat),
       empty x = None *)

The similarity of these laws is no accident! Since we claimed that our three laws were fundamental to map-like structures, they should certainly be true of functional maps.
In Maps, we also proved several other useful lemmas about the behavior of functional maps on various combinations of updates and lookups:

Check update_shadow.
(* : forall (A : Type) (m : partial_map A) (v1 v2 : A) (x : nat),
       update (update m x v1) x v2 = update m x v2    *)

Check update_same.
(* : forall (X : Type) (x : nat) (m : partial_map X),
        m x = Some v -> update m x v = m    *)

Check update_permute.
(* forall (X : Type) (v1 v2 : X) (x1 x2 : natd) (m : partial_map X),
       x2 <> x1 ->
       update (update m x2 v2) x1 v1 =
         update (update m x1 v1) x2 v2    *)

To provide some evidence for our claim that the three laws we have already proved on search trees are sufficient to characterize all operations on empty_tree, insert, and lookup, let's prove analogs to these three lemmas for search trees, using only the three basic rules that we have already proven. That is, in the following proofs, you should not use the definition of empty_tree, insert, or lookup, only the three lemmas lookup_empty (not actually needed), lookup_insert_eq, and lookup_insert_neq. Our ability to prove these lemmas without reference to the underlying map implementation (trees) demonstrates they hold for any map implementation for which the three basic rules hold.

Exercise: 2 stars, standard (lookup_insert_shadow)

Lemma lookup_insert_shadow: d t k' k v v',
    lookup d k' (insert k v (insert k v' t)) = lookup d k' (insert k v t).
Proof.
  intros.
  bdestruct (k =? k').
  (* FILL IN HERE *) Admitted.

Exercise: 2 stars, standard (lookup_insert_same)

Lemma lookup_insert_same: d t k' k,
    lookup d k' (insert k (lookup d k t) t) = lookup d k' t.
Proof.
  (* FILL IN HERE *) Admitted.

Exercise: 2 stars, standard (lookup_insert_permute)

Lemma lookup_insert_permute: d t k1 k2 k' v1 v2,
    k1 k2
    lookup d k' (insert k1 v1 (insert k2 v2 t)) = lookup d k' (insert k2 v2 (insert k1 v1 t)).
Proof.
  (* FILL IN HERE *) Admitted.
You may have noticed that each of these lemmas was phrased as an equality between the results of looking up an arbitrary key k' in two maps, rather than as a direct equality between the maps themselves, as was the case for the function-based map lemmas update_shadow, etc. There's a good reason for this: the more direct equalities on search trees are not all true! Can you figure out which ones are false?
It is quite common for the concrete realization of an abstract data type to allow multiple concrete values to correspond to a single abstract value. In such situations, there are quite likely to be equalities between abstract values that are not necessarily true on (all) the corresponding concrete ones. The way to work around this situation is to phrase specifications in terms of an observer function, like lookup in this case.

Exercise: 3 stars, standard, optional (direct_equalities)

Determine which of the following properties are actually true. For the ones that are not, explain what goes wrong.

Definition insert_shadow: Prop :=
    t k v v',
     insert k v (insert k v' t) = insert k v t.

Definition insert_same: Prop :=
   d t k,
    insert k (lookup d k t) t = t.

Definition insert_permute: Prop :=
   v1 v2 k1 k2 t,
    k1 k2
    insert k1 v1 (insert k2 v2 t) = insert k2 v2 (insert k1 v1 t).

(* FILL IN HERE *)

Properties of elements

Now let's consider how to specify and prove correctness of elements. Actually, before we do that, let's consider a simpler version.

Fixpoint slow_elements (s: tree) : list (key × V) :=
 match s with
 | Enil
 | T a k v bslow_elements a ++ [(k,v)] ++ slow_elements b
 end.
This one is easier to understand than the elements function, because it doesn't carry the base list around in its recursion. Unfortunately, its running time is NlogN even if the tree is roughly balanced, because it does a linear number of list concatentation steps for each level of the tree. The original elements function takes linear time overall (even if the tree is unbalanced), which is a more efficient.
To prove correctness of elements, it's actually easier to first prove that it's equivalent to slow_elements, then prove the correctness of slow_elements. We don't care that slow_elements is a bit slow, because we're never going to really run it; it's just there to support the proof.

Exercise: 3 stars, standard, optional (elements_slow_elements)

Lemma elements_slow_elements: elements = slow_elements.
Proof.
  extensionality s.
  unfold elements.
  assert ( base, elements' s base = slow_elements s ++ base).
  (* FILL IN HERE *) Admitted.
Now, what should we prove about elements (or slow_elements) ?
The algrebraic specification approach would suggest that we look for equations of the form
elements empty_tree = ...
and
elements (insert k v t) = ...(elements t)...
The first of these is easy; we can trivially prove

Lemma elements_empty : elements empty_tree = [].
Proof.
  unfold elements. auto.
Qed.
For the second, we have to express the result of inserting (k,v) into the elements list for t, accounting for ordering and the possibility that t may already contain a pair (k,v') which must be replaced. I claim that the following function will do the trick:

Fixpoint kvs_insert (k:key) (v:V) (kvs: list (key × V)) :=
  match kvs with
    (k',v')::kvs'if k <? k' then
                      (k,v)::kvs
                    else if k >? k' then
                           (k',v')::kvs_insert k v kvs'
                         else (k,v)::kvs'
  | [][(k,v)]
  end.
However, this is not terribly satisfactory. First of all, proving that
elements (insert k v t) = kvs_insert k v (elements t)
is not trivial, although you'll be given an opportunity to do so later on. But more importantly, the definition of kvs_insert is sufficiently complex that it is hard to see exactly what it does without thinking a bit. Moreover, this equation doesn't tell us anything directly about the overall properties of elements t for a given tree t. So for now, let's take a more ad hoc approach to coming up with suitable properties.
Surely one requirement is that if looking up a value in the search tree produces a value, then that value should be in the result list of elements. Let's try to prove that:

Print In. (* from the standard library *)
==>

fix In (a : A) (l : list A) {struct l} : Prop :=
  match l with
  | [] ⇒ False
  | b :: mb = aIn a m
  end
     : A : Type, Alist AProp

Theorem lookup_In_elements_first_try: d t k v,
    lookup d k t = v In (k,v) (elements t).
Proof.
  rewrite elements_slow_elements. (* change to the nicer version *)
  induction t; intros.
  - simpl in ×.
    (* oops! stuck already! *)
Abort.
We forgot that lookup might return the default value d for some k, which typically will not be in the elements list (although it might be!). Let's try again:

Theorem lookup_In_elements_second_try: d t k v,
    lookup d k t = v In (k,v) (elements t) v = d.
Proof.
  intros d t. rewrite elements_slow_elements. (* change to the nicer version *)
  induction t; intros.
  - right. auto.
  - simpl in H. simpl.
    bdestruct (k0 <? k); bdestruct (k <? k0); try omega; auto.
    + destruct (IHt1 _ _ H) as [P | P].
Check in_or_app.
      left. apply in_or_app. left. auto.
      right. auto.
    + destruct (IHt2 _ _ H) as [P | P].
      left. apply in_or_app. simpl. right. right. auto.
      right. auto.
    + assert (k = k0) by omega. subst.
      left. apply in_or_app. right. (* note: no simpl needed! *) left; auto.
Qed.
We were successful this time, but this theorem is just a little weaker than we might like in practice. We'll see a way to strengthen it further below.
Meanwhile, how about a converse property: if a (key,value) pair appears in elements then looking up that key in the search tree should produce that value.

Theorem In_elements_lookup_first_try : d t, k v,
    In (k,v) (elements t) lookup d k t = v.
Proof.
  rewrite elements_slow_elements. (* change to the nicer version *)
  induction t; intros.
  - inv H.
  - unfold slow_elements in H; fold slow_elements in H. (* a hacky trick that sometimes works *)
Check in_app_or.
  destruct (in_app_or _ _ _ H) as [P | P]; [| destruct P as [P|P]].
  + simpl.
    bdestruct (k0 <? k).
    × apply IHt1. apply P.
    × bdestruct (k <? k0); try omega; auto.
      (* STUCK! We have no induction hypothesis that lets us lookup k0 in t2
         unless we know it is in the elements of t2 -- but we don't! *)

Abort.
The problem is that we need to know we are working with a properly formed binary search tree! Otherwise, we don't know that the lookup path will sucessfully find every key in the tree.
Let's recall the definition of BSTs over natural numbers:
  • An empty tree is a binary search tree.
  • A nonempty tree is a binary search tree if the root element is greater than every element in the left sub-tree and less than every element in the right sub-tree, and the left and right subtrees are themselves binary search trees.
Here is one way to formalize this property in Coq:
ForallT P t holds if P k v holds for every (k,v) node of the tree t.
Fixpoint ForallT (P: key V Prop) (t: tree) : Prop :=
 match t with
 | ETrue
 | T l k v rP k v ForallT P l ForallT P r
 end.
BST t holds if t is a BST.
Inductive BST : tree Prop :=
| ST_E : BST E
| ST_T : l k v r,
    ForallT (fun k' _k' < k) l
    ForallT (fun k' _k' > k) r
    BST l
    BST r
    BST (T l k v r).

Exercise: 1 star, standard (is_bst)

Example is_bst : (v2 v4 v5 : V),
   BST (T (T E 2 v2 E) 4 v4 (T E 5 v5 E)).
Proof.
  (* FILL IN HERE *) Admitted.

Exercise: 1 star, standard (is_not_bst)

Example is_not_bst: (v2 v4 v5 : V),
   ¬ BST (T (T E 4 v4 E) 2 v2 (T E 5 v5 E)).
Proof.
  (* FILL IN HERE *) Admitted.
How do we know that all the trees we will encounter (particularly, that the elements function will encounter), have the BST property? Well, the empty tree is a BST; and if you insert into a tree that's a BST, then the result is a BST; and these are the only ways that you're supposed to build trees. So we need to prove those two theorems.

Exercise: 1 star, standard (empty_tree_BST)

Theorem empty_tree_BST: BST empty_tree.
Proof.
(* FILL IN HERE *) Admitted.

Exercise: 3 stars, standard (insert_BST)

You may want to start with an auxiliary lemma relating ForallT and insert

Theorem insert_BST :
   k v t,
   BST t BST (insert k v t).
Proof.
(* FILL IN HERE *) Admitted.
Now that we're equipped with BSTs, let us return to proving In_elements_lookup, i.e. that if a (key,value) pair appears in elements then looking up that key in the search tree should produce that value.
We will find it very useful to relate the ForallT predicate on trees with the standard Forall predicate on the lists produced by slow_elements.

About Forall.
Search Forall.
This lemma is apparently missing from the standard library.
Lemma Forall_app : (A: Type) (P : A Prop) (l1 l2 : list A),
    Forall P l1 Forall P l2 Forall P (l1 ++ l2).
Proof.
induction l1; intros.
- auto.
- simpl. inv H. constructor; auto.
Qed.
Here's the key connection between trees and their elements. (The tick mark in the lambda-expression fun '(k,v) P k v is needed to allow pattern matching against a pair.)
Lemma slow_elements_prop: P t, ForallT P t
                                      Forall (fun '(k,v)P k v)
                                             (slow_elements t).
induction t; intros.
- simpl. constructor.
- simpl. inv H. inv H1. apply Forall_app. auto.
  constructor; auto.
Qed.

Check Forall_forall. (* very useful below! *)

Exercise: 3 stars, standard (In_elements_lookup)

Finally, we're in a position to prove our theorem. This is easiest done by induction on evidence that t is a BST.
Theorem In_elements_lookup: d t, BST t k v,
    In (k,v) (elements t) lookup d k t = v.
Proof.
  rewrite elements_slow_elements. (* change to the nicer version *)
  intros d t H.
  induction H; intros.
  (* FILL IN HERE *) Admitted.
Now let's return briefly to our lookup_In_elements theorem.
==>

lookup_In_elements_second_try
     : (d: V) (t : tree) (k : key) (v : V),
       lookup d k t = vIn (k, v) (elements t) ∨ v = d
This result is not quite strong enough to characterize elements the way we will need later on, because in the case lookup d k t returns d, it says nothing about whether elements t has an entry with key k or not! In particular, elements t might contain an entry (k,v') for some v' different from d, without violating this theorem. What we really want is to say that if v = d then elements t either contains (k,d) or no entry for with key k at all.
It turns out that a theorem along these lines is also provable, but only if we know that t is a BST. (That's why we didn't try to prove something like this earlier.) In fact, we can prove it from the lemmas we already have, without reference to the definitions of lookup and elements. The proof uses a trick we haven't seen before: since membership in a list of nats is a decidable proposition, we can do a case split on the two possibilities.

Lemma in_fst_exists: {X Y} (k:X) (l:list (X×Y)),
     In k (map fst l) v, In (k,v) l.
Proof.
 split.
 - induction l; intros.
   + inv H.
   + inv H.
     × subst. destruct a. y. left; auto.
     × destruct (IHl H0) as [y P]. y. right; auto.
 - intros [v P]. generalize dependent l.
   induction l; intros.
   + inv P.
   + simpl. inv P.
     × left; auto.
     × right; auto.
Qed.

Theorem lookup_In_elements: d t k v,
    BST t
    lookup d k t = v
    In (k,v) (elements t) (¬ In k (map fst (elements t)) v = d).
Proof.
  intros d t k v B H.
  pose proof (lookup_In_elements_second_try _ _ _ _ H).
  destruct H0.
  - left; auto.
  - subst v.
    (* Here is the case split. In_dec is in the standard List library. *)
    destruct (In_dec eq_nat_dec k (map fst (elements t))) as [I | I].
    + left.
      rewrite in_fst_exists in I.
      destruct I as [v P].
      (* If (k,v) is in elements, it must have the value given by lookup! *)
      pose proof (In_elements_lookup d _ B _ _ P); auto.
      subst v. auto.
    + right; split; auto.
Qed.
Another important fact about elements is that it contains no duplicate keys. This is something we surely want to be true for any implementation of maps.

Print NoDup.
Search NoDup.

Definition disjoint {X:Type} (l1 l2: list X) :=
   (x:X), In x l1 ¬ In x l2.

Exercise: 3 stars, standard, optional (NoDup_append)

Lemma NoDup_append : (X:Type) (l1 l2: list X),
  NoDup l1 NoDup l2 disjoint l1 l2
  NoDup (l1 ++ l2).
Proof.
  (* Hint: You may have already proved this theorem, or one like it, in 
           IndProp; if so, just copy-paste your solution here. *)

  (* FILL IN HERE *) Admitted.

Exercise: 3 stars, standard, optional (elements_keys_distinct)

Lemma elements_keys_distinct: t,
    BST t
    NoDup (map fst (elements t)).
Proof.
  rewrite elements_slow_elements.
(* FILL IN HERE *) Admitted.
There is at least one more simple and interesting fact about our implementation of elements, which we might perhaps want to include in its specification, but is not implied by the specification we have so far. The next exercise encourages you to think about this.

Exercise: 4 stars, advanced, optional (elements_spec)

State and prove a nontrivial fact about elements that is not implied by In_elements_lookup, lookup_In_elements and elements_keys_distinct.

(* FILL IN HERE *)
(* Do not modify the following line: *)
Definition manual_grade_for_elements_spec : option (nat×string) := None.
Finally, let's return to our equation connecting slow_elements, insert, and kvs_insert. The auxiliary ideas and lemmas developed above should be helpful in proving this. Note that it is only true for trees having the BST property. The suggested approach is to first prove an auxiliary lemma about the behavior of kvs_insert; with this in hand, the main lemma is straightforward. Hint: remember the bdall tactic!

Exercise: 3 stars, standard, optional (kvs_insert_split)

Lemma kvs_insert_split: k v e1 e2 k0 v0,
  Forall (fun '(k',_)k' < k0) e1
  Forall (fun '(k',_)k' > k0) e2
  kvs_insert k v (e1 ++ (k0,v0):: e2) =
  if k <? k0 then
    (kvs_insert k v e1) ++ (k0,v0)::e2
  else if k >? k0 then
         e1 ++ (k0,v0)::(kvs_insert k v e2)
       else
         e1 ++ (k,v)::e2.
Proof.
(* FILL IN HERE *) Admitted.

Exercise: 3 stars, standard, optional (kvs_insert_elements)


Lemma kvs_insert_elements : t, BST t
     k v, elements (insert k v t) = kvs_insert k v (elements t).
Proof.
  (* FILL IN HERE *) Admitted.

Efficiency of Search Trees

All the theory we've developed so far has been about correctness. But the reason we use binary search trees is that they are efficient. That is, if there are N elements in a (reasonably well balanced) BST, each insertion or lookup takes about logN time.
What could go wrong?
1. The search tree might not be balanced. In that case, each insertion or lookup will take as much as linear time.
  • SOLUTION: use an algorithm, such as "red-black trees", that ensures the trees stay balanced. We'll do that in Chapter RedBlack.
2. Our keys are natural numbers, and Coq's nat type takes linear time per comparison. That is, computing (j <? k) takes time proportional to the value of k-j.
  • SOLUTION: represent keys by a data type that has a more efficient comparison operator. We just use nat in this chapter because it's something you're already familiar with.
3. There's no notion of "run time" in Coq. That is, we can't say what it means that a Coq function "takes N steps to evaluate." Therefore, we can't prove that binary search trees are efficient.
  • SOLUTION 1: Don't prove (in Coq) that they're efficient; just prove that they are correct. Prove things about their efficiency the old-fashioned way, on pencil and paper.
  • SOLUTION 2: Prove in Coq some facts about the height of the trees, which have direct bearing on their efficiency. We'll explore that in later chapters.
  • SOLUTION 3: Apply bleeding-edge frameworks for reasoning about run-time of programs represented in Coq.
4. Our functions in Coq aren't real implementations; they are just pretend models of real implementations. What if there are bugs in the correspondence between the Coq function and the real implementation?
  • SOLUTION: Use Coq's extraction feature to derive the real implementation (in Ocaml or Haskell) automatically from the Coq function. Or, use Coq's Compute or Eval native_compute feature to compile and run the programs efficiently inside Coq. We'll explore extraction in a later chapter.

Model-based specifications

Another approach to proving correctness of search trees might be to relate them to our existing implementation of functional partial maps, as developed in Maps. To prove the correctness of a search-tree algorithm, we can prove:
  • Any search tree corresponds to some functional partial map, using a function or relation that we write down.
  • The lookup operation on trees gives the same result as the find operation on the corresponding map.
  • Given a tree and corresponding map, if we insert on the tree and update the map with the same key and value, the resulting tree and map are in correspondence.
  • Functional maps have the properties we actually wanted. It would do no good to prove that searchtrees correspond to some abstract type X, if X didn't have useful properties!
This approach is sometimes called model-based specification: we show that our implementation of a data type corresponds to a more more abstract model type that we already understand. To reason about programs that use the implementation, it suffices to reason about the behavior of the abstract type, which may be significantly easier. For example, we can take advantage of laws that we proved for the abstract type, like update_eq for functional maps, without having to prove them again for the concrete tree type.
We also need to be careful here, because the type of functional maps as defined in Maps do not actually behave quite like our tree-based maps. For one thing, functional maps can be defined on an infinite number of keys, and there is no mechanism for enumerating over the key set. To maintain correspondence with our finite trees, we need to make sure that we consider only functional maps built by finitely many applications of constructor functions (empty and update). Also, thanks to functional extensionality, functional maps obey stronger equality laws than our trees do (as we investigated in the direct_equalities exercise), so we should not be misled into thinking that every fact we can prove about abstract maps necessarily holds for concrete ones.
Compared to the algebraic specification approach described earlier in this chapter, the model based approach can save some proof effort, especially if we already have a well-developed theory for the abstract model type. On the other hand, we have to give an explicit abstraction relation between trees and maps, and show that it is maintained by all operations. In the end, about the same amount of work is needed to show correctness, though the work shows up in different places depending on how the abstraction relation is defined.
For search trees, it is not clear that model-based specification is worth the effort. The simple algebraic rules for lookup and the indirect characterization of elements by its properties are simpler and adequate to support reasoning by clients of the type. However, we will find the model-based approach very useful in later chapters where we deal with data types that do not enjoy simple algebraic laws. So, in order to prepare for that, we will give a model-based specification for trees in terms of functional partial maps. It is based on a simple abstraction relation that builds a functional map element by element:

Fixpoint list2map (el: list (key×V)) : partial_map V :=
 match el with
 | nilempty
 | (i,v)::el'update (list2map el') i v
 end.
The following utility functions will prove useful. They are simple to prove by induction on el, using the basic laws on functional partial maps.

Exercise: 2 stars, standard (list2map_yes)

Lemma list2map_yes : (el: list (key×V)),
    NoDup (map fst el)
     k v,
      In (k,v) el (list2map el) k = Some v.
Proof.
(* FILL IN HERE *) Admitted.

Exercise: 2 stars, standard (list2map_no)

Lemma list2map_no : (el: list (key×V)),
     k,
      ¬ In k (map fst el) (list2map el) k = None.
Proof.
(* FILL IN HERE *) Admitted.
In general, model-based specifications may use an abstraction relation, allowing each concrete value to be related to multiple abstract values. But often, as in this case, a simple abstraction function will do, assigning a unique abstract value to each concrete one.

Definition Abs (t: tree) : partial_map V := list2map (elements t).
One small difference between trees and functional maps is that applying the latter returns an option V which might be None, whereas our lookup operation returns a default value if the lookup fails. We can easily provide a function on functional partial maps having the latter behavior.

Definition find (d: V) (m:partial_map V) (k:nat) : V :=
  match m k with
  | Noned
  | Some vv
  end.
We now proceed to prove that each operation preserves (or establishes) the abstraction relationship in an appropriate way:

    concrete abstract
    -------- --------
    empty_tree empty
    lookup find
    insert update
    elements (see below)
Since functional partial maps lack a way to extract or iterate over their elements, we cannot give an analogous abstract operation for elements, but the abstraction function itself gives a strong relationship that we can take as a (trivially satisfiable) part of the specification.
Thanks to various lemmas we proved earlier about elements, the following results are quite straightforward. Note that they do require the trees to be BSTs. It is quite typical that a model-based specification only works when we restrict the concrete reprsentation to obey some invariant.

Lemma empty_relate : Abs empty_tree = empty.
Proof.
  reflexivity.
Qed.

Exercise: 3 stars, standard (lookup_relate)

Lemma lookup_relate : d t k,
    BST t lookup d k t = find d (Abs t) k.
Proof.
  unfold Abs.
  intros.
  remember (lookup d k t) as v. symmetry in Heqv.
  (* Hint: use lookup_In_elements and list2map_find/nofind. *)
  (* FILL IN HERE *) Admitted.

Exercise: 3 stars, standard (insert_relate)

Lemma insert_relate : t k v,
  BST t Abs (insert k v t) = update (Abs t) k v.
Proof.
  unfold Abs.
  intros.
  (* use the fact we laboriously proved about elements(insert ...) *)
  rewrite kvs_insert_elements; auto.
  remember (elements t) as l.
  clear -l.
  (* Hint: from here all is easy using the tools you know *)
(* FILL IN HERE *) Admitted.

Lemma elements_relate : t,
  BST t
  list2map (elements t) = Abs t.
Proof.
  unfold Abs. auto.
Qed.

An alternative abstraction relation (Optional, Advanced)

There is often more than one way to specify a suitable Abstraction relation between given concrete and abstract datatypes. The following exercises explore another way to relate search trees to functional partial maps without using elements as an intermediate step.
We extend our definition of functional partial maps by adding a new primitive for combining two partial maps, which we call union. Our intention is that it only be used to combine maps with disjoint key sets; to keep the operation symmetric, we make the result be undefined on any key they have in common.

Definition union {X} (m1 m2: partial_map X) : partial_map X :=
  fun k
    match (m1 k,m2 k) with
    | (None,None)None
    | (None,Some v)Some v
    | (Some v,None)Some v
    | (Some _,Some _)None
    end.
We can prove some simple properties of lookup and update on unions, which will prove useful later.

Exercise: 2 stars, standard, optional (union_collapse)

Lemma union_left : {X} (m1 m2: partial_map X) k,
    m2 k = None union m1 m2 k = m1 k.
Proof.
(* FILL IN HERE *) Admitted.

Lemma union_right : {X} (m1 m2: partial_map X) k,
    m1 k = None
    union m1 m2 k = m2 k.
Proof.
(* FILL IN HERE *) Admitted.

Lemma union_both : {X} (m1 m2 : partial_map X) k v1 v2,
    m1 k = Some v1
    m2 k = Some v2
    union m1 m2 k = None.
Proof.
(* FILL IN HERE *) Admitted.

Exercise: 3 stars, standard, optional (union_update)

Lemma union_update_right : {X} (m1 m2: partial_map X) k v,
    m1 k = None
    update (union m1 m2) k v = union m1 (update m2 k v).
Proof.
(* FILL IN HERE *) Admitted.

Lemma union_update_left : {X} (m1 m2: partial_map X) k v,
    m2 k = None
    update (union m1 m2) k v = union (update m1 k v) m2.
Proof.
(* FILL IN HERE *) Admitted.
We can now write a direct conversion function from trees to maps based on the structure of the tree, and prove a basic property preservation result.

Fixpoint tree2map (t: tree) : partial_map V :=
  match t with
  | Eempty
  | T l k v rupdate (union (tree2map l) (tree2map r)) k v
  end.

Exercise: 3 stars, advanced, optional (tree2map_prop)

Lemma tree2map_prop : P t, ForallT P t
                                    k v, (tree2map t) k = Some v
                                               P k v.
Proof.
  (* FILL IN HERE *) Admitted.
Finally, we define our new abstraction function, and prove the same commutation lemmas as before.

Definition Abs' (t: tree) : partial_map V := tree2map t.

Lemma empty_relate' : Abs' empty_tree = empty.
Proof.
  reflexivity.
Qed.

Exercise: 3 stars, advanced, optional (lookup_relate')

Lemma lookup_relate' : d t k,
    BST t lookup d k t = find d (Abs' t) k.
Proof.
  (* FILL IN HERE *) Admitted.

Exercise: 4 stars, advanced, optional (insert_relate')

Lemma insert_relate' : k v t,
   BST t Abs' (insert k v t) = update (Abs' t) k v.
Proof.
  (* FILL IN HERE *) Admitted.
The elements_relate lemma, which was trivial for our previous Abs function, is considerably harder this time. We suggest starting with an auxiliary lemma.

Exercise: 3 stars, advanced, optional (list2map_app)

Lemma list2map_app : (el1 el2: list (key × V)),
   disjoint (map fst el1) (map fst el2)
   list2map (el1 ++ el2) = union (list2map el1) (list2map el2).
Proof.
  (* FILL IN HERE *) Admitted.

Exercise: 4 stars, advanced, optional (elements_relate')

Lemma elements_relate' : t,
  BST t
  list2map (elements t) = Abs' t.
Proof.
  (* FILL IN HERE *) Admitted.

End TREES.

(* Tue Jan 28 13:58:40 EST 2020 *)