Recursive datatypes

What are recursive datatypes?

Any datatype, the definition of which somehow involves a reference to itself, is considered a recursive datatype.

Most commonly, however, (in C, at least) such references to itelf are pointers to elements of the same type. You have already seen the most common example of recursive types: linked lists.

struct elem {
  int content;
  struct elem *next;  /* <----- reference to struct elem */
};

More generally, recursive types are used to represent arbitrary graphs.

Recursive types vs. recursive functions

Algorithmic recursion is often the mirror image of recursion in datatypes. You will find it to be very convenient to process your recursive data structures recursively.

Binary trees

Binary trees are a neat data structure. Unless a tree is empty (i.e., a leaf) it consists of two sub-trees of the same type. Sort of a self-similar structure (parts of it are "similar" to the whole.)

There are two major alternative ways of designing binary trees:

all tree nodes carry values
only leaf nodes carry values

If all nodes carry values, then it isn't really necessary to distinguish between leaf nodes and other nodes. Also, we can have something like the ``empty'' tree. If both of the sub-trees of some node are empty, then this node is a leaf. Let's assume the empty tree is represented by the NULL pointer. A simple binary tree can then be defined this way:

struct tnode {
  int content;
  struct tnode *left, *right;
};

The version which carries values at leaf nodes only is perhaps a bit more cumbersome to define cleanly in C. Here is an example, involving unions:
struct tnode { int is_leaf; union { struct { struct tnode *left, *right; } s; int content; } u; };

Let's go back to the simple case (which I would recommend in all cases!).

A recursive procedure that allows you to visit all the nodes in preorder is now very easy to write:

void tpreorder (struct tnode *tree, void visitor (int))
{
  if (tree) {
    visitor (tree->content);
    tpreorder (tree->left);
    tpreorder (tree->right);
  }
}

Can you guess how you would do inorder or postorder traversals?

A variant of the traversal could be used to sum the values stored in a tree:

int tsum (struct tnode *)
{
  return t ? t->content + tsum (t->left) + tsum (t->right) : 0;
}

Binary search trees

Binary search trees are just like binary trees. However, we will maintain another invariant:

The content of any node in a binary search tree it greater than the content of any other node in its left subtree and less than the content of any other node in its right subtree.

Here is how you search for a node with a given content:

struct tnode *find (struct tnode *t, int content)
{
  if (t && t->content != content)
    return find ((t->content < content) ? t->right : t->left, content);
  return t;
}

Notice how the structure of the algorithm follows the structure of the the datatype definition (just like in the case of the traversal procedures above).

On closer inspection you will find that the recursive call to find occurs right next to the word return. This indicates that your procedure is tail-recursive and can easily be rewritten using a loop.
struct tnode *lfind (struct tnode *t, int content) { while (t && t->content != content) t = (t->content < content) ? t->right : t->left; return t; }

Advise: When dealing with recursive datatypes always write your code in the natural, that is: recursive way first! Only if you discover that this approach constitutes a performance bottleneck, then you should start worrying about translating it into an iterative solution where possible.

Insertion

Inserting a node into a binary search tree is almost as easy as finding one:

void insert (struct tnode **t, int content)
{
  if (*t)
    if ((*t)-> content == content)
      return;
    else if ((*t)->content < content)
      insert (&(*t)->right, content);
    else
      insert (&(*t)->left, content);
  else {
    struct tnode *n = malloc (sizeof (struct tnode));
    if (n == NULL) {
      fputs ("Out of memory\n", stderr);
      exit (1);
    } else {
      n->left = n->right = NULL;
      n->content = content;
      *t = n;
    }
  }
}

Exercise: Write this function iteratively.

Deleting a node

If you think carefully about the insertion algorithm above, then you will see that it always adds nodes as leaf nodes. This makes things particularly easy, because we don't have to deal with the messy details of splicing a node into a middle of an existing mesh of pointers.

With deletion we won't be so lucky, because not every node we ever may want to delete is necessarily a leaf node. So, how do we deal with this? (Hint: If the node n to be deleted is not a leaf itself, then try selecting an appropriate leaf node from the rest of the tree, which you then can use to replace n!)

An advanced example

What does this do?

static struct elem *flat (struct tnode *t, struct elem *l)
{
  if (t) {
    struct elem *n = malloc (sizeof (struct elem));
    if (n == NULL) {
      fputs ("Out of memory\n", stderr);
      exit (1);
    }
    n->next = flat (t->right, l);
    n->content = t->content;
    return flat (t->right, n);
  } else
    return l;
}

struct elem *flatten (struct tnode *tree)
{
  return flat (tree, NULL);
}

Matthias Blume, CS Department, Princeton University