Abstract datatypes

Abstact datatypes (ADTs) are one way of separating parts of a larger programming task from the rest of the program. This allows for better structuring, and it enables multiple programmers to work on the same project. When done wisely it can be used to build libraries with code that can be incorporated into many different projects.

Motivation

Think of ADTs as contracts between the people (or programs!) who write the implementations and their clients.

Such contacts should include:

The contracts should not include:

You ask: Why?

Answer: The client of an ADT must know:

As long as the client doesn't take advantage of certain details of an ADT's implementation it is possible to change those details without breaking the client's code. This allows for ease of modification.

Enforcing the rules:

ADTs in C

A familiar example

There is one ADT that everybody has seen already. Its name is FILE, its specification is part of the C language definition. The definitions relevant to clients of FILE are in a system header file called <stdio.h>. Operations over FILE include fopen, fprintf, fgets, etc. However, a quick inspection of /usr/include/stdio.h (which is the physical location of <stdio.h> on many systems) reveals the following definition for FILE:
typedef struct {
	int	_cnt;
	unsigned char	*_ptr;
	unsigned char	*_base;
	int	_bufsiz;
	short	_flag;
	short	_file;

	char	*__newbase;
	void	*_lock;
	unsigned char	*_bufendp;
} FILE;
This means that even though we are supposed to treat FILE as if it were an ADT, it is possible without too much trouble to breach the contract and to directly access elements of a FILE structure without going through the official interface of the type.

A better approach

A better way of defining ADTs in C is to use incomplete structure definitions. Any such definition only introduces the name of a structure type, but nothing else. In C it is permitted to declare pointers to elements of such an incomplete type, but since the type is incomplete compilers will reject all attempts to dereference the pointer (via * or even ->).
Since C is not a type-safe language it will always be possible for the experienced hacker to get around such ``limitations''. The resulting programs frequently suffer from all the disadvantages of not playing by the rules (and then some), and the programmer deserves what he or she gets. Of course, passengers in airplanes controlled by such software might object to that...

You might ask: What's the point of having pointers which can't be dereferenced?

That's a good question. But the answer is simple: Only the implementor of the ADT is permitted to dereference the pointer, because only he or she has the complete structure definition. Since the client's code cannot do anything useful with pointers to an incomplete structure except for passing them around, it is forced to use functions provided by the implementor. This way abstraction can be enforced.

Headers, clients, and implementations

Let's consider the example of an ADT that implements a stack of integers. There is a function to create a new empty stack, as well as functions to push and pop elements onto and from the stack, respectively.

The public header file istack.h contains the following definitions:


# ifndef ISTACK_H
# define ISTACK_H

typedef struct istack *istack;

/*
 * create a new empty stack of integers
 *   RETURN VALUE: NULL if allocation fails
 */
istack istack_new (void);

/*
 * push an integer onto an existing istack
 *  RETURN VALUE: 0 upon success
 *               -1 upon failure (to allocate enough memory)
 */
int istack_push (istack, int);

/*
 * Query the number of elements on a given istack.
 * This number is the total number of istack_push operations performed
 * minus the total number of istack_pop operations performed.
 */
int istack_height (istack);

/*
 * pop the topmost integer from an istack;
 *   caveat:
 *             istack_pop (s)
 *      can only be called if
 *             istack_height (s) > 0
 */
int istack_pop (istack);

/* Free all memory associated with this stack. */
void istack_delete (istack);

# endif

With this ADT a client can use an integer stack to reverse the list of integers in its input. Here is sample code for reverse.c:
# include <stdio.h>
# include "istack.h"

int main (void)
{
  istack s = istack_new ();
  int i;

  if (s == NULL) {
    fputs ("Couldn't allocate stack\n", stderr);
    exit (1);
  }
  while (scanf ("%d", &i) == 1)
    if (istack_push (s, i) < 0) {
     fputs ("Out of memory\n", stderr);
     exit (1);
    }
  while (istack_height (s) > 0)
    printf ("%d\n", istack_pop (s));
  istack_delete (s);
  return 0;
}

Note, that until now we haven't said anything about the actual implementation of integer stacks. And still, we had no problem formulating the code using such stacks!

For completeness, here is one possible realization of the istack ADT. File istack.c:


# include <stdlib.h>
# include <stdio.h>
# include "istack.h"

# define INCREMENT 64

struct istack {
  size_t total;
  size_t height;
  int *array;
};

istack istack_new (void)
{
  istack s = malloc (sizeof (struct istack));
  if (s == NULL)
    return NULL;
  s->total = 0;
  s->height = 0;
  s->array = NULL;
  return s;
}

/*
 * This is how realloc SHOULD work according to ANSI and ISO
 * standards for C.  Unfortunately, Sun's C libraries are not quite
 * up to that... :(
 */
static void *safe_realloc (void *x, size_t s)
{
  if (x == NULL)
    if (s == 0)
      return NULL;
    else
      return malloc (s);
  else if (s == 0) {
    free (x);
    return NULL;
  } else
    return realloc (x, s);
}

int istack_push (istack s, int i)
{
  if (s->height >= s->total) {
    int *a = safe_realloc (s->array, (s->total + INCREMENT) * sizeof (int));
    if (a == NULL)
      return -1;     /* couldn't allocate more space */
    s->array = a;
    s->total += INCREMENT;
  }
  s->array [s->height++] = i;
  return 0;          /* worked out ok */
}

int istack_height (istack s)
{
  return s->height;
}

int istack_pop (istack s)
{
  if (s->height > 0)
    return s->array [--s->height];
  else {
    fputs ("Fell off the end of the stack.\n", stderr);
    exit (2);
  }
}

void istack_delete (istack s)
{
  if (s->array != NULL)
    free (s->array);
  free (s);
}

A Makefile helps keeping track of what needs to be recompiled:
CC = lcc
CFLAGS = -g

OBJECTS = reverse.o istack.o

reverse: $(OBJECTS)
	$(CC) -o reverse $(OBJECTS)

reverse.o: reverse.c istack.h
istack.o:  istack.c istack.h

Watch it!

Unfortunately, C doesn't give you the powers as an ADT designer/implementor to prevent clients from breaking your code. There is nothing that prevents a client program to set out like this:
# include "istack.h"

struct istack {
  size_t total;
  size_t height;
  int *array;
};

int main (void) {
  char buf [37];
  istack s = istack_new ();
  s->total = -23;
  s->height = 31415;
  s->array = buf;
  ...
}

Or, (hard to imagine) even worse:
# include "istack.h"

struct istack {
  float let;
  double us;
  struct istack *_break;
  int this;
  char code;
};

...

Specifications

The specification of an ADT is everything the client is permitted to know about it. It is what's in the contract. (Remember? ADTs are contracts.) The ADT's implementor must make sure that his or her implementation matches the specification precisely.

Specifications can be divided into two parts: signatures and axioms.

Signatures

An ADT's signature summarizes: Signatures can be written down formally in most programming languages that support ADTs. In our example, file istack.h is the definition of the ADT's signature.

Axioms

Axioms are used to specify the behavior of the operations. Some programming languages make it easier to formally write down such axioms; even fewer of them ask you to do so and check them at compile- or run-time. In our example we have informally written down some axioms - as C comments. Look at istack.h and see for yourself.

Axioms specify:

(A bad) Example

If you studied the code for istack's implementation carefully, then you must have noticed that we used an auxiliary function safe_realloc. This function behaves like realloc itself should have, according to the ANSI and ISO specifications for C, behaved in the first place. Unfortunately, some popular C libraried come with implementation that do not pay close attention to those document's demands.
For those who are interested: The ANSI standard for C requires realloc to subsume the functionality of both malloc and free. In particular, when it is called as
        realloc (NULL, n)
then it must behave like
        malloc (n)
If you look at our code for istack_new and istack_push, then you will discover that we are taking advantage of this feature. However, since Sun's default realloc doesn't behave as it should, we had to ``roll our own''.

Testing

It is important to make sure an ADT's implementation adheres to its specification. The programmer who implements the ADT is responsible for convincingly showing that his or her implementation is correct. In some cases it will be possible to use formal proof methods to do so, but usually this won't work out so well. Formal proofs require a complete and formal specification. A formal specification for ADTs in C is very hard to come up with (albeit not impossible).

A more practical (but less convincing) way of ``proving'' one's implementation correct is to thoroughly test it. But beware! Testing itself is a non-trivial task. A good approach: Write a test program, the purpose of which is to exercise the ADT's implementation! Stress-test it!

Most people's code, if it breaks at all, breaks at boundary cases. For example, try the program with empty input and with very large input, with empty lines, and with very long lines. If it works for those cases (and for at least a few ``normal'' cases as well), then the odds are it was implemented correctly. Good test programs will work very hard not to miss any of the boundary cases.

Interactive testing

Arguably, the best approach is to provide a framework for interactive testing.

In the case of istack we could use something like this:


# include <stdio.h>
# include "istack.h"

static istack s = NULL;

static void no_stack (void) {
  fputs ("no stack\n", stdout);
}

static void delete (void)
{
  if (s) {
    istack_delete (s);
    s = NULL;
    fputs ("deleted\n", stdout);
  } else
    no_stack ();
}

static void new (void)
{
  if (s) {
    istack_delete (s);
    fputs ("old stack deleted\n", stdout);
  }
  s = istack_new ();
  if (s == NULL)
    fputs ("allocation failed\n", stdout);
  else
    fputs ("new stack\n", stdout);
}

static void height (void)
{
  if (s == NULL)
    no_stack ();
  else
    printf ("height: %d\n", istack_height (s));
}

static void huh (void)
{
  fputs ("huh?!\n", stdout);
}

static void add (char *num)
{
  int i;
  if (s == NULL)
    no_stack ();
  else if (sscanf (num, "%d", &i) != 1)
    huh ();
  else if (istack_push (s, i) < 0)
    fputs ("Couldn't push -- out of memory\n", stdout);
  else
    printf ("pushed %d\n", i);
  }
}

static void pop (void)
{
  if (s == NULL)
    no_stack ();
  else if (istack_height (s) <= 0)
    fputs ("empty stack\n", stdout);
  else
    printf ("popped: %d\n", istack_pop (s));
}

static void Pop (void)
{
  if (s == NULL)
    no_stack ();
  else
    printf ("popped: %d\n", istack_pop (s));
}

static void test (char *command)
{
  switch (*command) {
  case 'n': new (); break;            /* new stack */
  case 'd': delete (); break;         /* get rid of it */
  case 'h': height (); break;         /* print height */
  case 'a': add (command + 1); break; /* add (push) element */
  case 'p': pop (); break;            /* (safely) pop element */
  case 'P': Pop (); break;            /* (forcedly) pop element */
  default: huh (); break;             /* huh?! */
  }
}

# define LEN 32

int main (int argc, char **argv)
{
  char buf [LEN];
  int c;
  if (argc == 1) {
    while (fgets (buf, LEN, stdin) != NULL) {
      test (buf);
      if (buf [strlen (buf) - 1] != '\n')
        while ((c = getchar ()) != EOF && c != '\n')
          ;
    }
  } else {
    while (--argc)
      test (*++argv);
  }
  return 0;
}

We can add rules to our Makefile to automate the task of building the test driver:
CC = lcc
CFLAGS = -g

OBJECTS = reverse.o istack.o
TESTOBJ = istack-test.o istack.o

all: reverse istack-test

reverse: $(OBJECTS)
	$(CC) -o reverse $(OBJECTS)

istack-test: $(TESTOBJ)
	$(CC) -o istack-test $(TESTOBJ)

reverse.o: reverse.c istack.h
istack.o:  istack.c istack.h
istack-test.o: istack-test.c istack.h

This is a very comfortable test driver. If you haben't figured it out already - here is what it does: It implements a tiny language of one-letter commands, which correspond to operations on integer stacks.
n
new - make a brand new stack
d
delete - delete the current stack
h
height - show the height of the current stack
anum
add - push num onto the current stack
p
pop - check for empty stack and pop an element from the stack
P
Pop - don't check for empty stack and pop an element
Commands can be given as command-line arguments to the test driver. If there aren't any (argc == 1), then the program reads commands from standard input.

Considering the simplicity of the ADT all this is probably overkill. But it gives you a flavor of what can be done, and what should be done in more interesting cases.


Matthias Blume, CS Department, Princeton University