Caml
Power

Type-directed Programming

One of the principle advantages of programming in a strongly typed programming language like OCaml is that the types of function arguments and results can help guide the way you construct programs.

In this class, whenever you write a function in OCaml, you should do so by following the type-directed programming methodology. This programming methodology has the following steps:

  1. Write down the name of the function and the types of its arguments and results.
  2. Write a comment that explains its purpose and any preconditions.
  3. Write several examples of what your function does.
  4. Write the body of the function. (This is the hard part!)
  5. Turn your examples into tests.

The place where types really help is in the hard part: Write the body of the function because function bodies involve two conceptual activities:

  • Deconstruct (ie, tear apart or analyze) the input values.
  • Construct (ie, build) the output values.
Types help because each type comes with a clearly-defined associated set of values. When we know the values associated with a type, we know what exactly what inputs we have to analyze -- there is little chance we will miss a case, and, indeed, if you program using good style, the OCaml type checker will often warn you if you do accidentally miss a case. We also know the range of possibilities available when building a function's outputs. In both situations, types help us search for and identify complete solutions to the programming problem at hand.

For example, the type bool comes with exactly two values, true and false. When writing a function with an argument of type bool, one must consider what to do when supplied with the input true and one must consider what to do when supplied with the input false -- there are never any other possibilities to consider. Analogously, when writing a function with a result type bool, there are only two possible results you can construct -- the value true and the value false.

In following, we consider a number of the built-in OCaml types. For each type, we'll discuss the set of values for that type as well as how to deconstruct those values (for inputs) and construct those values (for outputs). An ml file associated with this lecture may be found here.

Booleans

You already know the set of values that make up the boolean type: true and false -- and that's it. Given an input of type bool, we may determine which of the two values we have using a match statement. In general, match statements have the following form.

match expression with
| pattern1 -> result1
| pattern2 -> result2
...
| patternk -> resultk 

The code above evaluates the expression and then checks to see whether the resulting value matches one of the patterns. The patterns are checked against the computed value in order and the result associated with the first pattern that matches is executed. The kinds of patterns available depend upon the type of the expression. When the expression has type bool, there are two patterns ... true and false, because, of course, those are the only two values that can have type bool. Hence, a boolean pattern matching expression looks like this:

match expression with
| true -> result1
| false -> result2

Example. Let's define a function that converts a boolean in to an integer. According to our methodology, we will first define the function name and types, write a comment to explain what it does and then write down some examples:

(* convert a boolean to an integer:
 *   bool_to_int true = 1
 *   bool_to_int false = 0             *)
let bool_to_int (b:bool) : int = ...

Next, we fill in the body of the function by deconstructing the input and reconstructing a result.

(* convert a boolean to an integer:
 *   bool_to_int true = 1
 *   bool_to_int false = 0             *)
let bool_to_int (b:bool) : int = 
  match b with
    | true  -> 1
    | false -> 0

Finally, we will convert our examples into tests. Below I've created a series of tests from our examples using assert statements. An assertion is simply an expression with boolean type wrapped in the keyword assert. Assertions have the benefit that they may be turned off in production code by using the compiler option -noassert. Hence, you can put them in your code for testing purposes but suffer no performance penalty when deploying your final product. See the OCaml manual for more. Here is our final code with our tests.

let _ =
  assert (bool_to_int true = 1);
  assert (bool_to_int false = 0);

Notice that we deleted the portion of the comment with the examples. The examples were a good intermediate step for our own thinking process, but the final code is almost identical to the comment so the comment is redundant. The code itself is simple and clear enough that additional comments just get in the way. It is better style to omit them and "let the code do the talking" in this case. (As an aside, notice that any function application "f arg" has higher precedence than an operator such as "=" so you do not need parens around bool_to_int true.)

If you compile your file (using ocamlbuild), and an assertion fails, the Assert_failure exception is raised with the source file name and the location of the boolean expression as arguments.

By the way, a synonym for a match statement on booleans like the one above is an if-then-else statement. Hence, a completely equivalent piece of code is as follows. Notice, of course, that our tests do not change when we change how we write our function. Creating durable tests for a function helps keep code correct as it evolves.

(* convert a boolean to an integer *)
let bool_to_int (b:bool) : int = 
  if b then 1 else 0

let _ =
  assert (bool_to_int true = 1);
  assert (bool_to_int false = 0);

Most programmers will write functions on a single boolean using an if statement. We introduced the idea of analyzing a boolean value using pattern matching because pattern matching is the general paradigm that programmers use to deconstruct data. An if statement is a special case that only really exists for historical reasons and because programmers coming from other kinds of languages feel comfortable with it.

Ok, that's booleans, and, by the way, if you were thinking "oh my god, I can't believe he spent so much time on such a simple function." Well, you are right. It was pretty easy -- a proficient OCaml programmer can write that function in 5 seconds. Onwards and upwards!

Tuples

The type t1 * t2 represents pairs of values where the first component of the pair has type t1 and the second component has type t2. The type t1 * t2 * t3 represents triples of values where the first component of the triple has type t1, the second component has type t2 and the third component has type t3. An n-tuple has type t1 * t2 * ... * tn and has n such components.

We create a pair or triple or n-tuple values by writing down a series of expressions separated by commas and enclosed by parentheses. For example:

let name_and_age1 : string * string * int = ("David", "Walker", 25)
let name_and_age2 : string * string * int = ("Brian", "Kernighan", 15)

Example. To analyze a pair or any other kind of tuple, we may again use pattern matching. Let's write a function to extract the string components of a triple like the one above and return a string. According to our methodology, we write the name and types first along with a comment. Then we add some examples.

(* create a string from the first two components, separated by a space *)
let full_name (name_and_age : string * string * int) : string = ...

let _ =
  assert(full_name name_and_age1 = "David Walker");
  assert(full_name name_and_age2 = "Brian Kernighan");

To fill in the body of the function, we use pattern matching to extract the content we need from the triple. Recall that the "^" operator concatenates two strings.

(* create a string from the first two components, separated by a space *)
let full_name (name_and_age : string * string * int) : string =
  match name_and_age with
    | (first, last, _) -> first ^ " " ^ last

let _ =
  assert(full_name name_and_age1 = "David Walker");
  assert(full_name name_and_age2 = "Brian Kernighan");

Above, the pattern (first, last, _) matches any triple; the variable first is bound to the first value in the triple and the variable last is bound to the second value in the triple. The "_" is a pattern that matches any value. It informs the reader of the code that "I don't care about this value." In this function, we don't use the contents of the age component, so the underscore pattern is appropriate.

Whenever a match statement contains just one pattern, a programmer may replace the match statement with a let statement. For instance, the following is a bit more compact.

(* create a string from the first two components, separated by a space *)
let full_name (name_and_age : string * string * int) : string =
  let (first, last, _) = name_and_age in
  first ^ " " ^ last

Even more compact, we can place the pattern match in the function argument position:

(* create a string from the first two components, separated by a space *)
let full_name ((first, last, _) : string * string * int) : string =
  first ^ " " ^ last

While these latter two examples are more compact, it is important to understand that pair patterns are just like boolean patterns or integer patterns or any other kind of pattern in that they may be used within a match expression. In more complicated examples, we may use several different patterns in conjunction to analyze and extract information from an input.

Example. Define a function that computes the disjunction of a pair of booleans.

(* compute disjunction *)
let or_pair (p:bool*bool) : bool = ...

let _ =
  assert(or_pair (true,true) = true);
  assert(or_pair (true,false) = true);
  assert(or_pair (false,true) = true);
  assert(or_pair (false,false) = false);

Next, we fill in the body of the function by deconstructing the input and reconstructing a result.

(* compute disjunction *)
let or_pair (p:bool*bool) : bool =
  match p with
    | (true,true) -> true
    | (true,false) -> true
    | (false,true) -> true
    | (false,false) -> false

let _ =
  assert(or_pair (true,true) = true);
  assert(or_pair (true,false) = true);
  assert(or_pair (false,true) = true);
  assert(or_pair (false,false) = false);

Now, since the input contains a pair of booleans and each boolean contains two different values, true and false, it is natural that we would start out writing our function using 4 cases (2*2 = 4). However, we might now observe that these 4 cases may be written as two using a wildcard pattern:

(* compute disjunction *)
let or_pair (p:bool*bool) : bool =
  match p with
    | (false,false) -> false
    | _ -> true

let _ =
  assert(or_pair (true,true) = true);
  assert(or_pair (true,false) = true);
  assert(or_pair (false,true) = true);
  assert(or_pair (false,false) = false);

Example. Write a function that counts the number of true values in a 5-tuple.

(* count the number of occurrences of "true" in the input *)
let count5 (p:bool*bool*bool*bool*bool) : int = ...

let _ =
  assert(count5 (true,true,true,true,true) = 5);
  assert(count5 (false,false,true,false,false) = 1);
  assert(count5 (false,false,false,false,false) = 0);

Perhaps your first instinct when writing the function is to once again break down the tuple of booleans in to cases as follows.

(* count the number of occurrences of "true" in the input *)
let count5 (p:bool*bool*bool*bool*bool) : int = 
  match p with
    | (false, false, false, false, false) -> 0
    | (true, false, false, false, false) -> 1
    | (false, true, false, false, false) -> 1
    ...

... but that is clearly going to get way out of hand and so many cases are going to be hard to read and verify. How can we reduce the number of cases? Well, "counting" a single boolean is easy -- it involves just two cases (as all functions on single booleans do!) -- if the boolean is true, it returns 1 and if it is false, it returns 0. We'll just use that function 5 times and sum the results. Moreover, we've already written a function to "count" a single boolean -- it is called bool_to_int! How lucky is that? (Even if we hadn't written it already, writing it now would take 5 seconds and be far easier and clearer than writing the 2^5 patterns we would have had to write if we followed the naive approach.) Here is the code.

(* count the number of occurrences of "true" in the input *)
let count5 (p:bool*bool*bool*bool*bool) : int = 
  let (b1, b2, b3, b4, b5) = p in
  bool_to_int b1 + 
    bool_to_int b2 + 
    bool_to_int b3 + 
    bool_to_int b4 + 
    bool_to_int b5

let _ =
  assert(count5 (true,true,true,true,true) = 5);
  assert(count5 (false,false,true,false,false) = 1);
  assert(count5 (false,false,false,false,false) = 0);

Unit Type

We have talked about pairs, which are tuples with 2 fields. We have talked about triples, which are tuples with 3 fields. We have talked about quintuples, which are tuples with 5 fields. Ever consider what a tuple with 0 fields looks like? It looks like this: (). In OCaml, this value is referred to colloquially as "unit" and it's type is also called unit.

Surprisingly, even though the unit value has no information content, it is quite heavily used! Whenever an expression has an effect on the outside word, but returns no interesting data, unit is its type. For instance, expressions that do nothing but print data to stdout will typically have type unit. The following is an example of an expression with type unit:

print_string "hello world\n\n"

Assertions are also expressions with type unit. Why? Because when an assertion succeeds, it does nothing, returning the unit value.

It is also possible for a function to have no interesting input -- such functions already contain all the data they need to execute. In such cases, unit is a reasonable argument type. Like other types, one can pattern match on expressions with unit type -- the pattern is ().

Example (Poor Style). Here is a function that prints hello world:

let hello_world (x:unit) : unit =
  match x with
   | () -> print_string "hello world\n"

However, as with other kinds of pairs, since there is only one branch of the match expression, we can (and should) shift the pattern in to the argument position as follows (note that we omit the type of the argument in this revision since the argument type unit is fully determined by the pattern ()).

Example (Better Style).
let hello_world () : unit =
  print_string "hello world\n"

Example. Sometimes, we need to execute several unit-valued expressions in a row. we could use successive pattern matching, but that is overly verbose. Instead of successive pattern matching, use a semi-colon to separate one unit-valued expression from the next.

let hello_world () : unit =
  print_string "hello";
  print_string " ";
  print_endline "world"
Example. Now that we know that assertions are just expressions with unit type, we can use them inside functions to check necessary conditions of our inputs and verify the correctness of our outputs.
(* precondition: 0 <= n < Str.length s   
 * returns the nth character of s *)
let nth (s:string) (n:int) : char =
  assert (0 <= n & n < Str.length s);
  Str.nth s
Example. Unit-valued functions We can also use assertions within functions as part of our testing apparatus. An effective way to test your functions is often to compute the same answer in two different ways. For instance, we know that disjunction should be symmetric. If we find it isn't, we must have an error. Here is some simple code to test for symmetry.
let test_symmetry (x:bool) (y:bool) : unit =
  assert(or_pair (x,y) = or_pair (y,x))

let _ =
  test_symmetry true false;
  test_symmetry false false;
  test_symmetry true true;

Option Types

An option type, written t option, contains two sorts of values, the value None and the value Some v where v is a value with type t.

Example. A point is a pair of integer coordinates. Write a function that finds the slope of a line between two points. Return None if the line is vertical and the slope is undefined. Return Some slope if the slope is non-negative.

To start out, it is useful to define a type abbreviation for points:

type point = int * int

When defining a type abbreviation, the new type name (ie: point) is in every way identical to its definition (ie: int * int). Hence we may now use point and int * int interchangeably in our code. However, using point (where the data in question does in fact represent a point) makes the code easier to read. It is good documentation and good style. Now, on to computing the slope of a line between two points.

(* slope of a line: 
 *   slope (0,0) (0,1) = Some .5 
 *   slope (2,-1) (2,17) = None
 *)
let slope (p1:point) (p2:point) : float option = ...

Since computations with floating point values may be imprecise, we'll start with our examples in comments and create some tests from them afterwards. On to the body of the function:

(* slope of a line: 
 *   slope (0,0) (12,0) = Some 0.0    -- horizontal line
 *   slope (2,-1) (2,17) = None       -- vertical line
 *   slope (0,0) (1,1) = Some 1.0     -- 45 degree angle
 *)
let slope (p1:point) (p2:point) : float option = 
  let (x1,y1) = p1 in 
  let (x2,y2) = p2 in 
  let xd = x2 - x1 in 
  let yd = y2 - y1 in 
  if xd != 0 then 
    Some ( float_of_int yd /. float_of_int xd ) 
  else 
    None

Notice that we used pattern matching on points to extract their components. This is perfectly legal since a point is a pair and pattern matching on pairs is legal. Before dividing yd by xd, we tested xd for zero. If it is not zero, we divide and return a Some. If it is zero, we return None.

When testing floating point results, we would like to test that the results are within an acceptable range as opposed to being exactly equal to some constant we write down in our file because of the imprecision of floating point arithmetic. Hence, to facilitate testing, we will write another function, inrange, to help us. Of course, whenever we write code to help us test our program functionality, it is possible the testing code is incorrect. Nevertheless, it usually helps us detect errors in our work because writing a computation two different ways typically helps weed out errors. (Sometimes we'll find an error in our test when the function being tested is correct. That's ok, we can quickly fix the test.) Here's a testing function:

(* in_range:
 *   true if x is (Some f) and f is between low and high, inclusive
 *   false otherwise *)
let in_range (x: float option) (low:float) (high:float) =
  match x with
      None -> false
    | Some f -> (low <= f & f <= high)

let _ =
  assert(in_range (Some 0.0) (-1.0) (2.0));
  assert(in_range (Some 0.0) (0.0) (0.0));
  assert(not (in_range (Some 3.0) (5.0) (-2.0)));
  assert(not (in_range (None) (-100.0) (200.0)));

Now that we have in_range, testing slope is made easier.

let _ =
  assert(slope (0,0) (0,0) = None);
  assert(slope (1,17) (1,-15) = None);
  assert(in_range (slope (0,0) (1,0)) (-0.0001) 0.0001);
  assert(in_range (slope (0,0) (1,1)) (0.9999) 1.0001);

Summary

Strong, precise type systems help guide the construction of functions. Typically, we analyze the inputs to our functions according to their type and build ouputs for our function, again, according to their type. The following table summarizes the types we have looked, the shape of the patterns for analyzing values of those types and the common deconstruction patterns for that type.

Type T Pattern(s) Common Deconstruction
bool true; false if e then ... else ...
t1 * t2 (pat,pat) let (x,y) = e in ...
t1 * t2 * t3 (pat,pat,pat) let (x,y,z) = e in ...
t1 * ... * tn (pat,...,pat) let (x1,...,xn) = e in ...
unit () e; ...
t option None; Some pat match e with None -> ... | Some x -> ...