Power
Type-directed Programming
One of the principle advantages of programming in a strongly typed programming language like OCaml is that the types of function arguments and results can help guide the way you construct programs.
In this class, whenever you write a function in OCaml, you should do so by following the type-directed programming methodology. This programming methodology has the following steps:
- Write down the name of the function and the types of its arguments and results.
- Write a comment that explains its purpose and any preconditions.
- Write several examples of what your function does.
- Write the body of the function. (This is the hard part!)
- Turn your examples into tests.
The place where types really help is in the hard part: Write the body of the function because function bodies involve two conceptual activities:
- Deconstruct (ie, tear apart or analyze) the input values.
- Construct (ie, build) the output values.
For example, the type bool
comes with exactly
two values, true
and false
. When writing a
function with an argument of type bool
, one must consider
what to do when supplied with the input true
and
one must consider what to do when supplied with the input
false
-- there are never any other possibilities to
consider. Analogously, when writing a function
with a result type bool
, there are only two possible results
you can construct -- the value true
and the value
false
.
In following, we consider a number of the built-in OCaml types. For each type, we'll discuss the set of values for that type as well as how to deconstruct those values (for inputs) and construct those values (for outputs). An ml file associated with this lecture may be found here.
Booleans
You already know the set of values that make up the boolean type:
true
and false
-- and that's it.
Given an input of type bool, we may determine which of the
two values we have using a match
statement.
In general, match statements have the following form.
match expression with | pattern1 -> result1 | pattern2 -> result2 ... | patternk -> resultk
The code above evaluates the expression
and then
checks to see whether the resulting value matches one of the
patterns. The patterns are checked against the computed value in order
and the result associated with the first pattern that matches is
executed. The kinds of patterns available depend upon the type
of the expression
. When the expression
has type bool, there are two patterns ...
true
and false
, because, of course,
those are the only two values that
can have type bool. Hence, a boolean pattern matching
expression looks like this:
match expression with | true -> result1 | false -> result2
Example. Let's define a function that converts a boolean in to an integer. According to our methodology, we will first define the function name and types, write a comment to explain what it does and then write down some examples:
(* convert a boolean to an integer: * bool_to_int true = 1 * bool_to_int false = 0 *) let bool_to_int (b:bool) : int = ... ;;
Next, we fill in the body of the function by deconstructing the input and reconstructing a result.
(* convert a boolean to an integer: * bool_to_int true = 1 * bool_to_int false = 0 *) let bool_to_int (b:bool) : int = match b with | true -> 1 | false -> 0 ;;
Finally, we will convert our examples in to tests. The first step in doing so is to write down expressions that implement your tests. You can type them in to your file and then load the file in to the OCaml toplevel loop. Or you can type them in to the OCaml toplevel loop directly. Here, I type them in to the top-level loop (after loading the definitions above):
# bool_to_int true;; - : int = 1 # bool_to_int false;; - : int = 0 #
Excellent, the expression results correspond to my expectations.
Next, it is wise to save my work by recording the test infrastructure
in a file. If we want to modify the code later, perhaps for performance
reasons, we will want rerun the test suite we have developed to ensure
we did not make a mistake. There are several ways develop and record
test infrastructure. One way to do so is to use OCaml assertions.
An assertion is simply an
expression with boolean type wrapped in the keyword assert
.
Assertions have the benefit that they may be turned off in production
code by using the compiler option -noassert. Hence, you can put them
in your code for testing purposes but suffer no performance penalty when
deploying your final product.
See the OCaml manual for more.
Here is our final code with our tests.
(* convert a boolean to an integer *) let bool_to_int (b:bool) : int = match b with | true -> 1 | false -> 0 ;; assert (bool_to_int true = 1);; assert (bool_to_int false = 0);;
Notice that we deleted the portion of the comment with the examples.
The examples were a good intermediate step for our own thinking process,
but the final code is almost identical to the comment so the comment is
redundant. The code itself is simple and clear enough that
additional comments just get in the way. It is better style
to omit them and "let the code do the talking" in this case.
As for the assertions, (bool_to_int true = 1)
is
a boolean expression. It calls the function bool_to_int
with argument true
and then tests to see if the result
of the call is equal to 1. (Notice that any function application
"f arg
"
has higher precedence than an operator such as "=" so you do not
need parens around bool_to_int true
.)
By the way, a synonym for a match statement on booleans like the one above is an if-then-else statement. Hence, a completely equivalent piece of code is as follows. Notice, of course, that our tests do not change when we change how we write our function. Creating durable tests for a function helps keep code correct as it evolves.
(* convert a boolean to an integer *) let bool_to_int (b:bool) : int = if b then 1 else 0 ;; assert (bool_to_int true = 1);; assert (bool_to_int false = 0);;
Most programmers will write functions on a single boolean using an if statement. We introduced the idea of analyzing a boolean value using pattern matching because pattern matching is the general paradigm that programmers use to deconstruct data. An if statement is a special case that only really exists for historical reasons and because programmers coming from other kinds of languages feel comfortable with it.
Ok, that's booleans, and, by the way, if you were thinking "oh my god, I can't believe he spent so much time on such a simple function." Well, you are right. It was pretty easy -- a proficient OCaml programmer can write that function in 5 seconds. Onwards and upwards!
Tuples
The type t1 * t2
represents pairs of values where the first
component of the pair has type t1
and the second component
has type t2
. The type t1 * t2 * t3
represents
triples of values where the first
component of the triple has type t1
, the second component
has type t2
and the third component has type t3
.
An n-tuple has type t1 * t2 * ... * tn
and has n such
components.
We create a pair or triple or n-tuple values by writing down a series of expressions separated by commas and enclosed by parentheses. For example:
let name_and_age1 : string * string * int = ("David", "Walker", 25);; let name_and_age2 : string * string * int = ("Brian", "Kernighan", 15);;
Example. To analyze a pair or any other kind of tuple, we may again use pattern matching. Let's write a function to extract the string components of a triple like the one above and return a string. According to our methodology, we write the name and types first along with a comment. Then we add some examples.
(* create a string from the first two components, separated by a space *) let full_name (name_and_age : string * string * int) : string = ... ;; assert(full_name name_and_age1 = "David Walker");; assert(full_name name_and_age2 = "Brian Kernighan");;
To fill in the body of the function, we use pattern matching to extract the content we need from the triple. Recall that the "^" operator concatenates two strings.
(* create a string from the first two components, separated by a space *) let full_name (name_and_age : string * string * int) : string = match name_and_age with | (first, last, _) -> first ^ " " ^ last ;; assert(full_name name_and_age1 = "David Walker");; assert(full_name name_and_age2 = "Brian Kernighan");;
Above, the pattern (first, last, _)
matches any
triple; the variable first
is bound to the first
value in the triple and the variable last
is bound
to the second value in the triple. The "_" is a pattern that matches
any value. It informs the reader of the code that "I don't care about
this value." In this function, we don't use the contents of
the age component, so the underscore pattern is appropriate.
Whenever a match statement contains just one pattern, a programmer may replace the match statement with a let statement. For instance, the following is a bit more compact.
(* create a string from the first two components, separated by a space *) let full_name (name_and_age : string * string * int) : string = let (first, last, _) = name_and_age in first ^ " " ^ last ;;
Even more compact, we can place the pattern match in the function argument position:
(* create a string from the first two components, separated by a space *) let full_name ((first, last, _) : string * string * int) : string = first ^ " " ^ last ;;
While these latter two examples are more compact, it is important to understand that pair patterns are just like boolean patterns or integer patterns or any other kind of pattern in that they may be used within a match expression. In more complicated examples, we may use several different patterns in conjunction to analyze and extract information from an input.
Example. Define a function that computes the disjunction of a pair of booleans.
(* compute disjunction *) let or_pair (p:bool*bool) : bool = ... ;; assert(or_pair (true,true) = true);; assert(or_pair (true,false) = true);; assert(or_pair (false,true) = true);; assert(or_pair (false,false) = false);;
Next, we fill in the body of the function by deconstructing the input and reconstructing a result.
(* compute disjunction *) let or_pair (p:bool*bool) : bool = match p with | (true,true) -> true | (true,false) -> true | (false,true) -> true | (false,false) -> false ;; assert(or_pair (true,true) = true);; assert(or_pair (true,false) = true);; assert(or_pair (false,true) = true);; assert(or_pair (false,false) = false);;
Now, since the input contains a pair of booleans and each boolean contains two different values, true and false, it is natural that we would start out writing our function using 4 cases (2*2 = 4). However, we might now observe that these 4 cases may be written as two using a wildcard pattern:
(* compute disjunction *) let or_pair (p:bool*bool) : bool = match p with | (false,false) -> false | _ -> true ;; assert(or_pair (true,true) = true);; assert(or_pair (true,false) = true);; assert(or_pair (false,true) = true);; assert(or_pair (false,false) = false);;
Example. Write a function that counts the number of true values in a 5-tuple.
(* count the number of occurrences of "true" in the input *) let count5 (p:bool*bool*bool*bool*bool) : int = ... ;; assert(count5 (true,true,true,true,true) = 5);; assert(count5 (false,false,true,false,false) = 1);; assert(count5 (false,false,false,false,false) = 0);;
Perhaps your first instinct when writing the function is to once again break down the tuple of booleans in to cases as follows.
(* count the number of occurrences of "true" in the input *) let count5 (p:bool*bool*bool*bool*bool) : int = match p with | (false, false, false, false, false) -> 0 | (true, false, false, false, false) -> 1 | (false, true, false, false, false) -> 1 ... ;;
... but that is clearly going to get way out of hand and so many
cases are going to
be hard to read and verify. How can we reduce the number of cases?
Well, "counting" a single boolean is easy -- it involves just two
cases (as all functions on single booleans do!) -- if the boolean is
true, it returns 1 and if it is false, it returns 0. We'll just
use that function 5 times and sum the results. Moreover,
we've already written a function to "count" a single boolean --
it is called bool_to_int
!
How lucky is that? (Even if we hadn't written it already, writing
it now would take 5 seconds and be far easier and clearer than
writing the 2^5 patterns we would have had to write if we followed
the naive approach.) Here is the code.
(* count the number of occurrences of "true" in the input *) let count5 (p:bool*bool*bool*bool*bool) : int = let (b1, b2, b3, b4, b5) = p in bool_to_int b1 + bool_to_int b2 + bool_to_int b3 + bool_to_int b4 + bool_to_int b5 ;; assert(count5 (true,true,true,true,true) = 5);; assert(count5 (false,false,true,false,false) = 1);; assert(count5 (false,false,false,false,false) = 0);;
Unit Type
We have talked about pairs, which are tuples with 2 fields. We have
talked about triples, which are tuples with 3 fields. We have talked
about quintuples, which are tuples with 5 fields. Ever consider what
a tuple with 0 fields looks like? It looks like this: ()
. In
OCaml, this value is referred to colloquially as "unit" and it's type is also
called unit
.
Surprisingly, even though the unit value has no information content, it is quite heavily used! Whenever an expression has an effect on the outside word, but returns no interesting data, unit is its type. For instance, expressions that do nothing but print data to stdout will typically have type unit. The following is an example of an expression with type unit:
print_string "hello world\n\n";;
Assertions are also expressions with type unit. Why? Because when an assertion succeeds, it does nothing, returning the unit value.
It is also possible for a function to have no interesting input -- such functions already contain all the data they need to execute. In such cases, unit is a reasonable argument type. Like other types, one can pattern match on expressions with unit type -- the pattern is ().
Example (Poor Style). Here is a function that prints hello world:
let hello_world (x:unit) : unit = match x with | () -> print_string "hello world\n" ;;
However, as with other kinds of pairs, since there is only one branch of the match expression, we can (and should) shift the pattern in to the argument position as follows (note that we omit the type of the argument in this revision since the argument type unit is fully determined by the pattern ()).
Example (Better Style).let hello_world () : unit = print_string "hello world\n" ;;
Example. Sometimes, we need to execute several unit-valued expressions in a row. we could use successive pattern matching, but that is overly verbose. Instead of successive pattern matching, use a semi-colon to separate one unit-valued expression from the next.
let hello_world () : unit = print_string "hello"; print_string " "; print_endline "world" ;;Example. Now that we know that assertions are just expressions with unit type, we can use them inside functions to check necessary conditions of our inputs and verify the correctness of our outputs.
(* precondition: 0 <= n < Str.length s * returns the nth character of s *) let nth (s:string) (n:int) : char = assert (0 <= n & n < Str.length s); Str.nth s ;;Example. Unit-valued functions We can also use assertions within functions as part of our testing apparatus. An effective way to test your functions is often to compute the same answer in two different ways. For instance, we know that disjunction should be symmetric. If we find it isn't, we must have an error. Here is some simple code to test for symmetry.
let test_symmetry (x:bool) (y:bool) : unit = assert(or_pair (x,y) = or_pair (y,x)) ;; test_symmetry true false; test_symmetry false false; test_symmetry true true; ;;
Option Types
An option type, written t option
, contains two sorts of values,
the value None
and the value Some v
where v is a
value with type t.
Example. A point is a pair of integer coordinates.
Write a function that finds the slope of a line between two points.
Return None
if the line is vertical and the slope is
undefined. Return Some slope
if the slope is non-negative.
To start out, it is useful to define a type abbreviation for points:
type point = int * int;;
When defining a type abbreviation, the new type name (ie: point
) is
in every way identical to its definition (ie: int * int
).
Hence we may now use point
and int * int
interchangeably in our code.
However, using point
(where the data in question does in
fact represent a point) makes the code easier to read. It
is good documentation and good style. Now, on to computing the slope
of a line between two points.
(* slope of a line: * slope (0,0) (0,1) = Some .5 * slope (2,-1) (2,17) = None *) let slope (p1:point) (p2:point) : float option = ... ;;
Since computations with floating point values may be imprecise, we'll start with our examples in comments and create some tests from them afterwards. On to the body of the function:
(* slope of a line: * slope (0,0) (12,0) = Some 0.0 -- horizontal line * slope (2,-1) (2,17) = None -- vertical line * slope (0,0) (1,1) = Some 1.0 -- 45 degree angle *) let slope (p1:point) (p2:point) : float option = let (x1,y1) = p1 in let (x2,y2) = p2 in let xd = x2 - x1 in let yd = y2 - y1 in if xd != 0 then Some ( float_of_int yd /. float_of_int xd ) else None ;;
Notice that we used pattern matching on points to extract their components. This is perfectly legal since a point is a pair and pattern matching on pairs is legal. Before dividing yd by xd, we tested xd for zero. If it is not zero, we divide and return a Some. If it is zero, we return None.
When testing floating point results, we would like to test that the results are within an acceptable range as opposed to being exactly equal to some constant we write down in our file because of the imprecision of floating point arithmetic. Hence, to facilitate testing, we will write another function, inrange, to help us. Of course, whenever we write code to help us test our program functionality, it is possible the testing code is incorrect. Nevertheless, it usually helps us detect errors in our work because writing a computation two different ways typically helps weed out errors. (Sometimes we'll find an error in our test when the function being tested is correct. That's ok, we can quickly fix the test.) Here's a testing function:
(* in_range: * true if x is (Some f) and f is between low and high, inclusive * false otherwise *) let in_range (x: float option) (low:float) (high:float) = match x with None -> false | Some f -> (low <= f & f <= high) ;; assert(in_range (Some 0.0) (-1.0) (2.0));; assert(in_range (Some 0.0) (0.0) (0.0));; assert(not (in_range (Some 3.0) (5.0) (-2.0)));; assert(not (in_range (None) (-100.0) (200.0)));;
Now that we have in_range, testing slope is made easier.
assert(slope (0,0) (0,0) = None);; assert(slope (1,17) (1,-15) = None);; assert(in_range (slope (0,0) (1,0)) (-0.0001) 0.0001);; assert(in_range (slope (0,0) (1,1)) (0.9999) 1.0001);;
Summary
Strong, precise type systems help guide the construction of functions. Typically, we analyze the inputs to our functions according to their type and build ouputs for our function, again, according to their type. The following table summarizes the types we have looked, the shape of the patterns for analyzing values of those types and the common deconstruction patterns for that type.
Type T | Pattern(s) | Common Deconstruction |
---|---|---|
bool | true; false | if e then ... else ... |
t1 * t2 | (pat,pat) | let (x,y) = e in ... |
t1 * t2 * t3 | (pat,pat,pat) | let (x,y,z) = e in ... |
t1 * ... * tn | (pat,...,pat) | let (x1,...,xn) = e in ... |
unit | () | e; ... |
t option | None; Some pat | match e with None -> ... | Some x -> ... |