This is a group project. Teams of size 1 or 2 are acceptable. If you are working in a team of two, only one student needs to submit the project.
util/assert.ml(i) - the assertion framework util/platform.ml - platform-specific compilation support util/range.ml(i) - range datatype for error messages ll/ll.ml - the abstract syntax for LLVMlite ll/lllib.ml - name generation and pretty-printing for LLVMlite ll/lllexer.mll - lexer for LLVMlite syntax ll/llparser.mly - parser generator for LLVMlite syntax ll/llinterp.ml - reference interpreter for the LLVMlite semantics /x86/x86.ml - the X86lite language used as a target README - help about the main test harness Makefile - basic make support for invoking ocamlbuild atprograms/*.oat - example hw4 programs used in testing hw5programs/*.oat - example hw5 programs used in testing main.ml - main test harness driver.ml - utilities for invoking the compiler backend.ml - sample solution to HW3 astlib.ml - pretty printing lexer.mll - oat lexer parser.mly - oat parser runtime.c - oat runtime library studenttests.ml - where your own test cases should go gradedtests.ml - graded test cases that we provide Most important for this project: ---------------------------------- ast.ml - oat abstract syntax *frontend.ml - oat frontend [including most of a solution to HW4] *typechecker.ml - oat typechecker tctxt.ml - typechecking context data structure ----------------------------------
struct Color { int red; int green; int blue; (Color) -> Color f } Color rot(Color c1) { var c2 = new Color{ red = c1.green; green = c1.blue; blue = c1.red; f = c1.f }; return c2; } global c = new Color { red = 10; green = 20; blue = 30 ; f = rot}; int program (int argc, string[] argv) { return c.f(c).red; }and will produce an executable (by default named a.out) that, when linked against runtime.c and then executed produces the resulting output:
./a.out 20
This version of Oat supports all of the features from HW4 but, in addition, support structs and function pointers, and a type system that makes a distinction between "possibly null" and "definitely not null" references.
Oat supports multiple base-types of data: int, bool, and string, as well as arrays of such data. The Oat language is large enough that it is simpler to give the specification of its type system using inference rules than to use English prose. The Oat language specification contains a definition of the language syntax and a collection of inference rules that define Oat type checking.
See the file ast.ml for the OCaml representation of the abstract syntax -- the type typ of types is defined there, along with representations of expressions, statements, blocks, function declarations, etc. You should familiarize yourself with the correspondence between the OCaml representation and the notation used in the specification. The astlib module defines some helper functions for printing Oat programs and abstract syntax.
Oat struct types are declared by using the struct keyword at the top level. For example the following program declares a new struct type named Color with three fields. (Note: there is no trailing semicolon on the last struct field.)
struct Color { int red; int green; int blue } int program (int argc, string[] argv) { var garr = new Color { green = 4; red = 3; blue = 5 }; garr.red = 17; return garr.red + garr.green; }
Struct values can be create by using the new keyword followed by the name of the struct type and a record of field initializers. The order of the fields in the initializers does not have to match the order of the fields as declared, but all of the fields must be present and have the correct type.
Struct values are represented internally as pointers to heap-allocated blocks of memory. This means that structs, like strings and arrays, are reference values for the purposes of compilation.
Struct fields are also mutable. As shown in the sample program above, you can update the value of a struct.
Function Pointers:
This version of Oat supports function pointers as first-class values. This means that the name of a top-level declared function can itself be used as a value. For example, the following program declares a top-level function inc of type (int) -> int and passes it as an argument to another function named call:
int call((int) -> int f, int arg) { return f(arg); } int inc(int x) { return x + 1; } int program(int argc, string[] argv) { return call(inc, 3); }
Function types are written (t1, .., tn) -> tret and, as illustrated above, function identifiers act as values of the corresponding type. Note that such function identifiers, unlike global variables do not denote storage space, and so cannot be used as the left-hand side of any assignment statement. These function pointers are not true closures, since they cannot capture variables from a local scope.
Built-in Functions:
The built-in functions, whose types are given below, can also be
passed as function-pointers:
Possibly null vs. Definitely Not Null References:
The Oat type system makes a distinction between possibly null reference types (which are not statically known to be different from null) and definitely not-null reference types. These features are illustrated in the following code from the ifq3.oat file:
int sum(int[]? arr) { var z = 0; if?(int[] a = arr) { for(var i = 0; i<length(a); i = i + 1;) { z = z + a[i]; } } return z; } int program (int argc, string[] argv) { var x = 0; x = x + sum(new int[]{1,2,3}); x = x + sum(int[] null); return x; }
Here, the sum function takes a possibly null array reference. Possibly null arrays cannot directly be treated as arrays. Instead, the programmer has to insert the appropriate null check using the if? statement, which performs the null check and, if it is successful, creates an alias to the checked value for use as a definitely not null pointer. The rule for typechecking if? works for any possibly null reference types.
Array Initializers:
Once we decide to have definitely not-null types, we need a convenient way to
initialize arrays of reference types (so that we can ensure that the entries
are not null). We thus add support for built-in initializers, that work as shown
below:
var matrix = new int[][3]{i -> new int[3]{j -> i * j}}
This code declares a 3x3 matrix represented as an array of int arrays.
The entry matrix[i][j] is initialized to be i * j. The initializer array syntax is of the general form:
new t[e1]{id -> e2}, where e1 is an integer determining
the size of the array, id names the index, and the initializer expression e2
computes the initial value at each index. This initializer code is semantically
equivalent to allocating an array followed by immediatelly initializing each
element:
Note that e2 can mention the loop index id.
See the typechecking rules for the details about scoping and typechecking.
var a = new int[e1];
for(var id = 0; id < length(a); id = id + 1;) {
a[x] = e2;
}
The main task of this project is to implement a typechecker for the Oat language, which will put enough restrictions on its code to ensure type safety.
The typing rules describing the intended behavior of the typechecker are given in the accompanying Oat v.2 Language Specification. Use that, and the notes in typechecker.ml to get started. We suggest that you tackle this part of the project in this order:
We have provided a mostly complete sample implementation of the frontend of the compiler (corresponding to our solution to HW4). Your task for this part of the project is to add support for structures and function pointers.
Arrays: There are only a few places where the code must be modified, each of which is marked as a ARRAY TASK in the comments. You'll need to add code to handle the length expression (for which you might want to recall the array represenation from the HW4 instructions). You'll also have to implement the array initializers.
Structs: There are only a few places where the code must be modified, each of which is marked as a STRUCT TASK in the comments. Struct compilation is (broadly) similar to how we handle arrayes. The main difference is that you need to use the structure information to look up the field index for the struct representation (rather than computing an integer directly). Follow the STRUCT TASK breadcrumbs left in the frontend and the comments there.
Checked Cast: To implement the if? statement, you'll need to generate code that does a null-pointer check. Since this construct introduces a variable into scope (for the 'notnull' branch), you'll have to handle the context and allocate storage space...
The test harness provided by main.ml gives several ways to assess your code. See the README file for a full description of the flags.
As part of this project, you must post three test cases for the compiler to the course Piazza site.
Two of your test cases must be small "unit tests" for a specific inference rule used in the type system. Here's how it works: Choose a typechecking judgment and write one small, self-contained "positive" test case that succeeds because the inference rules for that judgment permit a particular derivation. Then, write a second, small, self-contained "negative" test case that succeeds because the inference rules for that judgment do not permit a particular derivation.
For example, I could pick the "subtype" inference rule and have the positive test case be testing that indeed, H |- string? <: string? and the negative test case could assert that it is not possible to derive H |- string? <: string. The OCaml code for these two tests would be given by:
let unit_tests = [ "subtype_stringQ_stringQ", (fun () -> if Typechecker.subtype Tctxt.empty (TNullRef RString) (TNullRef RString) then () else failwith "should not fail") ; ("no_subtype_stringQ_stringQ", (fun () -> if Typechecker.subtype Tctxt.empty (TNullRef RString) (TRef RString) then failwith "should not succeed" else ()) ) ]
You might find the typecheck_error and typecheck_correct functions in the gradedtests.ml file useful for writing these unit tests.
Your third test case must take the form of a .oat file along with expected input arguments and outputs (as found in the hard tests of gradedtests.ml). Your test should be an Oat program about the size of those in the hard test cases categories. This case must exercies some of the new features of this version of Oat: structs, function pointers, non-null vs. null references, or, ideally, some combination of these. Oat is now quite a fully-fledged language, so this can be pretty fun. Ideas include: linked-list algorithms (will definitely need null), object-encodings like the color example above, trees, etc. How handy is it to have notation for array initialization? How intrusive do you think the not-null pointer requirement is?
We will validate these tests against our own implementation of the compiler (and clang). A second component of your grade will be determined by how your compiler fairs against the test cases submitted by the other groups in the class.
Your team's grade for this project will be based on: