This is a group project. Teams of size 1 or 2 are acceptable. If you are working in a team of two, only one student needs to submit the project.
util/assert.ml(i) - the assertion framework util/platform.ml - platform-specific compilation support util/range.ml(i) - range datatype for error messages ll/ll.ml - the abstract syntax for LLVMlite ll/llutil.ml - name generation and pretty-printing for LLVMlite ll/lllexer.mll - lexer for LLVMlite syntax ll/llparser.mly - parser generator for LLVMlite syntax ll/llinterp.ml - reference interpreter for the LLVMlite semantics /x86/x86.ml - the X86lite language used as a target README - help about the main test harness Makefile - basic make support for invoking ocamlbuild atprograms/*.oat - example .oat v.1 programs used in testing main.ml - main test harness driver.ml - utilities for invoking the compiler ast.ml - oat abstract syntax astlib.ml - pretty printing backend.ml - sample solution to HW3 *lexer.mll - oat lexer *parser.mly - oat parser *frontend.ml - oat frontend runtime.c - oat runtime library *studenttests.ml - where your own test cases should go gradedtests.ml - graded test cases that we provide progasts.ml - helper ast representations for parser tests
int fact(int x) { var acc = 1; while (x > 0) { acc = acc * x; x = x - 1; } return acc; } int program(int argc, string[] argv) { print_string(string_of_int(fact(5))); return 0; }and will produce an executable (by default named a.out) that, when linked against runtime.c and then executed produces the resulting output:
./a.out 120
See the file ast.ml for the OCaml representation of the abstract syntax -- the type typ of types is defined there, along with representations of expressions, statements, blocks, function declarations, etc. You should familiarize yourself with the correspondence between the OCaml representation and the notation used in the specification. The astlib module defines some helper functions for printing Oat programs and abstract syntax.
This version of Oat will not be safe — it is possible to access an array out of bounds or to call a function with the incorrectly typed arguments. The next version of Oat will address these issues and add some other missing features. In particular, although the grammar gives a syntax for function types, this version of Oat does not need to support function pointers; these are included in anticipation of the next project.
A complete Oat program contains a function called program with type (int, string[]) -> int, which takes command-line arguments like the C main function and is the entry-point of the executable. The file runtime.c defines the Oat standard library, which provides a few universally available functions, mostly for doing I/O and working with strings.
Global values:
Oat supports globally-declard variables with a
limited set of initializers, including just base values (integer and boolean
constants and null) and array literals. Unlike LLVM, Oat global initializers
can't contain identifiers that refer to other global values.
Expression forms:
Oat supports several forms of expressions,
including all the usual binary and unary operations. Boolean values are true and false. Integer values have
type int are they are 64-bits. Oat uses different
syntax to distinguish boolean logical operations b1 &
b2 (and) and b1 | b2 (or) from bit-wise int
operations i1 [&] i2 (bitwise or) and i1 [|] i2 (bitwise or). (This difference from a C-like
language is necessitated by the the lack of casts and overloading.)
Arrays:
Oat supports arrays whose elements are of any type, including nested arrays. The expression
new typ [len] creates a new
zero-initialized array of length
Literal arrays are written using braces, and can appear as expressions inside a function:
var vec = new int[]{1, -2, 3+1, f(4)};or as global initializers:
global arr = int[]{1, 2, 3, 4};
Arrays are mutable, and they can be updated using assignment notation: vec[0] = 17. Array indices start at 0 and go through len - 1, where len is the length of the array. Oat arrays (unlike C arrays) also store a length field to support array-bounds checks, which we will add in a future project. For this project, you do not have to implement bounds checking.
Arrays in Oat are represented at the LL IR level by objects of the LL type {i64, [0 x t]}*, that is, an array is a pointer to a struct whose first element is an i64 value that is the array's length, and whose second component is a array of elements of type t. See the translation of Oat types into LL types via the cmp_ty function.
This array representation is similar to that used in OCaml or Java, which do not allow "inlined" multidimensional arrays as in C. We choose this representation to facilitate array-bounds checking (which we will inplement in HW) The length information is located before the 0th element of the array. For example, the following array would be represented as a pointer to memory as shown below:
int[]{1,2,3,4}; arr --+ | v [4][1][2][3][4]
We will exploit this array representation that includes length data in the next assignment, when we use a type system to make it a safe language.
Left-Hand-Side Expressions: As usual in imperative languages, local and global identifiers denote mutable locations, they can also appear to the left of an assignment operation. In the example below, the identifier x appears in both ways:
x = x + 1;On the right-hand-side of the assignment, x is implicitly dereferenced to obtain its value, whereas on the left hand side, it stands for the location where the value of x is stored. For example, in our LLVMlite IR, each Oat local identifier will correspond to an alloca'd value on the stack, accessed through a pointer. Similarly, the ith array index denotes both the value stored in the array and the corresponding location in memory.
myarr[i] = myarr[i] + 1;In this case, myarr can be an arbitrary expression that evaluates to an array, including function calls or an index into an array of arrays. For example the code below shows that it is legal to index off of a function call expression, as long as the function returns an array.
int[] f(int[] x, int[] y, bool b) { if ( b ) { return x; } else { return y; } } global x = int[]{1, 2, 3}; global y = int[]{4, 5, 6}; int program (int argc, string[] argv) { f(x, y, true)[0] = 17; /* non-trivial lhs */ var z = f(x, y, true)[0] + f(y, x, false)[0]; return z; /* returns the value 34 */ }
Strings:
Oat supports C-style immutable strings, written "in quotes". For
the moment, the string operations are very limited and mostly
provided by built-in functions provided by the runtime system.
Built-in Functions:
We now have enough infrastructure to support interesting built-in
operations, including:
The first step in implementing the frontend of the compiler is to get the lexer and parser working. We have provided you with a partial implementation in the lexer.mll and parser.mly files, but the provided grammar is both ambiguous and missing syntax for several key language constructs. Complete this implementation so that your frontend can parse all of the example atrprograms/*.oat files.
The full grammar is given in the Oat v.1 Language Specification.
You need to:
./main.native -S --print-oat file.oat
To a first approximation, there is one compilation function for each type of inference rule. The inputs to these functions are the static context and the piece of syntax (and its type) to be compiled. The output of such a function depends on which part of the program you are compiling: expressions evaluate to values, so their compilation function returns the code computing an operand; statements do not evaluate to values, but they do introduce new local variables, so their compilation function returns a new context and an instruction stream.
The test harness provided by main.ml gives several ways to assess your code. See the README file for a full description of the flags.
As part of this project, you must post an interesting test case for the compiler to the course Piazza site. This test case must take the form of a .oat file along with expected input arguments and outputs (as found in the hard tests of gradedtests.ml).
The test case you submit to Piazza will not count if it is too similar to previously-posted tests! Your test should be distinct from prior test cases. (Note that this policy encourages you to submit test cases early!)
We will validate these tests against our own implementation of the compiler (and clang). A second component of your grade will be determined by how your compiler fairs against the test cases submitted by the other groups in the class.
Your team's grade for this project will be based on: