COS 320: Compiling Techniques, Spring 2019
HW3: LLVMlite compilation
Due: Thursday, March 28 at 11:59pm
Overview
In this project you will implement a non-optimizing compiler for
subset of the LLVM IR language. The source language consists of a
64-bit, simplified subset of the LLVM IR that we call LLVMlite. The
target language is
X86lite as defined in HW2.
Note: You may work alone or in pairs for
this project.
Getting Started
To get started, download the project source files. The files included
in
hw03.zip are briefly described below. Those marked with
* are the only ones you should need to modify while
completing this assignment.
README - help about the main test harness
Makefile - basic make support for invoking ocamlbuild
util/assert.ml(i) - the assertion framework
util/platform.ml - OS platform-specific compilation support
/x86/x86.ml(i) - the X86lite instruction represenation from HW2
ll/ll.ml - the abstract syntax for LLVMlite
ll/lllexer.mll - lexer for LLVMlite syntax
ll/llparser.mly - parser generator for LLVMlite syntax
ll/llinterp.ml - reference interpreter for the LLVMlite semantics
main.ml - command-line interface
driver.ml - invoking the compiler pipeline
llprograms/*.ll - example .ll programs used in testing
cinterop.c - c code for testing interoperability
gradedtests.ml - graded test cases that we provide
studenttests.ml - where your additional test cases should go
*backend.ml - where you implement the LLVMlite to X86 compiler
Note: You'll need to have
Menhir
intalled in order to build the LLVMlite parser. The easiest way to do this is to use
OPAM
via the command
opam install menhir.
Note: You'll also need to add the
nums, str, and unix libraries to compile this project. With
OCamlBuild, you should be able to compile the project from the command line by running
ocamlbuild -Is ll,x86,util -libs nums,unix,str -use-menhir main.native
Preliminary Steps
- Skim through the rest of this web page to get a sense of what it contains.
- Familiarize yourself with the information in the
README, which explains the ways that you can run your compiler for
testing purposes.
- Then take a look at driver.ml, particularly the code related to
process_ll_file to see how the backend code fits into the overall
compilation pipeline.
- Then start working through backend.ml, following the
instructions below.
WARNING: This project is potentially very
difficult to debug and may take you a while to understand. GET STARTED EARLY!
LLVM Lite Specification
The source language for this 'backend' part of the full compiler is
a subset of the LLVM IR called LLVM Lite. You may find the
LLVM Language
Reference to be a useful resource for this project, though we
are concerned with only a small portion of the full LLVM feature
set.
The LLVM Lite Specification
describes the behavior of LLVM programs in terms of an abstract
semantics that is not target specific. This semantics is intended
to be faithful to the LLVM Language Reference.
Implementing the Compiler
The code we provide in
backend.ml is a
minimal skeleton of the basic structure of the compiler. To a first
approximation, for each part
foo of the
abstract syntax (such as
prog or
fdecl), there is a corresponding
compile_foo function (i.e.
compile_prog or
compile_fdecl). Most of these defintions have
been left unimplemented (and a few have been left out). Your job is
to complete this translation. Our reference solution is well under
350 lines of documented code, so if your implementation is
significantly longer than this, you may wish to seek help.
The file backend.ml contains additional
hints an explanations about the compilation strategy that we suggest
you use.
We suggest that you stage the development of your compiler like
this:
- First get a minimal implementation of compile_fdecl working so that you can compile
functions with empty bodies but varying numbers of input
parameters. To do this, you'll need to understand the System V
AMD64 ABI calling conventions (see Wikipedia
for one explanation), then understand the notion of a layout and complete the arg_loc function. At this point, the X86 code
you generate won't be able to run because the code for the
compiled function does not exit propertly. (But you can still look
at the generated assembly code to see whether it looks reasonable.)
- Next implement enough of the compile_terminator
function to handle (void) functions that return no results.
Similarly, implement enough of compile_block to handle blocks with no
instructions. At this point, your compiler should be able to
generate working code for an LLVM function like that found in
returnvoid.ll:
define void @main(i64 %argc, i8** %argv) {
ret void
}
(Note, this isn't part of the test suite, since the value
"returned" to the shell when this program runs isn't well defined.)
- Understand the notion of the ctxt
type and develop a strategy for storing uid locals. See the comments in the backend.ml file. Implement the compile_operand function.
- Implement the Binop case for
compile_insn (which, if you follow the
suggested method of compiling locals, will use compile_operand).
- At this point, you probably want to revisit compile_fdecl and compile_block to adjust them to deal properly
with contexts and non-empty control-flow graphs / blocks.
- Next go back and implement the rest of the cases for
compile_terminator. At this point, your
compiler should be able to handle functions that return int64
values and that contain simple arithmetic and direct jumps.
- Implement the translation of Icmp in
compile_insn, followed by
Alloca, Load, and
Store.
- Next tackle the Call instruction.
The code you generate must properly handle the System V AMD64 ABI
calling conventions, but note that we care only about 64-bit values.
After successfully completing this step, your compiler should be
able to handle the recursive factorial function definition.
- Breathe a sigh of relief at how easy it is to implement
Bitcast, because the target language is
untyped.
- Finally, gather your courage, and implement the Gep (getelementptr)
instruction.
Testing and Debugging Strategies
Testing and debugging a compiler is quite difficult. There are many
correct potential translations of a given source program, and
there are many incidental changes (such as the choice of label
names) that do not affect the semantics of the generated code. It
is also difficult to test parts of the translation independently,
since simple inputs may depend on almost all of the compilation
pipeline.
The test harness provided by main.ml
gives several ways to assess your code. See the README file for a full description of the flags.
We have provided a (minimally-featured) parser for LLVMlite
code. It is sufficiently complete to parse the examples in the
llprograms directory, and we expect you to
create additional test cases yourself. For examples of how to use
the test driver infrastructure, see the gradedtests.ml file.
You may find it helpful to run the LLVMlite code using our
reference interpreter (with the --interpret-ll flag).
You may also find it helpful to run the LLVMlite code by
compiling it via clang (with the --clang
flag).
Note that it is not very useful to directly compare the .s files
prodcued by your compiler to those produced by clang, but the behavior
of the two versions for the same inputs should be the same.
Graded Test Cases
As part of this project, you must post an interesting test case for
the compiler to the course Piazza site. This test case might take
the form of a .ll file along with expected
outputs (as in our automated tests), or it might start from
hand-generated LLVMlite abstract syntax.
The test case you submit to Piazza will not count if it is
too similar to previously-posted tests! Your test should be
distinct from prior test cases. (Note that this policy encourages
you to submit test cases early!) Tests that stress parts of
the language that aren't well exercised by the provided tests are
particularly encouraged.
Your submitted test should be easy to drop in to
the testing harness: ideally, it's a small amount of OCaml code,
plus a single LLVMlite file. If your test case requires supporting C
code (as some of our larger tests do), you can post that to Piazza
along with your test.
We will validate these tests against our own implementation of
the compiler (and clang). A second component of your grade will be
determined by how your compiler fairs against the test cases
submitted by the other groups in the class.
Grading
Projects that do not compile will receive no credit!
Your team's grade for this project will be based on:
-
90 Points: the various automated tests that we provide. (Some
reserved for online grading.)
-
5 Points for posting an interesting test case to Piazza. (Graded manually.)
-
5 Points divided among the test cases created by other
groups. (Graded manually.)