This is a group project. Teams of size 1 or 2 are acceptable. If you are working in a team of two, only one student needs to submit the project.
Many of the files in this project are taken from the earlier projects. The new files (only) and their uses are listed below. Only those preceeded with an asterisk will need to be modified for this assignment.
llprograms/*.ll - example .ll programs used in testing atprograms/*.oat - example .oat programs used in testing datastructures.ml - set and map modules (enhanced with printing) cfg.ml - "view" of LL control-flow graphs as dataflow graphs analysis.ml - helper functions for propagating dataflow facts *solver.ml - the general-purpose iterative dataflow analysis solver *alias.ml - alias analysis *dce.ml - dead code elimination optimization *constprop.ml - constant propagation analysis & optimization liveness.ml - provided liveness analysis code analysistests.ml - test cases (for liveness, constprop, alias) opt.ml - optimizer that runs dce and constprop *backend.ml - you will implement register allocation heuristics here registers.ml - collects statistics about register usage ---------------------------------------------------------------------------- (used in developing our tests, but perhaps useful to you:) printanalysis.ml - a standalone program to print the results of an analysis
-O1 : runs two iterations of (constprop followed by dce) --liveness {trivial|dataflow} : select which liveness analysis to use for register allocation --regalloc {none|greedy|better} : select which register allocator to use --print-regs : print a histogram of the registers used
The OAT compiler we have developed so far produces fairly inefficient code, since it performs no optimizations at any stage of the compilation pipeline. In this project, you will implement several simple dataflow analyses and some optimizations at the level of our LLVMlite intermediate representation in order to improve code size and speed.
The provided code makes extensive use of modules, module signatures, and functors. These aid in code reuse and abstraction. If you need a refresher on OCaml functors, we recommend reading through Chapter 9 of Real World OCaml.
In datastructures.ml, we provide you with a number of useful modules, module signatures, and functors for the assignment, including:
Your first task is to implement a version of the worklist algorithm for solving dataflow flow equations presented in lecture. Since we plan to implement several analyses, we'd like to reuse as much code as possible between each one. In lecture, we saw that each analysis differs only in the choice of the lattice, the transfer (or "flow") function, and the direction of the analysis. We can take advantage of this by writing a generic solver as an OCaml functor and instantiating it with these parameters.
Assuming only that we have a directed graph where each node is labeled with a dataflow fact and a flow function, we can compute a fixpoint of the flow on the graph as follows:
let w = new set with all nodes repeat until w is empty let n = w.pop() old_out = out[n] let in = combine(preds[n]) out[n] := flow[n](in) if (!equal old_out out[n]), for all m in succs[n], w.add(m) end
Here equal, combine and flow are abstract operations that will be instantiated with lattice equality, meets or joins and the function computed by the gen and kill sets of the analysis, respectively. Similarly, preds and succs are the graph predecessors and successors in the flow graph, and do not correspond to the control flow of the program. They can be instantiated appropriately to create a forwards or backwards analysis.
Be sure to review the comments in the DFA_GRAPH and FACT module signatures in solver.ml, which define the parameters of the solver. Make sure you understand what each declaration the signature does -- your solver will need to use each one (other than the printing functions)! It will also be helpful for you to understand the way that cfg.ml connects to the solver. Read the commentary there for more information.
Your first task is to fill in the solve function in the Solver.Make functor in solver.ml. The input to the function is a flow graph labeled with the initial facts. It should compute the fixpoint and return a graph with the corresponding labeling. You will find the set datatype from datastructures.ml useful for manipulating sets of nodes.
To test your solver, we have provided a full implementation of a liveness analysis in liveness.ml. Once you've completed the solver, the liveness tests in the test suite should all be passing. These tests compare the output of your solver on a number of programs with pre-computed solutions in analysistest.ml. Each entry in this file describes the set of uids that are live-in at a label in a program from ./llprograms. To debug, you can compare these with the output of the Graph.to_string function on the flow graphs you will be manipulating.
The goal of this task is to implement a simple dead code elimination optimization that can also remove store instructions when we can prove that they have no effect on the result of the program. Though we already have a liveness analysis, it doesn't give us enough information to eliminate store instructions: even if we know the UID of the destination pointer is dead after a store and is not used in a load in the rest of the program, we can not remove a store instruction.
The problem is that there may be different UIDs that name the same stack slot -- in other words it may have aliases. There are a number of ways this can happen after a pointer is returned by alloca:
Some pointers are never aliased. For example, the code generated by the Oat frontend for local variables. We can find such uses of alloca using a simple version of an alias analysis.
We have provided some code to get you started in alias.ml. You will have to fill in the flow function and lattice operations. The type of lattice elements, fact is a map from UIDs to symbolic pointers of type SymPtr.t. Your analysis should compute, at every program point, the set of UIDs of pointer type that are in scope and, additionally, whether that pointer is the unique name for a stack slot according to the rules above. See the comments in alias.ml for details.
Now we can use our liveness an alias analyses to implement a dead code elimination pass. We will simply compute the results of the analysis at each program point, then iterate over the blocks of the CFG removing any instructions that do not contribute to the output of the program.
In dce.ml, you will only need to fill out the dce_block function that implements these rules.
Compilers don't often directly emit dead instructions like the ones found by the analysis in the previous section. Instead, dead code is often produced as a result of other optimizations that execute parts of the original program at compile time, for instance constant propagation. In this section you'll implement a simple constant propagation analysis and constant folding optimization.
Start by reading through the constprop.ml. Constant propagation is similar to the alias analysis from the previous section. Dataflow facts will be maps from UIDs to the type SymConst.t, which corresponds to the lattice from the lecture slides. Your analysis will compute the set of UIDs in scope at each program point, and the integer value of any UID that is computed as a result of a series of binop and icmp instructions on constant operands. More specifically:
The backend implementation that we have given you provides two basic register allocation stragies:
For this task, you will implement a better register allocation strategy that makes use of the liveness information that you compute in Task I. Most of the instructions for this part of the assignment are found in backend.ml, where we have modified the code generation strategy to be able to make use of liveness information. The task is to implement a single function better_layout that beats our example "greedy" register allocation strategy. We recommend familiarizing yourself with the way that the simple strategies work before attempting to write your own allocator.
The compiler now also supports several additional command-line switches that can be used to select among different analysis and code generation options for testing purposes:
--print-regs prints the register usage statistics for x86 code --liveness {trivial|dataflow} use the specified liveness analysis --regalloc {none|greedy|better} use the specified register allocator
For testing purposes, you can run the compiler with the -v verbose flag and/or use the --print-regs flag to get more information about how your algorithm is performing. It is also useful to sprinkle your own verbose output into the backend.
The goal for this part of the homework is to create a strategy such that code generated with the --regalloc better --liveness dataflow flags is "better" than code generated using the simple settings, which are --regalloc greedy --liveness dataflow. See the discussion about how we compare register allocation strategies in backend.ml. The "quality" test cases report the results of these comparisons.
Of course your register allocation strategy should produce correct code, so we still perform all of the correctness tests that we have used in previous version of the compiler. Your allocation strategy should not break any of these tests -- and you cannot earn points for the "quality" tests unless all of the correctness tests also pass.
Of course we want to understand how much of an impact your register allocation strategy has on actual execution time. For the final task, you will create a new OAT program that highlights the difference. There are two parts to this task.
Post an OAT program to piazza. This program should exhibit significantly different performance when compiled using the "greedy" register allocation strategy vs. using your "better" register allocation strategy with dataflow information. See the file atprograms/regalloctest.oat for an uninspired example of such a program. Yours should be more interesting.
Test your compiler on at least these three programs:
Report the processor and OS version that you use to test. For best results, use a "lightly loaded" machine (close all other applications) and average the timing over several trial runs.
The example below shows one interaction used to test the
~/cis341/hw/hw.regalloc/soln> ./main.native --liveness trivial --regalloc none llprograms/matmul.ll ~/cis341/hw/hw.regalloc/soln> time ./a.out real 0m1.647s user 0m1.639s sys 0m0.002s ~/cis341/hw/hw.regalloc/soln> ./main.native --liveness dataflow --regalloc greedy llprograms/matmul.ll ~/cis341/hw/hw.regalloc/soln> time ./a.out real 0m1.127s user 0m1.123s sys 0m0.002s ~/cis341/hw/hw6/soln> ./main.native --liveness dataflow --regalloc better llprograms/matmul.ll ~/cis341/hw/hw6/soln> time ./a.out real 0m0.500s user 0m0.496s sys 0m0.002s ~/cis341/hw/hw.regalloc/soln> ./main.native --clang llprograms/matmul.ll ~/cis341/hw/hw.regalloc/soln> time ./a.out real 0m0.061s user 0m0.053s sys 0m0.004s
Your team's grade for this project will be based on: