The purpose of this assignment is to help you learn about computer architecture, assembly language programming, and testing strategies. It also will give you the opportunity to learn more about the GNU/Unix programming tools, especially bash
, emacs
, gcc217
, and gdb
for assembly language programs.
The assignment consists of two parts, each of which has subparts. We encourage you to complete Part 1 by the end of the first week of the assignment period.
Part 2e, as defined below, is the "on your own" part of this assignment. That part is worth 12% of this assignment.
The Unix operating system has a command named wc
(word count). In its simplest form, wc
reads characters from stdin until end-of-file, and writes to stdout a count of how many lines, words, and characters it has read. A word is a sequence of characters that is delimited by one or more white space characters.
Consider some examples. In the following, a space is shown as s
and a newline character as n
.
If the file named proverb
contains these characters:
Learningsissan treasureswhichn accompaniessitsn ownerseverywhere.n --sChinesesproverbn
then the command:
$ wc < proverb
writes this line to stdout:
5 12 82
If the file proverb2
contains these characters:
Learningsissan treasureswhichn accompaniessitsn ownerseverywhere.n --sssChinesesproverb
(note that the last "line" does not end with a newline character) then the command:
$ wc < proverb2
writes this line to stdout:
4 12 83
The file mywc.c
in the /u/cos217/Assignment4
directory contains a C program that implements the subset of the wc
command described above. Translate that program into assembly language, thus creating a file named mywc.s
. It is acceptable to use global (i.e. bss section or data section) variables in mywc.s
, but we encourage you to use local (i.e. stack) variables instead. Your assembly language program should have exactly the same behavior (i.e. should write exactly the same characters to stdout) as the given C program.
Design a test plan for your mywc
program. Your test plan should include tests in three categories: (1) boundary testing, (2) statement testing, and (3) stress testing.
Create text files to test your programs. Name each such file such that its prefix is mywc
and its suffix is .txt
. The command ls mywc*.txt
should display the names of all mywc
test files, and only those files.
Describe your mywc
test plan in your readme file. Your description should have this structure:
mywc
boundary tests:
mywcXXX.txt
: Description of the characteristics of that file, and how it tests boundary conditions of your mywc
program.
mywcYYY.txt
: Description of the characteristics of that file, and how it tests boundary conditions of your mywc
program.
...
mywc
statement tests:
mywcXXX.txt
: Description of the characteristics of that file, and which statements of your mywc
program it tests. Refer to the statements using the line numbers of the given mywc.c program.
mywcYYY.txt
: Description of the characteristics of that file, and which lines of your mywc
program it tests. Refer to the statements using the line numbers of the given mywc.c program.
...
Your descriptions of the test files should be of the form "This file contains such-and-such characteristics, and so tests lines such-and-such of the program." Please identify the lines of code tested by line numbers. The line numbers should refer to the given C code.
mywc
stress tests:
mywcXXX.txt
: Description of the characteristics of that file, and how it stress tests your mywc
program.
mywcYYY.txt
: Description of the characteristics of that file, and how it stress tests your mywc
program.
...
Submit no more than three large test files. Make sure that each large test file contains no more than 50000 characters. Submitting files containing more than 50000 characters could exhaust the course's allotted disk space on hats, and so could prohibit other students from submitting any files at all. Also make sure that each large test file contains no more than 1000 lines. It would be difficult for your grader to scroll through a file that contains more than 1000 lines.
In a more realistic context, your test files should contain non-ASCII characters. However, in the context of this assignment, submit test files that contain only printable ASCII characters. Specifically, make sure your computer-generated test files contain only characters having ASCII codes (in hexadecimal) 09, 0A, and 20 through 7E. It would be difficult for your grader to examine files that contain other characters.
Finally, create a Bash shell script named testmywc
to automate your mywc
test plan. A Bash shell script is simply a text file that contains commands, and that has been made executable via the chmod
command, for example, chmod 700 testmywc
.
The testmywc
script should build a mywcc
program from the given C code, build a mywcs
program from your assembly language code, execute both programs, and compare the output.
It is acceptable for your testmywc
script to call other scripts that you create. Each such script should have a name that is prefixed with testmywc
. The command ls testmywc*
should display the names of all mywc
test scripts, and only those files.
Feel free to use the testdecomment
and testdecommentdiff
Bash shell scripts from Assignment 1 as models. But note that those scripts assume that the programs to be tested have already been built; in contrast your script(s) should build the programs to be tested.
Many programming environments contain modules to handle high-precision integer arithmetic. For example, the Java Development Kit (JDK) contains the BigDecimal
and BigInteger
classes.
The Fibonacci numbers are used often in computer science. See http://en.wikipedia.org/wiki/Fibonacci_numbers for some background information. Note in particular that Fibonacci numbers can be very large integers.
This part of the assignment asks you to use assembly language to write a minimal but fast high-precision integer arithmetic module, and use it to compute large Fibonacci numbers.
BigInt
Objects Using C CodeSuppose you must compute Fibonacci number 500000, that is, fib(500000)...
The /u/cos217/Assignment4
directory contains a C program that computes Fibonacci numbers. It consists of two modules: a client module and a BigInt
ADT.
The client consists of the file fib.c
. The client accepts an integer x as a command-line argument, where x must be a non-negative integer. The client performs some boundary condition and stress tests, writing the results to stdout. Then it computes and writes fib(x) to stdout as a hexadecimal number. It writes to stderr the amount of CPU time consumed while performing the computation. The client module delegates most of the work to BigInt
objects.
The BigInt
ADT performs high precision integer arithmetic. It is a minimal ADT; essentially it implements only an "add" operation. The BigInt
ADT consists of four files:
bigint.h
is the interface. Note that the ADT makes seven functions available to clients: BigInt_new
, BigInt_free
, BigInt_largest
, BigInt_random
, BigInt_writeHex
, BigInt_writeHexAbbrev
, and BigInt_add
.
bigint.c
contains implementations of the BigInt_new
, BigInt_free
, BigInt_largest
, BigInt_random
, BigInt_writeHex
, and BigInt_writeHexAbbrev
functions.
bigintadd.c
contains an implementation of the BigInt_add
function.
bigintprivate.h
is a "private header file" -- private in the sense that clients never use it. It allows code sharing between the two implementation files, bigint.c
and bigintadd.c
.
Study the given code. Then build a fib
program consisting of the files fib.c
, bigint.c
, and bigintadd.c
, with no optimizations. Run the program to compute fib(500000). In your readme
file note the amount of CPU time consumed.
BigInt
Objects Using C Code Built with Compiler OptimizationSuppose you decide that the amount of CPU time consumed is unacceptably large. You decide to command the compiler to optimize the code that it produces...
Build the fib
program using optimization. Specifically, specify the -D NDEBUG
option so the preprocessor disables the assert
macro, and the -O3
option so the compiler generates optimized code. Run the resulting program to compute fib(500000). In your readme file note the amount of CPU time consumed.
BigInt
Objects Using Assembly Language CodeSuppose, in an attempt to gain speed, you decide to code the BigInt_add
function in assembly language...
Manually translate the C code in the bigintadd.c
file into assembly language, thus creating the file bigintadd.s
. You should not translate the code in other files into assembly language.
Your assembly language code should store all local variables defined in the BigInt_larger
and BigInt_add
functions in memory, on the stack.
Note that assert
is a parameterized macro, not a function. (See Section 14.3 of the King book for a description of parameterized macros.) So assembly language code cannot call assert
. When translating bigintadd.c
to assembly language, simply pretend that the calls of assert
are not in the C code.
Build a fib
program consisting of the files fib.c
, bigint.c
, and bigintadd.s
using the -D NDEBUG
and -O3
options. Run the program to compute fib(x) for various values of x, and make sure it writes the same output to stdout as the program built from C code does. Finally, run the program to compute fib(500000). In your readme file note the amount of CPU time consumed.
BigInt
Objects Using Optimized Assembly Language CodeSuppose, to your horror, you discover that you have taken a step backward: the CPU time consumed by your assembly language code is approximately the same as that of the non-optimized compiler-generated code! So you decide to optimize your assembly language code...
Manually optimize your assembly language code in bigintadd.s, thus creating the file bigintaddopt.s. Specifically, perform these optimizations:
Store local variables (but not parameters) in registers instead of in memory. Remember that EBX, ESI, and EDI are callee-save registers, and that EAX, ECX, and EDX are caller-save registers.
"Inline" the call of the BigInt_larger function. That is, eliminate the BigInt_larger
function, placing its code within the BigInt_add
function.
Build a fib
program consisting of the files fib.c
, bigint.c
, and bigintaddopt.s
using the -D NDEBUG
and -O3
options. Run the program to compute fib(x) for various values of x, and make sure it writes the same output to stdout as the program built from C code does. Finally, run the program to compute fib(500000). In your readme file note the amount of CPU time consumed.
Can you write assembly language code that is approximately as fast as the optimized code that the compiler generates? That is, can you approximately tie the compiler?
BigInt
Objects Using Highly Optimized Assembly Language CodeFinally, suppose you decide to optimize your assembly language code even further, moving away from a statement-by-statement translation of C code into assembly language...
Further optimize your assembly language code in bigintaddopt.s
, thus creating the file bigintaddoptopt.s
. Specifically, perform these optimizations:
Factor as much code as possible out of the loop.
Use the assembly language loop patterns described in Section 3.6.5 of the Bryant & O'Hallaron book instead of the simpler but less efficient patterns described in precepts.
Instead of using a uiCarry
variable (stored in memory or in a register) to keep track of carries during addition, use the "carry" bit in the EFLAGS register. The carry bit is examined and set by the adcl
("add with carry long") instruction.
Feel free to implement any additional optimizations you want. However, your BigInt_add
function must be a general-purpose function for adding two given BigInt
objects; the function cannot be specific to the task of adding two Fibonacci numbers to generate a third Fibonacci number. In other words, your function must work with the given fib.c
client.
This part is difficult; we will not think unkindly of you if you decide not to do it. To do it properly you will need to learn about the adcl
instruction, and about which instructions affect and do not affect the C ("carry") flag in the EFLAGS register.
Hint: When writing bigintaddoptopt.s
, the problem is this: How can you preserve the value of the C flag between executions of the adcl
instruction? One solution is to save and then restore the value of the EFLAGS register. Another solution is to express the logic such that only instructions that do not affect the C flag are executed between each execution of the adcl
instruction.
Build a fib
program consisting of the files fib.c
, bigint.c
, and bigintaddoptopt.s
using the -D NDEBUG
and -O3
options. Run the program to compute fib(x) for various values of x, and make sure it writes the same output to stdout as the program built from C code does. Finally, run the program to compute fib(500000). In your readme file note the amount of CPU time consumed.
Can you beat the compiler?
Develop on hats. Use emacs
to create source code. Use gdb
to debug.
Do not use a C compiler to produce any of your assembly language code. Doing so would be considered an instance of academic dishonesty. Instead produce your assembly language code manually.
We encourage you to develop "flattened" C code (as described in precepts) to bridge the gap between the given "normal" C code and your assembly language code. Using flattened C code as a bridge can eliminate logic errors from your assembly language code, leaving only the possibility of translation errors.
We also encourage you to use your flattened C code as comments in your assembly language code. Such comments can clarify your assembly language code substantially.
You should submit:
Your mywc.s
file.
Your testmywc
script, and any other scripts that it calls.
The data files that test your mywc
program.
Your bigintadd.s
, bigintaddopt.s
and bigintaddoptopt.s
files.
A readme
file.
Your readme
file should contain:
Your name.
A description of whatever help (if any) you received from others while doing the assignment, and the names of any individuals with whom you collaborated, as prescribed by the course "Policies" Web page.
Your mywc
test plan in the format specified above.
The times consumed by the fib
programs, as specified above.
(Optionally) An indication of how much time you spent doing the assignment.
(Optionally) Your assessment of the assignment.
(Optionally) Any information that will help us to grade your work in the most favorable light. In particular you should describe all known bugs.
Submit your work electronically using the commands:
submit 4 mywc.s submit 4 testmywc submit 4 anyBashScriptsCalledByTestmywc submit 4 mywc*.txt submit 4 bigintadd.s submit 4 bigintaddopt.s submit 4 bigintaddoptopt.s submit 4 readme
Please do not submit emacs
backup files, that is, files that end with '~'.
We will grade your code on quality from the user's and programmer's points of view. From the user's point of view, your code has quality if it behaves as it should. The correct behavior of your code is defined by the previous sections of this assignment specification.
From the programmer's point of view, your code has quality if it is well-styled and thereby easy to maintain. Comments in your assembly language code are especially important. Each assembly language function -- especially the main
function -- should have a comment that describes what the function does. Local comments within your assembly language functions are equally important. Comments copied from corresponding "flattened" C code are particularly helpful.
Your bigintaddopt.s
should contain a "register map," that is, comments near the beginning of the BigInt_add
function that indicate which local variables are stored in which registers. Your bigintaddoptopt.s
also should contain a register map.
Your assembly language code should use .equ
directives to avoid "magic numbers." In particular, you should use .equ
directives to give meaningful names to:
Enumerated constants, for example: .equ TRUE, 1
.
Parameter stack offsets, for example: .equ OADDEND1, 8
.
Local variable stack offsets, for example: .equ UICARRY, -4
.
Structure field offsets, for example: .equ ILENGTH, 0
.
Stack offsets at which callee-save registers are stored, for example: .equ EBXOFFSET, -4
.
To encourage good coding practices, we will deduct points if gcc217
generates warning messages.
Testing is a substantial aspect of the assignment. Approximately 10% of the grade will be based upon your mywc
test plan as described in your readme file, and as implemented by your test scripts and data files.
Your grade for the "on your own" part will be based upon raw performance (How quickly does your code compute fib(500000)?) and style (Does your code contain optimizations that logically should improve performance?).
This assignment was written by Robert M. Dondero, Jr.,
based in part upon suggestions from Andrew Appel,
and with contributions by other faculty members.