Princeton University
COS 217: Introduction to Programming Systems
Assignment 5: A SPARC Assembler
Purpose
The purpose of this assignment is to help you understand SPARC assembly
language, SPARC machine language, and the assembly process.
Background
A SPARC assembler reads a source code (.s) file containing SPARC assembly
language code and writes an object (.o) file containing SPARC machine language.
The object file is expressed in the UNIX Executable and Linking Format (ELF). An
ELF file consists of sections; the most common are data, bss, and text.
An assembler makes two passes over the source code. During pass1 the
assembler creates a symbol table. Each binding of the table relates a label to
an offset within a particular section. During pass2 the assembler uses that
symbol table to generate the sections that comprise the object code.
The SPARC assembler that underlies the gcc command is named "as".
Your Task
Your task in this assignment is to use the C programming language to create
an assembler that can process a substantial subset of SPARC assembly language. More precisely...
You are given code that reads a source code file and creates an in-memory
representation of the assembly language program. Your job is to create
code that accepts the in-memory representation, performs two passes over it (as
described above), and thereby produces an in-memory representation of the
program's data, bss, and text sections. Finally, you are given code that accepts
those in-memory sections and writes the object file.
Important note: You must compare the performance of your assembler with that
of the "as" assembler whose full pathname
is /usr/ccs/bin/as. You can verify that you are using that assembler by typing
the command "which as"; the reply should be /usr/ccs/bin/as. If not,
you should change the value of your PATH environment variable accordingly.
The Assembly Language Subset
Your assembler should be able to process the subset of SPARC assembly
language that is described in A Subset of SPARC Assembly
Language for the Assembler Assignment.
Note in particular these features of the subset:
- Only data, bss, and text sections are allowed. In particular, the
rodata
section is excluded.
- Assembly-time expressions must be simple decimal literals. For example, these instructions contain assembly-time
expressions (highlighted) that are excluded:
.skip 4 * 100
.word (18 + 5) * 3
save %sp, (-92 - 8) & -8, %sp
sethi %hi(label), %o0
or %o0, %lo(label), %o0
cmp %l0, 'a'
add %l0, 0x20, %l1
- The first argument of a "set"
instruction must be a label. For example, these instructions are excluded:
set 12345, %l0
set 0, %l0
- Labels can be referenced only within set, branch, and call instructions,
and must be unadorned. For example, these instructions are excluded:
set label + 2, %l0
bg label + 4
call label - 8
.word label
The Modules
You are given modules that accomplish the task of reading the assembly
language program and creating an in-memory representation:
- Lexer (abstract object). The Lexer object is a lexical analyzer. It reads the sequence of
characters that comprises an assembly language source code program, and converts it into a
sequence of tokens.
- Parser (abstract object). The Parser object is a syntactic analyzer. It accepts the sequence of
tokens generated by the Lexer object, and verifies that the sequence is a
syntactically valid assembly language program. While doing so
it calls functions of other modules to translate the assembly language
source code into an in-memory representation.
You also are given modules that comprise the in-memory representation:
- Program (abstract object). Program is the in-memory representation
of the assembly language program. It contains Instr objects.
- Instr (abstract data type). An Instr object is an in-memory
representation of an assembly language instruction. An Instr object has a
name (e.g. add, .global), a type (directive, mnemonic, or label), and Arg
objects.
- Arg (abstract data type). An Arg object is an in-memory
representation of an assembly language argument. An Arg object is either a
register (e.g. %i4, %g0), a literal (e.g. 0, 1, 555), a symbol (alias label)
(e.g.
label1), or a string (e.g. "hello", ".text").
Finally, you are given these modules:
- LabelInfo (abstract data type). A LabelInfo object contains
information about a label: its section, its offset within its section, its
type (local or global), and its label sequence number. The SymTable object
that your pass1 code creates should contain values that are LabelInfo
objects.
- Section (abstract data type). A Section object contains an
in-memory representation of a section. Your pass1 and pass 2 code should
create three Section objects: one for the data section, one for the bss
section, and one for the text section. A Section object contains the byte
length of the section and the code that comprises the section. The text
Section object also may contain RelRecord objects.
- RelRecord (abstract data type). A RelRecord object is a relocation
record through which your assembler can tell the linker about
"patches" that the linker must perform. A RelRecord object
contains the offset of the patch, and a label sequence number that
identifies a label, and the type of the patch. The type is either a 22-bit
displacement patch (appropriate for patching a branch instruction), a 30-bit
displacement patch (appropriate for patching a "call"
instruction), a high-order 22-bit patch (appropriate for patching a "sethi"
instruction that is generated from a "set" instruction), or a
low-order 10-bit patch (appropriate for patching an "or"
instruction that is generated from a "set" instruction).
- ElfGen (abstract object). The ElfGen object accepts the SymTable
object and the three
Section objects, and writes the object file.
- Assembler. The Assembler module contains a main function that
orchestrates the assembly process by calling functions of the other modules.
You should also use your SymTable ADT from previous assignments. If your SymTable ADT is not working, we will provide you with working symtable.h
and symtable.o files at your
request.
You should create modules named Pass1 and Pass2, as described
below.
The Pass1 Module
Pass1 should be an abstract object. The Assembler object will give the Pass1
object (1) a SymTable object, (2) three Section objects: one for the data
section, one for the bss section, and one for the text section, and (3) the
address of a "label sequence number" integer.
The Pass1 module should:
- Fetch data
from the Program, Instr, and Arg objects to create bindings in the given
SymTable object. The Pass1 object should create one binding for each label
that is defined in the program.
Each key should be a label. Each value should be a LabelInfo object.
- Process .global directives, thus affecting the given SymTable object.
- Set the size of each Section object.
- Tell the Assembler object the
next label sequence number to be used. It should do that via the "label
sequence number" address that the Assembler object passes to it. The Assembler object will pass that
integer to your Pass2 object -- in case it needs to add additional bindings to
the SymTable object.
Your Pass1 object may assume that the given assembly language program is semantically valid.
Your Pass1 object should contain no memory leaks. Your Pass1 must free all memory that
it has dynamically allocated. More precisely, the
given modules take care of freeing memory contained within the Program, Instr,
Arg, Section, and SymTable objects. Your Pass1 object must free any
other memory that it dynamically allocated. It is likely that there will be no
such memory.
Testing the Pass1 Module
You can test your Pass1 module using the given example assembly language
programs (01p1data.s, 02p1all.s, and 03p1global.s), the standard UNIX elfdump command, and this procedure:
- Temporarily comment-out all calls to Pass2 functions in the assembler.c
file.
- Run the standard UNIX assembler via the command "as -o one.o
01p1data.s", thus producing the file
one.o.
- Run your assembler via the command "assembler -o two.o 01p1data.s", thus producing the file two.o.
- Execute the command "elfdump -s one.o" to print the symbol table of
one.o to standard output.
- Execute the command "elfdump -s two.o" to print the symbol table
of two.o to standard output.
- Compare the symbol tables. For each
label, the section index (shndx), offset (value), type (bind), and name
fields should be identical.
- Repeat steps 2 to 6 for each of the three assembly language programs
Don't forget to comment-in the calls to Pass2 functions in the assembler.c
file before proceeding with the development and testing of your Pass2 module.
The Pass2 Module
Pass2 should be an abstract object. The Assembler object will give the Pass2
object a SymTable object, three Section objects, and an integer indicating the
next label sequence number.
The primary job of the Pass2 object is to fetch data from the SymTable,
Program, Instr, and Arg objects to add code to the three Section objects. It
should dynamically allocate a block of memory of the appropriate size for the data
section, fill that memory with the appropriate data, and install it into the data
Section object. Similarly it should dynamically allocate a block of memory of the
appropriate size for the text section, fill that memory with the appropriate
data, and install it into the text Section object. The Pass2 object should not
allocate a block of memory for the bss section, and should not install a block
of memory into the
bss Section object.
Another job of the Pass2 object is to add a new binding to the SymTable
object whenever it encounters a label that is referenced but not defined. It should
use the given next label sequence number to do that.
The final task of the Pass2 object is to add relocation records to the
text Section object. Each relocation record should be a RelRecord object.
Your Pass2 object may assume that the given assembly language program is semantically valid.
Your Pass2 object should contain no memory leaks. Your Pass2
must free all memory that it has dynamically allocated. More precisely, the
given modules take care of freeing memory contained within the Program, Instr,
Arg, Section, and SymTable objects. Your Pass2 object must free any
other memory that it dynamically allocated. It is likely that there will be no
such memory.
Testing the Pass2 Module
You can test your Pass2 module using the given example assembly language
programs (04p2directives.s, 05p2arith.s, 06p2log.s, 07p2shift.s, 08p2load.s, 09p2store.s,
10p2control.s, 11p2branch.s, 12p2call.s, 13p2syntheticarith.s, 14p2syntheticlog.s,
15p2syntheticcontrol.s, 16p2relbranch.s, 17p2relcall.s, 18p2relset.s, 19fibonacci.s,
and 20bubblesort.s), the standard UNIX elfdump command, the given objdump program,
and this procedure:
- Execute the command "as -o one.o 04p2directives.s", thus producing the file
one.o.
- Execute the command "assembler -o two.o 04p2directives.s", thus producing the file two.o.
- Execute the command "elfdump -s one.o" to display the symbol
table of one.o to standard output.
- Execute the command "elfdump -s two.o" to display the symbol
table of two.o to standard output.
- Compare the symbol tables. For each
label, the "shndx," "value," "bind," and
"name" fields should be identical.
- Execute the command "objdump -d one.o" to display the
contents of the data section of one.o to standard output.
- Execute the command "objdump -d two.o" to display the
contents of the data section of two.o to standard output.
- Compare the data sections. They should be identical.
- Execute the command "objdump -t one.o" to display the
contents of the text section of one.o to standard output.
- Execute the command "objdump -t two.o" to display the
contents of the text section of two.o to standard output.
- Compare the text sections. Then should be identical.
- Execute the command "elfdump -r one.o" to display the relocation
records of one.o to standard output.
- Execute the command "elfdump -r two.o" to display the relocation
records of two.o to standard output.
- Compare the relocation records. All of the "type" and
"offset" fields should be identical. Many of the "with
respect to" fields should be identical.
- Repeat steps 1 to 14 for each example assembly language program.
- Use you assembler to assemble the 19fibonacci.s and 20bubblesort.s
assembly language programs. Then use gcc to link them, thus producing
executable files. Finally, execute those files. They should execute properly
You will find the given UNIX shell script objdiff useful. Given an assembly
language program, it assembles the program using as and using your assembler,
and then uses the standard UNIX
diff command to compare the symbol table, data section, text section, and
relocation records of the two resulting object files. Thus it implements steps
1-14 as described above.
You will also find the given UNIX shell script grade5 useful. It executes
objdiff for each example assembly language program. It also uses your assembler
to assemble 19fibonacci.s and 20bubblesort.s, uses gcc to link them, and
executes them.
Logistics
You should develop on arizona. Use Emacs to create
source code. Use gdb to debug.
The directory /u/cs217/Assignment5 contains files that are pertinent to the
assignment:
- The source code files for all given modules.
- A makefile. You must add the dependency rules for your Pass1 and Pass2
modules, and for any additional modules that you create. (It is likely
that you will create no additional modules.)
- An executable file for the objdump program.
- The objdiff UNIX shell script.
- Assembly language programs that you can use to test your assembler.
- A UNIX shell script (grade5) that automates the
process of testing your assembler.
- A partial makefile that you can use as the basis for defining your
complete makefile.
You should submit:
- The source code files for your Pass1 and Pass2 modules (pass1.h, pass1.c,
pass2.h, pass2.c).
- The source code files for your SymTable module (symtable.h, symtable.c).
- The source code files for any additional modules that you have created.
- A makefile. The first dependency rule should build your entire program,
compiling with the -Wall, -ansi, and -pedantic options.
- A readme file.
Your readme file should contain:
- Your name.
- A description of whatever help (if any) you received from others while
doing the assignment, and the names of any individuals with whom you
collaborated, as prescribed by the course "Policies" web page.
- If you are submitting your work late, an indication of how many
"free late days" you wish to apply to this assignment.
- Optionally, an indication of how much time you spent doing the
assignment.
- Optionally, your assessment of the assignment.
- Optionally, any information that will help us to grade your work in the
most favorable light. In particular you should describe all known bugs.
You should not submit the given source code files.
Submit your work electronically via the command:
/u/cs217/bin/submit 5 pass1.h pass1.c pass2.h pass2.c
symtable.h symtable.c othersourcecodefiles makefile readme
Grading
As always, we will grade your work on functionality and design, and will consider
understandability to be an important aspect of good design. Defining each
function so it does a single small well-defined task is critical to
understandability, as are function-level
comments in both .h and .c files. To encourage good coding practices,
we will take off points based on warning messages during compilation.
Your grade for the assignment will be based upon these criteria:
Criterion |
Points |
readme file |
3 |
makefile |
3 |
Properly translating the 01p1data.s program |
3 |
Properly translating the 02p1all.s program |
4 |
Properly translating the 03p1global.s program |
4 |
Properly translating the 04p2directives.s program |
4 |
Properly translating the 05p2arith.s program |
3 |
Properly translating the 06p2log.s program |
3 |
Properly translating the 07p2shift.s program |
3 |
Properly translating the 08p2load.s program |
4 |
Properly translating the 09p2store.s program |
4 |
Properly translating the 10p2control.s program |
4 |
Properly translating the 11p2branch.s program |
4 |
Properly translating the 12p2call.s program |
4 |
Properly translating the 13p2syntheticarith.s program |
3 |
Properly translating the 14p2syntheticlog.s program |
3 |
Properly translating the 15p2syntheticcontrol.s program |
3 |
Properly translating the 16p2relbranch.s program |
4 |
Properly translating the 17p2relcall.s program |
4 |
Properly translating the 18p2relset.s program |
3 |
Properly translating the 19fibonacci.s program |
5 |
Properly translating the 20bubblesort.s program |
5 |
Design and understandability |
20 |
TOTAL |
100 |
Note that it will not be possible to give partial credit for the
"Properly translating..." criteria.