The purpose of this assignment is to help you learn (1) how programs are represented in ARMv8 machine language, (2) how ARMv8 stack frames are structured in memory, and (3) how ARMv8 programs can be vulnerable to buffer overrun attacks.
Since the switch to the ARMv8 architecture and assembly language, students have reported taking, on average, 12 hours to complete this assignment.
You may work with one partner on this assignment. You need not work with a partner, but we prefer that you do. Your preceptor will help you with your partner search, if you so desire.
The "A+" attack (as defined below) is the challenge part of this assignment. While doing the challenge part of the assignment, you are bound to observe the course policies regarding assignment conduct as given in the course Policies web page, plus one additional policy: you may not use any "human" sources of information. That is, you may not consult with the course's staff members, the Intro Lab TAs, other current students via Ed, or any other people while working on the challenge part of an assignment, except for clarification of requirements. However you may consult with your partner, with no limitations.
As noted below, the challenge part is worth 10 percent of this assignment. So if you don't do the challenge part and all other parts of your assignment solution are perfect and submitted on time, then your grade for the assignment will be 90 percent.
We will provide a "grader" program, both source code (grader.c
) and executable binary code (grader
). The file grader
was produced from grader.c
using this command:
gcc217 -O -fomit-frame-pointer grader.c -o grader
In short, the -O
and -fomit-frame-pointer
options instruct the compiler to generate assembly code that, in this case, opens the door to some of the attacks you will construct.
The program asks you for your name, and writes something like this (where the user input
and program output
are indicated by font style):
$ ./grader What is your name? Bob D is your grade. Thank you, Bob.
However, the author of the program inexplicably forgot to do bounds-checking on the array into which the program reads the input, and so the program is vulnerable to a buffer overrun (alias buffer overflow) attack.
Your task is to attack the given program by exploiting its buffer overrun vulnerability. More specifically, you will provide input data to the program so that it writes something more like this:
$ ./grader < dataB What is your name? B is your grade. Thank you, Bob.
As you can see from reading the program, it is designed to give the B grade if the user's name is Andrew Appel, but not if the user's name is anything else. However, it is programmed sloppily: it reads the input into a buffer, but forgets to check whether the input fits. This means that a too-long input can overwrite other important memory, and you can trick the program into giving you a B even though your name is not Andrew Appel.
Here's another example:
$ ./grader < dataA What is your name? A is your grade. Thank you, Bob.
As you can see from reading the program, it is designed not to give anyone an A under any circumstances. However, again, it is programmed sloppily. A too-long input can overwrite other important memory, and you can trick the program into giving you an A.
In fact, because of the program's buffer overrun vulnerability, you even can trick the program into giving you an A+:
$ ./grader < dataAplus What is your name? A+ is your grade. Thank you, Bob.
Develop your code in whatever development environment and on whatever computer you see fit, but be careful to design your code to be run on armlab, and do all testing and debugging there.
As with the other assignments, you will create your own repository, separate from ours, that has the same contents as ours. As a reminder, you can complete Setup Step 5 from the Git and GitHub Primer or follow these procedures to import your own version of our contents into your GitHub account.
The repository that you will import into your own
is:
https://github.com/COS217/Overrun.
This repository contains the files that you will need. Subsequent parts of this document describe them. Once you have your working copy, complete the parts of the assignment given below, in the order of their appearance.
Study the givenMakefile
. Better yet, actually use it throughout your work on this assignment! Using it could save you lots of typing. It also has a safeguard against generating a subtly different executable that would have different addresses throughout, which, if used as the basis for your development, would result in your attacks not working on our originalgrader
.
Copy these sentences to your readme
file, and fill in the blanks so the sentences are correct:
According to 18 U.S. Code 1030, if you were to use a buffer overrun attack to commit fraud or related activity in connection with computers, but did not attempt to cause death and did not knowingly or recklessly cause death, then you could receive a maximum penalty of _____ in prison.
According to 18 U.S. Code 1030, if you were to use a buffer overrun attack to commit fraud or related activity in connection with computers, and attempted to cause death or knowingly or recklessly caused death, then you could receive a maximum penalty of _____ in prison.
It's fine to do a web search to complete Step 1 of the assignment.
Create a text file named memorymap
. Begin your memorymap
file with your name(s).
Take the grader
executable binary file that we have provided you, and use gdb
on armlab to analyze its sections.
Analyze the text section by issuing this x
command:
$ gdb grader (gdb) x/67i readString
Then copy the resulting 67 lines of assembly language code into your memorymap
file. (Be careful: gdb
displays the lines one windowfull at a time, so you must press the <Enter> key to see all 67 lines.)
emacs
on armlab will often yield
strange indentation. Instead, you should paste into a text
editor on your local machine or into the pico
editor
on armlab. To use pico
, issue the following command:
$ pico memorymapTo save your pasted contents and exit
pico
, use
<Ctrl-x>
then at the prompts enter
Y
and confirm the filename.
Next annotate the lines to explain them. Do not annotate every line of assembly language code. Instead, cluster the lines of assembly language code into "paragraphs," and annotate each paragraph. Your analysis must have this format:
Annotation Line of assembly language code Line of assembly language code ... <blank line> Annotation Line of assembly language code Line of assembly language code ... <blank line> ...
Use these 7 annotations (and only these 7 annotations, exactly as they appear below) in the readString
function:
Prolog First loop setup First loop buf[i] = '\0' Second loop setup Second loop Epilog and return
Use these 4 annotations (and only these 4 annotations, exactly as they appear below) in the getName
function:
Prolog printf("What is your name?\n"); readString(); Epilog and return
Use these 8 annotations (and only these 8 annotations, exactly as they appear below) in the main
function:
Prolog mprotect(...); getName(); if (strcmp(name, "Andrew Appel") != 0) skip assignment to grade grade = 'B'; printf("%c is your grade.\n", grade); printf("Thank you, %s.\n", name); Epilog and return 0
The gcc
-generated assembly language you must explain will have some unfamiliar features and instructions that our hand-coded examples have not included:
!
appearing after a dereference operand in a load or store), combines two operations: computing an address, like we have seen with immediate offset memory operands, but updating the value of the register to be that computed address before completing the load or store. It is introduced in in section 3.3 of your Pyeatt with Ughetta textbook.,
and another immediate operand appearing after a dereference operand in a load or store), again combines two operations, but as its name suggests, in the opposite order: loading or storing a value from an address held in a register, but then updating the value of the register after the load or store is completed. It is also introduced in in section 3.3 of your Pyeatt with Ughetta textbook.ldp
, stp
, ldurb
, and sturb
instructions are variations on loading and storing. They are introduced in section 3.4 of your Pyeatt with Ughetta textbook.
adrp
instruction, when combined with an addition, does the same thing as adr
, but in a more universally applicable manner. This is detailed in section 3.5 of your Pyeatt with Ughetta textbook.
ccmn
instruction is a conditional variation on comparison. This is detailed in section 4.2 of your Pyeatt with Ughetta textbook.
Analyze the data section by issuing these gdb
commands:
$ gdb grader (gdb) break main (gdb) run (gdb) print &grade (gdb) x/xb &grade
Place a table in your memorymap
file showing the layout of the data section. The table must have three columns: Address (in hex), Content (in hex), and Description. The table must contain one row for each byte in the data section. Since the data section contains exactly one byte, the table must contain exactly one row.
Analyze the bss section by issuing this gdb
command, which will give you the address of the first byte of your name
array:
$ gdb grader (gdb) print &name
Place a table in your memorymap
file showing the layout of the bss section. The table must have two columns: Address (in hex) and Description. The table must contain one row for each byte in the bss section, that is, one row for each byte of the name
array.
At the start of program execution, the content of the name
array will be zeros. Later during program execution, the name
array will contain more interesting data.
Compose your memory map of the bss section before you implement your "A" attack (as described below). The table in your memory map must describe the content of the name
array as you wish it to be during your "A" attack. Thus your memory map of the bss section will help you to compose your "A" attack.
For your sake, it's fine to add another column to your memory map describing the content of the name
array as you wish it to be during your "A+" attack. Thus your memory map of the bss section will help you to compose your "A+" attack. But you are not required to add that column.
Using your analysis of the text section, compose an analysis of the stack section. Place a table in your memorymap
file showing the layout of the stack. The table must have two columns: Offset and Description. Each offset must be expressed as a non-negative offset relative to the SP register when it is set to the top of the readString
function's stackframe, and each row must represent 8 bytes. Thus, the first row must have offset 0 (i.e., where SP points into readString
's stackframe), the second row must have offset 8, the third row must have offset 16, and so forth. The table must show the content of the stack from the top value (whatever readString
places at sp
) through the saved value of the X30
register that was pushed onto the stack by the getName
function.
You'll discover that the stack begins with the content of some registers that have been saved to the stack; each row must have a description with the name of the register whose content is stored at that spot in the stack. Then the stack contains the buf
array; each row that comprises the buf
array (there will be 6, in total, since buf is 48 bytes long and each row is 8 bytes) must have the description "buf[i]-buf[j]", where i and j represent the indices into buf stored in that location on the stack. Finally the stack contains the content of the X30
register that was stored by the getName
function; that row must have the description "getName's X30".
Compose a C program named createdataB.c
that produces a file named dataB
, as short and simple as possible, that causes the grader
program to write your name and recommend a grade of "B". You can see by reading the program that, if your name is Andrew Appel, that's very easy to do. But that's not easy to do if your name isn't Andrew Appel! To receive full credit the dataB
file must cause the grader
program to generate output whose format is indistinguishable from normal output.
The createdataB.c
program must write to the dataB
file; it must not write to stdout
, nor should it read from stdin
Your createdataB.c
file must contain these comments:
A file comment, that is, a comment at the beginning of the file providing the file's purpose, i.e., to produce a file that will earn a B, with a brief description of the principles of operation for how the produced file's contents will do so. Here is an example, which you may use verbatim: "Produces a file called dataB with the student name, a nullbyte, padding to overrun the stack, and the address of the instruction in main to get a B, the latter of which will overwrite getName's stored x30."
A function comment describing the "I/O behavior" of the main
function. That is, the function comment must describe what the main
function accepts as command-line arguments, reads from stdin
or any other stream, writes to stdout
, stderr
, or any other stream, and returns.
Local comments, that is, comments throughout your program that thoroughly describe the bytes that the program writes to the dataB
file.
It is acceptable to use "magic numbers" throughout yourcreatedataB.c
file.
For the "B" attack it's OK to truncate your name(s) if necessary.
After creating yourdataB
file, issue the commandxxd dataB
to confirm that the file contains exactly the bytes that you want it to contain.
The given miniassembler.h
interface file declares
four functions that generate machine language
instructions. Comments in the file describe the functions. The
given miniassembler.c
implementation file defines one
of those functions, but not the static
helper
function MiniAssembler_setField
that it calls. You
will implement the helper function to accord to its supplied
function comment, as well as the other three functions from the
interface, thus completing the miniassembler.c
implementation file.
The given testminiassembler.c
file tests
the MiniAssembler
module. Use the MiniAssembler
module and testminiassembler.c
to build a program
named testminiassembler
. Run the program, and compare
its output to the given testminiassembler.ref
file to
make sure the function definitions in your MiniAssembler module
are correct.
Compose a C program named createdataA.c
that produces a file named dataA
, as short and simple as possible, that causes the grader
program to write your name and recommend a grade of "A". To receive full credit the dataA
file must cause the grader
program to generate output whose format is indistinguishable from normal output. Your program must call the four functions that are declared in miniassembler.h
.
The createdataA.c
program must write to the dataA
file; it must not write to stdout
, nor should it read from stdin
Your createdataA.c
file must contain these comments:
A file comment, that is, a comment at the beginning of the file providing the file's purpose, i.e., to produce a file that will earn an A, with a brief description of the principles of operation for how the produced file's contents will do so.
A function comment describing the "I/O behavior" of the main
function. That is, the function comment must describe what the main
function accepts as command-line arguments, reads from stdin
or any other stream, writes to stdout
, stderr
, or any other stream, and returns.
Local comments, that is, comments throughout your program that thoroughly describe the bytes that the program writes to the dataA
file.
It is acceptable to use "magic numbers" throughout yourcreatedataA.c
file.
For the "A" attack it's OK to truncate your name(s) if necessary.
After creating yourdataA
file, issue the commandxxd dataA
to confirm that the file contains exactly the bytes that you want it to contain, and that none of those bytes is the value0x0A
, which is the newline character and will cause thegrader
program to stop reading input. (If you do have a newline character in your data file, adjust your placement of padding bytes or the number of padding bytes and the length of the name you're using for your attack.)
Compose a C program named createdataAplus.c
that produces a file named dataAplus
, as short and simple as possible, that causes the grader
program to write your name and recommend a grade of "A+". To receive credit the dataAplus
file must cause the grader
program to generate output whose format is indistinguishable from normal output.
The createdataAplus.c
program must write to the dataAplus
file; it must not write to stdout
.
Your createdataAplus.c
file must contain:
A file comment, that is, a comment at the beginning of the file providing the file's purpose, i.e., to produce a file that will earn an A+, with a brief description of the principles of operation for how the produced file's contents will do so.
A function comment describing the "I/O behavior" of the main
function. That is, the function comment must describe what the main
function accepts as command-line arguments, reads from stdin
or any other stream, writes to stdout
, stderr
, or any other stream, and returns.
Local comments, that is, comments throughout your program that thoroughly describe the bytes that the program writes to the dataAplus
file.
It is acceptable to use "magic numbers" throughout yourcreatedataAplus.c
file.
For the "A+" attack it's OK to truncate your name(s) if necessary.
For the "A+" attack it's OK to declare additional functions in theminiassembler.h
file and define additional functions in theminiassembler.c
file.
Edit your copy of the given readme
file by answering each question that is expressed therein.
One of the sections of the readme
file requires you to list the authorized sources of information that you used to complete the assignment. Another section requires you to list the unauthorized sources of information that you used to complete the assignment. Your grader will not grade your submission unless you have completed those sections. To complete the "authorized sources" section of your readme
file, copy the list of authorized sources given in the Policies web page to that section, and edit it as appropriate.
Provide the instructors with your feedback on the assignment. To do that, issue this command:
FeedbackCOS217.py 6
and answer the questions that it asks. (If you worked with a partner, then when answering the numeric questions, please enter the average of the responses that you and your partner would provide individually.) That command stores its questions and your answers in a file named feedback
in your working directory.
Submit your work electronically on armlab. If you worked with a partner, then one of the partners must submit all of your team's files, and the other partner must submit none of your team's files. Your readme
file, your memorymap
file, and your source code files must contain both your name and your partner's name. Use these commands to submit:
submit 6 memorymap readme feedback submit 6 createdataB.c submit 6 miniassembler.h miniassembler.c submit 6 createdataA.c submit 6 createdataAplus.c
memorymap
something different than you were supposed to (typically memorymap.txt
) and copied+pasted the top line above, your file will not be submitted! The submit
command prints an error message, but invariably multiple students per term ignore this. Don't be one of them: name your file correctly and, at the very least, submit the filename that you actually have.
touch netid.partner submit 6 netid.partner
It is considered good practice to leave each working copy of your repository with a clean working tree up to date with the main branch. This helps reduce any potential confusion about the state of the codebase, and also helps you ensure that the final version you have submitted is indeed the final version.
On some versions of Linux, every time the program is executed the initial stack pointer is in a different place. This makes it difficult to make an attack in which the return address points into the same data that was just read into the buffer on the stack. (Indeed, that is the purpose of varying the initial stack pointer!) However, you will note that the grader program copies data from buf
(which is in the stack section) into name
(which is in the bss section). You'll find that name
is reliably in the same place every time you (or we) run the grader program.
On some versions of Linux, executing instructions from the bss section causes a segmentation fault. The purpose of this is to defend against buffer overrun attacks! The mprotect
call in the grader program disables that protection. You're not required to understand or explain how that line works. Note, however, that this mechanism (even if we didn't disable it) would not defend against the "B" attack.
If you work hard, you could create input that exploits the buffer overrun to take over the grader's Linux process and do all sorts of damage. DON'T DO THAT! Any deliberate attempt of that sort is a violation of the University's disciplinary code, and also is a violation of the Computer Fraud and Abuse Act (see Step 1 above).
In part, good program style is defined by the splint
and critTer
tools, and by the rules given in The Practice of Programming (Kernighan and Pike) as summarized by the Rules of Programming Style document.
The more course-specific style rules listed in the previous assignment specifications also apply, as do the rules given in previous sections of this assignment specification.
Minimal requirement to receive credit for creatdataB.c: the createdataB
program must build.
Minimal requirement to receive credit for miniassembler.c: the testminiassembler
program must build.
Minimal requirement to receive credit for createdataA.c: the createdataA
program must build.
Minimal requirement to receive credit for createdataAplus.c: the createdataAplus
program must build.
When we grade this assignment, we will take the recommendation of the grader
program into account. But that will not be the only criterion. In particular, see the grade percentages noted above.
To receive the full 10 percent credit for your "A+" attack, your "A+" attack must work perfectly. If it doesn't work perfectly, then we will award up to 6 percent partial credit, but only if your createdataAplus.c
program contains a very thorough comment explaining the principles of operation of your attack.
We will grade your work on two kinds of quality:
Quality from the user's point of view. From the user's point of view, your code has quality if it successfully implements the attacks described above.
Quality from the programmer's point of view. From the programmer's point of view, your code has quality if it is well styled and thereby easy to maintain. Good program style is defined by the previous sections of this assignment specification.
To encourage good coding practices, we will deduct points if gcc217
generates warning messages.
While debugging your attacks you might find it useful to use gdb
to step through the execution of the grader
program at the machine language level. These commands are appropriate for doing that:
display/i $pc
gdb
maintains a display list. You can issue the display
command to place items on the display list. Typically you place variables on the display list. At each pause in execution, gdb
displays the values of all those variables. Thus the display list is a handy way to track the values of variables throughout the execution of your program.
That particular display
command tells gdb
to place the pc (program counter) register on the display list. Thus at each pause in execution gdb
displays the content of the memory to which pc points. That is, at each pause in execution gdb
displays the instruction that is about to be executed.
Moreover, that particular display
command tells gdb
to use the i
(instruction) format when displaying. Thus at each pause in execution gdb
interprets the content of the memory to which pc points as a machine language instruction, and displays that instruction in assembly language.
In short, that command tells gdb
to display the instruction that is about to be executed.
stepi
As you know, in gdb
the step
command (abbreviated s
) executes the next line of C code. Since the grader
executable binary file was built without the -g
option, gdb
has no knowledge of how the machine language instructions of the grader
executable binary file correspond to lines of C code. So the step
command is useless when analyzing the grader
executable binary file.
Instead you can use the lower-level stepi
command. The stepi
command (abbreviated si
) tells gdb
to execute the next machine language instruction.
This assignment was written by Andrew Appel and Robert M. Dondero, Jr.