The purpose of this assignment is to help you learn (1) how programs are represented in machine language, (2) how stack frames are structured in memory, and (3) how programs can be vulnerable to buffer overrun attacks.
Students from past semesters reported taking, on average, 10 hours (each) to complete this assignment.
You may work with one partner on this assignment. You need not work with a partner, but we prefer that you do. Your preceptor will help you with your partner search, if you so desire.
Your partner must be from your precept. Sorry, no exceptions. So make sure you find a partner as soon as you can. After all, if your precept has an odd number of students, then someone must work alone; don't let that person be you!
"Part A+" (as defined below) is the "extra challenge" part of this assignment. While doing the "extra challenge" part of the assignment, you are bound to observe the course policies regarding assignment conduct as given in the course Policies web page, plus one additional policy: you may not use any "human" sources of information. That is, you may not consult with the course's staff members, the lab teaching assistants, other current students via Piazza, or any other people while working on the "extra challenge" part of an assignment, except for clarification of requirements. However you may consult with your partner, with no limitations.
As noted below, the "extra challenge" part is worth 10 percent of this assignment. So if you don't do any of the "extra challenge" part and all other parts of your assignment solution are perfect and submitted on time, then your grade for the assignment will be 90 percent.
We will provide a program, both source code (grader.c
) and executable binary code (grader
). The file grader
was produced from grader.c
using the gcc217
command with the -O
and -static
options. The -static
option commands the linker to do static linking rather than the default dynamic linking. Chapter 7 of Computer Systems: A Programmer's Perspective (Bryant & O'Hallaron) describes static vs. dynamic linking. Static linking could make your analysis of the grader
program simpler, and ensures that each function of the program resides at the same address on both CourseLab computers.
The program asks you your name, and writes something like this (where the user input
and program output
are indicated by font style):
$ ./grader What is your name? Bob D is your grade. Thank you, Bob.
However, the author of the program inexplicably forgot to do bounds-checking on the array into which the program reads the input, and therefore the program is vulnerable to attack.
Precepts will explain such buffer overrun (alias buffer overflow) attacks. The paper entitled Detection and Prevention of Stack Buffer Overflow Attacks by Kuperman et al. also does so. That paper is available through Blackboard in the Course Materials section.
Your task is to attack the given program by exploiting its buffer overrun vulnerability. More specifically, your job is to provide input data to the program so that it writes something more like this:
$ ./grader < data What is your name? A+ is your grade. Thank you, Bob.
As you can see from reading the program, it is designed not to give anyone an A+ under any circumstances. However, it is programmed sloppily: it reads the input into a buffer, but forgets to check whether the input fits. This means that a too-long input can overwrite other important memory, and you can trick the program into giving you an A+.
Develop on CourseLab. Use emacs
to create source code. Use gdb
to debug.
The CourseLab /u/cos217/Assignment5
directory contains files that you will find useful. Subsequent parts of this document describe them. Create a project directory, and copy all files from the /u/cos217/Assignment5
directory to your project directory.
Then complete the parts of the assignment given below, in the order of their appearance.
The document entitled A Linux File Sharing Trick describes a mechanism that you can use to share Assignment 5 files with your partner.
Copy this sentence to your readme
file, and fill in the blanks so the sentence is correct:
"If you were to use a buffer overrun attack to knowingly gain unauthorized access or to cause damage to other people's computers, the Computer Fraud and Abuse Act provides a maximum penalty of ______ years in prison for a first offense. However, the creator of the Melissa virus plea-bargained down to ______ months in prison."
It's fine to do a web search to complete Part F of the assignment.
Create a text file named memorymap
. Begin your memorymap
file with your name and your partner's name.
Take the grader
executable binary file that we have provided you, and use gdb
to analyze its sections:
Analyze the text section by issuing this x
command:
$ gdb grader (gdb) x/59i readStringThen copy the resulting 59 lines of text into your
memorymap
file. (Be careful: gdb
displays the lines one windowfull at a time, so you must press the <Enter> key to see all 59 lines.) Then annotate the code to explain it.
You must not annotate every line of assembly language code. Instead, cluster the lines of assembly language code into "paragraphs," and annotate each paragraph. Your analysis must have this format:
Annotation Line of assembly language code Line of assembly language code ... <blank line> Annotation Line of assembly language code Line of assembly language code ... <blank line> ...
Use these 6 annotations (and only these 6 annotations) in the readString
function:
Function setup First loop buf[i] = '\0' Second loop setup Second loop Cleanup and return
Use these 9 annotations (and only these 9 annotations) in the main
function:
Function setup mprotect(...) printf("What is your name?\n") readString(name) if strcmp(name, "Andrew Appel") != 0 skip over the assignment to grade grade = 'B' printf("%c is your grade.\n", grade) printf("Thank you, %s.\n", name) Cleanup and return 0
Analyze the data section by issuing these print
commands:
$ gdb grader (gdb) print &grade (gdb) print/x grade
Place a table in your memorymap
file showing the layout of the data section. The table must have three columns: Address (in hex), Contents (in hex), and Description. The table must contain only one row.
print
command:
$ gdb grader (gdb) print &name
Place a table in your memorymap
file showing the layout of the bss section. The table must have three columns: Address (in hex), Contents (in hex), and Description. The table must contain one row for each element of the name
array.
At the start of program execution, the contents of the name
array will be zeros. Later during program execution, the name
array will contain more interesting data.
Compose your memory map of the bss section before you implement Part A (as described below), that is, before you implement your "A attack." The table in your memory map must show the contents of the name
array as you wish it to appear during your A attack. Thus your memory map of the bss section will serve as a plan for your A attack. Then while you're debugging your A attack, you should make sure that the bss section indeed contains that data that it should contain, as indicated by your memory map.
For your sake, it's fine to add another column to your memory map showing the contents of the name
array as you wish it to appear near the end of program execution during your A+ attack (as described below). But the assignment doesn't require you to do that.
Using your analysis of the text section, compose an analysis of the stack section. Place a table in your memorymap
file showing the layout of the stack. The table must have two columns: Address and Description. Each address must be expressed as a positive offset relative to the RSP register (RSP, RSP+1, RSP+2, etc.). The table must contain one row for each byte in the stack from the first byte pointed to by RSP through the last byte of the readString
function's return address. The table must describe the stack when the readString
function has read "normal" data.
You'll discover that the stack begins with the buf
array; in your memorymap
file each byte that comprises the buf
array must have the description "buf". The part of the stack that you must describe ends with the readString
function's return address; in your memorymap
file each byte that comprises the return address must have the description "return address". In your memorymap
file each byte between the end of the buf
array and the beginning of the return address must have a description indicating what data is stored in that byte.
Compose a C program named createdataC.c
that produces a file named dataC
, as short and simple as possible, that causes the grader
program to generate a segmentation fault. The createdataC.c
program must write to the dataC
file; it must not write to stdout
.
Compose a C program named createdataB.c
that produces a file named dataB
, as short and simple as possible, that causes the grader
program to write your name and recommend a grade of "B". You can see by reading the program that, if your name is Andrew Appel, this is very easy to do. But your name isn't Andrew Appel! To receive full credit the dataB
file must cause the grader
program to generate output whose format is indistinguishable from normal output. The createdataB.c
program must write to the dataB
file; it must not write to stdout
.
For Part B feel free to truncate your name(s) if necessary.
Compose a C program named createdataA.c
that produces a file named dataA
, as short and simple as possible, that causes the grader
program to write your name and recommend a grade of "A". To receive full credit the dataA
file must cause the grader
program to generate output whose format is indistinguishable from normal output. The createdataA.c
program must write to the dataA
file; it must not write to stdout
.
For Part A feel free to truncate your name(s) if necessary.
Compose a C program named createdataAplus.c
that produces a file named dataAplus
, as short and simple as possible, that causes the grader
program to write your name and recommend a grade of "A+". To receive full credit the dataAplus
file must cause the grader
program to generate output whose format is indistinguishable from normal output. The createdataAplus.c
program must write to the dataAplus
file; it must not write to stdout
.
For Part A+ feel free to truncate your name(s) if necessary.
Edit your copy of the given readme
file by answering each question that is expressed therein.
One of the sections of the readme
file requires you to list the authorized sources of information that you used to complete the assignment. Another section requires you to list the unauthorized sources of information that you used to complete the assignment. Your grader will not grade your submission unless you have completed those sections. To complete the "authorized sources" section of your readme
file, copy the list of authorized sources given in the "Policies" web page to that section, and edit it as appropriate.
Provide the instructors with your feedback on the assignment. To do that, issue this command:
FeedbackCOS217.py 5
and answer the questions that it asks. (When answering the numeric questions, please enter the average of the responses that you and your partner would provide individually.) That command stores its questions and your answers in a file named feedback
in your working directory.
Submit your work electronically on CourseLab. If you worked with a partner, then one of the partners must submit all of your team's files, and the other partner must submit none of your team's files. Your readme
file, your memorymap
file, and your source code files must contain both your name and your partner's name. Use these commands to submit:
submit 5 createdataC.c createdataB.c createdataA.c createdataAplus.c submit 5 memorymap readme feedback
On some versions of Linux, every time the program is executed the initial stack pointer is in a different place. This makes it difficult to make an attack in which the return address points into the same data that was just read into the buffer on the stack. (Indeed, that is the purpose of varying the initial stack pointer!) However, you will note that the data is copied from buf
into name
. You'll find that name
is reliably in the same place every time you (or we) run the program.
On some versions of Linux, executing instructions from the bss section causes a segmentation fault. The purpose of this is to defend against buffer overrun attacks! The mprotect
call in our sample program is to disable this protection. You're not required to understand or explain how this line works. Note, however, that this mechanism (even if we didn't disable it) would not defend against the "C" or "B" attacks.
If you work hard, you could create a data input that will exploit the buffer overrun to take over the grader's Linux process and do all sorts of damage. DON'T DO THAT! Any deliberate attempt of that sort is a violation of the University's disciplinary code, and also is a violation of the Computer Fraud and Abuse Act (see part F above).
In part, good program style is defined by the splint
and critTer
tools, and by the rules given in The Practice of Programming (Kernighan and Pike) as summarized by the Rules of Programming Style document.
The more course-specific style rules listed in the previous assignment specifications also apply, as do the following. Each C program must contain:
A file comment, that is, a comment at the beginning of the file providing the file name, your name, and your partner's name.
(As usual) a function comment, that is, a comment near the beginning of the main
function describing what the function does. Remember that a function's comment should describe what the function reads from stdin
or any other stream, writes to stdout
or any other stream, and returns.
Comments describing the program's principles of operation, that is, an explanation of how the generated byte stream causes the grader program to write the desired output. The comment must be near the beginning of the main
function; it's not sufficient to scatter that explanation throughout the body of the function.
Local comments, that is, comments throughout the body of the main
function describing the components of the byte stream that the function writes.
For this assignment (only) it is acceptable to use "magic numbers" in your C programs as long as they are well commented.
Minimal requirement to receive credit for Part C: the createdataC.c
program must build.
Minimal requirement to receive credit for Part B: the createdataB.c
program must build.
Minimal requirement to receive credit for Part A: the createdataA.c
program must build.
Minimal requirement to receive credit for Part A+: the createdataAplus.c
program must build.
When we grade this assignment, we will take the recommendation of the grader
program into account. But that will not be the only criterion. In particular, see the grade percentages noted above.
We will grade your work on two kinds of quality:
Quality from the user's point of view. From the the user's point of view, your code has quality if it successfully implements the attacks described above.
Quality from the programmer's point of view. From the programmer's point of view, your code has quality if it is well styled and thereby easy to maintain. Good program style is defined by the previous section of this assignment specification.
To encourage good coding practices, we will deduct points if gcc217
generates warning messages.
While debugging your attacks you might find it useful to use gdb
to step through the execution of the grader
program at the machine language level. These commands are appropriate for doing that:
display/i $rip
gdb
maintains a "display list." You can issue the display
command to place items on the display list. Typically you place variables on the display list. At each pause in execution, gdb
displays the values of all those variables. Thus the display list is a handy way to track the values of variables throughout the execution of your program.
That particular display
command tells gdb
to place the RIP (instruction pointer) register on the display list. Thus at each pause in execution gdb
will display the contents of the memory to which RIP points. That is, at each pause in execution gdb
will display the instruction that is about to be executed.
Moreover, that particular display
command tells gdb
to use the i
(instruction) format when displaying. Thus at each pause in execution gdb
will interpret the contents of the memory to which RIP points as a machine language instruction, and will display that instruction in assembly language.
In short, that command tells gdb
to display the instruction that is about to be executed.
stepi
As you know, in gdb
the step
command (abbreviated s
) executes the next line of C code. Since the grader
executable binary file was built without the -g
option, gdb
has no knowledge of how the machine language instructions of the grader
executable binary file correspond to lines of C code. So the step
command is useless when analyzing the grader
executable binary file.
Instead you can use the lower-level stepi
command. The stepi
command (abbreviated si
) tells gdb
to execute the next machine language instruction.
This assignment was written by Andrew Appel with contributions by Robert M. Dondero, Jr.