Princeton University
COS 217: Introduction to Programming Systems

Assignment 5: Buffer Overrun


Purpose

The purpose of this assignment is to help you learn (1) how programs are represented in machine code, (2) how stack frames are structured in memory, and (3) how programs can be vulnerable to buffer overrun attacks.


Ground Rules

You may work with one partner on this assignment. You need not work with a partner, but we prefer that you do. If you work with a partner, then only one of the partners should submit work. The readme file, the memorymap file, and the source code files should contain your name and your partner's name.

Your partner should be from your precept. Your preceptor will help you find a partner if you cannot find one yourself. You may work with a partner from another precept only if you cannot find a partner from your precept, your preceptor cannot find a partner from your precept for you, and your preceptor explicitly allows you to do so.

"Part A+" (as defined below) is the "on your own" part of this assignment. As noted below, that part is worth 10% of this assignment.


Background

We will provide a program, both source code (grader.c) and executable binary code (grader). The file grader was produced from grader.c using the gcc217 command with the -O option.

The program asks you your name, and prints out something like this (where the user input and program output are indicated by font style):

$ grader
What is your name?
Bob
D is your grade.
Thank you, Bob.

However, the author of the program inexplicably has forgotten to do bounds-checking on the array into which it reads the input, and therefore it is vulnerable to attack.

Precepts will explain such buffer overrun (alias buffer overflow) attacks. The paper entitled Detection and Prevention of Stack Buffer Overflow Attacks by Kuperman et al. also does so. That paper is on electronic reserve at Princeton's library; you can access it through Princeton's Blackboard system by selecting the COS 217 course and clicking on E-Reserves.


Your Task

Your task is to attack the given program by exploiting its buffer overrun vulnerability. More specifically, your job is to provide input data to the program so that it prints something more like this:

$ grader < data
What is your name?
A+ is your grade.
Thank you, Bob.

As you can see from reading the program, it is designed not to give anyone an A+ under any circumstances. However, it is programmed sloppily: it reads the input into a buffer, but forgets to check whether the input fits. This means that a too-long input can overwrite other important memory, and you can trick the program into giving you an A+.

This assignment has five parts:

Part F: Fill in the blanks (6% of the grade).

Copy this sentence to your readme file, and fill in the blanks so the sentence is correct:

"If you were to use a buffer overrun attack to knowingly gain unauthorized access or to cause damage to other people's computers, the Computer Fraud and Abuse Act provides a maximum penalty of _______ years in prison for a first offense. However, the creator of the Melissa virus plea-bargained down to ______ months in prison."

Part D: Analyze the program (34% of the grade).

Take the grader executable binary file that we have provided you, and use gdb to analyze its sections:

$ gdb grader
(gdb) x/72i readString

Copy the resulting 72 lines of text into a text file named memorymap, and then annotate the code to explain what's going on. You should use the source code in grader.c as a reference, and indeed your annotation should consist of showing how the machine code corresponds to the C code. You don't need an annotation for every line of machine code, but every line of C code (or equivalent "flattened" C code) should be present as an annotation. Your analysis of the text section should be of the form:

Line of (flattened) C code
Line of corresponding assembly language code
Line of corresponding assembly language code
...
<blank line>
Line of (flattened) C code
Line of corresponding assembly language code
Line of corresponding assembly language code
...

Your analysis of the text section should begin with a few lines that note explicitly where the readString function's parameter and local variables are stored.

$ gdb grader
(gdb) print &grade
(gdb) print/x grade

Place a table in your memorymap file showing the layout of the data section. The table should have three columns: Address (in hex), Contents (in hex), and Description. The table should contain only one row.

$ gdb grader
(gdb) print &name

Place a table in your memorymap file showing the layout of the bss section. The table should have three columns: Address (in hex), Contents (in hex), and Description. The table should contain one row for each element of the name array.

At the start of program execution, the contents of the name array will be zeros. Later during program execution, the name array will contain more interesting data. The table in your memorymap file should show the contents of the name array near the end of program execution using data from Part A (as described below).

$ gdb grader
(gdb) break *readString+77
(gdb) run
Type a name
(gdb) print $esp
(gdb) print $ebp
(gdb) x/??b $esp  (where ?? is the appropriate number of bytes)

Place a table in your memorymap file showing the layout of the stack frame. The table should have three columns: Address, Contents (in hex), and Description. Each address should be expressed as a positive offset relative to the ESP register. The table should contain one row for each byte in the readString function's stack frame, from the first byte pointed to by ESP through the last byte of the return address. The table should show the contents of the readString function's stack frame when the function has read "normal" data.

Part C: Get the program to crash (12% of the grade).

Write a C program named createdataC.c that produces a file named dataC, as short and simple as possible, that causes the grader program to generate a segmentation fault. Explain its principles of operation in one sentence as a comment within your createdataC.c program.

The createdataC.c program should write to the dataC file; it should not write to stdout.

Part B: Get the program to print "B" (16% of the grade).

Write a C program named createdataB.c that produces a file named dataB, as short and simple as possible, that causes the grader program to print your name and recommend a grade of "B". You can see by reading the program that, if your name is Andrew Appel, this is very easy to do. But probably your name isn't Andrew Appel! Explain its principles of operation in one sentence as a comment within your createdataB.c program.

The createdataB.c program should write to the dataB file; it should not write to stdout.

Part A: Get the program to print "A" (22% of the grade).

Write a C program named createdataA.c that produces a file named dataA, as short and simple as possible, that causes the grader program to print your name and recommend a grade of "A". Explain its principles of operation in a few sentences as a comment within your createdataA.c program.

The createdataA.c program should write to the dataA file; it should not write to stdout.

Part A+: Get the program to print "A+" (10% of the grade).

Write a C program named createdataAplus.c that produces a file named dataAplus, as short and simple as possible, that causes the grader program to print your name and recommend a grade of "A+". Explain its principles of operation in a few sentences as a comment within your createdataAplus.c program.

The createdataAplus.c program should write to the dataAplus file; it should not write to stdout.


Notes

For parts B, A, and A+ feel free to truncate your name(s) if necessary.

On some versions of Linux, every time the program is executed the initial stack pointer is in a different place. This makes it difficult to make an attack in which the return address points into the same data that was just read into the buffer on the stack. (Indeed, that is the purpose of varying the initial stack pointer!) However, you will note that the data is copied from buf into name. You'll find that name is reliably in the same place every time you (or we) run the program.

On some versions of Linux, executing instructions from the bss section causes a segmentation fault. The purpose of this is to defend against buffer overrun attacks! The mprotect call in our sample program is to disable this protection. You're not required to understand or explain how this line works. Note, however, that this mechanism (even if we didn't disable it) would not defend against the "C" or "B" attacks.

If you work hard, you could create a data input that will exploit the buffer overrun to take over the grader's Linux process and do all sorts of damage. DO NOT DO THAT! Any deliberate attempt of that sort is a violation of the University's disciplinary code, and also is a violation of the Computer Fraud and Abuse Act (see part F above).


Logistics

Create your programs on hats using bash, emacs, gcc217, and gdb.

The directory /u/cos217/Assignment5 contains the grader.c and grader files. It also contains a simple makefile that you might find useful during development.

Create a readme text file that contains:

Submit your work electronically on hats via the commands:

submit 5 createdataC.c createdataB.c createdataA.c createdataAplus.c
submit 5 memorymap readme

Grading

When we grade this assignment, we will take the recommendation of the grader program into account. But that will not be the only criterion. In particular, see the grade percentages noted above.

As always, we will grade your work on quality from both the user's and programmer's points of view. Each C program should contain function-level and local comments as appropriate, as well as an explanation of the program's principles of operation. For this assignment (only) it is acceptable to use "magic numbers" in your C programs as long as they are well commented. To encourage good coding practices, we will deduct points if gcc217 generates warning messages.

This assignment was written by Andrew Appel with contributions by Robert M. Dondero, Jr.