Princeton University
COS 217: Introduction to Programming Systems

Assignment 5: Buffer Overrun


Purpose

The purpose of this assignment is to help you learn (1) how programs are represented in machine language, (2) how stack frames are structured in memory, and (3) how programs can be vulnerable to buffer overrun attacks.


Rules

You may work with one partner on this assignment. You need not work with a partner, but we prefer that you do. Your preceptor will help you with your partner search, if you so desire.

Your partner must be from your precept. Sorry, no exceptions. So make sure you find a partner as soon as you can. After all, if your precept has an odd number of students, then someone must work alone; don't let that person be you!

If you work with a partner, then exactly one of the partners must submit files. Your readme file, your memorymap file, and your source code files must contain both your name and your partner's name.

"Part A+" (as defined below) is the "extra challenge" part of this assignment. While doing the "extra challenge" part of the assignment, you are bound to observe the course policies regarding assignment conduct as given in the course Policies web page, plus one additional policy: you may not use any "human" sources of information. That is, you may not consult with the course's staff members, the lab teaching assistants, other current students via Piazza, or any other people while working on the "extra challenge" part of an assignment, except for clarification of requirements. However you may consult with your partner, with no limitations.

As noted below, the "extra challenge" part is worth 10 percent of this assignment. So if you don't do any of the "extra challenge" part and all other parts of your assignment solution are perfect and submitted on time, then your grade for the assignment will be 90 percent.


Background

We will provide a program, both source code (grader.c) and executable binary code (grader). The file grader was produced from grader.c using the gcc217 command with the -O and -static options. The -static option commands the linker to do static linking rather than the default dynamic linking. Chapter 7 of Computer Systems: A Programmer's Perspective (Bryant & O'Hallaron) describes static vs. dynamic linking. Static linking could make your analysis of the grader program simpler, and ensures that each function of the program resides at the same address on both CourseLab computers.

The program asks you your name, and writes something like this (where the user input and program output are indicated by font style):

$ grader
What is your name?
Bob
D is your grade.
Thank you, Bob.

However, the author of the program inexplicably forgot to do bounds-checking on the array into which the program reads the input, and therefore the program is vulnerable to attack.

Precepts will explain such buffer overrun (alias buffer overflow) attacks. The paper entitled Detection and Prevention of Stack Buffer Overflow Attacks by Kuperman et al. also does so. That paper is available through Blackboard in the Course Materials section.


Your Task

Your task is to attack the given program by exploiting its buffer overrun vulnerability. More specifically, your job is to provide input data to the program so that it writes something more like this:

$ grader < data
What is your name?
A+ is your grade.
Thank you, Bob.

As you can see from reading the program, it is designed not to give anyone an A+ under any circumstances. However, it is programmed sloppily: it reads the input into a buffer, but forgets to check whether the input fits. This means that a too-long input can overwrite other important memory, and you can trick the program into giving you an A+.

This assignment has five parts:

Part F: Fill in the blanks (5% of the grade).

Copy this sentence to your readme file, and fill in the blanks so the sentence is correct:

"If you were to use a buffer overrun attack to knowingly gain unauthorized access or to cause damage to other people's computers, the Computer Fraud and Abuse Act provides a maximum penalty of ______ years in prison for a first offense. However, the creator of the Melissa virus plea-bargained down to ______ months in prison."

It's fine to do a web search to complete Part F of the assignment.

Part D: Analyze the program (33% of the grade).

Take the grader executable binary file that we have provided you, and use gdb to analyze its sections:

Part C: Get the program to crash (12% of the grade).

Compose a C program named createdataC.c that produces a file named dataC, as short and simple as possible, that causes the grader program to generate a segmentation fault. The createdataC.c program must write to the dataC file; it must not write to stdout.

Part B: Get the program to write "B" (16% of the grade).

Compose a C program named createdataB.c that produces a file named dataB, as short and simple as possible, that causes the grader program to write your name and recommend a grade of "B". You can see by reading the program that, if your name is Andrew Appel, this is very easy to do. But your name isn't Andrew Appel! To receive full credit the dataB file must cause the grader program to generate output whose format is indistinguishable from normal output. The createdataB.c program must write to the dataB file; it must not write to stdout.

Part A: Get the program to write "A" (24% of the grade).

Compose a C program named createdataA.c that produces a file named dataA, as short and simple as possible, that causes the grader program to write your name and recommend a grade of "A". To receive full credit the dataA file must cause the grader program to generate output whose format is indistinguishable from normal output. The createdataA.c program must write to the dataA file; it must not write to stdout.

Part A+: Get the program to write "A+" (10% of the grade).

Compose a C program named createdataAplus.c that produces a file named dataAplus, as short and simple as possible, that causes the grader program to write your name and recommend a grade of "A+". To receive full credit the dataAplus file must cause the grader program to generate output whose format is indistinguishable from normal output. The createdataAplus.c program must write to the dataAplus file; it must not write to stdout.


Notes

For parts B, A, and A+ feel free to truncate your name(s) if necessary.

On some versions of Linux, every time the program is executed the initial stack pointer is in a different place. This makes it difficult to make an attack in which the return address points into the same data that was just read into the buffer on the stack. (Indeed, that is the purpose of varying the initial stack pointer!) However, you will note that the data is copied from buf into name. You'll find that name is reliably in the same place every time you (or we) run the program.

On some versions of Linux, executing instructions from the bss section causes a segmentation fault. The purpose of this is to defend against buffer overrun attacks! The mprotect call in our sample program is to disable this protection. You're not required to understand or explain how this line works. Note, however, that this mechanism (even if we didn't disable it) would not defend against the "C" or "B" attacks.

If you work hard, you could create a data input that will exploit the buffer overrun to take over the grader's Linux process and do all sorts of damage. DON'T DO THAT! Any deliberate attempt of that sort is a violation of the University's disciplinary code, and also is a violation of the Computer Fraud and Abuse Act (see part F above).


Logistics

Create your programs on CourseLab using bash, emacs, gcc217, and gdb.

The directory /u/cos217/Assignment5 contains the grader.c and grader files. It also contains a simple Makefile that you might find useful during development.

Create a readme file by copying the file /u/cos217/Assignment5/readme to your project directory, and editing the copy by replacing each area marked "<Complete this section.>" as appropriate.

One of the sections of the readme file requires you to list the authorized sources of information that you used to complete the assignment. Another section requires you to list the unauthorized sources of information that you used to complete the assignment. Your grader will not grade your submission unless you have completed those sections. To complete the "authorized sources" section of your readme file, copy the list of authorized sources given in the "Policies" web page to that section, and edit it as appropriate.

Submit your work electronically on CourseLab using these commands:

submit 5 createdataC.c createdataB.c createdataA.c createdataAplus.c
submit 5 memorymap readme

Grading

Minimal requirement to receive credit for Part C: the createdataC.c program must build.

Minimal requirement to receive credit for Part B: the createdataB.c program must build.

Minimal requirement to receive credit for Part A: the createdataA.c program must build.

Minimal requirement to receive credit for Part A+: the createdataAplus.c program must build.

When we grade this assignment, we will take the recommendation of the grader program into account. But that will not be the only criterion. In particular, see the grade percentages noted above.

We will grade your code on quality from the user's and programmer's points of view. From the the user's point of view, your code has quality if it successfully implements the attacks described above.

From the programmer's point of view, your code has quality if it is well styled. In part, good style is defined by the splint and critTer tools, and by the rules given in The Practice of Programming (Kernighan and Pike) as summarized by the Rules of Programming Style document.

Each C program must contain:

For this assignment (only) it is acceptable to use "magic numbers" in your C programs as long as they are well commented. To encourage good coding practices, we will deduct points if gcc217 generates warning messages.


Debugging Tips

While debugging your attacks you might find it useful to use gdb to step through the execution of the grader program at the machine language level. These commands are appropriate for doing that:


This assignment was written by Andrew Appel with contributions by Robert M. Dondero, Jr.