Princeton University
COS 217: Introduction to Programming Systems

Assignment 5: UNIX Commands in IA-32 Assembly Language

Purpose

The purpose of this assignment is to help you learn about IA-32 architecture and assembly language programming. It also will give you the opportunity to review dynamic memory management in C, and to learn more about the GNU/UNIX programming tools, especially bash, xemacs, gcc, and gdb for assembly language programs.

Background: wc

The UNIX operating system has a command named wc (word count). In its simplest form, wc reads characters from stdin until end-of-file, and prints to stdout a count of how many lines, words, and characters it has read. A word is a sequence of characters that is delimited by one or more whitespace characters. The program prints the three counts on the same line -- the line count in a field of width 7, the word count in a field of width 8, and the character count in a field of width 8.

Consider some examples. In the following, a space is shown as "_s" and a newline character as "_n".

If the file named proverb contains these characters:

Learning_sis_sa_n
treasure_swhich_n
accompanies_sits_n
owner_severywhere._n
--_sChinese_sproverb_n

then the command:

$ wc < proverb

prints this line to standard output:

_ssssss5_ssssss12_ssssss82_n

If the file proverb2 contains these characters:

Learning_sis_sa_n
treasure_swhich_n
accompanies_sits_n
owner_severywhere._n
--_sssChinese_sproverb

(note that the last "line" does not end with a newline character) then the command:

$ wc < proverb2

prints this line to standard output:

_ssssss4_ssssss12_ssssss83_n

Background: sort

Another commonly used UNIX command is sort. In its simplest form, it reads lines from stdin, sorts them into ascending (i.e. alphabetical, i.e. lexicographic) order, and prints them to stdout. For example, if the file proverb contains these lines:

Learning is a
treasure which
accompanies its
owner everywhere.
-- Chinese proverb

then the command

$ sort < proverb

prints these lines to standard output:

-- Chinese proverb
Learning is a
accompanies its
owner everywhere.
treasure which

Note that the special character '-' has an ASCII code that is less than the ASCII codes of all alphabetic characters, and so the line that begins with '-' appears first. Also note that the uppercase characters have ASCII codes that are less than the ASCII codes of the lower case characters, and so the line that begins with 'L' appears before the line that begins with 'a'.

One small complication... For the Linux sort program to work in the traditional way (as described above), you must set the environment variable named "LC_ALL" such that its value is "C". Using the bash shell, you can accomplish that by issuing this command:

export LC_ALL=C

You will find it convenient to add that command to the end of the .bashrc file in your home directory. Having done so, bash will execute the command each time you login. If you have copied the file /u/cos217/.bashrc file to your HOME directory, then your .bashrc file contains that export command.

Your Task

Your task is to create your own versions of the wc and sort commands in C and in IA-32 assembly language, as specified below.

mywc

Create a program named mywc. The program should implement the subset of wc described above. (The program need not process command-line arguments as wc does.) More precisely, create a C version of the program in file mywc.c and an assembly language version in file mywc.s.

Your program should call the C standard function isspace to determine whether a character is a whitespace character.

It is acceptable to use global (i.e. bss section and data section) variables in mywc.s.

Hint: A printf format string can contain "field widths." For example, the printf format string "%8d" indicates that the given integer should be printed in a field of width 8, padded with leading spaces.

mysort

Create a program named mysort. The functionality of the mysort program should be a subset of the functionality of the UNIX standard sort program. The mysort program should read lines from stdin, sort them in ascending order, and print them to stdout. It need not process command-line arguments as the UNIX sort program does.

You should not make any assumptions about how many lines are in stdin. (Hint: You may find the sort4 program discussed in precepts useful as a model.) However, you may assume that no line of stdin contains more than 1023 characters. The newline character is included in that count. The terminating null character is not included in that count. Your program need not sort lines properly when a line of stdin contains more than 1023 characters, but it should not corrupt memory. (Hint: You will find the fgets function useful.)

A small complication... Since the ASCII code for the tab character ('\t') is less than the ASCII code for the newline character ('\n'), your mysort program might consider a line that begins with tab to be less than a line that contains no characters at all (that is, a line that contains only the newline character). It is acceptable for your mysort program to do that. In other words, you may assume that the file to be sorted contains no lines that begin with the tab character.

Another small complication... If stdin does not end with a newline character, the UNIX sort program automatically adds one. Your mysort program need not do that. In other words, you may assume that the file to be sorted ends with a newline character.

The program's primary data structure should be an array of pointers. Each pointer should point to the zero'th character of a null-terminated array of characters (that is, a string). Each string should contain the characters of one line of stdin. The program should dynamically allocate the memory occupied by each string. The amount of memory allocated for each string should depend upon the length of the string.

There should be no memory leaks in the program. For each call of the malloc or calloc function, there should be a matching call of the free function.

The program should use these functions to sort the array of strings:

/*------------------------------------------------------------------*/

void swap(char *ppcArray[], int iOne, int iTwo)

/* Swap ppcArray[iOne] and ppcArray[iTwo]. */
 
{
   char *pcTemp;
   pcTemp = ppcArray[iOne];
   ppcArray[iOne] = ppcArray[iTwo];
   ppcArray[iTwo] = pcTemp;
}

/*------------------------------------------------------------------*/

int partition(char *ppcArray[], int iLeft, int iRight)

/* Divide ppcArray[iLeft...iRight] into two partitions so elements 
   in the first partition are <= elements in the second partition.
   Return the index of the element that marks the partition 
   boundary. */

{
   int iFirst = iLeft-1;
   int iLast = iRight;

   while (1)
   {
      while (strcmp(ppcArray[++iFirst], ppcArray[iRight]) < 0)
         ;
      while (strcmp(ppcArray[iRight], ppcArray[--iLast]) < 0)
         if (iLast == iLeft)
            break;
      if (iFirst >= iLast)
         break;
      swap(ppcArray, iFirst, iLast);
   }
   swap(ppcArray, iFirst, iRight);
   return iFirst;
}

/*------------------------------------------------------------------*/

void quicksort(char *ppcArray[], int iLeft, int iRight)

/* Sort ppcArray[iLeft...iRight] in ascending order. */

{
   if (iRight > iLeft)
   {
      int iMid = partition(ppcArray, iLeft, iRight);
      quicksort(ppcArray, iLeft, iMid - 1);
      quicksort(ppcArray, iMid + 1, iRight);
   }
}

/*------------------------------------------------------------------*/

Those functions implement the quicksort algorithm as presented in the Sedgewick textbook.

More precisely, create these files:

quicksort.c (containing the C language definition of the quicksort function)
quicksort.s (containing the assembly language definition of the quicksort function)
partition.c (containing the C language definition of the partition function)
partition.s (containing the assembly language definition of the partition function)
swap.c (containing the C language definition of the swap function)
swap.s (containing the assembly language definition of the swap function)
mysort.c (containing the C language definitions of all other functions)

You need not create a mysort.s file.

Somewhat unrealistically, you need not create C header (.h) files. Instead, it is sufficient to place explicit function declarations in each .c file as necessary.

Your source code files should be such that you can build the mysort program using mysort.c along with either quicksort.c or quicksort.s, either partition.c or partition.s, and either swap.c or swap.s.

Testing

For the mysort program you should build and test programs using at least these combinations:

mysort.c, quicksort.c, partition.c, and swap.c
mysort.c, quicksort.c, partition.c, and swap.s
mysort.c, quicksort.c, partition.s, and swap.c
mysort.c, quicksort.s, partition.c, and swap.c
mysort.c, quicksort.s, partition.s, and swap.s

Describe your testing strategy in your readme file. Your descriptions should be structured using these major headings:

Boundary condition testing
Logical path testing
Stress testing
Memory management testing
Producing output that is known to be right/wrong
Test automation

Develop a UNIX shell script named grade5 and some associated data files to automate your testing strategy. A UNIX shell script is simply a text file that contains UNIX commands, and that has been made executable via the chmod command:

chmod 700 grade5

The grade5 script should build and execute your programs. It is acceptable for your grade5 script to call other scripts that you create.

Logistics

You should develop on hats. Use xemacs to create source code. Use gdb to debug.

You should not use a C compiler to produce your assembly language programs. Doing so would be considered an instance of academic dishonesty. Rather you should produce your assembly language programs manually.

We encourage you to develop "flattened" C code (as described in precepts) to bridge the gap between your "normal" C code and your assembly language code. Using flattened C code as a bridge can eliminate logic errors from your assembly language code, leaving only the possibility of translation errors.

We also encourage you to use your flattened C code as comments in your assembly language code. Such comments can clarify your assembly language code substantially.

You should submit:

Your .c and .s files. Your .c files should contain "normal" C code, not "flattened" C code.
Your grade5 shell script, and any other shell scripts that it calls.
The data files that your shell script uses to test your programs.
A readme file.

Your readme file should contain:

Your name.
A description of whatever help (if any) you received from others while doing the assignment, and the names of any individuals with whom you collaborated, as prescribed by the course Policies web page.
A description of your testing strategy.
(Optionally) An indication of how much time you spent doing the assignment.
(Optionally) Your assessment of the assignment.
(Optionally) Any information that will help us to grade your work in the most favorable light. In particular you should describe all known bugs.

Submit your work electronically via the commands:

/u/cos217/bin/i686/submit 5 mywc.c mywc.s
/u/cos217/bin/i686/submit 5 mysort.c quicksort.c quicksort.s partition.c partition.s swap.c swap.s 
/u/cos217/bin/i686/submit 5 grade5 yourothershellscripts yourdatafiles
/u/cos217/bin/i686/submit 5 readme

Grading

We will grade your work on functionality and design. We will consider understandability to be an important aspect of good design.

Comments in your assembly language programs are especially important. Each assembly language function (especially the main function) should have a comment that describes what the function does. Local comments within your assembly language functions are equally important. Comments copied from corresponding "flattened" C code are particularly helpful.

Testing is a substantial aspect of the assignment. Approximately 15% of the grade will be based upon the description of your testing strategy, and the extent to which you have automated the strategy via scripts and data files.

To encourage good coding practices, we will take off points based on warning messages during preparation of your programs.

Princeton University COS 217: Introduction to Programming Systems