Princeton University
COS 217: Introduction to Programming Systems

Assignment 4: UNIX Commands in SPARC Assembly Language

Purpose

The purpose of this assignment is to help you learn about SPARC architecture and assembly language programming. It will also give you the opportunity to learn more about the GNU/UNIX programming tools, especially bash, Emacs, gcc, gdb (for assembly language programs), and make.

Background: echo

The UNIX operating system has a command named echo. echo prints its command-line arguments to standard output. It prints a single space between each argument, and prints a newline character at the end of the argument sequence. If there are no arguments, echo prints only a newline character. 

Consider some examples. If we show a space as "s" and a newline character as "n", then the command:

--> echosonestwosthreesfoursfiven
prints this line to standard output:
onestwosthreesfoursfiven
The command:
--> echosones"twosssthree"sfoursssfiven
prints this line to standard output:
onestwosssthreesfoursfiven
The command:
--> echos%sn
prints this line to standard output:
%sn

Background: wc

Another UNIX command is wc (word count). In its simplest form, wc reads characters from standard input until end-of-file, and prints to standard output a count of how many lines, words, and characters it has read. It prints the three counts on the same line, each in a field of width 8. A "word" is a sequence of characters that is delimited by one or more whitespace characters, as defined by the C standard function isspace

For example, if the file named "proverb" contains these characters:

Learningsissan
treasureswhichn
accompaniessitsn
ownerseverywhere.n
--sChinesesproverbn

then the command:

--> wc < proverb

prints this line to standard output:

sssssss5ssssss12ssssss82n
If the file "proverb2" contains these characters:
Learningsissan
treasureswhichn
accompaniessitsn
ownerseverywhere.n
--sssChinesesproverb

(note that the last "line" does not end with a newline character) then the command:

--> wc < proverb2

prints this line to standard output:

sssssss4ssssss12ssssss83n

Background: sort

Yet another commonly used UNIX command is sort. In its simplest form, it reads lines from standard input, sorts them into ascending (i.e. alphabetical, i.e. lexicographic) order, and prints them to standard output. For example, if the file proverb contains these lines:

Learning is a
treasure which
accompanies its
owner everywhere.
-- Chinese proverb

then the command

--> sort < proverb

prints these lines to standard output:

-- Chinese proverb
Learning is a
accompanies its
owner everywhere.
treasure which

Your Task

This assignment asks you to create your own versions of the echo, wc, and sort commands in C and in SPARC assembly language, as specified below.

echo1

Create a C program in a file named echo1.c, from which gcc can produce an executable file named echo1. echo1 should have the same functionality as echo.

echo2

Create a SPARC assembly language program in a file named echo2.S, from which gcc can produce an executable file named echo2. echo2 should have the same functionality as echo.

wc1

Create a C program in a file named wc1.c, from which gcc can produce an executable file named wc1. wc1 should implement the subset of wc described above. (wc1 need not process command-line arguments as wc does.)

Hint: The program should read one character at a time from standard input. You will find it most convenient to call the getchar function to do that.

Hint: A printf format string can contain "field widths."  For example, the printf format string "%8d" indicates that the given integer should be printed in a field of width 8, padded with leading spaces.

wc2

Create a SPARC assembly language program in a file named wc2.S, from which gcc can produce an executable file named wc2. Like wc1, wc2 should implement the subset of wc described above.

sort1

Create a C program in a file named sort1.c, from which gcc can produce an executable file named sort1. The functionality of the sort1 program should be a subset of the functionality of the UNIX standard sort program. The sort1 program should read lines from standard input, sort them in ascending order, and print them to standard output. It need not process command-line arguments as the UNIX sort program does. It may assume that the final line of standard input ends with a newline character.

You should not make any assumptions about how many lines are in standard input. Hint: You may find the sort programs discussed in early precepts and the standard realloc function helpful.

However, you may assume that no line of standard input contains more than 1023 characters. (The terminating newline character is included in that count.) Your program need not work properly when a line of standard input contains more than 1023 characters, but it should not corrupt memory. Hint: You will find the fgets and fputs functions useful.

The program's primary data structure should be an array of pointers. Each pointer should point to the zero'th character of a null-terminated array of characters (i.e., a string). Each string should contain the characters of one line of standard input. The program should dynamically allocate the memory occupied by each string. The amount of memory allocated for each string should be less than or equal to 1024 bytes, depending upon the length of the string.

The program should use these functions to sort the array of strings:

int mystrcmp(const char *pcFirst, const char *pcSecond)

/* Compare pcFirst to pcSecond.  Return <0, >0, or 0 if pcFirst is less
   than, greater than, or equal to pcSecond (respectively). */

{
   while ((*pcFirst != '\0') && (*pcFirst == *pcSecond))
   {
      pcFirst++;
      pcSecond++;
   }
   return *pcFirst - *pcSecond;
}

void swap(char *ppcArray[], int iOne, int iTwo)

/* Swap ppcArray[iOne] and ppcArray[iTwo]. */
 
{
   char *pcTemp;
   pcTemp = ppcArray[iOne];
   ppcArray[iOne] = ppcArray[iTwo];
   ppcArray[iTwo] = pcTemp;
}

int partition(char *ppcArray[], int iLeft, int iRight)

/* Divide ppcArray[iLeft...iRight] into two partitions so elements 
   in the first partition are <= elements in the second partition.
   Return the index of the element that marks the partition 
   boundary. */

{
   int iFirst = iLeft-1;
   int iLast = iRight;

   while (1)
   {
      while (mystrcmp(ppcArray[++iFirst], ppcArray[iRight]) < 0)
         ;
      while (mystrcmp(ppcArray[iRight], ppcArray[--iLast]) < 0)
         if (iLast == iLeft)
            break;
      if (iFirst >= iLast)
         break;
      swap(ppcArray, iFirst, iLast);
   }
   swap(ppcArray, iFirst, iRight);
   return iFirst;
}

void quicksort(char *ppcArray[], int iLeft, int iRight)

/* Sort ppcArray[iLeft...iRight] in ascending order. */

{
   if (iRight > iLeft)
   {
      int iMid = partition(ppcArray, iLeft, iRight);
      quicksort(ppcArray, iLeft, iMid - 1);
      quicksort(ppcArray, iMid + 1, iRight);
   }
}
Those functions implement the quicksort algorithm as presented in Princeton's COS 126 course.

sort2

Suppose your sort1 program is unacceptably slow. Further suppose that a performance analysis indicates that sort1 spends most of its time executing the quicksort, partition, swap, and mystrcmp functions...

Create a file named sort2.c that is the same as sort1.c except that the definitions of the quicksort, partition, swap, and mystrcmp functions have been removed. Then create a file named sort2.S containing SPARC assembly language versions of quicksort, partition, swap, and mystrcmp. When linked together, sort2.c and sort2.S thus create sort2 -- a (presumably) faster version of your sort1 program. The functionality of sort2 should be the same as that of sort1.

You should implement the swap and mystrcmp functions as leaf subroutines, as defined in your Paul textbook in Section 7.9. Note that, from the calling subroutine's point of view, a properly designed leaf subroutine is indistinguishable from a non-leaf subroutine.

Suggestion: Create sort2.S by translating functions from C to assembly language one at a time. Here is how you might proceed:

  1. Copy file sort1.c to sort2.c.
  2. Remove a function from sort2.c.
  3. Add an assembly language version of that function to sort2.S.
  4. Test the hybrid C/assembly language program by executing the commands:
    gcc -Wall -ansi -pedantic -o sort2 sort2.c sort2.S
    sort2 < inputfile1
    sort2 < inputfile2
    ...
  5. Repeat steps 2 through 4 for each of the four functions.

Logistics

You should develop on arizona. Use Emacs to create source code. Use the "make" tool to automate the build process. Use gdb to debug.

In accord with the purpose of the assignment, you should not use a C compiler to produce your assembly language programs. Rather you should produce them manually.

We recommend that you not use the m4 preprocessor. We recommend that you use the C preprocessor to define symbolic constants.

You need not optimize your assembly language programs, but we encourage you to do so. In particular, we encourage you to use registers instead of memory whenever possible. We will give you extra credit -- up to 5% -- if you minimize the use of "nop" instructions and optimize "if," "while," and "for" constructs using the standard control structure optimization patterns.

We encourage you to develop "flattened C" code (as described in precepts) to bridge the gap between your "normal" C code and your assembly language code. Using flattened C code as a bridge can essentially eliminate logic errors from your assembly language code, leaving only the possibility of translation errors. We also encourage you to use your flattened C code as comments in your assembly language code. Such comments can dramatically enhance the understandability of your assembly language code.

You should submit:

Your readme file should contain:

Submit your work electronically via the command:

/u/cs217/bin/submit 4 echo1.c echo2.S wc1.c wc2.S sort1.c sort2.c sort2.S makefile readme

Grading

We will grade your work on functionality and design. We will consider understandability to be an important aspect of good design. 

Comments in your assembly language programs are especially important. If an assembly language program fails a test, we will attempt to find the error(s) by relating each code fragment to the comments that describe it. (Comments copied from a flattened C program are particularly helpful in that regard.) But if your program is poorly documented, it is unlikely that we would be able to find errors in a reasonable amount of time. Thus an erroneous program that is poorly documented would receive little partial credit.

To encourage good coding practices, we will take off points based on warning messages during compilation of your C programs.