Princeton University
COS 217: Introduction to Programming Systems

Assignment 2: A String Module


Purpose

The purpose of this assignment is to help you learn/review (1) arrays and pointers in the C programming language, (2) how to create and use stateless modules in C, (3) the "design by contract" style of programming, and (4) how to use the GNU/Unix programming tools, especially Bash, Emacs, gcc217, and GDB.


Background

As you know, the C programming environment contains a standard library. The facilities provided in the standard library are declared in header files. One of those header files is string.h; it contains the declarations of "string functions," that is, functions that perform operations on character strings. Appendix D of the King textbook and the UNIX "man" pages describe the string functions.


Your Task

Your task in this assignment is to use C to create a "Str" module that contains versions of the most commonly used standard string functions. Specifically, design your Str module so it contains these functions, each of which behaves the same as a corresponding standard C function:

Str Function Standard C Function
Str_getLength() strlen()
Str_copy() strcpy()
Str_ncopy() strncpy()
Str_concat() strcat()
Str_nconcat() strncat()
Str_compare() strcmp()
Str_ncompare() strncmp()
Str_search() strstr()

The Details

Use "design by contract." Design each function comment so it describes that function's "checked runtime errors." Design each function definition so it calls the assert() macro to enforce those checked runtime errors. (In that way your Str functions should differ from the standard string functions.) Specifically, design each function definition so it calls assert() to make sure that none of its pointer/array formal parameters is NULL.

Do not add any other calls to assert() to your code. But consider whether it would be possible to do so. In particular, provide answers to these two questions in your readme file:

  1. Is it possible for Str_copy(), Str_ncopy(), Str_concat(), or Str_nconcat() to call assert() to verify that the specified destination memory area is large enough? Explain.
  2. Is it possible for Str_ncopy(), Str_nconcat(), or Str_ncompare() to call assert() to verify that the specified length parameter is non-negative? Explain.

Create two implementations of your Str module. Place the first implementation in a file named stra.c. Design the function definitions in stra.c such that they use array notation, and traverse each given string using an index relative to the beginning of the string. For example, in stra.c you might define the Str_getLength() function like this:

size_t Str_getLength(const char pcSrc[])
{
   size_t uiLength = 0U;
   assert(pcSrc != NULL);
   while (pcSrc[uiLength] != '\0')
      uiLength++;
   return uiLength;
}

Note that the type of uiLength is size_t. The type size_t is defined in the standard header file stddef.h. It is a system-dependent unsigned integral type that is large enough to hold the length of any string. Typically it is defined to be identical to either unsigned int or unsigned long. On hats, it is identical to unsigned int. Several of the standard string functions use type size_t, and so several of your functions should use it too.

Note that the initial value of uiLength is 0U. The "U" suffix indicates that the literal is of type unsigned int instead of int. Simply initializing uiLength to 0 would work because C allows initialization of an unsigned int variable with an int literal. However, as a matter of style, we suggest that you avoid mixing types within expressions.

Place the second implementation in a file named strp.c. Design the function definitions in strp.c such that they use pointer notation, and traverse each given string using an incremented pointer -- not using an index relative to the beginning of the string. For example, in strp.c you might define the Str_getLength() function like this:

size_t Str_getLength(const char *pcSrc)
{
   const char *pcEnd = pcSrc;
   assert(pcSrc != NULL);
   while (*pcEnd != '\0')
      pcEnd++;
   return (size_t)(pcEnd - pcSrc);
}

It is unacceptable create the strp.c implementation from the stra.c implementation simply by replacing each occurrence of "a[i]" with "*(a+i)". For example, in strp.c this definition of Str_getLength() is unacceptable:

size_t Str_getLength(const char *pcSrc)  /* unacceptable */
{                                        /* unacceptable */
   size_t uiLength = 0U;                 /* unacceptable */
   assert(pcSrc != NULL);                /* unacceptable */
   while (*(pcSrc + uiLength) != '\0')   /* unacceptable */
      uiLength++;                        /* unacceptable */
   return uiLength;                      /* unacceptable */
}                                        /* unacceptable */

Note that the function traverses the given string by incrementing the integer uiLength, not by incrementing a pointer. Also note that the function effectively uses uiLength as an index relative to the beginning of the string pcStr.

Place the Str module's interface in a file named str.h. You may use either array or pointer notation in the interface. The two notations are equivalent to the compiler, so both implementations match the one interface. Use the "#ifndef...#define...#endif" construct to protect your str.h file against accidental multiple inclusion.

This assignment does not focus on efficiency. Nevertheless, your functions should not be grossly inefficient. For example, a function is grossly inefficient if it traverses a (potentially long) string more than once when a single traversal would suffice.

Design your Str functions so they do not call any of the standard string functions. In the context of this assignment, pretend that the standard string functions do not exist. However your functions may call each other, and you may define additional (non-interface) functions.

Beware of type mismatches. In particular, beware of the difference between type size_t and type int: a variable of type size_t can store larger integers than a variable of type int can. Your functions should (in principle) be able to handle strings whose lengths exceed the capacity of type int. Also beware of type mismatches related to the use of the "const" keyword. Using the Splint tool, as described below, will help you to detect type mismatches.

In your assignment solution you may use any of the definitions of the Str_getLength() function given in this assignment specification.


Logistics

Create your Str module on hats using the Bash shell, Emacs, gcc217, and GDB.

A client that you can use to test your Str module is available in the file /u/cos217/Assignment2/teststr.c.

Create a "readme" text file that contains:

Submit all source code files, the files that gcc217 generated from them, and your readme file electronically on hats via the commands:

submit 2 readme
submit 2 str.h
submit 2 stra.c stra.i stra.s stra.o
submit 2 strp.c strp.i strp.s strp.o
submit 2 teststr.i teststr.s teststr.o
submit 2 teststra teststrp

where teststra is an executable binary file built using teststr.o and stra.o, and teststrp is an executable file built using teststr.o and strp.o.


Extra Credit (up to 10 points)

Create a program named strreplace. The strreplace program should be a multi-file program consisting of a file named strreplace.c and the files that comprise your Str module: str.h and either stra.c or strp.c. That is, your strreplace.c file should define a client of your Str module.

The program should perform string replacements. It should be called with two command-line arguments. If the user does not supply exactly two command-line arguments, or argv[1] is the empty string, then the program should print an error message to stderr and return EXIT_FAILURE. Otherwise the program should read lines from stdin and write them to stdout, replacing each distinct occurrence of argv[1] with argv[2]. It should write to stderr the number of replacements made. Finally the program should return 0. For example, this command:

strreplace abc def < infile > outfile

should read from stdin (and thus from infile), replace each occurrence of "abc" with "def", write the result to stdout (and thus to outfile), write the number of replacements made to stderr, and return 0.

The program should assume that no line of stdin contains more than 1023 characters, including the trailing newline character. That is, the program can read each stdin line into an array of characters of length 1024.

You will find it helpful to learn about the standard C fgets() and fputs() functions, and about handling command-line arguments in C. Those topics are described in our King book.

The /u/cos217/Assignment2 directory contains an executable binary file named samplestrreplace. Your strreplace program should have the same behavior as samplestrreplace. That is, when given the same input as samplestrreplace, your strreplace program should write exactly the same data to stdout and stderr as samplestrreplace does.

Submit your program electronically on hats via the commands:

submit 2 strreplace.c strreplace.i strreplace.s strreplace.o
submit 2 strreplace

where strreplace is an executable binary file built using strreplace.o and either stra.o or strp.o.


Grading

We will grade your work on quality from the user's point of view and quality from the programmer's point of view. To encourage good coding practices, we will deduct points if gcc217 generates warning messages.

From the user's point of view, a program has quality if it behaves as it should. The correct behavior of the Str module is defined by the previous sections of this assignment specification, and by the C90 specifications of the corresponding string.h functions.

From the programmer's point of view, a program has quality if it is well styled and thereby simple to maintain. In part, style is defined by the rules given in The Practice of Programming (Kernighan and Pike), as summarized by the Rules of Programming Style document. These additional rules apply:

Names: You should use a clear and consistent style for variable and function names. One example of such a style is to prefix each variable name with characters that indicate its type. For example, the prefix "c" might indicate that the variable is of type char, "i" might indicate int, "pc" might mean pointer to char, "ui" might mean unsigned int, etc. But it is fine to use another style -- a style which does not include the type of a variable in its name -- as long as the result is a readable program.

Line lengths: Limit line lengths in your source code to 72 characters. Doing so allows us to print your work in two columns, thus saving paper.

Comments: Each source code file should begin with a comment that includes your name, the number of the assignment, and the name of the file.

Comments: Each function should begin with a comment that describes what the function does. The function comment should:

In short, a function's comment should describe the flow of data into and out of the function. For example, this is an appropriate way to comment the Str_getLength() function:

In file str.h:

...
size_t Str_getLength(const char pcSrc[]);
/* Return the length of string pcSrc.
   It is a checked runtime error for pcSrc to be NULL. */
   ...

In file strp.c:

...
size_t Str_getLength(const char *pcSrc)
/* Return the length of string pcSrc.
   It is a checked runtime error for pcSrc to be NULL. */
{
   const char *pcEnd = pcSrc;
   assert(pcSrc != NULL);
   while (*pcEnd != '\0')
      pcEnd++;
   return (size_t)(pcEnd - pcSrc);
}
...

Note that the comment explicitly states what the function returns, explicitly refers to the function's parameter (pcSrc), and explicitly describes the function's checked runtime error.


Note: Using Idioms

C programmers sometimes use idioms that rely on the fact that the end-of-string character, the NULL pointer, and FALSE have the same representation. You may use those idioms. For example, you may define your Str_getLength() function like this:

size_t Str_getLength(const char pcSrc[])
{
   size_t uiLength = 0U;
   assert(pcSrc);          /* NULL and FALSE are identical. */
   while (pcSrc[uiLength]) /* End-of-string and FALSE are identical. */
      uiLength++;
   return uiLength;
}

or like this:

size_t Str_getLength(const char *pcSrc)
{
   const char *pcEnd = pcSrc;
   assert(pcSrc);    /* NULL and FALSE are identical. */
   while (*pcEnd)    /* End-of-string and FALSE are identical. */
      pcEnd++;
   return (size_t)(pcEnd - pcSrc);
}

But you are not required to use those idioms. In fact, we recommend that you avoid the use of idioms that adversely affect understandability.


Note: "Const" and the Str_search() Function

The use of the "const" keyword within the Str_search() function is tricky, as this question/answer sequence indicates.

Question

According to the man pages, the formal parameters of the strstr() function are of type const char*. That implies that the formal parameters of Str_search() also should be of type const char*. Why aren't they of type char*?

Answer

Suppose you were to define your Str_search() function like this:

char *Str_search(char *pcHaystack, char *pcNeedle)
{
   ...
}

Further suppose the client then calls Str_search() like this:

const char *pcBig = "hello";
const char *pcSmall = "lo";;
...
... Str_search(pcBig, pcSmall) ...
...

(Note that's a perfectly reasonable way to call the function.) In that case the compiler, noting that pcBig is of type const char* and that pcHaystack is of type char*, would generate a warning on the function call. Thus pcHaystack (and pcNeedle) should be of type const char*.

Question

According to the man pages, the return type of strstr() is char*. That implies that the return type of Str_search() also should be of type char*. Why isn't the return type const char*?

Answer

Suppose you were to define Str_search() like this:

const char *Str_search(const char *pcHaystack, const char *pcNeedle)
{
   ...
}

Further suppose the client then calls Str_search() like this:

char *pcBig = "hello";
char *pcSmall = "lo";
char *pc;
...
pc = Str_search(pcBig, pcSmall);
...

(Note that's a perfectly reasonable way to call the function.) In that case the compiler, noting that pc is of type char* and that the value returned by Str_search() is of type const char*, would generate a warning on the assignment statement. Thus the return type of Str_search() should be char* and should not be const char*.

Question

Within the definition of Str_search(), I decided to define a local variable named pc that "points into" the first of the two given strings, as indicated by this code:

char *Str_search(const char *pcHaystack, const char *pcNeedle)
{
   ...
   pc = pcHaystack;
   ...
   /* Increment pc so it points to the appropriate character. */
   ...
   return pc;
}

If I define pc to be of type char*, then the assignment statement generates a warning. If I define pc to be of type const char*, then the return statement generates a warning. How can I resolve that problem?

Answer

Unfortunately, C provides no elegant solution. We recommend that you define pc to be of type const char* so the assignment statement is warningless. Then use a cast operator in the return statement:

return (char*)pc;

to explicitly inform the compiler to not generate a warning. C programmers refer to that solution as "casting away the constness" of the variable. Sadly, often that inelegant technique is unavoidable.


Note: Using Splint

Splint is a high-powered static code analysis tool that was developed at the University of Virginia. It is available to you on the hats cluster.

Typically you execute Splint after your program builds cleanly. Splint writes to stdout a list of stylistic flaws that it finds in your source code. Some of Splint's flaw messages may be a bit cryptic; feel free to ask your preceptor about them as the need arises.

Assuming that you have configured your computing environment as described in the document entitled "A Minimal COS 217 Computing Environment" (from the first precept), using Splint is simple. At the bash prompt, execute the command:

splint sourcecodefiles

For example, to use Splint for this assignment, you would execute these commands:

splint teststr.c stra.c
splint teststr.c strp.c
splint strreplace.c stra.c  (extra credit)
splint strreplace.c strp.c  (extra credit)

You should critique your code using Splint before you submit, and should edit your code to eliminate all Splint warnings. Using Splint on your code might improve its quality, and so might result in higher grades.

If you would like to learn more about Splint, see www.splint.org.


Note: Function Return Types

The assignment requires that you use pointer notation in your strp.c file. So, for example, you should define your Str_copy() function like this:

char *Str_copy(char *pcDest, const char *pcSrc)
{
   ...
}

The assignment requires that you use array notation in your stra.c file. So it would be logical to try to define your Str_copy() function something like this:

char[] Str_copy(char pcDest[], const char pcSrc[])
{
   ...
}

Sadly, C doesn't allow "char[]" as a function return type. So in your stra.c file, you will need to define your Str_copy() function in this hybrid way:

char *Str_copy(char pcDest[], const char pcSrc[])
{
   ...
}

That same point applies to the definitions of several other functions as well.


Note: Descriptions of strncpy() and strncat()

The man pages for the strncpy() and strncat() functions are terse. Below are the descriptions of those functions from the book C: A Reference Manual by Harbison and Steele. They are more complete, and might be easier to understand than the man pages.

char *strncpy(char *dest, const char *src, size_t n);

"The function strncpy copies exactly n character to dest. It first copies up to n characters from src. If there are fewer than n characters in src before the terminating null character, then null characters are written into dest as padding until exactly n characters have been written. If there are n or more characters in src, then only n characters are copied, and so only a truncated copy of src is transferred to dest. If follows that the copy in dest is terminated with a null by strncpy only if the length of str (not counting the terminating null) is less than n. If the value of n is zero ..., then calling strncpy function has no effect. The value of dest is always returned."

char *strncat(char *dest, const char *src, size_t n);

"The strncat function appends up to n characters from the contents of src to the end of dest. If the null character that terminates src is encountered before n characters have been copied, then the null character is copied but no more. If no null character appears among the first n characters of src, then the first n characters are copied and a null character is supplied to terminate the destination string; that is, n+1 characters in all are written. If the value of n is zero ..., then calling strncat has no effect. The function always returns dest."