The purpose of this assignment is to help you learn (1) arrays and pointers in the C programming language, (2) how to create and use stateless modules in C, (3) the "design by contract" style of programming, and (4) how to use the GNU/Unix programming tools, especially bash
, emacs
, gcc217
, and gdb
.
Implementing the Str_search
function (as described below) is the "on your own" part of this assignment. That part is worth 14% of this assignment.
As you know, the C programming environment contains a standard library. The facilities provided in the standard library are declared in header files. One of those header files is string.h
; it contains the declarations of "string functions," that is, functions that perform operations on character strings. Appendix D of the King textbook and the Unix "man" pages describe the string functions.
Your task in this assignment is to use C to create a Str
module that contains versions of the most commonly used standard string functions. Specifically, design your Str
module so it contains these functions, each of which behaves the same as a corresponding standard C function:
Str Function
|
Standard C Function |
Str_getLength
|
strlen
|
Str_copy
|
strcpy
|
Str_ncopy
|
strncpy
|
Str_concat
|
strcat
|
Str_nconcat
|
strncat
|
Str_compare
|
strcmp
|
Str_ncompare
|
strncmp
|
Str_search
|
strstr
|
Design each function definition so it calls the assert
macro to validate the function's parameters. Specifically, design each function definition to assert
that each pointer/array formal parameter is not NULL
.
Do not add any other calls to assert
to your code. But consider whether it would be possible to do so. In particular, provide answers to these two questions in your readme
file:
Str_copy
, Str_ncopy
, Str_concat
, or Str_nconcat
to call assert
to verify that the destination memory area specified by the caller is large enough? Explain.Str_ncopy
, Str_nconcat
, or Str_ncompare
to call assert
to verify that the length parameter specified by the caller is non-negative? Explain.Create two implementations of your Str
module. Place the first implementation in a file named stra.c
. Design the function definitions in stra.c
such that they use array notation as much as possible, and traverse each given string using an index relative to the beginning of the string. For example, in stra.c
you might define the Str_getLength
function like this:
size_t Str_getLength(const char pcSrc[]) { size_t uiLength = 0U; assert(pcSrc != NULL); while (pcSrc[uiLength] != '\0') uiLength++; return uiLength; }
Note that the type of uiLength
is size_t
. The type size_t
is defined in the standard header file stddef.h
. It is a system-dependent unsigned integral type that is large enough to hold the length of any string. Typically it is defined to be identical to either unsigned int
or unsigned long
. On hats, it is identical to unsigned long
. Several of the standard string functions use type size_t
, and so several of your functions should use it too.
Note that the initial value of uiLength
is 0U
. The U
suffix indicates that the constant is of type unsigned int
instead of int
. Simply initializing uiLength
to 0
would work because C allows initialization of an unsigned int
variable with an int
constant. However, as a matter of style, we suggest that you avoid mixing types within expressions.
Place the second implementation in a file named strp.c
. Design the function definitions in strp.c
such that they use pointer notation as much as possible, and traverse each given string using an incremented pointer -- not using an index relative to the beginning of the string. For example, in strp.c
you might define the Str_getLength
function like this:
size_t Str_getLength(const char *pcSrc) { const char *pcEnd = pcSrc; assert(pcSrc != NULL); while (*pcEnd != '\0') pcEnd++; return (size_t)(pcEnd - pcSrc); }
It is unacceptable create the strp.c
implementation from the stra.c
implementation simply by replacing each occurrence of a[i]
with *(a+i)
. For example, in strp.c
this definition of Str_getLength
is unacceptable:
size_t Str_getLength(const char *pcSrc) /* unacceptable */ { /* unacceptable */ size_t uiLength = 0U; /* unacceptable */ assert(pcSrc != NULL); /* unacceptable */ while (*(pcSrc + uiLength) != '\0') /* unacceptable */ uiLength++; /* unacceptable */ return uiLength; /* unacceptable */ } /* unacceptable */
Note that the function traverses the given string by incrementing the integer uiLength
, not by incrementing a pointer. Also note that the function effectively uses uiLength
as an index relative to the beginning of the string pcStr
.
Place the Str
module's interface in a file named str.h
. You may use either array or pointer notation in the interface. The two notations are equivalent to the compiler, so both implementations match the one interface. Use the #ifndef...#define...#endif
construct to protect your str.h
file against accidental multiple inclusion.
This assignment does not focus on efficiency. Nevertheless, your functions should not be grossly inefficient. For example, a function is grossly inefficient if it traverses a (potentially long) string multiple times when a single traversal would suffice.
Design your Str
functions so they do not call any of the standard string functions. In the context of this assignment, pretend that the standard string functions do not exist. However your functions may call each other, and you may define additional (non-interface) functions.
Beware of type mismatches. In particular, beware of the difference between type size_t
and type int
: a variable of type size_t
can store larger integers than a variable of type int
can. Your functions should (in principle) be able to handle strings whose lengths exceed the capacity of type int
. Also beware of type mismatches related to the use of the const
keyword. Using the splint
tool, as described below, will help you to detect type mismatches.
In your assignment solution you may use any of the definitions of the Str_getLength
function given in this assignment specification.
Create your Str
module on hats using bash
, emacs
, gcc217
, and gdb
.
A client that you can use to test your Str
module is available in the file /u/cos217/Assignment2/teststr.c
.
Create a readme
text file that contains:
Use the gcc217
command to preprocess, compile, assemble, and link your program. Perform each step individually, and examine the intermediate results to the extent possible.
Submit all source code files, the files that gcc217
generated from them, and your readme
file electronically on hats via the commands:
submit 2 readme submit 2 str.h submit 2 stra.c stra.i stra.s stra.o submit 2 strp.c strp.i strp.s strp.o submit 2 teststr.i teststr.s teststr.o submit 2 teststra teststrp
where:
teststra
is an executable file built using teststr.o
and stra.o
.teststrp
is an executable file built using teststr.o
and strp.o
.splint
splint
is a high-powered static code analysis tool developed by researchers at the University of Virginia. It is available to you on the hats cluster.
Typically you execute splint
after your program builds cleanly. splint
writes to the standard output stream a list of stylistic flaws that it finds in your source code. Some of splint
's warnings may be a bit cryptic; feel free to ask your preceptor about them as the need arises.
Assuming that you have copied the .splintrc
file from the /u/cos217
directory to your HOME directory (as described in the document entitled A Minimal COS 217 Computing Environment from the first precept), using splint
is simple. At the bash
prompt, issue the command:
splint sourcecodefiles
For example, to use splint
for this assignment you should issue these commands:
splint teststr.c stra.c splint teststr.c strp.c
You should think of splint
and the given .splintrc
file as defining coding standards for the COS 217 course. You should critique your code using splint
before you submit, and should edit your code to eliminate all splint
warnings.
If you cannot eliminate a splint
warning on some particular line of your source code and you believe that the warning is without merit, then explain in your readme
file. Your explanation should refer to the source code line by number. Your readme
file should contain one explanation for each splint
warning.
If you would like to learn more about splint
, see www.splint.org.
We will grade your work on quality from the user's point of view and quality from the programmer's point of view. To encourage good coding practices, we will deduct points if gcc217
generates warning/error messages. We also will deduct points if splint
generate warning messages that are not explained in your readme
file.
From the user's point of view, a program has quality if it behaves as it should. The correct behavior of the Str
module is defined by the previous sections of this assignment specification, and by the C90 specifications of the corresponding string.h
functions.
From the programmer's point of view, a program has quality if it is well styled and thereby simple to maintain. In part, style is defined by splint
(as noted above). Style also is defined by the rules given in The Practice of Programming (Kernighan and Pike), as summarized by the Rules of Programming Style document. These additional rules apply:
Names: You should use a clear and consistent style for variable and function names. One example of such a style is to prefix each variable name with characters that indicate its type. For example, the prefix c
might indicate that the variable is of type char
, i
might indicate int
, pc
might mean char*
, ui
might mean unsigned int
, etc. But it is fine to use another style -- a style which does not include the type of a variable in its name -- as long as the result is a readable program.
Line lengths: Limit line lengths in your source code to 72 characters. Doing so allows us to print your work in two columns, thus saving paper.
Comments: Each source code file should begin with a comment that includes your name, the number of the assignment, and the name of the file.
Comments: Each function should begin with a comment. It's fine to place a function's comment either before or after the function's heading. A function's comment should describe what the function does. A function's comment should:
In short, a function's comment should describe the flow of data into and out of the function. For example, this is an appropriate way to comment the Str_getLength
function:
In file str.h
:
... size_t Str_getLength(const char pcSrc[]); /* Return the length of string pcSrc. */ ...
In file strp.c
:
... size_t Str_getLength(const char *pcSrc) /* Return the length of string pcSrc. */ { const char *pcEnd = pcSrc; assert(pcSrc != NULL); while (*pcEnd != '\0') pcEnd++; return (size_t)(pcEnd - pcSrc); } ...
Note that the comment explicitly states what the function returns and explicitly refers to the function's parameter (pcSrc
).
C programmers sometimes use idioms that rely on the fact that the end-of-string character, the NULL
pointer, the integer 0, and the logical concept FALSE have the same representation. You may use those idioms. For example, you may define Str_getLength
like this:
size_t Str_getLength(const char pcSrc[]) { size_t uiLength = 0U; assert((int)pcSrc); /* NULL, 0, and FALSE are identical. */ while (pcSrc[uiLength]) /* End-of-string and FALSE are identical. */ uiLength++; return uiLength; }
or like this:
size_t Str_getLength(const char *pcSrc) { const char *pcEnd = pcSrc; assert((int)pcSrc); /* NULL, 0, and FALSE are identical. */ while (*pcEnd) /* End-of-string and FALSE are identical. */ pcEnd++; return (size_t)(pcEnd - pcSrc); }
(Note the use of the cast operator within the calls of assert
to suppress splint
warnings.)
But you are not required to use those idioms. In fact, we recommend that you avoid the use of idioms that adversely affect understandability.
const
Keyword and Str_search
The use of the const
keyword within Str_search
is tricky, as this question-and-answer sequence indicates.
According to the man pages, the formal parameters of strstr
are of type const char*
. That implies that the formal parameters of Str_search
also should be of type const char*
. Why are they not of type char*
?
Suppose you were to define Str_search
like this:
char *Str_search(char *pcHaystack, char *pcNeedle) { ... }
Further suppose the client then calls Str_search
like this:
const char *pcBig = "hello"; const char *pcSmall = "lo";; ... ... Str_search(pcBig, pcSmall) ... ...
(Note that's a perfectly reasonable way to call the function.) In that case the compiler, noting that pcBig
is of type const char*
and that pcHaystack
is of type char*
, would generate a warning on the function call. Thus pcHaystack
(and pcNeedle
) should be of type const char*
.
According to the man pages, the return type of strstr
is char*
. That implies that the return type of Str_search
also should be of type char*
. Why is the return type not const char*
?
Suppose you were to define Str_search
like this:
const char *Str_search(const char *pcHaystack, const char *pcNeedle) { ... }
Further suppose the client then calls Str_search
like this:
char *pcBig = "hello"; char *pcSmall = "lo"; char *pc; ... pc = Str_search(pcBig, pcSmall); ...
(Note that's a perfectly reasonable way to call the function.) In that case the compiler, noting that pc
is of type char*
and that the value returned by Str_search
is of type const char*
, would generate a warning on the assignment statement. Thus the return type of Str_search
should be char*
and should not be const char*
.
Within the definition of Str_search
, I decided to define a local variable named pc
that "points into" the first of the two given strings, as indicated by this code:
char *Str_search(const char *pcHaystack, const char *pcNeedle) { ... pc = pcHaystack; ... /* Increment pc so it points to the appropriate character. */ ... return pc; }
If I define pc
to be of type char*
, then the assignment statement generates a warning. If I define pc
to be of type const char*
, then the return statement generates a warning. How can I resolve that problem?
Unfortunately, C provides no elegant solution. We recommend that you define pc
to be of type const char*
so the assignment statement is warningless. Then use a cast operator in the return statement:
return (char*)pc;
to explicitly inform the compiler to not generate a warning. C programmers refer to that solution as "casting away the constness" of the variable. Sadly, often that inelegant technique is unavoidable.
The assignment requires that you use pointer notation as much as possible in your strp.c
file. So, for example, you should define Str_copy
like this:
char *Str_copy(char *pcDest, const char *pcSrc) { ... }
The assignment requires that you use array notation as much as possible in your stra.c
file. So it would be logical to try to define your Str_copy
function something like this:
char[] Str_copy(char pcDest[], const char pcSrc[]) { ... }
Sadly, C disallows char[]
as a function return type. So in your stra.c
file, you will need to define Str_copy
in this hybrid way:
char *Str_copy(char pcDest[], const char pcSrc[]) { ... }
That same point applies to the definitions of several other functions as well.
strncpy
and strncat
The man pages for the strncpy
and strncat
functions are terse. Below are the descriptions of those functions from the book C: A Reference Manual by Harbison and Steele. They are more complete, and might be easier to understand than the man pages.
char *strncpy(char *dest, const char *src, size_t n);The functionstrncpy
copies exactlyn
character todest
. It first copies up ton
characters fromsrc
. If there are fewer thann
characters insrc
before the terminating null character, then null characters are written intodest
as padding until exactlyn
characters have been written. If there aren
or more characters insrc
, then onlyn
characters are copied, and so only a truncated copy ofsrc
is transferred todest
. If follows that the copy indest
is terminated with a null bystrncpy
only if the length ofsrc
(not counting the terminating null) is less thann
. If the value ofn
is zero ..., then callingstrncpy
function has no effect. The value ofdest
is always returned.
char *strncat(char *dest, const char *src, size_t n);Thestrncat
function appends up ton
characters from the contents ofsrc
to the end ofdest
. If the null character that terminatessrc
is encountered beforen
characters have been copied, then the null character is copied but no more. If no null character appears among the firstn
characters ofsrc
, then the firstn
characters are copied and a null character is supplied to terminate the destination string; that is,n+1
characters in all are written. If the value ofn
is zero ..., then callingstrncat
has no effect. The function always returnsdest
.
This assignment was written by Robert M. Dondero, Jr. with input from other faculty members