The purpose of this assignment is to help you learn (1) arrays and pointers in the C programming language, (2) how to create and use stateless modules in C, (3) the design by contract style of programming, and (4) how to use the GNU/Unix programming tools, especially bash
, emacs
, gcc217
, and gdb
.
Composing the two Str_search
functions (as described below) is the "extra challenge" part of this assignment. While doing the "extra challenge" part of the assignment, you are bound to observe the course policies regarding assignment conduct as given in the course Policies web page, plus one additional constraint: you may not use any "human" sources of information. That is, you may not consult with the course's staff members, the lab teaching assistants, other current students via Piazza, or any other people while working on the "extra challenge" part of an assignment, except for clarification of requirements.
The "extra challenge" part is worth 16 percent of this assignment. So if you don't do any of the "extra challenge" part and all other parts of your assignment solution are perfect, then your grade for the assignment will be 84 percent.
As you know, the C programming environment contains a standard library. The facilities provided in the standard library are declared in header files. One of those header files is string.h
; it contains the declarations of "string functions," that is, functions that perform operations on character strings. Chapter 13 and Appendix D of the King textbook and the Unix "man" pages describe the string functions.
Your task is to use C to compose a Str
module that contains versions of the most commonly used standard string functions. Specifically, design your Str
module so it contains these functions, each of which behaves the same as a corresponding standard C function:
Str Function
|
Standard C Function |
Str_getLength
|
strlen
|
Str_copy
|
strcpy
|
Str_concat
|
strcat
|
Str_compare
|
strcmp
|
Str_search
|
strstr
|
Design each function definition so it calls the assert
macro to validate the function's parameters. Specifically, design each function definition to assert that each formal parameter is not NULL
.
Don't add any other calls of assert
to your code. But consider whether it would be possible to do so. In particular, provide an answer to this question in your readme
file:
Is it possible forStr_copy
orStr_concat
to callassert
to verify that the destination memory area specified by the caller is large enough? Explain.
Create two implementations of your Str
module. Place the first implementation in a file named stra.c
. Design the function definitions in stra.c
such that they use array notation as much as possible, and traverse each given string using an index relative to the beginning of the string. For example, in stra.c
you might define the Str_getLength
function like this:
size_t Str_getLength(const char pcSrc[]) { size_t uiLength = 0U; assert(pcSrc != NULL); while (pcSrc[uiLength] != '\0') uiLength++; return uiLength; }
Note that the type of uiLength
is size_t
. The type size_t
is defined in the standard header file stddef.h
. It is a system-dependent unsigned integral type that is large enough to hold the length of any string. Typically it is defined to be identical to either unsigned int
or unsigned long
. On nobel, it is identical to unsigned int
. Several of the standard string functions use type size_t
, and so several of your functions should use it too.
Note that the initial value of uiLength
is 0U
. The U
suffix indicates that the literal is of type unsigned int
instead of int
. Simply initializing uiLength
to 0
would work because C allows initialization of an unsigned int
variable with an int
literal. However, as a matter of style, we suggest that you avoid mixing types within expressions.
Place the second implementation in a file named strp.c
. Design the function definitions in strp.c
such that they use pointer notation as much as possible, and traverse each given string using an incremented pointer -- not using an index relative to the beginning of the string. For example, in strp.c
you might define the Str_getLength
function like this:
size_t Str_getLength(const char *pcSrc) { const char *pcEnd; assert(pcSrc != NULL); pcEnd = pcSrc; while (*pcEnd != '\0') pcEnd++; return (size_t)(pcEnd - pcSrc); }
It is unacceptable to create the strp.c
implementation from the stra.c
implementation simply by replacing each occurrence of a[i]
with *(a+i)
. For example, in strp.c
this definition of Str_getLength
is unacceptable:
size_t Str_getLength(const char *pcSrc) /* unacceptable */ { /* unacceptable */ size_t uiLength = 0U; /* unacceptable */ assert(pcSrc != NULL); /* unacceptable */ while (*(pcSrc + uiLength) != '\0') /* unacceptable */ uiLength++; /* unacceptable */ return uiLength; /* unacceptable */ } /* unacceptable */
Note that the function traverses the given string by incrementing the integer uiLength
, not by incrementing a pointer. Also note that the function effectively uses uiLength
as an index relative to the beginning of the string pcStr
.
Place the Str
module's interface in a file named str.h
. You may use either array or pointer notation in the interface. The two notations are equivalent to the compiler, so both implementations match the one interface. Use the #ifndef...#define...#endif
construct to protect your str.h
file against accidental multiple inclusion.
This assignment does not focus on efficiency. Nevertheless, your functions should not be grossly inefficient. For example, a function is grossly inefficient if it traverses a (potentially long) string multiple times when a single traversal would suffice.
Design your Str
functions so they do not call any of the standard string functions. In the context of this assignment, pretend that the standard string functions do not exist. However your functions may call each other, and you may define additional (non-interface) functions.
Beware of type mismatches. In particular, beware of the difference between type size_t
and type int
: a variable of type size_t
can store larger integers than a variable of type int
can. Your functions should (in principle) be able to handle strings whose lengths exceed the capacity of type int
. Also beware of type mismatches related to the use of the const
keyword. Using the splint
tool, as described below, will help you detect type mismatches.
In your assignment solution you may use any of the definitions of the Str_getLength
function given in this assignment specification.
A client that you can use to test your Str
module is available on nobel in the file /u/cos217/Assignment2/teststr.c
.
In days gone by, web designers used the <i>
(italics) and <b>
(bold) tags to inform page rendering software -- typically web browsers -- that designated text is "important" or "very important" respectively. These days many web designers view such tags as strictly presentation elements; they believe that we should not use the <i>
and <b>
tags to indicate structural information such as level of importance. (Consider the fact that "italics" and "bold" are not meaningful when a web page is rendered by a speech synthesizer instead of a browser.)
Instead they believe we should use the structural tags <em>
(emphasis) and <strong>
(strong emphasis) to inform page rendering software -- web browsers or speech synthesizers -- that designated text is "important" or "very important" respectively. So many web designers are faced with the task of editing legacy web pages to change <i>
tags to <em>
tags and <b>
tags to <strong>
tags.
Your job in this part of the assignment is to compose a C program named fixhtml
that performs such edits. More specifically, you are given a partial program named fixhtml.c
in the /u/cos217/Assignment2
directory on nobel. Your job is to copy the file to your working directory, and edit the file to make the program complete.
Doing so, in part, will involve completing the definitions of replaceFirst
and replaceAll
functions. Define those functions so they call the assert
macro to make sure that each formal parameter is not NULL
. Don't be concerned about the context of the replacements; for example, define your functions to perform replacements even if the text to be replaced occurs within quotes, comments, etc.
Design your program so it is a client of your Str
module. That is, design your program so it calls functions declared in your str.h
file and defined in either your stra.c
file or your strp.
c file in meaningful ways, whenever reasonably possible, to perform its task. It's fine for your program to do low-level string manipulation, of the kind found within the definitions of your Str
functions, in places where it would be awkward to call your Str
functions. In such low-level string-manipulation code you may use either array notation or pointer notation.
If and only if your Str
functions are not working, then design your program so it calls the standard C string-handling functions declared in string.h
instead of your Str
functions. In that case don't forget to change #include "str.h"
to #include <string.h>
near the beginning of the program.
The nobel /u/cos217/Assignment2
directory contains an executable binary file named samplefixhtml
. Your fixhtml
program should have the same behavior as samplefixhtml
. That is, when given the same input as samplefixhtml
, your program should write exactly the same data to stdout
and stderr
as samplefixhtml
does.
Test your fixhtml
program thoroughly by running it on many input files. The /u/cos217/Assignment2
directory contains some .txt
files that you can use as input to fixhtml
. That directory also contains scripts named testfixhml
and testfixhtmldiff
that automate the testing process.
Do your work on the nobel cluster using bash
, emacs
, gcc217
, and gdb
.
Critique your code using the splint
tool, as described in the next section of this document. Each time splint
generates a warning on your code, you should either (1) edit your code to eliminate the warning, or (2) explain your disagreement with the warning in your readme
file.
Similarly, critique your code using the critTer
tool, as described in the next section of this document. Each time critTer
generates a warning on your code, you should either (1) edit your code to eliminate the warning, or (2) explain your disagreement with the warning in your readme
file.
Create a readme
file by copying the file /u/cos217/Assignment2/readme
to your project directory, and editing the copy by replacing each area marked "<Complete this section.>" as appropriate.
Use the gcc217
command to preprocess, compile, assemble, and link your programs. Perform each step individually, and examine the intermediate results to the extent possible.
Submit all source code files, the files that gcc217
generated from them, and your readme
file electronically on nobel using these commands:
submit 2 readme submit 2 str.h submit 2 stra.c stra.i stra.s stra.o submit 2 strp.c strp.i strp.s strp.o submit 2 teststr.i teststr.s teststr.o submit 2 teststra teststrp submit 2 fixhtml.c fixhtml.i fixhtml.s fixhtml.o submit 2 fixhtml
where:
teststra
is an executable file built using teststr.o
and stra.o
.
teststrp
is an executable file built using teststr.o
and strp.o
.
fixhtml
is an executable binary file built using fixhtml.o
and either stra.o
or strp.o
.
splint
and critTer
splint
is a high-powered code analysis tool developed by researchers at the University of Virginia. critTer
(critique from the terminal) is a more COS 217-specific code analysis tool that was developed by Erin Rosenbaum '11, a former COS 217 student. Both are available on the nobel cluster.
Execute splint
and critTer
after your programs build cleanly. splint
and critTer
write to stdout
a list of stylistic flaws that they find in your source code. Some of the warnings may be a little cryptic; feel free to ask your preceptor about them as the need arises.
Assuming that you've copied the .splintrc
file from the /u/cos217
directory to your HOME
directory (as described in the document entitled A Minimal COS 217 Computing Environment from the first precept), using splint
is easy. At the bash
prompt, issue this command:
splint allsourcecodefilescomprisingyourprogram
For example, issue these commands to use splint
for this assignment:
splint teststr.c stra.c splint teststr.c strp.c splint fixhtml.c stra.c splint fixhtml.c strp.c
Using critTer
is even easier. At the bash
prompt, issue this command:
critTer anysourcecodefile
For example, issue these commands to use critTer
for this assignment:
critTer stra.c critTer strp.c critTer fixhtml.c
Think of splint
(with the given .splintrc
file) and critTer
as defining coding standards for the COS 217 course. Critique your code using those tools, and edit your code to eliminate all splint
and critTer
warnings before you submit.
splint
and critTer
are aggressive; some of the warnings that they write might not indicate true stylistic errors in your code. If you disagree with a splint
or critTer
warning, then copy the warning to your readme
file, and explain your disagreement with it in your readme
file. Also feel free to discuss the matter with your preceptor before you submit.
If you would like to learn more about splint
, see www.splint.org.
We will grade your work on quality from the user's and programmer's points of view. From the user's point of view, a program has quality if it behaves as it should. The correct behavior of the Str
module is defined by the previous sections of this assignment specification, and by the C90 specifications of the corresponding string.h
functions. The correct behavior of the fixhtml
program is defined by the previous sections of this assignment specification, and by the given samplefixhtml
program.
From the programmer's point of view, a program has quality if it is well styled and thereby easy to maintain. In part, good style is defined by the splint
and critTer
tools as noted above, and by the rules given in The Practice of Programming (Kernighan and Pike) as summarized by the Rules of Programming Style document.
The more course-specific style rules listed in the previous assignment specification also apply, as do these:
Modularity: Divide your program into logically coherent modules. Each module should have an interface (.h) file and an implementation (.c) file.
Modularity: A module's interface should be minimal; it should reveal only those aspects of the module that clients must see. In particular, A module's interface should declare only those functions that clients must see. Other functions should be defined as "static" in the implementation.
Parameter Validation: Validate function parameters via asserts whenever possible.
Global Variables: If you must define a global variable in a multi-file program, then make it static
if you can and const
if you can.
Function Comments: A function's comment should appear in both the interface (.h) file for the sake of the clients of the function and the implementation (.c) file for the sake of the maintainers of the function. For example, this is an appropriate way to comment the Str_getLength
function:
In file str.h
:
... /* Return the length of string pcSrc. */ size_t Str_getLength(const char pcSrc[]); ...
In file strp.c
:
... /* Return the length of string pcSrc. */ size_t Str_getLength(const char *pcSrc) { const char *pcEnd; assert(pcSrc != NULL); pcEnd = pcSrc; while (*pcEnd != '\0') pcEnd++; return (size_t)(pcEnd - pcSrc); } ...
Note that the comment explicitly states what the function returns and explicitly refers to the function's parameter (pcSrc
).
To encourage good coding practices, we will deduct points if gcc217
generates warning messages.
C programmers sometimes use idioms that rely on the fact that the null character ('\0'), the NULL
address, the integer 0, and the logical concept FALSE have the same representation. You may use those idioms. For example, you may define Str_getLength
like this:
size_t Str_getLength(const char pcSrc[]) { size_t uiLength = 0U; assert((int)pcSrc); /* NULL address, 0, and FALSE are identical. */ while (pcSrc[uiLength]) /* null character and FALSE are identical. */ uiLength++; return uiLength; }
or like this:
size_t Str_getLength(const char *pcSrc) { const char *pcEnd; assert((int)pcSrc); /* NULL address, 0, and FALSE are identical. */ pcEnd = pcSrc; while (*pcEnd) /* null character and FALSE are identical. */ pcEnd++; return (size_t)(pcEnd - pcSrc); }
(Note the use of the cast operator within the calls of assert
to suppress splint
warnings.)
But you are not required to use those idioms. In fact, we recommend that you avoid the use of idioms that adversely affect understandability.
const
Keyword and Str_search
The use of the const
keyword within Str_search
is tricky, as this question-and-answer sequence indicates.
According to the man pages, the formal parameters of strstr
are of type const char*
. That implies that the formal parameters of Str_search
also should be of type const char*
. Why are they not of type char*
?
Suppose you were to define Str_search
like this:
char *Str_search(char *pcHaystack, char *pcNeedle) { ... }
Further suppose the client then calls Str_search
like this:
const char *pcBig = "hello"; const char *pcSmall = "lo";; ... ... Str_search(pcBig, pcSmall) ... ...
(Note that's a perfectly reasonable way to call the function.) In that case the compiler, noting that pcBig
is of type const char*
and that pcHaystack
is of type char*
, would generate a warning on the function call. Thus pcHaystack
(and pcNeedle
) should be of type const char*
.
According to the man pages, the return type of strstr
is char*
. That implies that the return type of Str_search
also should be of type char*
. Why is the return type not const char*
?
Suppose you were to define Str_search
like this:
const char *Str_search(const char *pcHaystack, const char *pcNeedle) { ... }
Further suppose the client then calls Str_search
like this:
char *pcBig = "hello"; char *pcSmall = "lo"; char *pc; ... pc = Str_search(pcBig, pcSmall); ...
(Note that's a perfectly reasonable way to call the function.) In that case the compiler, noting that pc
is of type char*
and that the value returned by Str_search
is of type const char*
, would generate a warning on the assignment statement. Thus the return type of Str_search
should be char*
and should not be const char*
.
Within the definition of Str_search
, I decided to define a local variable named pc
that "points into" the first of the two given strings, as indicated by this code:
char *Str_search(const char *pcHaystack, const char *pcNeedle) { ... pc = pcHaystack; ... /* Increment pc so it points to the appropriate character. */ ... return pc; }
If I define pc
to be of type char*
, then the assignment statement generates a warning. If I define pc
to be of type const char*
, then the return
statement generates a warning. How can I resolve that problem?
Unfortunately, C provides no elegant solution. We recommend that you define pc
to be of type const char*
so the assignment statement is warningless. Then use a cast operator in the return
statement:
return (char*)pc;
to explicitly inform the compiler to not generate a warning. C programmers refer to that solution as "casting away the constness" of the variable. Sadly, often that inelegant technique is unavoidable.
The assignment requires that you use pointer notation as much as possible in your strp.c
file. So, for example, you should define Str_copy
like this:
char *Str_copy(char *pcDest, const char *pcSrc) { ... }
The assignment requires that you use array notation as much as possible in your stra.c
file. So it would be logical to try to define your Str_copy
function something like this:
char[] Str_copy(char pcDest[], const char pcSrc[]) { ... }
Sadly, C disallows char[]
as a function return type. So in your stra.c
file, you will need to define Str_copy
in this hybrid way:
char *Str_copy(char pcDest[], const char pcSrc[]) { ... }
That same point applies to the definitions of other functions as well.
critTer
Warnings on teststr.c
When given the teststr.c
file, critTer
generates warnings about the use of "magic numbers," the length of some functions, and the length of the file. Don't be concerned about those warnings. You need not address them in your readme
file. We judged that using magic numbers and defining long functions in teststr.c
was clearer than the alternative, and that splitting teststr.c
into multiple files would have complicated the build process unnecessarily.
This assignment was written
by Robert M. Dondero, Jr.
with input from other faculty members