The purpose of this assignment is to help you learn or review (1) the fundamentals of the C programming language, (2) the details of the "de-commenting" task of the C preprocessor, and (3) how to use the GNU/Unix programming tools, especially bash
, emacs
, and gcc217
.
Make sure you study the course Policies web page before doing this assignment or any of the COS 217 assignments. In particular, note that you may use a variety of "human" sources of information while doing assignments, including the course staff members, the lab teaching assistants, and other current students via Piazza.
Each assignment has a "extra challenge" part. While doing the "extra challenge" part of an assignment, you are bound to observe the course policies regarding assignment conduct as given in the course Policies web page, plus one additional policy: you may not use any "human" sources of information. That is, you may not consult with the course's staff members, the lab teaching assistants, other current students via Piazza, or any other people while working on the "extra challenge" part of an assignment, except for clarification of requirements.
The "extra challenge" part is designed to be a kind of relief valve: if you find that an assignment is taking too much time, then you might consider not doing the assignment's "extra challenge" part -- thereby accepting a (typically small) penalty on your grade for that assignment.
Usually the "extra challenge" part is, as its name implies, more challenging than the rest of the assignment. Sometimes it involves material from the course's readings that hasn't (yet) been covered in lectures or precepts.
For this assignment, avoiding the use of global variables (as described below) is the "extra challenge" part. That part is worth 5 percent of this assignment. So if you don't do the "extra challenge" part and all other parts of your assignment solution are perfect and submitted on time, then your grade for the assignment will be 95 percent. In subsequent assignments the "extra challenge" part will be worth more.
The C preprocessor is an important part of the C programming system. Given a C source code file, the C preprocessor performs three jobs:
Merge physical lines of source code into logical lines. That is, when the preprocessor detects a line that ends with the backslash character, it merges that physical line with the next physical line to form one logical line. More precisely, if the preprocessor detects a backslash character immediately followed by a newline character, then it simply removes both characters.
Remove comments from ("de-comment") the source code.
Handle preprocessor directives (#define
, #include
, etc.) that reside in the source code.
The second of those jobs -- the de-comment job -- is more substantial than you might think. For example, when de-commenting a program the C preprocessor must be sensitive to:
The fact that a comment is a token delimiter. After removing a comment, the C preprocessor must make sure that a white space character is in its place.
Line numbers. After removing a comment, the C preprocessor sometimes must insert blank lines in its place to preserve the original line numbering.
String and character literal boundaries. The preprocessor must not consider the character sequence (/*...*/
) to be a comment if it occurs inside a string literal ("..."
) or character literal ('...'
).
Your task is to compose a C program named decomment
that performs a subset of the de-comment job of the C preprocessor, as defined below.
Your program should be a Unix filter. A filter is a program that reads characters from the standard input stream, and writes characters to the standard output stream and possibly to the standard error stream. Specifically, your program should (1) read text, presumably a C program, from the standard input stream, (2) write that same text to the standard output stream with each comment replaced by a space, and (3) write error and warning messages as appropriate to the standard error stream. A typical execution of your program from the shell might look like this:
decomment < somefile.c > somefileWithoutComments.c 2> errorsAndWarnings
In the following examples a space character is shown as "s
" and a newline character as "n
".
Your program should replace each comment with a space. Examples:
Standard Input Stream | Standard Output Stream | Standard Error Stream |
abc/*def*/ghin
|
abcsghin
|
|
abc/*def*/sghin
|
abcssghin
|
|
abcs/*def*/ghin
|
abcssghin
|
Your program should define "comment" as in the C90 standard. In particular, your program should consider text of the form (/*...*/
) to be a comment. It should not consider text of the form (//...
) to be a comment. Example:
Standard Input Stream | Standard Output Stream | Standard Error Stream |
abc//defn
|
abc//defn
|
Your program should allow a comment to span multiple lines. That is, your program should allow a comment to contain newline characters. Your program should add blank lines as necessary to preserve the original line numbering. Examples:
Standard Input Stream | Standard Output Stream | Standard Error Stream |
abc/*defnghi*/jklnmnon
|
abcsnjklnmnon
|
|
abc/*defnghinjkl*/mnonpqrn
|
abcsnnmnonpqrn
|
Your program should not recognize nested comments. Example:
Standard Input Stream | Standard Output Stream | Standard Error Stream |
abc/*def/*ghi*/jkl*/mnon
|
abcsjkl*/mnon
|
Your program should handle C string literals. In particular, your program should not consider text of the form (/*...*/
) that occurs within a string literal ("..."
) to be a comment. Examples:
Standard Input Stream | Standard Output Stream | Standard Error Stream |
abc"def/*ghi*/jkl"mnon
|
abc"def/*ghi*/jkl"mnon
|
|
abc/*def"ghi"jkl*/mnon
|
abcsmnon
|
|
abc/*def"ghijkl*/mnon
|
abcsmnon
|
Similarly, your program should handle C character literals. In particular, your program should not consider text of the form (/*...*/
) that occurs within a character literal ('...'
) to be a comment. Examples:
Standard Input Stream | Standard Output Stream | Standard Error Stream |
abc'def/*ghi*/jkl'mnon
|
abc'def/*ghi*/jkl'mnon
|
|
abc/*def'ghi'jkl*/mnon
|
abcsmnon
|
|
abc/*def'ghijkl*/mnon
|
abcsmnon
|
Note that the C compiler would consider the first of those examples to be erroneous (multiple characters in a character literal). But many C preprocessors would not, and your program should not.
Your program should handle escaped characters within string literals. That is, when your program reads a backslash (\
) while processing a string literal, your program should consider the next character to be an ordinary character that is devoid of any special meaning. In particular, your program should consider text of the form ("...\"..."
) to be a valid string literal which happens to contain the double quote character. Examples:
Standard Input Stream | Standard Output Stream | Standard Error Stream |
abc"def\"ghi"jkln
|
abc"def\"ghi"jkln
|
|
abc"def\'ghi"jkln
|
abc"def\'ghi"jkln
|
Similarly, your program should handle escaped characters within character literals. That is, when your program reads a backslash (\
) while processing a character literal, your program should consider the next character to be an ordinary character that is devoid of any special meaning. In particular, your program should consider text of the form ('...\'...'
) to be a valid character literal which happens to contain the quote character. Examples:
Standard Input Stream | Standard Output Stream | Standard Error Stream |
abc'def\'ghi'jkln
|
abc'def\'ghi'jkln
|
|
abc'def\"ghi'jkln
|
abc'def\"ghi'jkln
|
Note that the C compiler would consider both of those examples to be erroneous (multiple characters in a character literal). But many C preprocessors would not, and your program should not.
Your program should handle newline characters in C string literals without generating errors or warnings. Examples:
Standard Input Stream | Standard Output Stream | Standard Error Stream |
abc"defnghi"jkln
|
abc"defnghi"jkln
|
|
abc"defnghinjkl"mno/*pqr*/stun
|
abc"defnghinjkl"mnosstun
|
Note that a C compiler would consider those examples to be erroneous (newline character in a string literal). But many C preprocessors would not, and your program should not.
Similarly, your program should handle newline characters in C character literals without generating errors or warnings. Examples:
Standard Input Stream | Standard Output Stream | Standard Error Stream |
abc'defnghi'jkln
|
abc'defnghi'jkln
|
|
abc'defnghinjkl'mno/*pqr*/stun
|
abc'defnghinjkl'mnosstun
|
Note that a C compiler would consider those examples to be erroneous (multiple characters in a character literal, newline character in a character literal). But many C preprocessors would not, and your program should not.
Your program should handle unterminated string and character literals without generating errors or warnings. Examples:
Standard Input Stream | Standard Output Stream | Standard Error Stream |
abc"def/*ghi*/jkln
|
abc"def/*ghi*/jkln
|
|
abc'def/*ghi*/jkln
|
abc'def/*ghi*/jkln
|
Note that a C compiler would consider those examples to be erroneous (unterminated string literal, unterminated character literal, multiple characters in a character literal). But many C preprocessors would not, and your program should not.
Your program should detect an unterminated comment. If your program detects end-of-file before a comment is terminated, it should write the message "Error: line X: unterminated comment" to the standard error stream. "X" should be the number of the line on which the unterminated comment begins. Examples:
Standard Input Stream | Standard Output Stream | Standard Error Stream |
abc/*defnghin
|
abcsnn
|
Error:slines1:sunterminatedscommentn
|
abcdefnghi/*n
|
abcdefnghisn
|
Error:slines2:sunterminatedscommentn
|
abc/*def/ghinjkln
|
abcsnn
|
Error:slines1:sunterminatedscommentn
|
abc/*def*ghinjkln
|
abcsnn
|
Error:slines1:sunterminatedscommentn
|
abc/*defnghi*n
|
abcsnn
|
Error:slines1:sunterminatedscommentn
|
abc/*defnghi/n
|
abcsnn
|
Error:slines1:sunterminatedscommentn
|
Your program (more precisely, its main
function) should return EXIT_FAILURE
if it is unsuccessful, that is, if it detects an unterminated comment and so is unable to remove comments properly. Otherwise it should return EXIT_SUCCESS
or, equivalently, 0.
Your program should work for standard input lines of any length.
Your program may assume that the backslash-newline character sequence does not occur in the standard input stream. That is, your program may assume that logical lines are identical to physical lines in the standard input stream. So your de-comment program need not perform the first of the three jobs described above in the "Background" section.
Create your program on the nobel cluster using bash
, emacs
, and gcc217
.
Design a deterministic finite state automaton (DFA, alias FSA) that expresses the required de-commenting logic. The DFA concept is described in lectures, and in this Web page written by Professors Sedgewick and Wayne: http://www.cs.princeton.edu/introcs/java/73dfa/.
Design your DFA so it "accepts" a given sequence of characters if the sequence contains no unterminated comments. That is, when given a sequence of characters that does not contain an unterminated comment, your DFA should end in an "accepting" state. Conversely design your DFA so it "rejects" a given sequence of characters if the sequence contains an unterminated comment. That is, when given a sequence of character that contains an unterminated comment, your DFA should end in a "rejecting" state.
Express your DFA using the traditional "labeled ovals and labeled arrows" notation. Let each oval represent a state. Give each state a descriptive name, and indicate whether it is an "accepting" state or a "rejecting" state. Let each arrow represent a transition from one state to another. Label each arrow with the single character, or class of characters, that causes the transition to occur. We encourage (but do not require) you also to label each arrow with action(s) that should occur (e.g. "print the character") when the corresponding transition occurs.
Express as much of the de-commenting logic as you can within your DFA. The more de-commenting logic you express in your DFA, the better your grade on the DFA will be.
(To properly report unterminated comments, your program must contain logic to keep track of the current line number of the standard input stream. You need not show that logic in your DFA.)
Convert your "labeled ovals and labeled arrows" DFA to a textual representation, placing the result in a file named dfa
. For example, the last DFA in the Sedgewick and Wayne Web page could be expressed as this Textual DFA. Make sure you indicate explicitly which state is the DFA's start state, and whether each state is an accepting state or a rejecting state. The name of the file should be dfa
, not dfa.txt
, not DFA
, not Dfa
, etc.
Use emacs
to create source code in a file named decomment.c
that implements your DFA.
If your DFA exits while in an "accepting" state, then your program's exit status should be EXIT_SUCCESS
or 0. If your DFA exits while in a "rejecting" state, then your program's exit status should be EXIT_FAILURE
. In other words, if your program detects no unterminated comments, then its exit status should be EXIT_SUCCESS
or 0; if your program detects an unterminated comment, then its exit status should be EXIT_FAILURE
.
Your program should not consist of one large main
function. Instead your program should consist of multiple small functions, each of which performs a single well-defined task. In this program you should create one function to implement each state of your DFA, as described in lectures.
Generally, a (large) C program should consist of multiple source code files. For this assignment, you need not split your source code into multiple files. Instead, place all source code in a single source code file. Subsequent assignments will ask you to write programs consisting of multiple source code files.
We suggest that your program use the standard C getchar
function to read characters from the standard input stream.
Use the gcc217
command to preprocess, compile, assemble, and link your program. Perform each step individually, and examine the intermediate results to the extent possible.
Execute your program multiple times on various input files that test all logical paths through your code.
We have provided files in nobel directory /u/cos217/Assignment1
that you will find helpful:
sampledecomment
is an executable version of a correct assignment solution. Your program should write exactly (character for character) the same data to the standard output stream and the standard error stream as sampledecomment
does. You should test your program using commands similar to these:
sampledecomment < somefile.c > output1 2> errors1 decomment < somefile.c > output2 2> errors2 diff -c output1 output2 diff -c errors1 errors2 rm output1 errors1 output2 errors2
The Unix diff
command finds differences between two given files. The executions of the diff
command shown above should produce no output. If the command diff -c output1 output2
produces output, then sampledecomment
and your program have written different characters to the standard output stream. Similarly, if the command diff -c errors1 errors2
produces output, then sampledecomment
and your program have written different characters to the standard error stream.
Many .txt
files (that is, files whose names end with .txt
) can serve as input files to your program.
testdecomment
and testdecommentdiff
are bash
scripts that automate the testing process. Comments at the beginning of those files describe how to use them. After copying the scripts to your project directory, you may need to execute the commands chmod 700 testdecomment
and chmod 700 testdecommentdiff
to give them "executable" permissions.
Copy those files to your project directory, and use them to help you test your program.
You also should test your program against its own source code using a command sequence such as this:
sampledecomment < decomment.c > output1 2> errors1 decomment < decomment.c > output2 2> errors2 diff -c output1 output2 diff -c errors1 errors2 rm output1 errors1 output2 errors2
readme
FileCopy the file /u/cos217/Assignment1/readme
to your project directory, thus creating a file named readme
in your project directory. Then use emacs
to edit your readme
file, replacing each area marked "<Complete this section.>" as appropriate.
Descriptions of your code should not be in the readme
file. Instead they should be integrated into your code as comments.
Your readme
file should be a plain text file. Don't create your readme
file using Microsoft Word or any other word processor. The name of the file should be readme
, not readme.txt
, not README
, not Readme
, etc.
Submit your work. Submit your dfa
file, your decomment.c
file, the files that gcc217
generated from it, and your readme
file electronically by issuing these commands on nobel:
submit 1 dfa submit 1 decomment.c submit 1 decomment.i decomment.s decomment.o decomment submit 1 readme
We can accept your files only if you submit them by executing submit
commands on nobel. In particular, we cannot accept your files via e-mail. We cannot accept your DFA in any form other than as a file containing plain text.
We will grade your work on two kinds of quality: quality from the user's point of view, and quality from the programmer's point of view. From the user's point of view, a program has quality if it behaves as it should. The correct behavior of your program is defined by the previous sections of this assignment specification, and by the behavior of the given sampledecomment
program.
From the programmer's point of view, a program has quality if it is well styled and thereby easy to maintain. In part, good style is defined by the rules given in The Practice of Programming (Kernighan and Pike), as summarized by the Rules of Programming Style document. For this assignment we will pay particular attention to rules 1-24.
These additional more course-specific rules apply:
Indentation: Indent using spaces, not tabs. emacs
does that when using the .emacs
configuration file that we have provided.
Line lengths: Limit line lengths in your source code to 72 characters. Doing that allows us to print your code in two columns, and so saves paper. When using the .emacs configuration file that we have provided, emacs
indicates lines that exceed 72 characters. Specifically, emacs
uses a green background to mark the character in column 72, and a gray background to mark the character in column 73. So it's OK to see a green background, but a gray background indicates that the line is too long.
Names: Use a clear and consistent style for variable and function names. One example of such a style is to prefix each variable name with characters that indicate its type. For example, the prefix c
might indicate that the variable is of type char
, i
might indicate int
, pc
might mean char*
, ui
might mean unsigned int
, etc. But it is fine to use another style -- a style that does not include the type of a variable in its name -- as long as the result is a clear and readable program.
File Comments: Begin each source code file with a comment that includes your name, the number of the assignment, and the name of the file.
Variable Comments: Compose a comment for each global variable. The comment should appear immediately before the definition/declaration of the global variable. Compose a comment for each local variable definition/declaration whose purpose is unclear. The comment should appear immediately before the definition/declaration of the local variable, or at the end of the line containing the definition/declaration.
Function Comments: Compose a comment for each function. A function definition/declaration's comment should immediately precede the function's definition/declaration.
A function's comment should describe what the function does from the point of view of the function's callers. A function's comment should refer explicitly to the function's parameters (by name) and the function's return value. A function's comment should state what, if anything, the function reads from standard input or any other stream, and what, if anything, the function writes to standard output, standard error, or any other stream. In short, a function's comment should describe the flow of data into and out of the function.
A function's comment should not describe how the function works. If a programmer cannot reasonably determine how the function works by reading the function's definition, then either (1) rewrite the function definition so it is clearer or, if that is impossible, (2) add some local comments (that is, comments with the function definition) to explain the code.
To encourage good coding practices, we will deduct points if gcc217
generates warning messages.
Suppose a program contains functions f
and g
, and that f
calls g
. Further suppose that f
wishes to pass some values to g
. f
can do that using ordinary parameters.
Now suppose g
wishes to pass some values back to f
, its caller. g
can pass the first value back via its return value. But how can g
pass additional values back to f
?
One approach is to use global variables, where a global variable is one which is defined outside of all functions. g
could assign the additional values to global variables; f
then could fetch those values by accessing the global variables.
However, as prescribed by Kernighan and Pike style rule 25, and for reasons that we will discuss later in the course, generally you should avoid using global variables. Instead all communication of values into and out of a function should occur via the function's parameters and its return value.
Indeed in your decomment
program you should avoid using global variables. You can do that using either of these two approaches:
Express the program's logic such that no function must pass more than one value back to its caller. You can do that by enhancing the logic in your main
function.
Use pointer parameters, as described in Section 11.4 of our King book.
Some notes:
The COS 217 course has not covered pointers yet. So you'll need to work ahead of the pace of the course if you use the second approach.
Although you should avoid defining variables at the global level, it's fine to define data types at the global level. For example, code of this form:
enum SomeEnum {SOMEVALUE1, SOMEVALUE2, ...};
at the global level is fine.
Avoiding global variables is worth only 5 percent of this assignment. We will not think less of you if you decide to accept the small deduction instead of doing that part of the assignment.
This assignment was written by Robert M. Dondero, Jr.
with input from many other faculty members