COS 333: Assignment 2

COS 333 Assignment 2: Feeping Creaturism

Due Thursday February 21 at 10:00 pm

Mon Feb 11 09:30:59 EST 2019

One of the most common experiences for a programmer coming into a new job or working with some open-source code is to have to fix a bug or make a small change in a large unfamiliar program. This requires the ability to quickly find the relevant parts of the program and change them in a minimal way, while ignoring irrelevant parts and being sure not to break anything.

This assignment is an exercise in working with a program that we have talked about in class, but whose innards are almost certainly unfamiliar. It also involves

scripting language design and implementation
the basics of how compilers and interpreters work
working with and extending existing code
fetching source from Github
the patch mechanism
testing, little languages, shell programming

To get started, become familiar enough with Awk that you know in broad outline what it does; this Awk help file might help. Then clone the source from Github at https://github.com/onetrueawk/awk, skim the source code to see how it is implemented, and be sure that you can build it and test it before you make any changes. (You may have to fiddle the makefile to use yacc instead of bison; it's a 1-character change.)

Warning: Most of the assignments in this course are best done on Linux or MacOSX, and that is especially true of this one. If you are a Windows user, it is strongly recommended that you use ssh, putty or MobaXterm to log into a CS cluster like cycles or an OIT cluster like nobel, or use a virtual Linux machine. The alternative is likely to be a great deal of beating your head against lots of walls, and it's just not worth it. (For general use, Cygwin is highly recommended; it makes Windows more convenient for programmers familiar with Unix. You might also try WSL, the Windows Subsystem for Linux, a compatibility layer that lets you run Linux binaries on Windows.)

1. What to do

Your specific tasks are to

Add a new variant of the for loop that is loosely based on the indexed loops in languages like Fortran and Basic:
```
for (var from expr1 to expr2 by expr3) {
    statements
}
```
repeats the statements in the body with values of var running from expr1 to expr2 inclusive in steps of expr3. (The klunky syntax is so that it will fit smoothly into the existing grammar, which is fragile.) The by clause is optional; if omitted, the step size is 1. Negative values of by are legal. The values of the expressions are fixed when the statement begins; statements within the loop do not affect the limits or the step size, and changes to the index variable within the loop do not affect the iterations either. Other than that, the behavior should be the same as
```
for (var = expr1; var <= expr2; var += expr3) {
    statements
}
```
(with >= instead if expr3 is negative). Thus, for example,
```
for (i from 1 to 10) { print i }
```
prints 1 through 10, while
```
for (i from 1 to 10 by 2) { i = 0; print i }
```
prints 0 five times. (This is the simplest of several plausible designs; please implement exactly this even if you think something else is more sensible.)
Add a new built-in function trim(s) that returns a copy of the string s with all leading and trailing white space removed. Use ctype.h to test for white space.
Fix a bug. This Awk doesn't complain if you try to read a directory, and arguably it should. Modify the program so that trying to read a directory as an input file is an immediate fatal error. The error message must include exactly this string:
filename is a directory
so we can test your code mechanically. For example, the program
a.out '{print}' .
should produce the error message a.out: . is a directory.
Create enough test cases to exercise your code thoroughly.

2. How to do it

For this assignment, you can get a very long way merely by finding code that already does more or less what's needed and is already in the right place and right form; most of the job amounts to intelligent copy and paste, using grep and your favorite text editor to locate places that might be relevant.

For the new kind of for statement, the Awk grammar needs two new rules (parallel to existing rules for for) that specify its syntax and create the right kind of node in the parse tree. Use the same syntactic category for expressions as in the existing for statement.

Lexical analysis has to recognize the three new keywords from, to and by, and their syntactic categories have to be added to the grammar. A new function has to be written to provide the semantics of the new for statement; it will be similar to the functions that handle the two existing for loops.

You also have to recognize the name of the new built-in function trim as part of lexical analysis and add semantics for it.

For detecting directory access, there are no grammar changes but you have to add one function and direct all file opens to it. Use the stat system call to determine whether a file is a directory; it's fine to search online for how to use this system call. Note that stat itself can return an error; that should be treated the same as a failure to access the specified file.

For the last part, testing, create a shell file awktests.sh that contains at least a dozen tests that ensure that your features are thoroughly tested. The file awktests.sh must be self-contained, requiring no input from a user and generating its own test data somehow. It should produce only one line per test if the test works, and one additional line per failure, of the form

Error: test N failed [further text...]

if a test fails. It should assume that the program being tested is named a.out and is in the current directory. For example, this file contains two minimal tests of the new for:

#!/bin/bash

echo 'test 1: count from 1 to 10'
a.out '
BEGIN {
	for (i from 1 to 10)
		print i
}'>temp1
for i in `seq 1 10`; do echo $i; done >temp2
cmp temp1 temp2 >/dev/null 2>&1 || echo 'Error: test 1 failed: 1..10' 2>&1

echo 'test 2: count from 1 to 10 by 3'
a.out '
BEGIN {
	for (i from 1 to 10 by 3)
		print i
}' >temp1
for i in 1 4 7 10; do echo $i; done >temp2
cmp temp1 temp2 >/dev/null 2>&1 || echo 'Error: test 2 failed: 1..10 3' 2>&1

This file is here; you might find it easiest to just paste your tests into it. Your awktests.sh should contain at least ten further tests: the echo line should state what each test does. Don't forget to test for syntax errors as well as boundary conditions.

You also want to be sure that your changes don't break anything else. The Github distribution includes the Awk regression suite, which you can run with the single command REGRESS. A couple of tests in that will fail when you add new keywords like from and trim but otherwisd everything should work.

3. Advice

Start early.

Talking over the precise meaning of the assignment with friends is encouraged. You must write your code yourself, however, so once you start coding, you may no longer discuss specific programming details with other students.
It is almost always best to attack such a task in stages, making sure each one works before the next one is started. First compile what you downloaded and test it. Then make a tiny change, recompile, and test again.
Prepare some good test cases ahead of time. Test early, test often, know what you are testing. Automate the testing as much as you can.
I think that for this assignment gdb is not as helpful as a few well-positioned fprintf statements, but if you want to use it, this short gdb help file might help you get started or remind you of basic commands.

For my version, I added about 50 lines for the for statement, spread over 6 source files. Testing directories added less than 10 lines, plus changing a handful of function calls in main.c, run.c and lib.c. For trim, there are no grammar changes but there is a new function name to be recognized, and you have to add new semantics in the right place in run.c. Figure out how some analogous one-argument built-in function works, then copy, paste and edit.

If you're doing a lot more, you are off on the wrong track. If you get stuck, here are some hints that might help. No penalty for using them, no reward for not using them, but you will probably find it most instructive to try hard before looking at the hints.

Make sure you don't break anything. Don't make unnecessary or irrelevant changes in any file. In particular, do not replace newlines by carriage returns (Mac) or CRLF (Windows), and do not reformat the code. And try to make your tests correct; it would be nice if we could run your tests over everyone else's implementations.

4. Submission

Updates to significant pieces of software are often distributed as "patch" files, that is, the set of changes necessary to convert the old version into the new version. For Unix and Linux source, a patch file is usually made by running diff to create a file of editing commands on the system where the changes were made, and running patch with that file as input on the system where the changes are to be applied.

For this assignment, you have to submit a patch file awk.patch that contains your changes. The easiest way to do this is to place the original program in one directory, say old, and the new version in new. Clean out all the junk like Yacc-generated files (ytab*), proctab.c, binary files like a.out and *.o, and editor temporaries like run.c~. (The make cleaner rule in the makefile doesn't get rid of all of ytab*, nor does it get rid of editor temporaries.) Then in the parent of these directories, say

diff -ur old new >awk.patch

The recipient (the grader in this case) will say, on his/her system,

cd old
patch --verbose --backup <../awk.patch

to update the old version with your changes, in place.

Try the patch process yourself to be sure it works right before you submit. Back up your work before you start experimenting!! My patch file is just under 300 lines long; if yours is a lot bigger (say over 1000 lines), you are probably including something like proctab.c or a Yacc output file, or you have somehow changed line terminations or tabs in your source files. Fix those up and try again.

When you're all done, submit awk.patch and awktests.sh, using the TigerFile link tigerfile.cs.princeton.edu/COS333_S2019/asgn2.