Due Thursday February 21 at 10:00 pm
Mon Feb 11 09:30:59 EST 2019
One of the most common experiences for a programmer coming into a new job or working with some open-source code is to have to fix a bug or make a small change in a large unfamiliar program. This requires the ability to quickly find the relevant parts of the program and change them in a minimal way, while ignoring irrelevant parts and being sure not to break anything.
This assignment is an exercise in working with a program that we have talked about in class, but whose innards are almost certainly unfamiliar. It also involves
To get started, become familiar enough with Awk that you know in broad outline what it does; this Awk help file might help. Then clone the source from Github at https://github.com/onetrueawk/awk, skim the source code to see how it is implemented, and be sure that you can build it and test it before you make any changes. (You may have to fiddle the makefile to use yacc instead of bison; it's a 1-character change.)
Warning: Most of the assignments in this course are best done on Linux or MacOSX, and that is especially true of this one. If you are a Windows user, it is strongly recommended that you use ssh, putty or MobaXterm to log into a CS cluster like cycles or an OIT cluster like nobel, or use a virtual Linux machine. The alternative is likely to be a great deal of beating your head against lots of walls, and it's just not worth it. (For general use, Cygwin is highly recommended; it makes Windows more convenient for programmers familiar with Unix. You might also try WSL, the Windows Subsystem for Linux, a compatibility layer that lets you run Linux binaries on Windows.)
Your specific tasks are to
for (var from expr1 to expr2 by expr3) { statements }repeats the statements in the body with values of var running from expr1 to expr2 inclusive in steps of expr3. (The klunky syntax is so that it will fit smoothly into the existing grammar, which is fragile.) The by clause is optional; if omitted, the step size is 1. Negative values of by are legal. The values of the expressions are fixed when the statement begins; statements within the loop do not affect the limits or the step size, and changes to the index variable within the loop do not affect the iterations either. Other than that, the behavior should be the same as
for (var = expr1; var <= expr2; var += expr3) { statements }(with >= instead if expr3 is negative). Thus, for example,
for (i from 1 to 10) { print i }prints 1 through 10, while
for (i from 1 to 10 by 2) { i = 0; print i }prints 0 five times. (This is the simplest of several plausible designs; please implement exactly this even if you think something else is more sensible.)
For this assignment, you can get a very long way merely by finding code that already does more or less what's needed and is already in the right place and right form; most of the job amounts to intelligent copy and paste, using grep and your favorite text editor to locate places that might be relevant.
For the new kind of for statement, the Awk grammar needs two new rules (parallel to existing rules for for) that specify its syntax and create the right kind of node in the parse tree. Use the same syntactic category for expressions as in the existing for statement.
Lexical analysis has to recognize the three new keywords from, to and by, and their syntactic categories have to be added to the grammar. A new function has to be written to provide the semantics of the new for statement; it will be similar to the functions that handle the two existing for loops.
You also have to recognize the name of the new built-in function trim as part of lexical analysis and add semantics for it.
For detecting directory access, there are no grammar changes but you have to add one function and direct all file opens to it. Use the stat system call to determine whether a file is a directory; it's fine to search online for how to use this system call. Note that stat itself can return an error; that should be treated the same as a failure to access the specified file.
For the last part, testing, create a shell file awktests.sh that contains at least a dozen tests that ensure that your features are thoroughly tested. The file awktests.sh must be self-contained, requiring no input from a user and generating its own test data somehow. It should produce only one line per test if the test works, and one additional line per failure, of the form
Error: test N failed [further text...]if a test fails. It should assume that the program being tested is named a.out and is in the current directory. For example, this file contains two minimal tests of the new for:
#!/bin/bash echo 'test 1: count from 1 to 10' a.out ' BEGIN { for (i from 1 to 10) print i }'>temp1 for i in `seq 1 10`; do echo $i; done >temp2 cmp temp1 temp2 >/dev/null 2>&1 || echo 'Error: test 1 failed: 1..10' 2>&1 echo 'test 2: count from 1 to 10 by 3' a.out ' BEGIN { for (i from 1 to 10 by 3) print i }' >temp1 for i in 1 4 7 10; do echo $i; done >temp2 cmp temp1 temp2 >/dev/null 2>&1 || echo 'Error: test 2 failed: 1..10 3' 2>&1This file is here; you might find it easiest to just paste your tests into it. Your awktests.sh should contain at least ten further tests: the echo line should state what each test does. Don't forget to test for syntax errors as well as boundary conditions.
You also want to be sure that your changes don't break anything else. The Github distribution includes the Awk regression suite, which you can run with the single command REGRESS. A couple of tests in that will fail when you add new keywords like from and trim but otherwisd everything should work.
Start early.
For my version, I added about 50 lines for the for statement, spread over 6 source files. Testing directories added less than 10 lines, plus changing a handful of function calls in main.c, run.c and lib.c. For trim, there are no grammar changes but there is a new function name to be recognized, and you have to add new semantics in the right place in run.c. Figure out how some analogous one-argument built-in function works, then copy, paste and edit.
If you're doing a lot more, you are off on the wrong track. If you get stuck, here are some hints that might help. No penalty for using them, no reward for not using them, but you will probably find it most instructive to try hard before looking at the hints.
Make sure you don't break anything. Don't make unnecessary or irrelevant changes in any file. In particular, do not replace newlines by carriage returns (Mac) or CRLF (Windows), and do not reformat the code. And try to make your tests correct; it would be nice if we could run your tests over everyone else's implementations.
For this assignment, you have to submit a patch file awk.patch that contains your changes. The easiest way to do this is to place the original program in one directory, say old, and the new version in new. Clean out all the junk like Yacc-generated files (ytab*), proctab.c, binary files like a.out and *.o, and editor temporaries like run.c~. (The make cleaner rule in the makefile doesn't get rid of all of ytab*, nor does it get rid of editor temporaries.) Then in the parent of these directories, say
diff -ur old new >awk.patchThe recipient (the grader in this case) will say, on his/her system,
cd old patch --verbose --backup <../awk.patchto update the old version with your changes, in place.
Try the patch process yourself to be sure it works right before you submit. Back up your work before you start experimenting!! My patch file is just under 300 lines long; if yours is a lot bigger (say over 1000 lines), you are probably including something like proctab.c or a Yacc output file, or you have somehow changed line terminations or tabs in your source files. Fix those up and try again.
When you're all done, submit awk.patch and awktests.sh, using the TigerFile link tigerfile.cs.princeton.edu/COS333_S2019/asgn2.