Last update: Sat Feb 10 21:12:38 EST 2001
This page describes the testing strategies that have evolved over the past 15 years for maintaining The One True AWK. In an attempt to keep the program working, and to maintain our sanity as bugs are fixed and the language changes slowly, we have developed a large number of ad hoc and systematic tests, and tools for running them. At the moment, there are somewhat over 1000 tests, which can be run automatically by a single command.
This description is mainly for a software engineering course, to illustrate one pragmatic approach to testing a small but important real program over a very long time; we also hope that the tests themselves will be helpful to developers of other versions of AWK (all five of them), and perhaps of interest to others. And of course if you would like to contribute new and better tests, we'd be happy to hear from you.
{ print }which prints each input line; the second example is
{ print $1, $3 }which prints the first and third fields.
echo 4000004 >foo1 $awk ' BEGIN { x1 = sprintf("%1000000s\n", "hello") x2 = sprintf("%-1000000s\n", "world") x3 = sprintf("%1000000.1000000s\n", "goodbye") x4 = sprintf("%-1000000.1000000s\n", "goodbye") print length(x1 x2 x3 x4) }' >foo2 cmp -s foo1 foo2 || echo '^GBAD: T.overflow huge sprintfs'
oawk=${oawk-awk} awk=${awk-../a.out} echo oawk=$oawk, awk=$awk for i do echo "$i:" $oawk -f $i test.data >foo1 $awk -f $i test.data >foo2 if cmp -s foo1 foo2 then true else echo -n "$i: BAD^G ..." fi diff -b foo1 foo2 | sed -e 's/^/ /' -e 10q done
awk=${awk-../a.out} $awk 'NR%2 == 1 { print >>"foo" } NR%2 == 0 { print >"foo" }' /etc/passwd diff foo /etc/passwd || echo 'BAD: T.redir (print > and >>"foo")'This applies the ">" and ">>" output operators to alternate input lines; the result at the end should be that the input file has been copied.
This example is an extreme test of the function call mechanism; it computes Ackermann's function for several pairs of argument values and compares the results to values computed earlier by a C program:
$awk ' function ack(m,n) { k = k+1 if (m == 0) return n+1 if (n == 0) return ack(m-1, 1) return ack(m-1, ack(m, n-1)) } { k = 0; print ack($1,$2), "(" k " calls)" } ' <<! >foo2 0 0 1 1 2 2 3 3 3 4 3 5 ! cat <<! >foo1 1 (1 calls) 3 (4 calls) 7 (27 calls) 61 (2432 calls) 125 (10307 calls) 253 (42438 calls) ! diff foo1 foo2 || echo 'BAD: T.func (ackermann)'
Although this kind of test is the most useful, since it is the most portable and least dependent on other things, it is among the hardest to create, especially for large volumes, since each test has to be carefully written out by hand.
^a.$ ~ ax aa !~ xa aaa axy ""into a sequence of test cases. In effect, this is a simple language for regular expression tests: it reads
^a.$ ~ ax "the pattern ^a.$ matches ax" aa "and matches aa" !~ xa "but does not match xa" aaa "and does match aaa" axy "and does not match axy" "" "and does not match the empty string"
Another such language describes substitute commands, and a third language describes input and output relations for expressions. The test expression follows the word "try", and after that are inputs and correct outputs; an AWK program generates and runs the tests.
try { print ($1 == 1) ? "yes" : "no" } 1 yes 1.0 yes 1E0 yes 0.1E1 yes 10E-1 yes 01 yes 10 no 10E-2 noThere are nearly 300 regular expression tests, 130 substitution tests, and over 100 expression tests; more are easily added.
{ i++} END { if (i != NR) print "error" }Splitting an input lines into fields should produce NF fields:
{ if (split($0, x) != NF) print "error"Deleting all elements of an array should leave no elements in the array, so this code should print 0 at the end.
BEGIN { for (i = 0; i < 100000; i++) x[i] = i for (i in x) delete x[i] n = 0 for (i in x) n++ print n }
Mechanize. This is the main lesson. The more automated your testing process, the more likely it is that you will run it routinely and often. And the more that tests and test data are generated automatically from compact specificiations, the easier it will be to extend them. For AWK, the single command REGRESS runs all the tests. The process takes a couple of minutes. It produces several hundred lines of output, but most consists just of filenames that are printed as tests progress. Having this large and easy to run set of tests has saved us from much embarrassment -- it's all too easy to think that some fix to the program is benign, when in fact something has been broken. The tests find such problems with high probability.
Make test output self-identifying. You have to know what tests ran and especially which ones caused error messages, core dumps, etc.
Make sure you can reproduce a test that fails. Reset random number generators and files and anything else that might inadvertently preserve state from one test to the next. Each test should start with a clean slate.
Add a test for each bug. Better tests originally should have caught the bug. At least this should prevent you from having to find this bug again.
Add tests for each new feature or change. While the new thing is fresh is a good time to figure out how to test whether it works correctly; presumably there was some testing anyway, so make sure it's preserved.
Never throw away a test. A corollary to the previous point.
Check your tests and scaffolding. It's easy to get into a rut and assume that your tests are working because they produce the expected (i.e., mostly empty) output. Go back from time to time and take a fresh look -- data files may no longer be appropriate, or may have changed underfoot. (In preparing to write this note, we found that the "big" data set we thought we were using had somehow mutated into a tiny one.) Paths to programs and data may have changed and you could be testing the wrong things.
Make your tests portable. Tests should run on more than one system; otherwise, it's too easy to miss errors in both your tests and your programs. Commands like the shell, built-ins (or not) like echo, search paths for commands, and the like are all potentially different on different machines, and just because something works one place is no assurance that it will work elsewhere.
Make sure that your tester reports progress. Too much output is bad, but there has to be some. The AWK tests report the name of each file that is being tested; if something seems to be taking too long, this is a clue about where the problem is.
Watch out for things that break. Make the test framework robust against the many things that can go wrong: infinite loops, tests that prompt for user input, tests that print spurious output, and tests that don't really distinguish success from failure.