COS 126 Counting Words |
Due: Wed. 4/2/97, 11:59pm |
Write, in wf.c
, a program that prints, in lexicographic order,
the identifiers in the standard input and the number of times each one appears.
For example:
% a.out </u/cs126/examples/bsearch.c | pr -3 -t 1 Quicksort 5 i 2 quicksort 1 Read 4 if 5 return 1 and 2 include 1 scanf 2 argc 1 input 1 sort 2 argv 10 int 1 sscanf 4 array 1 integers 1 standard 4 bsearch 1 is 1 stdio 1 char 4 k 1 them 5 d 4 lb 1 to 1 element 6 m 4 ub 4 else 1 main 1 up 1 for 8 n 1 using 1 found 1 not 1 while 1 from 2 printf 10 x 2 h 5 q
a.out
emits just a single-column list, but this output has
been folded into 3 columns with the
pr
command to save space. This output shows, for example, that there are 10
occurrences of "int
" and "x
" in
/u/cs126/examples/bsearch.c
.
In this assignment, you'll use pointers and structures to hold the identifiers
and their counts in a table, and you'll adapt the recursive binary search in
bsearch.c
to search this table as you read identifiers from the input. You'll also learn
more about multi-file programs, because you'll use
/u/cs126/examples/getword.c
.
Your program is actually more general than suggested above, because an "identifier"
is defined only by getword
, and you could use a different
implementation of getword
to count a different set of "words".
The implementation in
getword.c
defines a word as a C identifier. To acknowledge this generality, we'll use "word"
below instead of "identifier". Here's the general form for your
program and pseudocode for the main
function; you can copy this
skeleton from this page and edit it into your wf.c
.
#include <stdio.h> #include "getword.h" struct word { char *str; int count; }; struct word words[1000], *ptrs[1000]; /* Function to search for a word in ptrs[0...] */ /* Function to add a word to words and ptrs */ int main(void) { for each word in the input if word is in ptrs[0...] then increment its count field else add word to words and ptrs set its count field to 1 for each element in ptrs[0...] print its count and str fields }
The global variable words
is an array of struct
word
structures; each structure holds a pointer to a word in its
str
field and the number of times that word occurs in its count
field. The global variable ptrs
is an array of pointers
to struct
word
structures. The elements in words
are in the order in which the words appear in input; the elements in ptrs
are in lexicographic order of the words in the struct
word
structures to which they point. As words are read and added
to words
and ptrs
, the elements in ptrs
are rearranged so that ptrs[0
...]
are kept in sorted
order. ptrs
and words
are related in the same way as
are deck
and cards
shown on page 13-7 in
lecture 13.
The search function suggested in the pseudocode above looks for a word by
doing a binary search of ptrs
. Copy the binary search function
described in lecture 10, available in
/u/cs126/examples/bsearch.c
,
and edit it as necessary to search an array of pointers to struct
word
structures. You can adapt bsearch
by changing
only a few lines.
The add function suggested in the pseudocode above adds a word to words
and ptrs
. Adding a word to words
is easy: Keep track
of the number of elements currently used, fill in the next element, and
increment the number of elements used. A similar scheme can be used to add a
pointer to ptrs
, but after you initialize the last element of
ptrs
, you need to move that element to its proper location. You do
not have to sort ptrs
; just use exchanges to percolate
the new element down to its proper location. (Technically, this percolation is
one step of an insertion sort.)
int getword(char *word, int size)
, declared in
/u/cs126/examples/getword.h
and defined in /u/cs126/examples/getword.c
,
reads the next word from the standard input, stores it as a null-terminated
string in word
, and returns its length. When the input is
exhausted, getword
returns EOF
. getword
assumes that word can hold up to size
characters including the
null character.
As you read in the words, you'll need to store the new words some place
permanent, because the str
fields point to them. You can store
them end-to-end in a large, say, 8000-element, character array. The first time
you see a word, append it and its terminating null character to this
array; the address of its first character in this array is thus a character
pointer to the word.
The figure below depicts the values of words
,
ptrs
, and the input array for the one-line input
int main(int argc, char *argv[]) {
Note: You should not useand
you may not usethe
C library functions malloc
, calloc
, realloc
,
or free
in this program. You many assume there are no more than
1000 words and that they fit in 8000 bytes.
You don't have to copy getword.h
and getword.c
;
just compile getword.c
directly with your wf.c
with
the command
% lcc -I/u/cs126/examples wf.c /u/cs126/examples/getword.c
The -I
dir option causes lcc
to look in
dir for header files. You can also compile getword.c
once and load the resulting object code when you compile wf.c
.
To compile just getword.c
, use the command
% lcc -c -I/u/cs126/examples /u/cs126/examples/getword.c
The -c
option causes lcc
to leave the object
code for getword.c
in getword.o
in your directory.
You can load getword.o
when you compile wf.c
:
% lcc -I/u/cs126/examples wf.c getword.o
This assignment isn't difficult, but it's full of pitfalls for the unwary,
because pointer bugs are particularly difficult to find. A common error is to
dereference null pointers, e.g., using an expression like p->count
when p is NULL
. lcc
's -n
option can
help catch these errors; -n
causes lcc
to generate
code that checks for null pointers and, if it trips across one, to issue a
one-line diagnostic and terminate the program. See the
lcc man page
for more information.
Turn in your code and your documentation with the command
/u/cs126/bin/submit 7 readme wf.c
Add a command-line option "-n
" to your program that
causes it to print the identifiers in decreasing order of occurrence instead of
in alphabetical order. Use the sorting code from
/u/cs126/examples/quicksort.c
,
or use the qsort
function from the C library (see Dietel and
Deitel, p. 879). To use qsort
, you'll need to include stdlib.h
,
and you'll need to use a different name for bsearch
, because
there's a bsearch
in stdlib.h
, too.