Princeton University
COS 217: Introduction to Programming Systems

Assignment 3: A Symbol Table ADT


Purpose

The purpose of this assignment is to help you learn C dynamic memory management, and how to create abstract data types (ADTs) in C. It also will give you the opportunity to gain more experience with the Linux operating system and the GNU programming tools, especially bash, emacs, gdb, and make.

Students from past semesters reported taking, on average, 12 hours to complete this assignment.


Rules

Implementing hash table expansion (as described below) is the "extra challenge" part of this assignment. While doing the "extra challenge" part of the assignment, you are bound to observe the course policies regarding assignment conduct as given in the course Policies web page, plus one additional policy: you may not use any "human" sources of information. That is, you may not consult with the course's staff members, the lab teaching assistants, other current students via Piazza, or any other people while working on the "extra challenge" part of an assignment, except for clarification of requirements.

The extra challenge part is worth 10 percent of this assignment. So if you don't do any of the "extra challenge" part and all other parts of your assignment solution are perfect and submitted on time, then your grade for the assignment will be 90 percent.


Background

A symbol table is an unordered collection of bindings. A binding consists of a key and a value. A key is a string that uniquely identifies its binding; a value is data that is somehow pertinent to its key. A symbol table allows its client to insert (put) new bindings, to retrieve (get) the values of bindings with specified keys, and to remove bindings with specified keys. Symbol tables are used often in programming systems; compilers, assemblers, and execution profilers use them extensively.

There are several reasonable ways to implement a symbol table. A simple implementation might store the bindings in a linked list. Linked lists are described in Section 2.7 of The Practice of Programming (Kernighan and Pike) and Section 17.5 of C Programming: A Modern Approach (King). A more efficient implementation might use a hash table. Hash tables are described in Section 2.9 of The Practice of Programming (Kernighan & Pike).


Your Task

Your task in this assignment is to create an ADT named SymTable. Each SymTable object must be a symbol table. A SymTable object must be generic. That is, a SymTable object must contain values which are void pointers, and thus can point to data of any type.

You must create two implementations of your SymTable ADT: one that uses a linked list and another that uses a hash table.


The SymTable Interface

Store your SymTable interface in a file named symtable.h. It must contain these function declarations:

SymTable_T SymTable_new(void);

void SymTable_free(SymTable_T oSymTable);

size_t SymTable_getLength(SymTable_T oSymTable);

int SymTable_put(SymTable_T oSymTable, const char *pcKey, const void *pvValue);

void *SymTable_replace(SymTable_T oSymTable, const char *pcKey,
   const void *pvValue);

int SymTable_contains(SymTable_T oSymTable, const char *pcKey);

void *SymTable_get(SymTable_T oSymTable, const char *pcKey);

void *SymTable_remove(SymTable_T oSymTable, const char *pcKey);

void SymTable_map(SymTable_T oSymTable,
   void (*pfApply)(const char *pcKey, void *pvValue, void *pvExtra),
   const void *pvExtra);

SymTable_new must return a new SymTable object that contains no bindings, or NULL if insufficient memory is available.

SymTable_free must free all memory occupied by oSymTable.

SymTable_getLength must return the number of bindings in oSymTable.

If oSymTable does not contain a binding with key pcKey, then SymTable_put must add a new binding to oSymTable consisting of key pcKey and value pvValue and return 1 (TRUE). Otherwise the function must leave oSymTable unchanged and return 0 (FALSE). If insufficient memory is available, then the function must leave oSymTable unchanged and return 0 (FALSE).

If oSymTable contains a binding with key pcKey, then SymTable_replace must replace the binding's value with pvValue and return the old value. Otherwise it must leave oSymTable unchanged and return NULL.

SymTable_contains must return 1 (TRUE) if oSymTable contains a binding whose key is pcKey, and 0 (FALSE) otherwise.

SymTable_get must return the value of the binding within oSymTable whose key is pcKey, or NULL if no such binding exists.

If oSymTable contains a binding with key pcKey, then SymTable_remove must remove that binding from oSymTable and return the binding's value. Otherwise the function must not change oSymTable and return NULL.

SymTable_map must apply function *pfApply to each binding in oSymTable, passing pvExtra as an extra parameter. That is, the function must call (*pfApply)(pcKey, pvValue, pvExtra) for each pcKey/pvValue binding in oSymTable.

A SymTable object is responsible for allocating the memory in which its keys reside. Specifically, SymTable_put must not simply store the value of pcKey within the binding that it creates. Instead SymTable_put must make a defensive copy of the string to which pcKey points, and store the address of that copy within the new binding. You will find the standard C functions strlen, malloc, and strcpy useful for making the copy. Thereafter a SymTable object must own its keys. That is, a SymTable object must free the memory in which its keys reside when that memory is no longer required.

Conversely, a SymTable object is not responsible for allocating the memory in which its values reside. Specifically, SymTable_put must simply store the value of pvValue within the binding that it creates. SymTable_put must not make a defensive copy of the object to which pvValue points. In fact SymTable_put cannot make a copy of the object pointed to by pvValue; since SymTable_put cannot determine the type of that object, SymTable_put cannot make a copy of that object. Thus a SymTable object must not own its values; it must not free the memory in which its values reside. (That memory might not even be in the heap! Freeing it could be a disaster!)


The SymTable Linked List Implementation

Your SymTable linked list implementation must:


The SymTable Hash Table Implementation

Your SymTable hash table implementation must:

/* Return a hash code for pcKey that is between 0 and uBucketCount-1,
   inclusive. */
   
static size_t SymTable_hash(const char *pcKey, size_t uBucketCount)
{
   enum {HASH_MULTIPLIER = 65599};
   size_t u;
   size_t uHash = 0;

   assert(pcKey != NULL);

   for (u = 0; pcKey[u] != '\0'; u++)
      uHash = uHash * (size_t)HASH_MULTIPLIER + (size_t)pcKey[u];
      
   return uHash % uBucketCount;
}

Concerning hash table expansion:


The Procedure

Develop on CourseLab using emacs to create source code, gdb to debug, splint and critTer to check for stylistic errors, and meminfo and valgrind to check for memory management errors.


Step 1: Create and Populate a Project Directory

The CourseLab /u/cos217/Assignment3 directory contains files that you will find useful. Subsequent steps describe them. Create a project directory, and copy all files from the /u/cos217/Assignment3 directory to your project directory.


Step 2: Compose symtablelist.c

Compose symtablelist.c, as described in previous sections of this specification.


Step 3: Test symtablelist.c

The given testsymtable.c is a test client. As its name implies, testsymtable.c thoroughly tests your SymTable implementations.

Use testsymtable.c and symtablelist.c to build a program named testsymtablelist. Then run testsymtablelist. Confirm that the output is correct.

The testsymtablelist program requires you to provide a single command-line argument, which must be an integer that specifies a binding count. The module tests your SymTable ADT by using several SymTable objects. One of those SymTable objects contains the specified number of bindings. The module writes to stdout an indication of how much CPU time it consumed while manipulating that SymTable object.

The given buzz.c file is another test client. Whereas testsymtable.c tests your SymTable implementations thoroughly but performs no meaningful task, buzz.c tests your SymTable implementations less thoroughly, but performs a meaningful task. Comments in the buzz.c file describe what the client does.

Use buzz.c and symtablelist.c to build a program named buzzlist. Then run buzzlist, redirecting its stdin to the given file huck.txt. Confirm that the output seems reasonable.

Study buzz.c. An upcoming lecture requires you to understand buzz.c, and the midterm exam may contain questions that involve knowledge of buzz.c.

Create additional test clients, as you deem necessary, to test symtablelist.c.

Make sure you use meminfo and valgrind to check the memory management of your testsymtable.c/symtablelist.c and buzz.c/symtablelist.c programs.


Step 4: Critique symtablelist.c

Critique your testsymtable.c/symtablelist.c and buzz.c/symtablelist.c programs using the splint tool. Each time splint generates a warning on your code, you must either (1) edit your code to eliminate the warning, or (2) explain your disagreement with the warning in your readme file.

When given testsymtable.c, splint generates warning messages about the use of the sprintf function, and suggests using the snprintf function instead. splint complains because sprintf does not check the length of the array (alias buffer) into which it assigns characters, and so could cause a buffer overflow if the given array is too short. It suggests using snprintf because that function allows the caller to specify the length of the array.

Don't be concerned about those warning messages. You need not copy them to your readme file or explain them. Contrary to splint's suggestion, testsymtable.c uses sprintf instead of snprintf because:

Similarly, critique symtablelist.c using critTer. Each time critTer generates a warning on your code, you must either (1) edit your code to eliminate the warning, or (2) explain your disagreement with the warning in your readme file.

When given testsymtable.c, critTer generates warning messages about the use of "magic numbers," the number of statments in some functions, the number of function definitions in the file, and the number of lines in the file. Don't be concerned about those warning messages. You need not copy those warning messages to your readme file or explain them. We judged that using magic numbers and defining long functions in testsymtable.c was clearer than the alternative. And splitting testsymtable.c into multiple files would have complicated the build process unnecessarily.
When given buzz.c, critTer generates a few warning messages, for which Professor Appel apologizes. You need not copy those warning messages to your readme file or explain them.

Step 5: Compose symtablehash.c

Compose symtablehash.c, as described in previous sections of this specification.


Step 6: Test symtablehash.c

Use testsymtable.c and symtablehash.c to build a program named testsymtablehash. Then run testsymtablehash. Confirm that the output is correct.

Use buzz.c and symtablehash.c to build a program named buzzhash. Then run buzzhash, redirecting its stdin to the given file huck.txt. Confirm that the output seems reasonable.

Create additional test clients, as you deem necessary, to test symtablehash.c.

Make sure you use meminfo and valgrind to check the memory management of your testsymtable.c/symtablehash.c and buzz.c/symtablehash.c programs.


Step 7: Critique symtablehash.c

Critique your testsymtable.c/symtablehash.c and buzz.c/symtablehash.c programs using splint. Each time splint generates a warning on your code, you must either (1) edit your code to eliminate the warning, or (2) explain your disagreement with the warning in your readme file.

Similarly, critique symtablehash.c using critTer. Each time critTer generates a warning on your code, you must either (1) edit your code to eliminate the warning, or (2) explain your disagreement with the warning in your readme file.


Step 8: Compose a Makefile

Compose a Makefile. The first dependency rule of the Makefile must command make to build four executable files:

That is, the first dependency rule of your Makefile must be:

all: testsymtablelist testsymtablehash buzzlist buzzhash

Your Makefile must:

We recommend that you create your Makefile early in your development process — earlier than its "Step 8" placement would imply. Doing so will allow you to use and test your Makefile during development.


Step 9: Compose a readme File

Edit your copy of the given readme file by answering each question that is expressed therein.

One of the sections of the readme file requires you to list the authorized sources of information that you used to complete the assignment. Another section requires you to list the unauthorized sources of information that you used to complete the assignment. Your grader will not grade your submission unless you have completed those sections. To complete the "authorized sources" section of your readme file, copy the list of authorized sources given in the "Policies" web page to that section, and edit it as appropriate.

Place in your readme file the CPU times reported by testsymtable.c with binding counts 100, 1000, 10000, 100000, and 1000000 using (1) your linked list implementation, (2) your non-expanding hash table implementation, and (3) your expanding hash table implementation. You can create a non-expanding hash table implementation by temporarily commenting-out your expansion code; don't forget to "comment in" that code before submitting your work. If the CPU time consumed is more than 5 minutes, then the testsymtable.c client module will abort execution. In that case you must write "More than 5 minutes."

Execute this command:

time ./buzzlist < huck.txt
and report in your readme file the amount of "user" time consumed. Similarly, execute this command:
time ./buzzhash < huck.txt

and report the amount of "user" time consumed.

Finally, complete the section of the readme file that asks how thoroughly you studied buzz.c.


Step 10: Provide Feedback

Provide the instructors with your feedback on the assignment. To do that, issue this command:

FeedbackCOS217.py 3

and answer the questions that it asks. That command stores its questions and your answers in a file named feedback in your working directory.


Step 11: Submit

Submit your work electronically on CourseLab using these commands:

submit 3 symtable.h symtablelist.c symtablehash.c
submit 3 Makefile readme feedback
The given huck.txt file is large. It would be wise to delete it from your project directory when you no longer need it. (Of course you always can recopy it from the /u/cos217/Assignment3 directory if necessary.) If you don't delete huck.txt, then the probability is greater that you eventually will exhaust your disk quota in the CourseLab file system.

Program Style

In part, good program style is defined by the splint and critTer tools, and by the rules given in The Practice of Programming (Kernighan and Pike) as summarized by the Rules of Programming Style document.

The more course-specific style rules listed in the previous assignment specifications also apply, as do these:


Grading

Minimal requirement to receive credit for the SymTable linked list implementation:

Minimal requirement to receive credit for the SymTable hash table implementation:

We will grade your work on two kinds of quality:

To encourage good coding practices, we will deduct points if gcc217 generates warning messages.


This assignment was created by Robert M. Dondero, Jr. and Andrew Appel,
with input from other faculty members