The purpose of this assignment is to help you learn C dynamic memory management, and how to create abstract data types (ADTs) in C. It also will give you the opportunity to gain more experience with the GNU/Linux programming tools, especially bash
, emacs
, and gdb
.
Implementing hash table expansion (as described below) is the "on your own" part of this assignment. That part is worth 10% of this assignment.
A symbol table is an unordered collection of bindings. A binding consists of a key and a value. A key is a string that uniquely identifies its binding; a value is data that is somehow pertinent to its key. A symbol table allows its client to insert (put) new bindings, to retrieve (get) the values of bindings with specified keys, and to remove bindings with specified keys. Symbol tables are used often in programming systems; compilers, assemblers, and execution profilers use them extensively.
There are several reasonable ways to implement a symbol table. A simple implementation might store the bindings in a linked list. Linked lists are described in Section 2.7 of The Practice of Programming (Kernighan and Pike) and Section 17.5 of C Programming: A Modern Approach (King). A more efficient implementation might use a hash table. Hash tables are described in Section 2.9 of The Practice of Programming (Kernighan & Pike) and Chapter 14 of Algorithms in C, Parts 1-4 (Sedgewick).
Your task in this assignment is to create an ADT named SymTable
. Each SymTable
object should be a symbol table. A SymTable
object should be generic. That is, a SymTable
object should contain values which are void pointers, and thus can point to data of any type.
You should create two implementations of your SymTable
ADT: one that uses a linked list and another that uses a hash table.
SymTable
InterfaceStore your SymTable
interface in a file named symtable.h
. It should contain these function declarations:
SymTable_T SymTable_new(void); void SymTable_free(SymTable_T oSymTable); int SymTable_getLength(SymTable_T oSymTable); int SymTable_put(SymTable_T oSymTable, const char *pcKey, const void *pvValue); void *SymTable_replace(SymTable_T oSymTable, const char *pcKey, const void *pvValue); int SymTable_contains(SymTable_T oSymTable, const char *pcKey); void *SymTable_get(SymTable_T oSymTable, const char *pcKey); void *SymTable_remove(SymTable_T oSymTable, const char *pcKey); void SymTable_map(SymTable_T oSymTable, void (*pfApply)(const char *pcKey, void *pvValue, void *pvExtra), const void *pvExtra);
SymTable_new
should return a new SymTable
object that contains no bindings, or NULL if insufficient memory is available.
SymTable_free
should free all memory occupied by oSymTable
.
SymTable_getLength
should return the number of bindings in oSymTable
.
If oSymTable
does not contain a binding with key pcKey
, then SymTable_put
should add a new binding to oSymTable
consisting of key pcKey
and value pvValue
and return 1 (TRUE). Otherwise the function should leave oSymTable
unchanged and return 0 (FALSE). If insufficient memory is available, then the function should leave oSymTable
unchanged and return 0 (FALSE).
If oSymTable
contains a binding with key pcKey
, then SymTable_replace
should replace the binding's value with pvValue
and return the old value. Otherwise it should leave oSymTable
unchanged and return NULL
.
SymTable_contains
should return 1 (TRUE) if oSymTable
contains a binding whose key is pcKey
, and 0 (FALSE) otherwise.
SymTable_get
should return the value of the binding within oSymTable
whose key is pcKey
, or NULL
if no such binding exists.
If oSymTable
contains a binding with key pcKey
, then SymTable_remove
should remove that binding from oSymTable and return the binding's value. Otherwise the function should not change oSymTable
and return NULL
.
SymTable_map
should apply function *pfApply
to each binding in oSymTable
, passing pvExtra
as an extra parameter. That is,the function should call (*pfApply)(pcKey, pvValue, pvExtra)
for each pcKey/pvValue
binding in oSymTable
.
A SymTable
object should "own" its keys. That is, a SymTable
object is responsible for allocating and freeing the memory in which its keys reside. Specifically, SymTable_put
should not simply store the value of pcKey
within the binding that it creates. Rather, SymTable_put
should make a copy of the string pointed to by pcKey
, and store the address of that copy within the new binding. You will find the standard C functions strlen
, malloc
, and strcpy
useful for making the copy. A SymTable
object also should free the memory in which its keys reside when that memory is no longer required.
Conversely, a SymTable
object should not own its values. Indeed it cannot own its values; since it cannot determine the types of its values, it cannot create copies of them.
SymTable
Linked List ImplementationYour SymTable
linked list implementation should:
symtablelist.c
.assert
macro.SymTable
ADT will use dynamically allocated memory. It should explicitly free all dynamically allocated memory when that memory is no longer required. For each call of malloc
or calloc
in your ADT, eventually there should be exactly one call of free
.SymTable
Hash Table ImplementationYour SymTable
hash table implementation should:
symtablehash.c
.enum {HASH_MULTIPLIER = 65599}; ... static int SymTable_hash(const char *pcKey, int iBucketCount) /* Return a hash code for pcKey that is between 0 and iBucketCount-1, inclusive. Adapted from the COS 217 lecture notes. */ { int i; unsigned int uiHash = 0U; for (i = 0; pcKey[i] != '\0'; i++) uiHash = uiHash * (unsigned int)HASH_MULTIPLIER + (unsigned int)pcKey[i]; return (int)(uiHash % (unsigned int)iBucketCount); }
assert
macro.SymTable
ADT will use dynamically allocated memory. It should explicitly free all dynamically allocated memory when that memory is no longer required. For each call of malloc
or calloc
in your ADT, eventually there should be exactly one call of free
.SymTable_put
causes the number of bindings to become too large. If an expansion attempt fails because of insufficient memory, then your implementation simply should proceed with the execution of SymTable_put
.Concerning hash table expansion:
SymTable_put
detects that the new binding count exceeds 509, it should increase the SymTable
object's bucket count to 1021. When the function detects that the new binding count exceeds 1021, it should increase the bucket count to 2039. Etc. When SymTable_put
detects that the new binding count exceeds 65521, it should not increase the bucket count. Thus 65521 is the maximum number of buckets that a SymTable
object should contain.symtablehash.c
file as comments, and describe your attempts in your readme file.Develop on hats using emacs
to create source code, splint
to check your source code for stylistic errors, and gdb
to debug.
A client program named testsymtable.c
is available in the /u/cos217/Assignment3
directory on hats. That program requires you to provide a single command-line argument, which should be an integer that specifies a binding count. The program tests your SymTable
ADT by manipulating several SymTable
objects. One of those SymTable
objects contains the specified number of bindings. The program prints to the standard output stream an indication of how much CPU time it consumed while manipulating that SymTable
object. You should use testsymtable.c
to test your SymTable
ADT. You should create additional test programs, as you deem necessary, to test your SymTable
ADT.
Create a readme
text file that contains:
testsymtable.c
with binding counts 100, 1000, 10000, 100000, and 1000000 using (1) your linked list implementation, (2) your non-expanding hash table implementation, and (3) your expanding hash table implementation. You can create a non-expanding hash table implementation by temporarily commenting-out your expansion code; don't forget to "comment in" that code before submitting your work. If the time consumed is more than 5 minutes, you may abort execution and report "Greater than 5 minutes."Submit your work electronically on hats via the command:
submit 3 symtable.h symtablelist.c symtablehash.c readme
We will grade your work on quality from the user's point of view and from the programmer's point of view. To encourage good coding practices, we will deduct points if gcc217
generates warning messages. We also will deduct points if splint
generates warning messages that are not explained in your readme file. But see the next section of this document regarding splint
warnings on the given testsymtable.c
file.
From the user's point of view, your module has quality if it behaves as it should. The correct behavior of the SymTable
ADT is defined by the previous sections of this assignment specification.
In part, style is defined by the rules given in The Practice of Programming (Kernighan and Pike), as summarized by the Rules of Programming Style document. These additional rules apply:
Names: You should use a clear and consistent style for variable and function names. One example of such a style is to prefix each variable name with characters that indicate its type. For example, the prefix c
might indicate that the variable is of type char
, i
might indicate int
, pc
might mean char*
, ui
might mean unsigned int
, etc. But it is fine to use another style -- a style which does not include the type of a variable in its name -- as long as the result is a readable program.
Line lengths: Limit line lengths in your source code to 72 characters. Doing so allows us to print your work in two columns, thus saving paper.
Comments: Each source code file should begin with a comment that includes your name, the number of the assignment, and the name of the file.
Comments: Each function should begin with a comment that describes what the function does from the caller's point of view. The function comment should:
Comments: Each structure type definition and each structure field definition should have a comment that describes it.
Comments: The interface of an ADT should contain a comment that describes what an object of that type is. It would be reasonable to place that comment adjacent to the definition of the opaque pointer type.
splint
Warnings on testsymtable.c
When you run splint
on the given testsymtable.c
file, it generates these warnings:
$ splint testsymtable.c symtablelist.c Splint 3.1.1 --- 19 Jul 2006 testsymtable.c: (in function testLargeTable) testsymtable.c:583:7: Buffer overflow possible with sprintf. Recommend using snprintf instead: sprintf Use of function that may lead to buffer overflow. (Use -bufferoverflowhigh to inhibit warning) testsymtable.c:599:7: Buffer overflow possible with sprintf. Recommend using snprintf instead: sprintf testsymtable.c:605:7: Buffer overflow possible with sprintf. Recommend using snprintf instead: sprintf testsymtable.c:614:7: Buffer overflow possible with sprintf. Recommend using snprintf instead: sprintf testsymtable.c:627:7: Buffer overflow possible with sprintf. Recommend using snprintf instead: sprintf testsymtable.c:634:7: Buffer overflow possible with sprintf. Recommend using snprintf instead: sprintf testsymtable.c:644:7: Buffer overflow possible with sprintf. Recommend using snprintf instead: sprintf testsymtable.c: (in function testMultipleTables) testsymtable.c:693:7: Buffer overflow possible with sprintf. Recommend using snprintf instead: sprintf testsymtable.c:739:7: Buffer overflow possible with sprintf. Recommend using snprintf instead: sprintf Finished checking --- 9 code warnings
Don't be concerned about those warnings. You need not explain them in your readme
file. splint
complains about the calls of the sprintf
function in testsymtable.c
. It complains because sprintf
does not check the length of the array (alias buffer) into which it assigns characters, and so could cause a buffer overflow if the given array is too short. splint
suggests using snprintf
instead.
Contrary to splint
's suggestion, testsymtable.c
uses sprintf
instead of snprintf
because:
testsymtable.c
the number of characters being assigned is known, and so the array certainly is not too short.snprintf
is not part of the C90 standard, and so is not declared in stdio.h
. So gcc217
complains about the use of snprintf
. A splint
warning is less important than a gcc217
warning.This assignment was created by Robert M. Dondero, Jr. with input from other faculty members