The purpose of this assignment is to help you learn C dynamic memory management, and how to create abstract data types (ADTs) in C. It also will give you the opportunity to gain more experience with the Linux operating system and the GNU programming tools, especially bash
, gdb
, and make
.
Students from past semesters reported taking, on average, 15 hours to complete this assignment. Anecdotally, this assignment is one of two on which students used a disproportionate number of late days.
This assignment is an individual assignment, not a partnered assignment.
Implementing hash table expansion (as described below) is the challenge part of this assignment. While doing the challenge part of the assignment you are bound to observe the course policies regarding assignment conduct as given in the course Policies web page, plus one additional policy: you may not use any "human" sources of information. That is, you may not consult with the course's staff members, the Intro Lab TAs, other current students via Ed, or any other people while working on the challenge part of an assignment, except for clarification of requirements.
The challenge part is worth 10 percent of this assignment. So if you don't do any of the challenge part and all other parts of your assignment solution are perfect and submitted on time, then your grade for the assignment will be 90 percent.
A symbol table is an unordered collection of bindings. A binding consists of a key and a value. A key is a string that uniquely identifies its binding; a value is data that is somehow pertinent to its key. A symbol table allows its client to insert (put) new bindings, to retrieve (get) the values of bindings with specified keys, and to remove bindings with specified keys. Symbol tables are used often in programming systems; compilers, assemblers, and execution profilers use them extensively.
There are several reasonable ways to implement a symbol table. A simple implementation might store the bindings in a linked list. Linked lists are described in Section 2.7 of The Practice of Programming (Kernighan and Pike) and Section 17.5 of C Programming: A Modern Approach (King). A more efficient implementation might use a hash table. Hash tables are described in Section 2.9 of The Practice of Programming (Kernighan & Pike).
Your task in this assignment is to create an ADT named SymTable
. Each SymTable
object must be a symbol table. A SymTable
object must be generic. That is, a SymTable
object must contain values which are void pointers, and thus can point to data of any type.
You must create two implementations of your SymTable
ADT: one that uses a linked list and another that uses a hash table.
SymTable
InterfaceStore your SymTable
interface in a file named symtable.h
. It must contain these function declarations:
SymTable_T SymTable_new(void); void SymTable_free(SymTable_T oSymTable); size_t SymTable_getLength(SymTable_T oSymTable); int SymTable_put(SymTable_T oSymTable, const char *pcKey, const void *pvValue); void *SymTable_replace(SymTable_T oSymTable, const char *pcKey, const void *pvValue); int SymTable_contains(SymTable_T oSymTable, const char *pcKey); void *SymTable_get(SymTable_T oSymTable, const char *pcKey); void *SymTable_remove(SymTable_T oSymTable, const char *pcKey); void SymTable_map(SymTable_T oSymTable, void (*pfApply)(const char *pcKey, void *pvValue, void *pvExtra), const void *pvExtra);
SymTable_new
must return a new SymTable
object that contains no bindings, or NULL
if insufficient memory is available.
SymTable_free
must free all memory occupied by oSymTable
.
SymTable_getLength
must return the number of bindings in oSymTable
.
If oSymTable
does not contain a binding with key pcKey
, then SymTable_put
must add a new binding to oSymTable
consisting of key pcKey
and value pvValue
and return 1 (TRUE). Otherwise the function must leave oSymTable
unchanged and return 0 (FALSE). If insufficient memory is available, then the function must leave oSymTable
unchanged and return 0 (FALSE).
If oSymTable
contains a binding with key pcKey
, then SymTable_replace
must replace the binding's value with pvValue
and return the old value. Otherwise it must leave oSymTable
unchanged and return NULL
.
SymTable_contains
must return 1 (TRUE) if oSymTable
contains a binding whose key is pcKey
, and 0 (FALSE) otherwise.
SymTable_get
must return the value of the binding within oSymTable
whose key is pcKey
, or NULL
if no such binding exists.
If oSymTable
contains a binding with key pcKey
, then SymTable_remove
must remove that binding from oSymTable
and return the binding's value. Otherwise the function must not change oSymTable
and return NULL
.
SymTable_map
must apply function *pfApply
to each binding in oSymTable
, passing pvExtra
as an extra parameter. That is, the function must call (*pfApply)(pcKey, pvValue, pvExtra)
for each pcKey/pvValue
binding in oSymTable
.
A SymTable
object is responsible for allocating the memory in which its keys reside. Specifically, SymTable_put
must not simply store the value of pcKey
within the binding that it creates. Instead SymTable_put
must make a defensive copy of the string to which pcKey
points, and store the address of that copy within the new binding. You will find the standard C functions strlen
, malloc
, and strcpy
useful for making the copy. Thereafter a SymTable
object must own its keys. That is, a SymTable
object must free the memory in which its keys reside when that memory is no longer required.
Conversely, a SymTable
object is not responsible for allocating the memory in which its values reside. Specifically, SymTable_put
must simply store the value of pvValue
within the binding that it creates. SymTable_put
must not make a defensive copy of the object to which pvValue
points. In fact SymTable_put
cannot make a copy of the object pointed to by pvValue
; since SymTable_put
cannot determine the type of that object, SymTable_put
cannot make a copy of that object. Thus a SymTable
object must not own its values; it must not free the memory in which its values reside. (That memory might not even be in the heap! Freeing it could be a disaster!)
SymTable
Linked List ImplementationYour SymTable
linked list implementation must:
Reside in a file named symtablelist.c
.
Validate function parameters by calling the standard assert
macro.
Contain no memory leaks. Your SymTable
ADT will use dynamically allocated memory. It must explicitly free all dynamically allocated memory when that memory is no longer required. For each chunk of dynamically allocated memory there must be exactly one call of free
.
Avoid gross inefficiencies. In particular, it would be grossly inefficient to traverse the list multiple times when a single time would suffice.
Define the SymTable_getLength
function so it runs in constant time.
SymTable
Hash Table ImplementationYour SymTable
hash table implementation must:
Reside in a file named symtablehash.c
.
Contain 509 buckets initially.
Use a reasonable hash function. For consistency, you should use this one:
/* Return a hash code for pcKey that is between 0 and uBucketCount-1, inclusive. */ static size_t SymTable_hash(const char *pcKey, size_t uBucketCount) { const size_t HASH_MULTIPLIER = 65599; size_t u; size_t uHash = 0; assert(pcKey != NULL); for (u = 0; pcKey[u] != '\0'; u++) uHash = uHash * HASH_MULTIPLIER + (size_t)pcKey[u]; return uHash % uBucketCount; }
Validate function parameters, where appropriate, by calling the standard assert
macro.
Contain no memory leaks. Your SymTable
ADT will use dynamically allocated memory. It must explicitly free all dynamically allocated memory when that memory is no longer required. For each chunk of dynamically allocated memory there must be exactly one call of free
.
Avoid gross inefficiencies. It would be grossly inefficient to examine any bucket other than the one the given key belongs in. It would also be grossly inefficient to hash the given key multiple times when a single time would suffice.
Define the SymTable_getLength
function so it runs in constant time.
Use separate chaining, and not linear probing. That is, each bucket should be implemented as a distinct linked list, as illustrated in lectures and precepts.
Expand. That is, for efficiency your implementation must dynamically increase the number of buckets (and, necessarily, reposition all bindings) whenever a call of SymTable_put
causes the number of bindings to become too large. If an expansion attempt fails because of insufficient memory, then your implementation simply must proceed with the execution of SymTable_put
.
Concerning hash table expansion:
Use this sequence of integers as bucket counts: 509, 1021, 2039, 4093, 8191, 16381, 32749, and 65521. (Those integers are primes that are close to powers of two.) When SymTable_put
detects that the new binding count exceeds 509, it must increase the SymTable
object's bucket count to 1021. When the function detects that the new binding count exceeds 1021, it must increase the bucket count to 2039. Etc. When SymTable_put
detects that the new binding count exceeds 65521, it must not increase the bucket count. Thus 65521 is the maximum number of buckets that a SymTable
object must contain.
You will receive a better grade if you submit a working non-expanding hash table implementation instead of a non-working expanding hash table implementation. If your attempts to develop the expansion code fail, then leave the expansion code in your symtablehash.c
file as comments, and describe your attempts in the "what bugs are in your submission" section of your readme
file.
Create your program in the environment of your choice, staging the files to the armlab cluster to build (with gcc217
), test, debug (with gdb
, meminfo
, and valgrind
), critique (with splint
and critTer
), and submit.
Read all of the following steps before you perform any of them. Note that you absolutely should perform Step 8 earlier than its "Step 8" placement would imply. That is, you should create your Makefile
early enough that you can use it to generate your .o
files and executable binary files during development.
Get a working copy of your own version of the COS 217 assignment repository for this assignment by completing the Setup Step 5 from the Git and GitHub Primer document from the first precept. The COS 217 assignment repository for this assigment is: https://github.com/COS217/SymTable
The repository contains files that you will find useful, which are described in subsequent steps.
symtablelist.c
Compose symtablelist.c
, as described in previous sections of this specification.
symtablelist.c
The given testsymtable.c
is a test client. As its name implies, testsymtable.c
thoroughly tests your SymTable
implementations.
Use testsymtable.c
and symtablelist.c
to build a program named testsymtablelist
. Then run testsymtablelist
. Confirm that the output is correct.
The testsymtablelist
program requires you to provide a single command-line argument, which must be an integer that specifies a binding count. The module tests your SymTable
ADT by using several SymTable
objects. One of those SymTable
objects contains the specified number of bindings. The module writes to stdout
an indication of how much CPU time it consumed while manipulating that SymTable
object.
Create additional test clients, as you deem necessary, to test symtablelist.c
.
Make sure you use meminfo
and valgrind
to check the memory management of your testsymtable.c/symtablelist.c
program.
symtablelist.c
Critique your testsymtable.c/symtablelist.c
program using the splint
tool. Each time splint
generates a warning on your code, you must either (1) edit your code to eliminate the warning, or (2) explain your disagreement with the warning in your readme
file.
When given
testsymtable.c
,splint
generates warning messages about the use of thesprintf
function, and suggests using thesnprintf
function instead.splint
complains becausesprintf
does not check the length of the array (alias buffer) into which it assigns characters, and so could cause a buffer overflow if the given array is too short. It suggests usingsnprintf
because that function allows the caller to specify the length of the array.Don't be concerned about those warning messages. You need not copy them to your
readme
file or explain them. Contrary tosplint
's suggestion,testsymtable.c
usessprintf
instead ofsnprintf
because:
- In
testsymtable.c
the number of characters being assigned is known, and so the array certainly is not too short.snprintf
is not part of the C90 standard, and so is not declared instdio.h
. Sogcc217
complains about the use ofsnprintf
. Asplint
warning is less important than agcc217
warning.
Similarly, critique symtablelist.c
using critTer
. Each time critTer
generates a warning on your code, you must either (1) edit your code to eliminate the warning, or (2) explain your disagreement with the warning in your readme
file.
When giventestsymtable.c
,critTer
generates warning messages about the use of "magic numbers," the number of statments in some functions, the number of function definitions in the file, and the number of lines in the file. Don't be concerned about those warning messages. You need not copy those warning messages to yourreadme
file or explain them. We judged that using magic numbers and defining long functions intestsymtable.c
was clearer than the alternative. And splittingtestsymtable.c
into multiple files would have complicated the build process unnecessarily.
symtablehash.c
Compose symtablehash.c
, as described in previous sections of this specification.
symtablehash.c
Use testsymtable.c
and symtablehash.c
to build a program named testsymtablehash
. Then run testsymtablehash
. Confirm that the output is correct.
Create additional test clients, as you deem necessary, to test symtablehash.c
.
Make sure you use meminfo
and valgrind
to check the memory management of your testsymtable.c/symtablehash.c
program.
symtablehash.c
Critique your testsymtable.c/symtablehash.c
program using splint
. Each time splint
generates a warning on your code, you must either (1) edit your code to eliminate the warning, or (2) explain your disagreement with the warning in your readme
file.
Similarly, critique symtablehash.c
using critTer
. Each time critTer
generates a warning on your code, you must either (1) edit your code to eliminate the warning, or (2) explain your disagreement with the warning in your readme
file.
Makefile
Compose a Makefile
. The first dependency rule of the Makefile
must command make
to build two executable files:
testsymtablelist
(using, indirectly, testsymtable.c
and symtablelist.c
),testsymtablehash
(using, indirectly, testsymtable.c
and symtablehash.c
),That is, the first dependency rule of your Makefile
must be:
all: testsymtablelist testsymtablehash
Your Makefile
must:
Maintain object (.o) files to allow for partial builds of the two executable binary files.
Encode the dependencies among the files that comprise your programs.
Use only the fundamental features of make
that are covered in the Building lecture, not the pattern or implicit dependency rules.
As noted above, we strongly recommend that you create your Makefile
early in your development process — earlier than its "Step 8" placement would imply. Doing so will allow you to use and test your Makefile
during development.
readme
FileEdit your copy of the given readme
file by answering each question that is expressed therein.
One of the sections of the readme
file requires you to list the authorized sources of information that you used to complete the assignment. Another section requires you to list the unauthorized sources of information that you used to complete the assignment. Your grader will not grade your submission unless you have completed those sections. To complete the "authorized sources" section of your readme
file, copy the list of authorized sources given in the "Policies" web page to that section, and edit it as appropriate.
Place in your readme
file the CPU times reported by testsymtable.c
with binding counts 50, 500, 5000, 50000, and 500000 using (1) your linked list implementation, (2) your non-expanding hash table implementation†, and (3) your expanding hash table implementation. If the CPU time consumed for any of the tests is more than 5 minutes, then the testsymtable.c
client module will abort execution. In that case you must write "More than 5 minutes."
† Surely you've made your hash table expansion modular, such that timing the non-expanding hash table after completing the challenge requires only commenting out a single short if
statement temporarily (and then "comment it back in" before submitting). Hint: if this isn't the case, you should fix that!
Provide the instructors with your feedback on the assignment. To do that, issue this command:
FeedbackCOS217.py 3
and answer the questions that it asks. That command stores its questions and your answers in a file named feedback
in your working directory.
Submit your work electronically on armlab using these commands:
submit 3 symtable.h symtablelist.c symtablehash.c submit 3 Makefile readme feedback
In part, good program style is defined by the splint
and critTer
tools, and by the rules given in The Practice of Programming (Kernighan and Pike) as summarized by the Rules of Programming Style document.
The more course-specific style rules listed in the previous assignment specifications also apply, as do these:
Modularity: A module's interface must not reveal the module's data. Data must be encapsulated with functions. When defining an ADT, use opaque pointer type definitions to hide structure type definitions from the ADT's clients.
ADT Comments: The interface (.h
) of an ADT must contain a comment that describes what an object of that type is. The comment must appear immediately before the definition of the opaque pointer type.
Structure Type Definition Comments: Compose a comment for each structure type definition. A structure type definition's comment must immediately precede the structure type definition.
Field Definition Comments: Compose a comment for each field definition that occurs within a structure type definition. A field definition's comment must immediately precede the field definition.
Minimal requirement to receive credit for the SymTable
linked list implementation:
The symtablelist.c
implementation must build with the testsymtable.c
client.
Minimal requirement to receive credit for the SymTable
hash table implementation:
The symtablehash.c
implementation must build with the testsymtable.c
client.
We will grade your work on two kinds of quality:
Quality from the user's point of view. From the user's point of view, your code has quality if it behaves as it must. The correct behavior of the SymTable
ADT is defined by the previous sections of this assignment specification.
Quality from the programmer's point of view. From the programmer's point of view, your code has quality if it is well styled and thereby easy to maintain. Good program style is defined by the previous section of this assignment specification.
To encourage good coding practices, we will deduct points if gcc217
generates warning messages.
This assignment was created by Robert M. Dondero, Jr. and Andrew Appel
with input from other faculty members