Princeton University
COS 217: Introduction to Programming Systems
Assignment 2: An Efficient Symbol Table ADT
Purpose
The purpose of this assignment is to help you learn more about advanced C
programming, creating ADTs, and the GNU/UNIX programming tools.
Background
Recall the symbol table concept, as described in Assignment 1.
A hash table is a particularly efficient way to implement a symbol
table. Section 2.9 of The Practice of Programming (Kernighan and Pike),
chapter 8 of C Interfaces and Implementations (Hanson), and chapter 14 of
Algorithms in C, Parts 1-4 (Sedgewick) describe hash tables.
Your Task
As with Assignment 1, your task in this assignment is to create a generic ADT named SymTable,
each of whose instances is a symbol table. Your SymTable ADT should be the same as the one from Assignment 1, except:
- You should implement your SymTable ADT as a hash table instead of
as a linked list.
As such, its functions will be much more time efficient.
- You should implement the SymTable_free and SymTable_map functions.
- Your SymTable should produce no "memory leaks." That is, your
SymTable ADT should, when appropriate, free the memory that it previously
dynamically allocated.
Note that subsequent assignments may ask you to create
programs that are clients of your SymTable ADT.
The SymTable Interface
The SymTable interface should be the same as it was in Assignment 1. In
particular, recall that:
- The SymTable_free function should free all memory occupied by
oSymTable.
- The SymTable_map function should apply function *pfApply to each binding in
oSymTable. It should be a checked runtime error for oSymTable or pfApply to be NULL.
The SymTable Implementation
The SymTable implementation should be the same as for Assignment 1, except
for these substantial changes:
- The SymTable data structure should be a hash table.
The hash table should use a reasonable hash function. See the lecture
notes for an example. See p.
578 of the
Sedgewick textbook for an alternative.
Initially the hash table should contain 509 buckets.
- The SymTable should produce no memory leaks.
The SymTable_free function should free all memory occupied by the given
SymTable. That includes the memory occupied by the SymTable structure, its
bindings array, its bindings, and its keys. The
SymTable_remove function should free the memory occupied by the removed
binding and its key.
- You should implement the SymTable_map function.
Implementing those changes is worth 90% of the assignment. To receive
full credit for the assignment, you also should implement this change:
- To minimize collisions, you should design the SymTable_put
function so it dynamically increases the number of buckets (and,
necessarily, rehashes all bindings) whenever the number
of bindings becomes too large. More precisely, p. 126 of the Hanson textbook provides a
sequence of integers that are appropriate bucket counts: 509, 1021,
2053, 4093, 8191, 16381, 32771, 65521, and INT_MAX. (Note that INT_MAX is
defined in file limits.h.) When the SymTable_put function
detects that the binding count exceeds 509, it should increase the bucket
count to 1021. When the function detects that the binding count exceeds
1021, it should increase the bucket count to 2053. Etc.
Logistics
We encourage you to develop on arizona, using Emacs to create
source code and gdb to debug.
A minimal test program is available in the file
/u/cs217/Assignment2/testsymtable.c. You may use it as a starting point for
creating your own test programs.
Create a "readme" text file that contains:
- Your name.
- A description of whatever help (if any) you received from others while
doing the assignment, and the names of any individuals with whom you
collaborated (as prescribed by the course "Policies" web page).
- The CPU times consumed by testsymtable.c when the given binding count is
100, 1000, 10000, and 100000. If the time consumed is more than 5
minutes, you may abort execution and report "Greater than 5
minutes."
- The CPU times consumed by testsymtable.c when the given binding count is
100, 1000, 10000, and 100000 and your SymTable ADT does not dynamically
increase the number of buckets. You can determine those times by
temporarily commenting-out your bucket-array-expansion code. If the
time consumed is more than 5 minutes, you may abort execution and report
"Greater than 5 minutes."
- (Optionally) An indication of how much time you spent doing the assignment.
- (Optionally) Your assessment of the assignment.
- (Optionally) Any information that
will help us to grade your work in the most favorable light. In particular
you should describe all known bugs.
Submit your work electronically on arizona via the command:
/u/cs217/bin/submit 2 symtable.h symtable.c readme
Grading
As always, we will grade your work on functionality and design, and will consider
understandability to be an important aspect of good design. To encourage good coding practices, we will compile using
"gcc -Wall -ansi -pedantic" and take off points based on warning messages during compilation.