The purpose of this assignment is to increase your experience with the following important software engineering fundamentals:
bash
, gdb
,
and make
.
Part 1 consists of no code writing, but will have lots of code reading. Part 2 consists of some additional reading, (hopefully) some deep thinking about invariants, and some code writing. Part 3 consists of (hopefully) some practical thinking about design decisions and (potentially, but not necessarily) lots of code writing.
As fair warning, we expect it may be more challenging to get started on this assignment than your assignments thus far because of the learning curve to understand the inner workings of an initially unfamiliar codebase.
In the past two terms, students have reported taking on average 25 hours to complete the assignment. Note that this measure may not be as representative as it is for other assignments, because this assignment is our "youngest" assignment (first offered in fall 2020) and further it has been substantially rewritten for this semester.
You may work with one partner on this assignment. You are not required to work with a partner, but we strongly prefer and suggest that you do: in a recent semester, the ratio of generally working versus catastrophically broken A4 submissions for partnerships was approximately 11:1, whereas the ratio for solo efforts was only 1:1! Your preceptor will help you with your partner search if you so desire.
There is no "challenge" portion for this assignment.
As you learned in COS 126, a tree is a linked data structure that represents a hierarchical organization of data. There is exactly one node with no incoming links, which we call the root of the tree. There is a unique path from the root to every other node in the tree. Every other node in the tree has exactly one incoming link (from its parent). A node with no outgoing links (and thus no children) is called a leaf of the tree.
As you learned earlier in this course, the Linux file system is
structured as a tree. There is one root directory, technically
with no name but colloquially called /
, that serves
as the ancestor of all other directories in the directory
hierarchy. Both directories and files may be leaves of the tree,
and nodes that are not leaves may have either or both of
directories and files as children. Each node in the hierarchy has
a canonical name (absolute path or full path)
derived from an enumeration of the nodes visited traveling from
the root directory to that node, delimited by slashes. As an
example, /usr/bin/ls
describes all four nodes visited
in reaching the ls binary executable: the unnamed directory that
is the tree's root, the usr
directory that is a child
of the root, the bin
directory that is a child
of usr
, and the ls
file leaf that is a
child of bin
.
In this assignment you will be working with three different tree data structures, each of which represents an increasingly complicated file system hierarchy, eventually achieving the full description from the first paragraph above:
a/b/c
would be a valid full path and /a//b/c
would not be,
unlike in Linux).
There are three parts to this assignment, corresponding to the three tree data structures listed above.
In Part 1, you are given object files for
several faulty Binary Directory Tree
implementations, as well as for a correct implementation. You are
also given the source for a minimalist BDT client that exhibits
behavioral or memory management errors when run after building
with any of the faulty implementations. You will locate the bugs
in the faulty implementations by debugging the client program
using gdb
.
In Part 2, you are given object files for
several faulty Directory Tree implementations, as
well as the source for a correct implementation and a minimalist
DT client. This time the client the object files were not
configured to facilitate stepping through the code in those files
while debugging in gdb
. So instead of the approach
from Part 1, here you will write an internal validation module
that will verify the validity of the data structure's
representation such that it correctly identifies the invalid state
of the data structure and terminates the program when checked at
the leading and trailing edges of each API function that mutates
the data structure
In Part 3, you are given an expanded API that is appropriate for hierarchies that contain both directories and files. You will implement the File Tree interface, using the Directory Tree implementation from Part 2 as a jumping off point. Along the way, you may need to reassess the program design and modularity decisions made for the DT implementation if they are no longer the best choices for the FT requirements.
a4def.h
: a utility interface including various
convenient typedef
s and constants shared across
all three partsbdt.h
: the interface file for the BDT data structurebdt_client.c
: a very simple BDT driver programbdtGood.o
: the object file for a correct BDT
implementation (built with -g
)bdtBad[1-5].o
: object files for buggy BDT
implementations (built with -g
)dynarray.{c,h}
: a module implementing an expandable
array-like structurepath.{c,h}
: an ADT representing a path in a treeMakefile
: automates building each of the BDT versionsreadme
: a template readme file that has sections to
complete in Part 1dt.h
: the interface file for the DT data structuredt_client.c
: a very simple DT driver programdtGood.c
: the source for a reasonable DT implementationnodeDT.h
: the interface file for the newly-modular
Node_T
data structure used by the given DT implementationnodeDTGood.c
: the source for a reasonable Node_T
implementation used by the given DT implementationdtBad{1a,1b,2,3,4}.o and nodeDTBad{1a,1b,2,3,4}.o
: five
pairs object files for buggy Node_T
and DT
implementations (not built with -g
now!)checkerDT.h
: the interface file for the
representation invariant validation modulecheckerDT.c
: the scaffolding for the validation
module, with a few sample checks Makefile
: automates building each of the DT versionsa4def.h
: the same as in Part 1dynarray.{c,h}
: the same as in Part 1path.{c,h}
: the same as in Part 1readme
: the same readme from Part 1 also has a
section to complete in Part 2ft.h
: the interface file for the FT data structureft_client.c
: a very simple FT driver programsampleft.o
: the object file for a reasonable (other
than being done all in one file) FT implementationMakefile.sampleft
a Makefile
that
will build a reference sampleft
out of the
given sampleft.o
and the (modified or
unmodified) ft_client.c
a4def.h
: the same as in Parts 1 and 2dynarray.{c,h}
: the same as in Parts 1 and 2path.{c,h}
: the same as in Parts 1 and 2As you have been doing thus far in the course, you may write code in whatever development environment and on whatever computer you see fit.
In this assignment, we provide pre-built .o
files that
contain various correct and buggy solutions. These were built
using gcc217
on armlab, so you will have to do the
testing and debugging there.
We continue to encourage you to embrace the discipline of committing frequently in GitHub. When you are finished, pull your final version to armlab to issue the appropriate submit command(s).
As with the other assignments, you will create your own repository, separate from ours, that has the same contents as ours. As a reminder, you can follow these procedures to import your own version of our contents into your GitHub account, or follow the Setup Step 5 from the Git and GitHub Primer document from the first precept.
Note, though, that only one partner should do this. After having done so, that partner should continue with Setup Step 6. Only after that will the other partner obtain a working copy, by executing a (regular, non-bare) clone of the first partner's repository.
The COS 217 assignment repository for this assignment is:
https://github.com/COS217/DirectoryFileTrees
Study the given source files, a4def.h
, bdt.h
and bdt_client.c
, to learn about what operations a BDT
supports and how they might be called by a client of the BDT
module. The client program is not a true external client of the
module, but instead a test client that validates (some of) the
expected behavior of the module.
The BDT internal implementation is not revealed to the interface
or the client — this is good data structure design! However, to get
you started, we will tell you that the BDT is implemented as an
Abstract Object with the following static
state variables:
/* 1. a flag for being in an initialized state (TRUE) or not (FALSE) */ static boolean bIsInitialized; /* 2. a pointer to the root node in the hierarchy */ static struct node *psRoot; /* 3. a counter of the number of nodes in the hierarchy */ static size_t ulCount;where
struct node *
is a pointer to one of these structures:
/* A node in a BDT */ struct node { /* the object corresponding to the node's absolute path */ Path_T oPPath; /* the node's parent, or NULL if the node is the root */ struct node *psParent; /* the node's child directories, each NULL if there is no such child. invariant: if child1 == NULL, then child2 == NULL as well. */ struct node *psChild1, *psChild2; };
Using the Makefile
you can issue all these commands to build the programs from the client's C source, the two provided ADT's C source, and the good and bad implementations' object code:
$ make gcc217 -g -c dynarray.c gcc217 -g -c path.c gcc217 -g -c bdt_client.c gcc217 -g dynarray.o path.o bdtGood.o bdt_client.o -o bdtGood gcc217 -g dynarray.o path.o bdtBad1.o bdt_client.o -o bdtBad1 gcc217 -g dynarray.o path.o bdtBad2.o bdt_client.o -o bdtBad2 gcc217 -g dynarray.o path.o bdtBad3.o bdt_client.o -o bdtBad3 gcc217m -g -c dynarray.c -o dynarrayM.o gcc217m -g -c path.c -o pathM.o gcc217m -g -c bdt_client.c -o bdt_clientM.o gcc217m -g dynarrayM.o pathM.o bdtBad4.o bdt_clientM.o -o bdtBad4 gcc217m -g dynarrayM.o pathM.o bdtBad5.o bdt_clientM.o -o bdtBad5Note that the last two programs are built with
gcc217m
to
facilitate use of the meminforeport
tool in helping
you debug.
Each of the buggy implementations has a simple-but-serious bug
that will manifest itself in some observable way, e.g. a failed
assert, a runtime crash, output that diverges from the good
implementation, or a memory issue identifiable by meminforeport
.
Run the programs to observe their behavior:
$ ./bdtGood ... $ ./bdtBad1 ...
One at a time, conduct one or more gdb
sessions to
step through the operations that eventually result in an error and
explore to home in on the point at which it all goes wrong.
Refer to the gdb
guide for tips on how to debug
efficiently.
Write the location of your identified bug locations (simply the function is fine, no need to be more granular) in the designated section of the readme. Note that this should be the location where the bug is introduced (and where it should be fixed), not necessarily where it manifests in an error or becomes observable.
Once you have identified the bug in each of the buggy implementations, you are done with Part 1.
Study the new given source files for part 2,
dt.h
, nodeDT.h
, and dt_client.c
.
The interface is extremely similar to the BDT interface from
part 1 of the assignment, but the client exhibits some of the
different properties of the new data structure (e.g., that a
parent can now have more than two children).
Also study the given implementation dtGood.c
and
nodeDTGood.c
. This implementation is sometimes
cryptic in its heavy use of static
helper functions,
making it imperative that you really study it to tease out the design,
algorithms, and invariants of Node_T and DT. But you will need to do
so in order to construct your internal testing, because the buggy
versions share the same internal representation and overall structure.
You will also need to have a strong grasp of this implementation to
complete the required critique of its design.
Using the Makefile
you can issue all these commands to build the programs from the client's C source, the two provided ADT's C source, the good and bad implementations' object code, and your validation module's C source:
$ make gcc217 -g -c dynarray.c gcc217 -g -c path.c gcc217 -g -c checkerDT.c gcc217 -g -c nodeDTGood.c gcc217 -g -c dtGood.c gcc217 -g -c dt_client.c gcc217 -g dynarray.o path.o checkerDT.o nodeDTGood.o dtGood.o dt_client.o -o dtGood gcc217 -g dynarray.o path.o checkerDT.o nodeDTBad1a.o dtBad1a.o dt_client.o -o dtBad1a gcc217 -g dynarray.o path.o checkerDT.o nodeDTBad1b.o dtBad1b.o dt_client.o -o dtBad1b gcc217 -g dynarray.o path.o checkerDT.o nodeDTBad2.o dtBad2.o dt_client.o -o dtBad2 gcc217 -g dynarray.o path.o checkerDT.o nodeDTBad3.o dtBad3.o dt_client.o -o dtBad3 gcc217 -g dynarray.o path.o checkerDT.o nodeDTBad4.o dtBad4.o dt_client.o -o dtBad4Notice that the
Makefile
automatically recognizes
which files do and do not need to be rebuilt after any
changes. This was also the case with the BDTs, however in part 2
you will have to rebuild often as you develop your validation
module, so the partial-build capabilities become more important.
As in Step 1.3, each of the buggy implementations has a bug that
is observable: dtBad2
fails a test in the client,
dtBad3
produces different output versus the reference
solution, and dtBad4
fails an assert
inside
a known-good ADT module. But unlike with the BDTs, trying to
debug these with gdb
runs into a problem: the DTs
were not compiled with -g
, so there is no source
code associated with the memory locations in the executable.
This is a good indication that a developer will be well served
taking a different approach to debugging. (But do note that
gdb
can still debug programs built with these object
files, it just cannot help you line-by-line through that code.)
Lacking the line-by-line analysis capabilities of gdb
,
your will instead approach the problem at a lower level by beefing
up the internal testing used in the DT modules.
Each function listed in DT's interface that changes the state of
the data structure (as opposed to just observing / reporting it)
has a call to CheckerDT_isValid
at the
beginning of its function
body and just before each possible return
point. Similarly, each function listed in the given node interface
that produces a new Node_T
for the client or mutates
an existing one for the client has a call
to CheckerDT_Node_isValid
at the function's leading
and trailing edges. The implementation of these checker functions
should thoroughly exercise checks of every invariant of the data
structures' internal representations and their interfaces' stated
restrictions.
(What's a data structure invariant? These are constraints on the
set of values of individual state variables within the data
structure's implementation or well-defined relationships between
several state variables' values. For example, in your
Assignment 3 SymTable list implementation,
if oSymTable->firstNode
is NULL
,
then oSymTable->uCount
must be 0,
and vice versa. Another example is in your hash table
implementation: if the binding for a particular key is found in
bucket i
, the hash value
of that key must be i
!)
The provided checkerDT.h
gives a specification for
the two isValid
functions. To get you started, you
are also given a few supplied sample tests in
checkerDT.c
. (This is why dtBad1a
and dtBad1b
already abort with a correct description of
the bug.) In particular, as some of the tests for DT validity require
a proper method of traversing the tree, we have provided working
scaffolding for this algorithm in checkerDT.c
Our advice is that your checkerDT.c
implementation
should identify all or nearly all of the buggy DTs' issues before you
proceed to the next step. (This isn't a hard prerequisite, it's just
that writing more checker tests will further familiarize you with the
implementation upon which we assume you will be basing Part 3.)
In the code given for Part 2, there may exist functions that are
not used or not used to the best of their abilities, module bloat,
questionable naming conventions or parameter order, etc. There could
even be a couple potential errors in handling allocation error cases
(thankfully, ones not exercised by the sample client). Some things
were improved between Part 1 and Part 2 (Node_T
is
implemented with a fully fledged module, not an overgrown
struct
within another object's module, among other
things), but real code you sit down in front of does not typically
have the polish of perfect academic examples, and we've
represented that here to a limited degree. Give one last look back at
the code and offer a critique of how a refactorer could/should clean
up the interfaces and dtGood.c
and nodeGood.c
implementations to better meet the goals from the Modularity
lecture and accord to what we know about good program design and
style. You will enter this in the appropriate place in the readme
— either paragraph or bulleted list form will be fine.
Study the two new given source files, ft.h
and ft_client.c
, to learn about what additional
operations an FT supports beyond those from the DT interface, and how
they might be called by a client of the FT module. Again, the client
program is not a true external client of the module, but instead a
test client that validates (some of) the expected behavior of the
module.
In Part 3, you will implement the FT interface to further expand
the capabilities of a DT to include representing files: the leaves
in the tree that have contents associated with them. Your first
inclination for doing so might be to rush out and start "hacking"
by adding new fields to the Node_T
implementation to
fit the first idea that comes to mind, making slapdash changes to
the DT code where its invariants no longer hold for the new
expansion, etc.
While this could end up working — and indeed, your eventual product will likely end up making some of these changes — a better software engineering approach is to reconsider the overall design of your program. Modularity is the manifestation of abstraction: a strong program designer understands how to find abstractions in a large program and convey them via a cogent set of modules, each with a sensible interface.
Before you begin coding, consider the answers to these design decision questions (there may be no right answers):
(Aside: this assignment's author was recently on the receiving end of the lamentations of the software engineer in charge of recruiting for a major tech company, who indicated that candidates — including Princetonians, alack! — often fail her modularity-focused design question in interviews because they charge into coding instead of drawing out a coherent diagram of the architecture of modules they will be building.)
Your design choices will inform how easy or difficult it will be to complete subsequent steps in Part 3. So do not become wedded to your choices to the point of becoming susceptible to the sunk-cost fallacy: you may have to iterate on your design decisions as you delve into the next steps of actually implementing the FT. The perfect, however, is the enemy of the good — encouraging deep thoughts about design doesn't mean we want you to never start coding!
Realtalk: you are given a wide berth to proceed here, but there are two sensible options that most submissions will select from:
Makefile
Create a first version of a Makefile
that
corresponds to the files you have chosen in the previous step. You
may have to refine this Makefile
as you refine your
program design.
Your Makefile
must:
.o
) files to allow for partial builds of all modules.make
that
are covered in the Building lecture versions 2 and 3. For
example, your Makefile
should not contain implicit
or pattern dependency rules from that lecture. (You are still
welcome to look at the Makefile
s provided in Parts
1 and 2 of this assignment that do use more advanced features,
of course.)ft
(note the naming
requirement) using the client ft_client.c
and your modules
when make
is run without command line
arguments.Having settled on a set of modules and their interfaces in Step 3.2, you will now implement these modules. As you write the implementation(s), you may find that you are never using some functions from the API, yet you feel as though other useful ones are missing — use this insight to iterate back to Step 3.2 and refine your design choices. (If these are leftover elements from DT, consider also whether this observation applies to the given code from Part 2 as well, and if so whether you should augment your answer from Step 2.5.)
You will want to test each new module, perhaps with a simple test client for your own benefit (that is, you need not submit such a client), before relying on the module as a dependency of a larger build. If you completed an organizational diagram in Step 3.2, it can provide the appropriate order for composing and testing your new modules that minimizes the entanglement of multiple untested modules with each other.
The final module that you will implement is the only one with its
interface specified for you: the core FT abstract object. The FT
interface is very similar to the DT interface, except almost every
function is now split out into a directory version and a file
version. In addition, there are functions to manipulate a file's
contents, and a new FT_stat
function that will allow a
client to access metadata about a node in the File Tree.
Note: your implementation of the FT_toString
function
must maintain the same format from the corresponding
BDT_toString
and DT_toString
functions. In dealing with the new features, it adds
the ordering requirement specified in the comment
in the client that a directory's children should be printed depth
first in lexicographic order but with file children listed before
their directory children siblings.
After you have composed your FT implementation, you can test
using the provided reference implementation. You can build the
given sampleft.o
with the provided test
client ft_client.c
using the
provided Makefile.sampleft
, which is named in such a
way as not to conflict with your own Makefile
. Use
the -f Makefile.sampleft
option to make
in order to specify this file instead of your
own Makefile
. You may choose to write your own
simple client, to expand ft_client.c
with your own
tests, or even to write a FT validation checker module if we have
sufficiently convinced you that these are valuable. (If you do the
latter, you'll need to name it differently than the module from
Part 2 in order to submit it without a naming conflict.
CheckerFT
would be a natural choice.)
Critique the code for your new modules
using critTer
. Each time critTer
generates a
warning on your code, you must either (1) edit your code to eliminate
the warning, or (2) explain your disagreement with the warning in your
readme file.
Critique the program consisting of the new modules and the client
using splint
. Each time splint
generates a
warning on your code, you must either (1) edit your code to eliminate
the warning, or (2) explain your disagreement with the warning in your
readme file.
Exceptions: You do not have to address warnings about too many functions, too long loops, functions, or files, or too deeply nested code. You also do not have to address warnings that result solely from your repurposing of the DT code we gave you, but you might consider if those warnings are additional fodder for your comments in Step 2.5.
Edit your copy of the given readme file by answering each question that is expressed therein.
In step 1.5, you should have entered the bug locations from Part 1. If you didn't, do it now.
In step 2.5, you should have entered your critique and refactoring suggestions from Part 2. If you didn't, do it now.
Provide the instructors with your feedback on the assignment. To do that, issue this command on armlab:
FeedbackCOS217.py 4and answer the questions that it asks. (If you worked with a partner, then when answering the numeric questions, please enter the average of the responses that you and your partner would provide individually.) That command stores its questions and your answers in a file named
feedback
in your working directory. You need not store
it in your repository, though you can.
Note: this portion is particularly helpful for this assignment, given that this is still a "young" assignment relative to the others in this course that have developed and evolved over the course of decades.
Submit your work electronically on armlab. If you worked with a partner, then one of the partners must submit all of your team's files, and the other partner must submit none of your team's files.
Your readme file must contain both your name and your partner's name in the appropriate locations.
For Part 1 you must complete the appropriate section of the readme
and submit that file.
For Part 2 you must submit checkerDT.c
.
For Part 3 you must submit the .c
and .h
files for all the modules used to implement the FT interface and build your ft
executable, with the exception of the ft.h
interface file itself and the ft_client.c
client file itself. If you have not modified dynarray.c
, dynarray.h
, path.c
, path.h
, or a4def.h
, you do not have to submit these either. (Be sure you get all the required files! Seriously. Go triple-check. We will deduct several points if we have to track you down to submit missing files after-the-fact.)
For Part 3 you also must submit the Makefile
that uses all these files to build the ft
executable out of ft_client.c
.
You must submit the completed readme
and the feedback
transcript.
Finally, if you have worked with a partner, you must submit the appropriate .partner
file as described on the Policies page:
touch netid.partner submit 4 netid.partnerwhere netid is the non-submitting partner's NetID. Non-submitting partners should make sure the submitting partner did this, because otherwise we have no way to know to give you credit!
Really finally, go do an nth check that you have submitted all the required files for us to build and run your code.
In part, good program style is defined by the splint
and critTer
tools, and by the rules given in The
Practice of Programming (Kernighan and Pike) as summarized by
the Rules of Programming Style document.
The more course-specific style rules listed in the previous assignment specifications also apply, as do these:
Modularity and Encapsulation: A module's interface must not reveal the module's data. Data must be encapsulated with functions. When defining an ADT, use opaque pointer type definitions to hide structure type definitions from the ADT's clients. Module implementations must have good function-level modularity
AO State Definition Comments: Compose a comment for each definition that defines the state of the abstract object. The state member's comment must immediately precede its definition.
ADT Comments: The interface
(.h
) of an ADT must contain a comment that describes
what an object of that type is. The comment must appear immediately
before the definition of the opaque pointer type.
Structure Type Definition Comments: Compose a comment for each structure type definition. A structure type definition's comment must immediately precede the structure type definition.
Field Definition Comments: Compose a comment for each field definition that occurs within a structure type definition. A field definition's comment must immediately precede the field definition.
This assignment was created by Christopher Moretti and Vikash Modi '23, with input from Xiaoyan Li and Donna Gabai.