The purpose of this assignment is to increase your experience with the following important software engineering fundamentals:
bash
, gdb
,
and make
.
This is the first term the assignment is being offered, so we cannot provide empirical time estimates for it based on former 217 students. Our intent is that it will be less intensive in terms of code writing than Assignment 3, but we expect it may be more challenging to get started because of the learning curve to understand the inner workings of an initially unfamiliar code base. We have also designated it a partnered assignment (see below) with hopes of this lessening the uncertainty.
You may work with one partner on this assignment. You need not work with a partner, but we prefer that you do. Your preceptor will help you with your partner search if you so desire.
There is no "challenge" portion for this assignment.
As you learned in COS 126, a tree is a linked data structure that represents a hierarchical organization of data. Formally a connected acyclic graph, a tree is a collection of nodes in which exactly one node is the root, with no incoming edges, and every other node has exactly one incoming edge (from its parent). A node with no outgoing edges (and thus no children) is called a leaf of the tree. A node that is neither the root nor a leaf is called an internal node.
As you learned earlier in this course, the Linux file system is
structured as a tree. There is one root directory, technically with no
name but colloquially called /
, that serves as the
ancestor of all other directories in the directory hierarchy. The
root and any internal nodes may have either or both of directories and
files as children, and both directories and files may be leaves of the
tree. Each node in the hierarchy has a canonical name (absolute
path or full path) derived from an enumeration of the
nodes visited traveling from the root directory to that node,
delimited by slashes. As an example, /usr/bin/ls
describes all four nodes visited in reaching the ls binary executable:
the unnamed directory that is the tree's root, the usr
and bin
directories that are internal nodes, and
the ls
file that is a leaf.
In this assignment you will be working with three different tree data structures, each of which represents an increasingly complicated file system hierarchy, eventually achieving the full description above:
a/b/c
would be a valid full path
and /a//b/c
would not be, unlike in Linux), in order to
avoid complications with empty strings within paths.
There are three parts to this assignment, corresponding to the three tree data structures listed above.
In Part 1, you are given object files for a series
of faulty Binary Directory Tree implementations, as
well as for a correct implementation. You are also given the source
for a minimalist BDT client that exhibits errors when run after
building with any of the faulty implementations. You will locate the
bugs in each faulty implementation by debugging the client program
using gdb
, and will also report a transcript of
a gdb
session that appropriately shows how the debugger
can identify the location of the bug.
In Part 2, you are given object files for a series
of faulty Directory Tree implementations, as well as
the source for a correct implementation and a minimalist DT
client. This time the client does not necessarily exhibit errors when
built with the faulty implementations and the object files were not
configured to facilitate stepping through the code while debugging
in gdb
. So instead of the approach from Part 1, here you
will write an internal validation module that will verify the validity
of the data structure such that it correctly identifies the invalid
state of the data structure and terminates the program when checked at
the leading and trailing edges of each DT API function.
In Part 3, you are given an expanded API that is appropriate for hierarchies that contain both directories and files. You will implement the File Tree interface, using the Directory Tree implementation from Part 2 as a jumping off point, but departing from it in order to fit the expanded requirements and to make appropriate decisions about program design and modularity.
bdt.h
: the interface file for the BDT data
structurebdt_client.c
: a very simple BDT driver programbdtGood.o
: the object file for a correct BDT
implementation (built with -g
)bdtBad[1-5].o
: object files for buggy BDT
implementations (built with -g
)dynarray.{c,h}
: a module implementing an expandable
array-like structureMakefile
: a complete Makefile
that
automates building each of the versions in Part 1.gdbinit[1-5]
: gdb
initialization
files to facilitate gdb
session transcripts readme
: a template readme file that has sections to
complete in Part 1dt.h
: the interface file for the DT data
structurechecker.h
: the interface file for the
representation invariant validation modulea4def.h
: a utility interface including various
convenient typedef
s and constantsdt_client.c
: a very simple DT driver programdtGood.c
: the source for a reasonable DT
implementationnode.h
: the interface file for the newly-modular
Node
data structure used by the given DT
implementationnodeGood.c
: the source for a reasonable Node
implementation used by the given DT implementationchecker.c
: the scaffolding for the validation
module, with a few sample checks dtBad[1a,1b,2-5].o and nodeBad[1a,1b,2-5].o
: six
pairs object files for buggy Node
and DT
implementations (not built with -g
now!)dynarray.{c,h}
: the same as in Part 1Makefile
: a complete Makefile
for all
versions in Part 2readme
: the same readme from Part 1 also has a
section to complete in Part 2ft.h
: the interface file for the FT data structurea4def.h
: the same utility library from Part 2ft_client.c
: a very simple FT driver programsampleft
: a reference version of the client executableAs you have been doing thus far in the course, you may write code in whatever development environment and on whatever computer you see fit.
In this assignment, we provide pre-built .o
files that
contain various correct and buggy solutions. These were built
using gcc217
on armlab, so you will have to do the
testing and debugging there.
We continue to encourage you to use the model of committing frequently in GitHub, pushing to the cloud, then pulling down the updates into a repository on armlab for testing, debugging, critiquing. When you are finished, pull down your final version to armlab to issue the appropriate submit command(s).
As with the other assignments, you will create your own repository, separate from ours, that has the same contents as ours. As a reminder, you can follow these procedures to import your own version of our contents into your GitHub account.
The repository that you will import into your own is available
here:
https://github.com/COS217/DirectoryFileTrees
Study the two given source files, bdt.h
and bdt_client.c
, to learn about what operations a BDT
supports and how they might be called by a client of the BDT
module. The client program is not a true external client of the
module, but instead a test client that validates (some of) the
expected behavior of the module.
Using the Makefile
you can, in one line with no arguments, issue all these commands to build the programs from the client's C source and the good and bad implementations' object code:
gcc217 -g dynarray.o bdtGood.o bdt_client.c -o bdtGood gcc217 -g dynarray.o bdtBad1.o bdt_client.c -o bdtBad1 gcc217 -g dynarray.o bdtBad2.o bdt_client.c -o bdtBad2 gcc217 -g dynarray.o bdtBad3.o bdt_client.c -o bdtBad3 gcc217 -g dynarray.o bdtBad4.o bdt_client.c -o bdtBad4 gcc217 -g dynarray.o bdtBad5.o bdt_client.c -o bdtBad5
Each of the buggy implementations has a simple-but-serious bug that
will manifest itself in either a failed assert in the client, a crash
or other serious problem, or output that diverges from that of
bdtGood
. Run each of the programs to observe the
behavior:
./bdtGood ./bdtBad1 ...
gdb
SessionsBecause you will submit not only a diagnosis of the bug's location,
but also a transcript of a reasonable gdb
session
demonstrating the discovery, you will need to record your debugging
session.
To almost fully automate the process of recording
your gdb
sessions, we will rely on gdb
's
ability to load a configuration file that contains many commands (much
like the .bashrc
file that bash
loads each
time we start a new shell). We will use these configuration files to
batch issue a series of commands that turn on logging so that
every gdb
command you type and the output from that
command will be saved in a sensibly named log file.
In the repository, there are configuration files to use when
debugging each of the buggy Part 1 implementations (each separately,
so that the transcripts from debugging on different implementations
are not intermingled). In order to use them, you must issue this as
the first command to gdb
, where N
is the
buggy implementation number from 1-5:
(gdb) source .gdbinitNThis setup will create and append to
./badN.log
(where N
is again 1-5) as the log files. We will expect
you to submit a curated-but-representative version of the contents of
each of these files for Part 1. If you are curious
about gdb
logging options refer to
the documentation
here.
One at a time, conduct one or more gdb
sessions to
step through the operations that eventually result in an error and
explore to home in on the point at which it all goes wrong.
Refer to the gdb
guide for tips on how to debug
efficiently.
Write the identified bug locations (simply the function is fine, no
need to be more granular) in the designated section of the readme and
ensure you have a gdb
session transcript that is
representative of the exploratory-but-structured nature of tracking
down the bug.
Once you have identified and logged all the given buggy implementations, you are done with Part 1.
Study the given source
files, dt.h
, dt_client.c
, and the given
implementation dtGood.c
and nodeGood.c
. The
interface is extremely similar to the BDT interface from part 1 of the
assignment, but the client exhibits some of the different behavior of
the new data structure (e.g., that a parent can now have more than two
children). The given implementation is sometimes cryptic and
under-documented, making it imperative that you really study it to
tease out the design, algorithms, and invariants of Node and DT, in
order to construct your internal testing and to complete the required
critique of the given implementation's design.
Using the Makefile
you can, in one line, issue (better, partial-build-capable versions of) all these commands to build the programs from the available C source and obscured object code:
gcc217 dynarray.c nodeBad1a.o checker.c dtBad1a.o dt_client.c -o dtBad1a gcc217 dynarray.c nodeBad1b.o checker.c dtBad1b.o dt_client.c -o dtBad1b gcc217 dynarray.c nodeBad2.o checker.c dtBad2.o dt_client.c -o dtBad2 gcc217 dynarray.c nodeBad3.o checker.c dtBad3.o dt_client.c -o dtBad3 gcc217 dynarray.c nodeBad3.o checker.c dtBad3.o dt_client.c -o dtBad4 gcc217 dynarray.c nodeBad3.o checker.c dtBad3.o dt_client.c -o dtBad5 gcc217 dynarray.c nodeGood.c checker.c dtGood.c dt_client.c -o dtGood
In Step 1.3, each of the buggy implementations had a bug that was
client-observable. That isn't the case in this step:
while dtBad2
crashes, dtBad3
produces
different output, and dtBad5
fails
an assert
, dtBad4
yields the same behavior
from the client as the correct version. Furthermore, trying to debug
these with gdb
runs into a problem: they were not
compiled with -g
, so there is no source code associated
with the memory locations in the executable. This is a good
indication that a developer will be well served taking a different
approach to debugging.
While one option in this situation would be to expand the client to span a broader and more systematic testing suite than its current scattershot series of calls, instead you will approach the problem at a lower level by beefing up the internal testing used in the DT modules.
Each function listed in DT's interface has a call
to Checker_DT_isValid
at the beginning of its function
body and just before each possible return
point. (It
would be reasonable to have each function in the given
Node
implementation do the same with calls
to Checker_Node_isValid
, but as you can see
in nodeGood.c
, this isn't actually the case.) These
checker function should thoroughly exercise checks of every invariant
of the data structures' internal representations and their interfaces'
stated restrictions. See the provided checker.h
for the
specification.
To get you started, you are given a few supplied sample tests in
checker.c
. (This is why dtBad1a
and dtBad1b
already abort with a correct description of
the bug.) In particular, as some of the tests for DT validity require
a proper method of traversing the tree, we have provided working
scaffolding for this algorithm in checker.c
Our advice is that your checker.c
implementation
should identify all or nearly all of the buggy DTs' issues before you
proceed to the next step. (This isn't a hard prerequisite, it's just
that writing more checker tests will further familiarize you with the
implementation upon which we assume you will be basing Part 3.)
In the code given for Part 2, there exist functions that are not
used or not used to the best of their abilities, module bloat,
etc. Some things were improved between Part 1 and Part 2
(Node
is a fully fledged module, not an
overgrown struct
within another object's module, among
other things), but real code you sit down in front of does not
typically have the polish of perfect academic examples, and we've
represented that here. Give one last look back at the code and offer a
critique of how a refactorer could/should clean up the interfaces,
implementation, etc. to better meet the goals from the Modularity
lecture. You will enter this in the appropriate place in the readme.
Study the two given source files, ft.h
and ft_client.c
, to learn about what additional
operations a FT supports beyond those from the DT interface, and how
they might be called by a client of the DT module. Again, the client
program is not a true external client of the module, but instead a
test client that validates (some of) the expected behavior of the
module.
In Part 3, you will implement the FT interface to further expand
the capabilities of a DT to include representing files: the leaves in
the tree that have contents associated with them. Your first
inclination for doing so might be to rush out and start "hacking" by
adding new fields to the Node
structure, changing DT code
where the invariants no longer work for the new expansion, etc.
While this could end up working, a better software engineering approach is to reconsider the overall design of your program. Modularity is the manifestation of abstraction: a strong program designer understands how to find abstractions in a large program and convey them via a cogent set of modules, each with a sensible interface.
Before you begin coding, consider the answers to these design decision questions:
Your design choices will inform how easy or difficult it will be to complete subsequent steps in Part 3. So do not become wedded to your choices to the point of becoming susceptible to the sunk-cost fallacy: you may have to iterate on your design decisions as you delve into the next steps of actually implementing the FT.
Makefile
Create a first version of a Makefile
that corresponds
to the files you have chosen in the previous step. You will refine
this Makefile
as you refine your program design.
Your Makefile
must:
.o
) files to allow for partial builds of all modules.make
that are covered in the Building lecture. For example, your Makefile
should not contain implicit or pattern dependency rules from that lecture. (Note that this means you will largely be writing your Makefile
from scratch, as the Makefile
s provided in Parts 1 and 2 intentionally use these advanced features.)ft
using the client ft_client.c
and your modules when run without command line arguments.Having settled on a set of modules and their interfaces in Step 3.2, you will now implement these modules. As you write the implementation(s), you may find that you are never using some functions from the API, yet you feel as though other useful ones are missing — use this insight to iterate back to Step 3.2 and refine your design choices. (If these are leftover elements from DT, consider also whether this observation applies to the given code from Part 2 as well, and if so whether you should augment your answer from Step 2.5.)
You will want to test each new module, perhaps with a simple test client for your own benefit, before relying on it as a dependency of a larger build. (But you do not have to submit these additional test clients.) If you completed a flowchart or mind map in Step 3.2, it can provide the appropriate order for composing and testing your new modules that minimizes the entanglement of multiple untested modules with each other.
The final module that you will implement is the only one with its
interface specified for you: the core FT abstract object. The FT
interface is very similar to the DT interface, except almost every
method is now split out into a directory version and a file
version. In addition, there are methods to manipulate a file's
contents, and a new FT_stat
method that will allow a
client to access metadata about a node in the File Tree.
Note: your implementation of the FT_toString
method
must maintain the same format from the corresponding
BDT_toString
and DT_toString
methods. In
dealing with the new features, it adds the requirement specified in
the comment in the client that a directory's children should be
printed in lexicographic order, depth first, with file children before
directory children.
After you have composed your FT implementation, test using the provided client program, your own expanded test client, or even a validation checker module if we have sufficiently convinced you that these are valuable.
Critique the code for your new modules
using critTer
. Each time critTer
generates a
warning on your code, you must either (1) edit your code to eliminate
the warning, or (2) explain your disagreement with the warning in your
readme file.
Critique the program consisting of the new modules and the client
using splint
. Each time splint
generates a
warning on your code, you must either (1) edit your code to eliminate
the warning, or (2) explain your disagreement with the warning in your
readme file.
You do not have to explicitly explain away warnings that result from your repurposing of the DT code we gave you, but you might consider if those warnings are additional fodder for your comments in Step 2.5.
Edit your copy of the given readme file by answering each question that is expressed therein.
In step 1.5, you should have entered the bug locations from Part 1. If you didn't, do it now.
In step 2.5, you should have entered your critique and refactoring suggestions from Part 2. If you didn't, do it now.
Provide the instructors with your feedback on the assignment. To do that, issue this command on armlab:
FeedbackCOS217.py 4and answer the questions that it asks. (If you worked with a partner, then when answering the numeric questions, please enter the average of the responses that you and your partner would provide individually.) That command stores its questions and your answers in a file named
feedback
in your working directory. You need not store
it in your repository, though you can.
Note: this portion is particularly helpful for this assignment, given that this is the first term that this assignment is being offered.
Submit your work electronically on armlab. If you worked with a partner, then one of the partners must submit all of your team's files, and the other partner must submit none of your team's files.
Your readme file and your source code must contain both your name and your partner's name.
For Part 1 you must submit your gdb
transcripts.
For Part 2 you must submit checker.c
For Part 3 you must submit the .c
and .h
files for all the modules used to implement the FT interface and the Makefile
that uses them to build the ft
executable out of ft_client.c
. (There is no need to submit ft.h
, nor any provided or expanded client code.)
You must submit the readme file and the feedback transcript.
Finally, if you have worked with a partner, you must submit the appropriate .partner
file as described on the Policies page:
touch netid.partner submit 4 netid.partner
In part, good program style is defined by the splint
and critTer
tools, and by the rules given in The
Practice of Programming (Kernighan and Pike) as summarized by
the Rules of Programming Style document.
The more course-specific style rules listed in the previous assignment specifications also apply, as do these:
Modularity and Encapsulation: A module's interface must not reveal the module's data. Data must be encapsulated with functions. When defining an ADT, use opaque pointer type definitions to hide structure type definitions from the ADT's clients. Module implementations must have good function-level modularity
AO State Definition Comments: Compose a comment for each definition that defines the state of the abstract object. The state member's comment must immediately precede its definition.
ADT Comments: The interface
(.h
) of an ADT must contain a comment that describes
what an object of that type is. The comment must appear immediately
before the definition of the opaque pointer type.
Structure Type Definition Comments: Compose a comment for each structure type definition. A structure type definition's comment must immediately precede the structure type definition.
Field Definition Comments: Compose a comment for each field definition that occurs within a structure type definition. A field definition's comment must immediately precede the field definition.
This assignment was created by Christopher Moretti and Vikash Modi '23, with input from Xiaoyan Li and Donna Gabai.