|
COS 429 - Computer Vision
|
Fall 2019
|
Assignment 4: Deep Learning
Due Thursday, Dec. 5
Part II. Adding a hidden layer
The "network" that you trained in part 1 was rather simple, and still only
supported a linear decision boundary. The next step up in complexity
is to add additional nodes to the network, so that the decision boundary
can be made nonlinear. That will allow classifying datasets such as this
one:
We will implement a simple network with only "fully connected" layers, "relu", and the logistic function –
Here is a simple example of the architecture that we will use. It contains one "hidden" fully connected layer with two neurons (u and v) and one output layer with one neuron:
For convenience, we've given the names $u$ and $v$ to the neurons of the
two hidden nodes, and $\hat{z}$ to the final output after it has been squished by the logistic function. As before, $z$ is
the ground-truth label. The different $w$ are the weights to be learned
during training, then used during testing.
To train this network using SGD, we need to evaluate
$$
\nabla L = \begin{pmatrix}
\frac{\partial L}{\partial w_1} \\
\frac{\partial L}{\partial w_2} \\
\frac{\partial L}{\partial w_3} \\
\vdots
\end{pmatrix}
$$
As discussed in class, evaluating these partial derivatives is done using
"backpropagation", which is just a fancy name for repeatedly applying the
chain rule of differentiation, collapsing into a row vector for efficiency, and repeating.
For example, suppose we wish to evaluate
$\frac{\partial L}{\partial w_5}$. Tracing back through the network to
find the dependency, we know that $L$ depends on $\hat{z}$, which depends
on $v$, which in turn depends on $w_5$. So, using the chain rule, we can
write
$$
\frac{\partial L}{\partial w_5} =
\frac{\partial L}{\partial \hat{z}}
\frac{\partial \hat{z}}{\partial v}
\frac{\partial v}{\partial w_5}
$$
and then proceed to write out all of those partial derivatives:
$$
\frac{\partial L}{\partial w_5} =
\bigl( 2 \; (\hat{z} - z) \bigr)
\bigl( \hat{z} \; (1 - \hat{z}) \; w_9 \bigr)
\bigl( Z(v) \; x \bigr)
$$
Here $Z$ is a function that returns 1 if its argument is $>$ 0 and 0
otherwise.
(Note that it's critical for this to
be defined as $>$ 0 and not $\geq$ 0.)
$Z$ is the derivative of RELU, just as $\hat{z}\;(1-\hat{z})$
is the derivative of the sigmoid.
In practical implementations, for efficiency and modularity, each layer of the network
supports a backwards layer which takes as inputs the derivatives of the loss with respect
to the outputs of the layer, and returns the derivatives with respect to the inputs of the
layer (inputs includes the input vector 'x' to the layer, as well as the parameters at that layer).
In order to get comfortable with backprop, we suggest that you write out the
partial derivatives of $L$ at each layer before moving forward.
The starter code for this part defines a function tinynet_sgd that
supports the above architecture, but with a few important generalizations:
- The input vector can be of arbitrary dimensionality (not just two scalars 'x' and 'y').
It is a row vector in the code to be consistent with standard matrix calculus conventions.
- Rather than passing ones around, a separate bias vector 'b' is stored at each fully connected layer. Each
fully connected layer is then defined as x*W + b, where x is a (row vector) input.
This simplifies the derivatives you will have to compute.
-
There can be an arbitrary number of chained fully-connected hidden layers before the output layer (chaining them
together is handled for you), and they can each have an arbitrary number of
neurons. The network architecture is specified as a vector 'layers', which for
the diagram above would be '[2]' (e.g. one hidden layer with two neurons, 'u' and 'v'. The output layer is implicit). For a network
with two hidden layers, the first with three neurons and the second with two neurons,
'layers' would be '[3,2]'.
Do the following:
- Download the starter code for this part.
- Read through the codebase, starting with test_tinynet.py and tinynet_sgd.py.
They contain the main entrypoints to this part of the assignment.
There are many hints and clarifications throughout the code comments, so please read all code in this folder.
- Implement logistic.py, relu.py, and fully_connected.py.
These are the forward layers of the network.
- Implement logistic_backprop.py, relu_backprop.py, and fully_connected_backprop.py. These are the backwards layers of the network.
Because fully_connected_backprop.py is the trickiest part of the assignment, but later code depends on it, we have provided fully_connected_backprop_gt.pyc.
This is a compiled file that contains a correct implementation of fully_connected_backprop.py that you can call the same function fully_connected_backprop() after from fully_connected_backprop_gt import fully_connected_backprop.
Feel free to write unit tests against this code, or to use it in place of fully_connected_backprop.py if (and only if) you get stuck.
This file is just meant to act as a timesaver, as there is an always-available and standard way of writing unit tests for backprop code: using a finite difference approximation to estimate the gradients from the forward pass.
You are on the honor code not to try to decrypt the .pyc, and to use your version of the function for the remaining parts of the assignment if you want full credit.
- Implement the body of tinynet_predict.py.
- Implement the backpropogation pass of the network in tinynet_sgd.py.
- Train the default network using SGD by running test_tinynet.py. Run it
several times - you should see the network converging to different results. It should
converge to a good result some, but not all, of the time.
-
Experiment with different network architectures to get improved convergence
and performance. Do additional layers always help?
What about more neurons in each layer? How does the number of epochs and learning
rate affect the network? Try a network with a constant number of neurons (i.e. 6,6,6)
versus one with an hourglass shape (i.e. 9,6,3). Does one exhibit better
accuracy or convergence properties than the other?
What to turn in:
- In Assignment 4 code: all files from the starter code with your well commented implementations inside a folder called q2.
- In Assignment 4 written: a section in the README.pdf containing your short
answer responses to the final bullet point's questions
and your modified code snippets (Pleaes refer to
the main page for Assignment 4).
Last update
25-Nov-2019 14:19:12