COS 429 - Computer Vision |
Fall 2016 |
Course home | Outline and Lecture Notes | Assignments |
Now we are ready to train a simple ConvNet to classify images. This data set, known as CIFAR-10, contains 32x32 RGB images, each belonging to one of 10 categories. MatConvNet conveniently includes an example that trains a basic network (and a more complex one, too).
The code is located in
matconvnet/examples/cifar/
Replace the files found there with these.
This version changes a few features:
imdb.showExample(imdb, ex_index, cls_result);where cls_result is a 10x1 vector of softmax outputs.
Task 0:
Train the network for 10 epochs using:
[net, info, opts, imdb] = cnn_cifar('train', struct('numEpochs', 10), 'runName', 'baseVersion');
This will start by downloading and saving the data set.
In the train & test loss graph, the test loss does better in the first 1-2 samples, then the training begins to have a smaller loss. Can you figure out why this is?
Task 1:
Evaluate the network on a batch of 100 test examples (hints:
find(imdb.images.set==3) and
[ims, labels]=imdb.getBatch(imdb, batch))
and compute the normalized softmax scores for each class (recall Part
4.2 of the tutorial). Now look at several examples including the best,
worst, and some average classified ones and display them using the
showExample() function).
Task 2:
Now train the network for more iterations until it looks like it has
converged. (calling cnn_cifar() again will continue training from
where you left off). You probably want to give the resulting net a
different variable name so you can still access the previous one.
Look at the same examples as before. Has the classification improved? How did it change for the good and bad examples?
Task 3:
Now you will start to modify the network and its training options. These are defined in cnn_cifar_init.m.
First, play with the learning rate and see if you can make it learn faster. What is the range of reasonable learning rates for this network?
Task 4:
Play with the network and see if you can make it better. Try changing the
number of channels. Also try adding some additional conv+relu
layers (without pooling and with pading so the layer size does not change).
Can you find one that achieves a better classification accuracy on the test
set?