COS 429 - Computer Vision

Fall 2017

Course home

Outline and Lecture Notes

Assignments

Featured Projects

Assignment 2: Face Detection and Model Fitting

Due Thursday, Oct. 19

Part II. Training a Face Classifier

Dalal-Triggs: The 2005 paper by Dalal and Triggs proposes to perform pedestrian detection by training a classifier on a Histogram of Gradients (HoG), then applying that classifier throughout an image. Your next task is to train that classifier, and test how well it works.

Training: At training time, you will need two sets of images: ones containing faces and ones containing nonfaces. The set of faces provided to you comes from the Caltech 10,000 Web Faces dataset. The dataset has been trimmed to 6,000 faces, by eliminating images that are not large or frontal enough. Then, each image is cropped to just the face, and is resized to a uniform 36x36 pixels. All of these images are grayscale.

The non-face images come from from Wu et al. and the SUN scene database. You'll need to a bit more work to use these, though, since they come in a variety of sizes and include entire scenes. So, you will need to randomly sample patches from these images, and resize each patch to be the same 36x36 pixels as the face dataset.

Once you have your positive and negative examples (faces and nonfaces, respectively), you'll compute the HoG descriptor (a partial implementation of which is provided for you) on each one. Finally, you'll train the logistic regression classifier you wrote in Part I to run on the feature vectors.

As mentioned in the Dalal and Triggs paper and in class, the HoG descriptor has a number of parameters that affect its performance. Two of these are exposed as inputs to the hog36 function, but the code to use them must be implemented. These parameters are the number of orientations in each bin, and whether those orientations cover the full 360 degrees or orientations 180 degrees apart are collapsed together. Implement the code needed to use the parameters orientations and wrap180, and then experiment with these parameters to see whether the conclusions reached in the paper (i.e., performance does not improve beyond about 9 orientations, and it doesn't matter whether or not you wrap at 180 degrees) hold equally well for face detection as for pedestrian detection.

Predicting: One of the nice things about logistic regression is that its output ranges from 0 to 1, and is naturally interpreted as a probability. (In fact, using logistic regression is equivalent to assuming that the two classes are distributed according to Gaussian models with different means but the same variance.) In lecture, we thresholded the output of the learned model to get a 0/1 prediction for the class, which effectively thresholded at a probability of 0.5. But for face detection you may wish to bias the detector to give fewer false positives (i.e., be less likely to mark non-faces as faces) or fewer false negatives (i.e., be less likely to mark faces as non-faces). Therefore, you will look at graphs that plot the false-negative rate vs. false-positive rate as the threshold of probability is changed. Curves that lie closer to the bottom-left corner indicate better performance. Of course, you will look at performance on both the training set (for which you expect great performance) and a separate test set (which may or may not perform as well).

Do this:

Download the starter code and dataset (about 30 MB) for this part. It contains the following files:
- logistic_prob.m - you will modify this to compute the predicted probability that new datapoints belong to class 1 (vs 0), given a model trained by logistic_fit.
- get_training_data.m - you will modify this to create a matrix of training data, including both positive and negative examples (i.e., faces and nonfaces).
- get_testing_data.m - in a similar way, you will implement this function to create a matrix of testing data.
- test_face_classifier.m - this function calls the above two functions, calls logistic_fit to learn a model, and produces graphs of false-negative vs false-positive rate.
- hog36.m - A simple implementation of the HoG descriptor, specialized for 36x36 images, 6x6 cells, and 2x2 blocks. It currently runs with a fixed number of orientations and no wrapping at 180 degrees, but you will modify it to allow a variable number of orientations and optional wrapping at 180 degrees.
- face_data/training_faces/*.jpg - 6,000 images, 36x36, grayscale, each containing one centered face
- face_data/training_nonfaces/*.jpg - 250 images, varying sizes, color
- face_data/testing_faces/*.jpg - 500 images, 36x36, grayscale, each containing one centered face
- face_data/testing_nonfaces/*.jpg - 500 images, 36x36, grayscale
Copy over logistic_fit.m from part I; you will need it for this part.

Do this and turn in:

Implement logistic_prob.m, get_training_data.m and get_testing_data.m. Look for sections marked "Fill in here". The trickiest part is likely to be selecting random squares (i.e., random positions and random sizes no smaller than 36x36) from the nonface images. Turn in these three files.
Run
test_face_classifier(250,100,4,true)
which trains the classifier on 250 faces and 250 nonfaces, tests it on 100 faces and 100 nonfaces, using 4 orientations that are wrapped at 180 degrees. If all goes well, the training should complete in a few seconds, and you should see two plots, for training and testing performance. The training plot should be boring: running down the bottom and left sides of the graph, indicating perfect performance. The testing plot, though, should indicate imperfect performance. Turn in the training and testing plots.
Train the classifier with 6,000 training images and 500 test images. The training time will take longer, but should still finish in a few minutes, depending on your CPU. Note that testing accuracy should increase significantly with more training data. Turn in the training and testing plots.
Train the classifier with increasing the number of orientations from 4 to 6, 9, 12. Use 6,000 training images and 500 test images as before. Briefly describe what happens to test accuracy.
Modify hog36.m to disable wrapping of orientations at 180 degrees. Train the classifier with 6,000 training images, 500 test images, 9 orientations, and no wrapping. Do you see the same behavior as Dalal and Triggs, in that turning off the wrapping of orientations at 180 degrees makes little difference to accuracy? Briefly explain why (or why not) that is the case.
In parts III and IV of this assignment, you will run this detector at many locations throughout an image that may or may not contain some faces. Would you prefer to run the detector with a threshold that favors fewer false positives, fewer false negatives, or some balance? Briefly explain why.

Acknowledgment: idea and datasets courtesy James Hays. His assignment at Georgia Tech.

Last update 5-Oct-2017 01:45:09