COS 429 - Computer Vision
Fall 2005
Assignment 2
Due Thursday, Oct. 27
Submission Guidelines
Please submit your write-up as an HTML file, with links to your code
and images (or even better, IMG tags). Nothing fancy is required. If
you didn't receive any comments on the way you formatted your
assignment 1, it was probably just fine.
1. Questions (30%)
- When we were discussing the Hough transform for lines, we saw that
parameterizing lines by slope and intercept led to a less uniform
parameterization than angle and distance from center. To analyse this
nonuniformity, assume that you are given 1000 lines (of random orientations)
and you must assign them to 5 buckets based on orientation.
- How many lines fall in each bucket if the buckets are assigned
uniformly based on angle?
- How many lines fall in each bucket if the buckets are assigned
uniformly based on slope? For concreteness, assume that
the slopes of the buckets are -2, -1, 0, 1, 2 and that
each line is assigned to the bucket that is nearest
to its slope.
- The finite size of an image implies that, on average, the length in pixels
of the visible portions of lines close to the image center C is greater
than that of lines distant from C. How does this bias the Hough
transform? How could you counter this bias?
- Imagine you are given two vectors, a "signal" S and a
"template" T. Assume T is shorter than S.
Now, you want to find the position within S at which T
is the best match according to the sum of squared differences (SSD)
criterion. That is, given an offset k at which you are looking
for T within S, you want to find the k that minimizes
Show how you can do this without an explicit loop over k
by computing two convolutions:
- S convolved with some variant of T
Hint: this won't necessarily be T itself, but
some simple transformation. Look at the equation above, and compare
with the definition of discrete convolution.
- The vector consisting of the square of each element of S,
convolved with a vector the same length as T but consisting
of all ones.
You will need this result for the face detection portion of the
assignment below.
2. Aligned database of faces (20%)
The remainder of this assignment will be to implement a face detection system
trained on a database of examples. The first step is to align and normalize
the examples. Download the following images:
cos429_f05_faces_scaled.zip (2 MB)
Note: this includes pictures taken Oct 11 - download this again
if you have the earlier version
and make sure you can load them into Matlab. These images are rescaled and
slightly cropped from the originals, to make them more manageable. If you
would like to work with the originals instead (including pictures with glasses,
etc.), those are here:
cos429_f05_faces_orig.zip (72 MB)
Then, implement code that:
- Presents each face image in turn
- Lets the user click on the centers of the eyes, and stores those
coordinates (using the getpts function)
- Converts the image to grayscale
- Warps the image so that the eye points are mapped to fixed locations,
100 pixels apart horizontally (look up the imtransform and
cp2tform functions)
- Crops out an appropriate section (e.g., 300 pixels tall by 200 wide)
of the image (the easiest way is to use the 'XData' and
'YData' options to imtransform)
- Saves the results, so you don't have to do the clicking more than once
The result should be a collection of well-aligned equal-sized images.
To verify, look at (and submit) the average of the images, and confirm
that it looks like a face. If you're feeling ambitious, use more
features than just the eyes to compute alignment. You could also try
to localize the eyes and align faces automatically.
3. Face detection (40%)
We're just going to use the average face we computed above as a
template. To find faces in a test image, we'll find where subsets
of the test image match the template best, using the SSD metric.
As you showed above, this can be done using a couple of convolutions.
There are two wrinkles to deal with:
- Because the filter is now rather larger than the ones we were using
in the edge detector assignment, you'll get much better performance by
using the FFT to do the convolution. Implement an FFT-based convolution
(using the fft and ifft functions), and compare its
performance to conv2 on an image and filter of your choosing.
- Since the face can appear at any size, you'll need to do your
convolution at multiple scales. This can be done by scaling either
the face template or the input image. Pick one of these options,
and justify why you chose the one you did.
Run your face detector on some test images of crowds of people.
You can use your own, or here are some taken in class:
Show the SSD score image for each test. Devise a method based on
thresholding and/or nonmaximum suppression for narrowing the output down
to a set of discrete locations in each image where you think there's a
4. Eigenfaces (10%)
Run PCA with whitening on your database of faces (read up on the Matlab
svd function, especially the svd(X,0) syntax). Show
us the top 5 principal components. If you're feeling ambitious, try
some recognition experiments with the faces you detected in part (3) -
can you recognize faces based on their projection onto the top few
principal components and a nearest-neighbor classifier?
Last update
29-Dec-2010 12:00:22
smr at princeton edu