COS 496 - Computer Vision

Assignment #3

Spring, 2002

Course home | Outline | Assignments

Due Wed. Mar. 27

I. Questions (25%)

Show that perspective projection projection (i.e., a transformation that may be represented by some 4x4 matrix in homogeneous coordinates) maps straight lines into straight lines.
What problems would you expect when performing stereo correspondence on images of a ball? (Hint: silhouettes)
Given a stereo system operating on a pair of rectified images, determine the error in the reconstructed 3D position of a point, assuming a 1-pixel error in finding the correspondence. Express the answer as a function of the angle subtended by each pixel, the distance between the cameras, the distance from the baseline to the 3D point, and any other parameters you need.
Assuming a camera model with 6 extrinsic and 2 intrinsic parameters, what is the theoretical minimum number of observed points (with known 3D coordinates) necessary to calibrate the camera?

II. Camera Calibration (75%)

Implement a system for determining the intrinsics and extrinsics of a camera, given a few views of a calibration target.

Camera Model

You should use a camera model consisting of:

An arbitrary position and orientation for each image.
Perspective projection.
A uniform pixel scale.
First-order radial distortion.
Translation to the center of the image.

That is, given a 3-D point p = (x,y,z), the camera model should consist of a translation and rotation:

p' = Rz * Ry * Rx * T * p

where Rx, Ry, Rz are rotations around the x, y, and z axes, and T is a translation. Next is a perspective projection and scale:

u' = s * x' / z'
v' = s * y' / z'

where s is just a constant scale factor. Next comes radial distortion:

u'' = u' (1 + k * (u'*u'+v'*v'))
v'' = v' (1 + k * (u'*u'+v'*v'))

where k is the radial distortion coefficient. Finally comes translation to the center of the image:

u = u'' + cu
v = v'' + cv

You do not need to solve for (cu, cv) - just assume that it is the center of the image (i.e., for a 640x480 image assume that cu = 320 and cv = 240). Thus, this camera model consists of 2 intrinsic parameters to solve for (the scale factor s and the radial distortion coefficient k) and six extrinsic parameters (the three rotations and three components of translation).

You will be given several views of a test target. The intrinsics of the camera may be assumed to remain the same between views, but the extrinsics vary between views. Thus, if there are n views, you must solve for a total of 6*n + 2 parameters (the intrinsics and all the extrinsics).

Solution method

You should use a solution method based on nonlinear least squares. The input is a set of pictures of a checkerboard pattern. Each corner of the checkerboard defines a point in 3D space relative to some fixed position (e.g. the lower left corner). Since the spacing of the checkerboard is known (assume that the checkers are precisely 1 cm on a side), and since the target is assumed to be planar, the 3D locations of these points are known. Together with the (u,v) locations of the corners in the image, each of these corners places some constraint on the parameters in the camera model.

Since there are more corners than free parameters in the model, we have an overconstrained system, and will minimize it via least squares. In particular, we will be minimizing the sum of squared distances between the actual (u,v) locations and the (u,v) locations predicted by the camera model.

Matlab has built-in functions to perform nonlinear minimization. You should read the help for fmins (if using Matlab 5) or fminsearch (for Matlab 6) to learn how to use them.

Solving for parameters

Do the following:

Load each of the following images: 1 2 3 4 5 6 7 8
For image, let the user click on the corners in the image and record the (u,v) positions. Use the function getpts (if you don't have the getpts function, you can get it here: getpts.m). Be sure to click on the points in a predetermined order (e.g., left-to-right, bottom-to-top) to make it easy to assign 3D coordinates to these points. During debugging, you will probably want to use only a few images, and only click on a few points (e.g., 9) per image.
Determine the 3D coordinates for each 2D point (assume the target is planar, the origin is at the lower-left corner, and the squares are 1 cm on a side). Note that some coordinate (e.g. the z coordinate) will be zero for all these 3D points.
Form a matrix with all the data points. This matrix should have as many rows as there are corners, and six columns: the x, y, and z coordinates, the u and v coordinates of the corner in the image, and the number of the image from which this data came (this is necessary so that the function implementing the camera model can use the correct set of extrinsics).
To avoid having to click on points many times during debugging, save this matrix to disk (help save).
Write a function that takes the above matrix of data values, as well as a 6*n+2 dimensional vector with the camera intrinsics and extrinsics, and computes the sum of squared differences between the observed (u,v) points and the result of applying the camera model to the (x,y,z) points.
Use fmins or another nonlinear minimizer to minimize the above function (you will have to use the form of the minimizer that allows additional arguments to be passed to the objective function, unless you use a global variable for the matrix of data values).

Determining feature points

Make the user's job easier by not requiring clicking precisely on every corner in every image. There are many levels of sophistication possible, and you will get more credit for more automation:

(Minimum required level) Find corners in the image using the corner finder discussed in class. When the user clicks on some point in the image, "snap" that point to the nearest corner.
(Harder, some extra credit) Let the user click on the four corners of the checkerboard pattern, and specify how many checkers there are. Use this information to guess the locations of the corners, then snap each guess to the nearest output of a corner finder.
(Hardest) Use a Hough transform to find lines in the image, then guess candidate locations for corners at the intersections of these lines. Again, use the output of a corner detector to refine the positions.

Submitting

This assignment is due Wednesday, March 27, 2002 at 11:59 PM Eastern Standard Time. Please see the general notes on submitting your assignments, as well as the late policy and the collaboration policy.

Please submit your code (as one or more .m files), the results of your code on the provided test images, and a README or README.html file containing your answers to the questions in part I. If you find it more convenient to hand in the written portion of the assignment on paper, that will be accepted as well.

Last update 12:05:14 29-Dec-2010

cs496@princeton.edu