|
COS 496 - Computer Vision
Assignment #3
|
Spring, 2002
|
Course home
|
Outline
|
Assignments
Due Wed. Mar. 27
I. Questions (25%)
- Show that perspective projection projection (i.e., a transformation
that may be represented by some 4x4 matrix in homogeneous coordinates)
maps straight lines into straight lines.
- What problems would you expect when performing stereo correspondence
on images of a ball? (Hint: silhouettes)
- Given a stereo system operating on a pair of rectified images,
determine the error in the reconstructed 3D position of a point, assuming
a 1-pixel error in finding the correspondence. Express the answer as a
function of the angle subtended by each pixel, the distance between the
cameras, the distance from the baseline to the 3D point, and any other
parameters you need.
- Assuming a camera model with 6 extrinsic and 2 intrinsic parameters,
what is the theoretical minimum number of observed points (with known 3D
coordinates) necessary to calibrate the camera?
II. Camera Calibration (75%)
Implement a system for determining the intrinsics and extrinsics of a
camera, given a few views of a calibration target.
Camera Model
You should use a camera model consisting of:
- An arbitrary position and orientation for each image.
- Perspective projection.
- A uniform pixel scale.
- First-order radial distortion.
- Translation to the center of the image.
That is, given a 3-D point p = (x,y,z), the camera model should consist of a
translation and rotation:
p' = Rz * Ry * Rx * T * p
where Rx, Ry, Rz are rotations around the x, y, and z axes, and T is
a translation. Next is a perspective projection and scale:
u' = s * x' / z'
v' = s * y' / z'
where s is just a constant scale factor. Next comes radial distortion:
u'' = u' (1 + k * (u'*u'+v'*v'))
v'' = v' (1 + k * (u'*u'+v'*v'))
where k is the radial distortion coefficient. Finally comes translation to
the center of the image:
u = u'' + cu
v = v'' + cv
You do not need to solve for (cu, cv) - just assume that it is the center of
the image (i.e., for a 640x480 image assume that cu = 320 and cv = 240).
Thus, this camera model consists of 2 intrinsic parameters to solve for
(the scale factor s and the radial distortion coefficient k) and six
extrinsic parameters (the three rotations and three components of translation).
You will be given several views of a test target. The intrinsics of the
camera may be assumed to remain the same between views, but the extrinsics
vary between views. Thus, if there are n views, you must solve for a
total of 6*n + 2 parameters (the intrinsics and all the extrinsics).
Solution method
You should use a solution method based on nonlinear least squares.
The input is a set of pictures of a checkerboard pattern. Each corner
of the checkerboard defines a point in 3D space relative to some fixed
position (e.g. the lower left corner). Since the spacing of the checkerboard
is known (assume that the checkers are precisely 1 cm on a side), and since
the target is assumed to be planar, the 3D locations of these points are
known. Together with the (u,v) locations of the corners in the image, each
of these corners places some constraint on the parameters in the camera model.
Since there are more corners than free parameters in the model, we have an
overconstrained system, and will minimize it via least squares. In particular,
we will be minimizing the sum of squared distances between the actual (u,v)
locations and the (u,v) locations predicted by the camera model.
Matlab has built-in functions to perform nonlinear minimization. You
should read the help for fmins (if using Matlab 5) or
fminsearch (for Matlab 6) to learn how to use them.
Solving for parameters
Do the following:
- Load each of the following images:
1
2
3
4
5
6
7
8
- For image, let the user click on the corners in the image and record
the (u,v) positions. Use the function getpts (if you don't have
the getpts function, you can get it here: getpts.m).
Be sure to click on the points in a predetermined order (e.g., left-to-right,
bottom-to-top) to make it easy to assign 3D coordinates to these points.
During debugging, you will probably want to use only a few images, and only
click on a few points (e.g., 9) per image.
- Determine the 3D coordinates for each 2D point (assume the target is
planar, the origin is at the lower-left corner, and the squares are 1 cm on
a side). Note that some coordinate (e.g. the z coordinate) will be zero
for all these 3D points.
- Form a matrix with all the data points. This matrix should have as
many rows as there are corners, and six columns: the x, y, and z
coordinates, the u and v coordinates of the corner in the image, and the
number of the image from which this data came (this is necessary so that
the function implementing the camera model can use the correct set of
extrinsics).
- To avoid having to click on points many times during debugging, save
this matrix to disk (help save).
- Write a function that takes the above matrix of data values, as well as
a 6*n+2 dimensional vector with the camera intrinsics and extrinsics, and
computes the sum of squared differences between the observed (u,v) points and
the result of applying the camera model to the (x,y,z) points.
- Use fmins or another nonlinear minimizer to minimize the above
function (you will have to use the form of the minimizer that allows additional
arguments to be passed to the objective function, unless you use a global
variable for the matrix of data values).
Determining feature points
Make the user's job easier by not requiring clicking precisely on every
corner in every image. There are many levels of sophistication possible,
and you will get more credit for more automation:
- (Minimum required level) Find corners in the image using the corner finder
discussed in class. When the user clicks on some point in the image, "snap"
that point to the nearest corner.
- (Harder, some extra credit) Let the user click on the four corners of
the checkerboard pattern, and specify how many checkers there are. Use this
information to guess the locations of the corners, then snap each guess to
the nearest output of a corner finder.
- (Hardest) Use a Hough transform to find lines in the image, then guess
candidate locations for corners at the intersections of these lines. Again,
use the output of a corner detector to refine the positions.
Submitting
This assignment is due Wednesday, March 27, 2002 at 11:59 PM Eastern
Standard Time. Please see the general
notes on submitting your assignments, as well as the
late policy and the
collaboration policy.
Please submit your code (as one or more .m files), the results of your code
on the provided test images, and a README or README.html file containing
your answers to the questions in part I. If you find it more convenient to
hand in the written portion of the assignment on paper, that will be
accepted as well.
Last update
12:05:14 29-Dec-2010
cs496@princeton.edu