|
COS 496 - Computer Vision
Assignment #4
|
Spring, 2002
|
Course home
|
Outline
|
Assignments
Due Wed. Apr. 10
I. Questions (20%)
- Describe the probable effects of running a voxel coloring algorithm using
pictures of a constant-color cube (against a different-colored background).
How would these artifacts be affected by the camera positions?
- What happens when voxel coloring is run with the color similarity
threshold set to be too large? Too small?
- Describe the effects of incorrect camera calibration on
correspondence-based stereo and on voxel coloring. Which algorithm do you
expect would produce worse results given a slightly-miscalibrated camera?
II. Voxel Coloring (80%)
Implement a system for 3D reconstruction using voxel coloring, as described
in Seitz and Dyer's 1997 paper.
As a review, the basic steps are as follows:
- Load a number of images, together with their camera positions and
intrinsics.
- For each image, create a mask that is the same size as the image and
initialize it to all zeros.
- For each voxel V in some volume of space:
- Project V into each of the camera images. Since the projection
of a voxel (which is a cube) will, in general, be some sort of
hexagon, it is simpler to approximate the set of pixels to be
considered by projecting the six corners of the voxel and taking
all pixels within the image-space bounding box of these projections.
Let P be this set of pixels.
- Remove from P all pixels that belong to the background (you may
assume a pixel to be background if its red, green, and blue components
are all less than thresh_bg).
- Remove from P all pixels for which the value of the mask is 1.
- Find the standard deviation of the colors of the pixels remaining
in P.
- If the standard deviation is smaller than thresh_color:
- Record the voxel being considered as "visible" and
assign it the average color in P.
- For each pixel in P, set the corresponding bit in the
mask to 1.
Cube data set
Run the voxel coloring algorithm on these
(computer-generated) images of a cube. For each image, there is also a
".xf" file containing a 4x4 matrix representing the camera translation and
rotation. To project a 3D point (x,y,z,1) into a camera image, you first
multiply the point by the corresponding matrix:
[x'] [x]
[y'] = M [y]
[z'] [z]
[1 ] [1]
then obtain camera (u,v) coordinates as follows:
u = 256 - 443.405 * x' / z'
v = 256 + 443.405 * y' / z'
All this assumes that the pixel origin is at the upper-left corner.
For this data set, you may assume the following:
- Make your voxel grid extend from -0.1 to 1.1 in x, y, and z
- Perform the voxel sweep in one pass, proceeding in the negative x
direction. That is, you should first consider the voxels with x = 1.1,
then consider planes of voxels with smaller and smaller x.
- Start with a small number of voxels (e.g. 10x10x10) and increase
the resolution once your code is working.
- Use 5 for thresh_bg, and experiment with different values of thresh_color.
Visualizing your results
vxlview is a really simple viewer for visualizing your results.
Versions are available for Windows and
Linux. You will need to have OpenGL
installed, and you will also need the GLUT library (if you don't already
have it, here are the files you need for
Windows and
Linux).
The program takes the name of a file containing a list of voxels on the
command line. Under some versions of Windows, you can also drag 'n drop
a file of voxel data onto the application and have it run. Here is a
sample file: bunny.vxl
When the program is running, you can
- Drag with the left mouse button to rotate the model
- Drag with the right mouse button to translate the model
- Drag with both left and right buttons pressed to move towards and away
from the model
- Press the space bar to reset the view to the original
- Press Escape to exit
The format of the ".vxl" file is as follows:
- Everything is plain text
- The first line contains the number of occupied voxels
- The second line contains the size of each voxel
- The remaining lines contain six numbers each: the x, y, and z coordinates
of each occupied voxel, and the r,g,b color components
(as integers from 0 to 255).
Language
You may find that this assignment runs slowly in Matlab, especially for large
numbers of voxels. If you wish, you can implement the code in another language
of your choice (such as C or C++). In order to read in the images, we suggest
converting them to PPM (djpeg cube1.jpg > cube1.ppm), which is very simple
to read ("man ppm" for details).
Submitting
This assignment is due Tuesday, April 9, 2002 at 11:59 PM EDT.
Please see the general
notes on submitting your assignments, as well as the
late policy and the
collaboration policy.
Please submit your code, the results of your code on the provided test
images, and a README or README.html file containing your answers to the
questions in part I. If you find it more convenient to hand in the written
portion of the assignment on paper, that will be accepted as well.
Last update
12:05:14 29-Dec-2010
cs496@princeton.edu