COS 496 - Computer Vision

Assignment #4

Spring, 2002

Course home | Outline | Assignments

Due Wed. Apr. 10

I. Questions (20%)

Describe the probable effects of running a voxel coloring algorithm using pictures of a constant-color cube (against a different-colored background). How would these artifacts be affected by the camera positions?
What happens when voxel coloring is run with the color similarity threshold set to be too large? Too small?
Describe the effects of incorrect camera calibration on correspondence-based stereo and on voxel coloring. Which algorithm do you expect would produce worse results given a slightly-miscalibrated camera?

II. Voxel Coloring (80%)

Implement a system for 3D reconstruction using voxel coloring, as described in Seitz and Dyer's 1997 paper.

As a review, the basic steps are as follows:

Load a number of images, together with their camera positions and intrinsics.
For each image, create a mask that is the same size as the image and initialize it to all zeros.
For each voxel V in some volume of space:
1. Project V into each of the camera images. Since the projection of a voxel (which is a cube) will, in general, be some sort of hexagon, it is simpler to approximate the set of pixels to be considered by projecting the six corners of the voxel and taking all pixels within the image-space bounding box of these projections. Let P be this set of pixels.
2. Remove from P all pixels that belong to the background (you may assume a pixel to be background if its red, green, and blue components are all less than thresh_bg).
3. Remove from P all pixels for which the value of the mask is 1.
4. Find the standard deviation of the colors of the pixels remaining in P.
5. If the standard deviation is smaller than thresh_color:
  - Record the voxel being considered as "visible" and assign it the average color in P.
  - For each pixel in P, set the corresponding bit in the mask to 1.

Cube data set

Run the voxel coloring algorithm on these (computer-generated) images of a cube. For each image, there is also a ".xf" file containing a 4x4 matrix representing the camera translation and rotation. To project a 3D point (x,y,z,1) into a camera image, you first multiply the point by the corresponding matrix:

        [x']     [x]
        [y'] = M [y]
        [z']     [z]
        [1 ]     [1]

then obtain camera (u,v) coordinates as follows:

        u = 256 - 443.405 * x' / z'
        v = 256 + 443.405 * y' / z'

All this assumes that the pixel origin is at the upper-left corner.

For this data set, you may assume the following:

Make your voxel grid extend from -0.1 to 1.1 in x, y, and z
Perform the voxel sweep in one pass, proceeding in the negative x direction. That is, you should first consider the voxels with x = 1.1, then consider planes of voxels with smaller and smaller x.
Start with a small number of voxels (e.g. 10x10x10) and increase the resolution once your code is working.
Use 5 for thresh_bg, and experiment with different values of thresh_color.

Visualizing your results

vxlview is a really simple viewer for visualizing your results. Versions are available for Windows and Linux. You will need to have OpenGL installed, and you will also need the GLUT library (if you don't already have it, here are the files you need for Windows and Linux).

The program takes the name of a file containing a list of voxels on the command line. Under some versions of Windows, you can also drag 'n drop a file of voxel data onto the application and have it run. Here is a sample file: bunny.vxl

When the program is running, you can

Drag with the left mouse button to rotate the model
Drag with the right mouse button to translate the model
Drag with both left and right buttons pressed to move towards and away from the model
Press the space bar to reset the view to the original
Press Escape to exit

The format of the ".vxl" file is as follows:

Everything is plain text
The first line contains the number of occupied voxels
The second line contains the size of each voxel
The remaining lines contain six numbers each: the x, y, and z coordinates of each occupied voxel, and the r,g,b color components (as integers from 0 to 255).

Language

You may find that this assignment runs slowly in Matlab, especially for large numbers of voxels. If you wish, you can implement the code in another language of your choice (such as C or C++). In order to read in the images, we suggest converting them to PPM (djpeg cube1.jpg > cube1.ppm), which is very simple to read ("man ppm" for details).

Submitting

This assignment is due Tuesday, April 9, 2002 at 11:59 PM EDT. Please see the general notes on submitting your assignments, as well as the late policy and the collaboration policy.

Please submit your code, the results of your code on the provided test images, and a README or README.html file containing your answers to the questions in part I. If you find it more convenient to hand in the written portion of the assignment on paper, that will be accepted as well.

Last update 12:05:14 29-Dec-2010

cs496@princeton.edu