COS 429: Computer Vision, Fall 2013 |
Your goal for this part of the assignment is to write a MATLAB program that predicts the locations of long, straight lines in an input image. This is just like the warmup exercise, except that now you are detecting lines rather than eyes.
As in the previous assignment, "runme()" should write gray-level images in the output directory where the brightness of each pixel is proportional to the predicted probability of finding a straight line at that pixel in the corresponding image in the input directory. For example, running runme() for the input shown on the left below might produce the output shown in the middle. The rightmost image shows an overlay of the two (where the output has been added to the red channel of the input). More examples can be found here.
Input Output Overlay
The general strategy we will use is to predict whether pixels appear to be on edges locally AND on long straight lines globally and then combine the two to produce an overall line detection prediction. This strategy can be implemented with the following four steps:
Step A: Gradient estimation
Implement a function[xgradient_image, ygradient_image] = computegradients(input_image)
to find the x and y components of the gradient Fx and Fy of the image smoothed with a Gaussian (e.g., σ = 1). In your implementation, you should use the facts that a 2D Gaussian is separable and that finding the derivative of a function convolved with a Gaussian is the same as convolving with the derivative of a Gaussian to improve the efficieny (as described on slide 65 of the Image Analysis lecture).Step B: Edge detection
Implement a functioncanny_image = detectedges( xgradient_image, ygradient_image)
to detect edges using the Canny algorithm as follows:
- Gradient analysis:
- Compute the edge strength F (the magnitude of the gradient)
- Compute the edge orientation D = arctan(Fy/Fx) at each pixel.
- Nonmaximum suppression:
Create a "thinned edge image" I(x,y) as follows:
- For each pixel, find the direction D* in (0, 45, 90, 135) that is closest to the orientation D at that pixel. (Or the equivalent in radians...)
- If the edge strength F(x,y) is smaller than at least one of its neighbors along D*, set I(x,y) = 0, else set I(x,y) = F(x,y).
- Hysteresis thresholding:
Repeatedly do the following:
- Locate the next unvisited pixel (x,y) such that I(x,y) > T_h (e.g., T_h = 0.1).
- Starting from (x,y), follow the chain of connected local maxima in both directions as long as I(x,y) > T_l (e.g., T_l = 0.03), where two pixels are connected if they are adjacent in horizontal, vertical, or diagonal directions.
- Mark each pixel as it is visited.
- Output Edge image:
Return a gray-level Edge image, E, where all pixels found with hysteresis thresholding have values equal to their gradient magnitudes and all other pixels are zero.Step C: Hough Voting
Implement a functionhough_transform = votelines( canny_image, xgradient_image, ygradient_image)
that accumulates votes for lines using the Hough algorithm as follows:
- Initialization:
Create a Hough image, H, where the rows represent distances and the columns represent angles -- i.e., each pixel (d,a) in this image represents a line in the input image with distance "d" from the image center and angle "a" with respect to the x-axis. You should choose a dimensionality of the rows large enough to accomodate the largest possible distance from the center (i.e., the diagonal radius of the input image in pixels). The dimensionality of the columns should be large enough to accomodate all possible angles (0 - 2*PI) with at most one degree increments. All values should be initialized to zero.
- Voting:
Accumulate votes for lines in the Hough image. For every pixel p = (x,y) of the input image: first compute (d,a) = L(p), where L(p) is the line through p perperpendicular to the image gradient (Fx,Fy), d is the shortest distance between L(p) and the center of the image (cx,cy), and a is the angle between L(p) and the x-axis (note that gradients in opposite directions at the same point should produce the same (d,a)). Then, add E(p) to the corresponding position (d,a) in the Hough image, using bilinear weights to augment the four closest pixels, since (d,a) probably will not fall on integer coordinates. The result will be a Hough image where the brightest pixels represent lines where many votes agree.Step D: Combination
Implement a functionoutput_image = predictlines( canny_image, hough_image, xgradient_image, ygradient_image, alpha, beta, gamma)
that combines the Canny Edge and Hough images into a final output image containing larger gray values where lines are predicted with more confidence. There are many ways to do this, but you should at least implement the following process:for each pixel p in the input image output_image(p) = max ( E(p)α · H(L)β · (N(p)·N(L))γ ) L endwhere L is a line containing p, E(p) is the magnitude of the Edge image for point p; H(L) is the magnitude of the Hough image for a line L (again, use interpolation since L may not be on integer coordinates), (N(p)·N(L)) is a measure of how well the orientation of L agrees with the smoothed gradient at p, and the argmax provides the maximum value of the product of the three over all lines L containing p. This procedure predicts that long, straight lines appear where edges are detected locally, lines are detected globally, and the two agree in orientation, where α, β, and γ are exponents used to weight the contributions of the three factors (we will use α=1, β=1, and γ=2 as defaults for this assignment).
For this part of the assignment, please make a hypothesis about a particular modification or enhancement to at least one of the basic four steps, implement it, and then analyze how and why your modification improves or hurts your results. Your writeup should include a description of your modification, an explanation of why you chose it, images comparing results with and without it, and quantitative comparisons of results for the provided input images (see next part).
You should execute your program using runme.m on a variety of input test images to investigate how well it works under different conditions. To get full credit for this part, you must show outputs of your program for all of the images in the "input" subdirectory of cos429_assignment1.zip, plus at least three images taken with your own camera, and at least three images downloaded from the web.
As in Assignment 0, please think about how to investigate how well your algorithm works. For example, you might consider a set of images with different levels of noise, radial distortion, dynamic range, etc. You could also show images demonstrating the impact of parameter settings in your program (e.g., the sigma used for a blur, the low and high threholds used for finding connected components in Canny, the resolution of the Hough image, etc.).
Please include the results of your program on the test data set and your own input images in a third section of your writeup titled "Results and Analysis." For each input image (or group of images), you should include a short discussion of how your algorithm works on that image. Overall, the goal of this section is to answer questions like: When does your program succeed? When does it fail? What are the key attributes of input images that affect its success? What parameter settings affect its success? You do not have to answer all of these questions, but you should discuss at least one characteristic of the input images and/or parameters that affects the quality of your results, demonstrated by results for a set of input images spanning cases where the algorithm does and doesn't work well.
Additionally, please analyze the results of your algorithm quantitatively on the provided input images using the binary ground truth results provided in the groundtruth subdirectory of the zip file. The evaluation metric will be computed using the 2D correlation coefficient between your predicted image and the ground truth image (see corr2 in MATLAB). This metric will produce a value S=1.0 if your prediction is perfect, S=-1.0 if it misses all ground truth pixels, and S in [-1.0,1.0] otherwise.
You should edit the MATLAB files (and possibly create new ones) to implement the algorithms, download new test images into the input subdirectory, execute runme.m to produce your results, and complete your writeup.
Please submit your solution via the dropbox link here.
Your submission should include a single file named "assignment1.zip" with the following structure:
code
" containing your source code.
You should not change the API of the MATLAB functions provided, but you may
add new MATLAB files as you see fit.
groundtruth
" containing all
groundtruth files in ".jpg" format (this is provided).
input
" containing all test
input files in ".jpg" format (include both your new examples and
the test images provided).
canny
" containing all
Edge images produced by your detectedges algorithm
(should be produced automatically by runme.m).
hough
" containing all
Hough transforms produced by your votelines algorithm
(should be produced automatically by runme.m).
stronglines
" containing a
visualization of which lines have accumulated the strongest support
in the Hough transform for all images in your writeup
(should be produced automatically by runme.m).
output
" containing all
images produced by predictlines (should be produced automatically by runme.m).
overlay
" containing the
images produced by overlaying an output image over its corresponding
input (should be produced authomatically by runme.m).
writeup
" containing a file
named "writeup.html
" containing separate sections titled:
(you can start from the template provided in the .zip file):
Please follow the general policies for submitting assignments, including the late policy and collaboration policy.