COS 429 - Computer Vision

Fall 2016

Assignment 3

Due Tuesday, Dec. 13

*Updates:*
	Dec 8	Note that there is a small bug in `hw3test.m` - `defineActiveLevels` should return levels 1..5 on the given input, not 1..4
	Dec 5	If you have already downloaded the starter code, please grab the new versions of `uvsPlus.m` and `uvsChangeCenter.m` (to be placed in the `uvs/` directory). Newly-downloaded starter code contains these changes.

In this assignment you will be building a face tracker based on the Lukas-Kanade algorithm. For background, you should read the Lucas-Kanade paper and look at the notes for lecture 12.

Part 1. Preliminaries

Begin by downloading the starter code and data, unzipping it, changing to the directory containing the code, and running hw3start to put all the subdirectories on Matlab's path.

The script hw3test.m has some useful commands and tests for the assignments.

1.1 Motion

We will be working with two motion models:

Translation by $(u,v)$, where the coordinate $(x,y)$ is updated to a new location $(x',y')$ according to $$ x'=x+u $$ $$ y'=y+v $$
Translation by $(u,v)$ and scale by $s$, where the scale $s$ is defined by the change of the object width from $w$ to $w'$ as: $$ s = \frac{w'-w}{w} $$ Thus $s=0$ means no scale, $s=1$ means the object doubles in size, $s=-0.5$ means it shrinks in half, and $s=-1$ means it shrinks to a singularity.
To define the motion of a point $(x,y)$ due to scale, one needs to define the "center" of motion. We will refer to this arbitrary point as $(x_0,y_0)$. Thus scaling the point $(x,y)$ means $$ x' = x+(x-x_0)*s $$ $$ y' = y+(y-y_0)*s $$
After scaling, we apply the translation also, producing the motion equations: $$ x' = x+u+(x-x_0)*s $$ $$ y' = y+v+(y-y_0)*s; $$

In the code, the directory uvs/ contains functions that manipulate $(u,v,s)$ motion models. The data structure they use is simply a row vector of 5 elements: [u v s x0 y0]. For example, the function

    wc = uvsCWarp(mot, c)

takes a motion model mot = [u v s x0 y0] and coordinate c = [x,y] and produces the new warped point wc = [x',y'].

Note that most of the functions are written in a vectorized form where they accept multiple rows of motions (and or points), hence you see a lot of code such as mot( : , U ).

In the uvs/ subdirectory, write the body of the function

    motinv = uvsInv(mot)

which inverts the motion mot. The location of motinv's center of motion [x0',y0'] should be the projection of [x0,y0] by the motion mot. Similarly to the other functions, your function may get several rows of motions as input and should output the same number of rows with each motion inverted. Try to do it without a loop.

Include in your readme.txt the output of the commands in part 1.1 of hw3test.m

1.2 Rectangles

In frame-to-frame alignment, our goal is to estimate these parameters, $(u,v)$ or $(u,v,s)$, from a local part of a given pair of images. For simplicity, we will limit ourselves to rectangular areas of the image. Our rectangles are defined as rect = [xmin xmax ymin ymax] and t directory rect/ contains functions to manipulate rectangles. (Technical note: the sides of a rect can take on non-integer values, where a rectangle may include only part of some pixels. Pixels are centered on the integer grid, thus a rectangle that contains exactly pixel [1,3] would be rect = [0.5 1.5 2.5 3.5].)

Given 2 rectangles (of the same aspect ratio), we can compute the $(u,v,s)$ motion between them. This could be useful for you later to define an initial motion for the tracking if you have a guess of 2 rectangles that define the object (e.g. by running the face detector).

In the rect/ subdirectory, finish the function

    mot = rect2uvs(r1, r2)

that defines this motion. If the aspect ratio of the 2 rects is not exactly the same, just use the average of the 2 possible scales. (Hint: notice the functions rectCenter(r) and rectSize(r) in the rect/ directory).

Include in your readme.txt the output of the commands in part 1.2 of hw3test.m

1.3 Image sequence

Included with the starter code (in the data/ subdirectory) are the 'girl' and 'david' video sequences of faces from http://vision.ucsd.edu/~bbabenko/project_miltrack.html .

Each frame of the video is just an image (here it's actually stored as set of png files). We could represent the image as just a Matlab matrix, as you have done in the previous assignments, but we will be cutting and warping images and it is useful for an image to be able to have a coordinate system that is not necessarily rooted at the top-left corner. Here we will use a "coordinate image" struct (often called coimage or coi in the code). It is just a wrapper around the matrix with some extra fields, for example:

    coi = 
        im: [240x320 double]
        origin: [30 20]
        label: 'img00005'
        level: 1

The level field will be useful later to keep track of which pyramid-level this image represents.

The directory coi/ contains functions to create and manipulate these images. See the constructor function coimage.m for more information.

The class FaceSequence (coi/FaceSequence.m) is there to simplify the access to the tracking sequences downloaded above. To initialize:

    fs = FaceSequence('path/to/data/girl');

Then, to access e.g. the 5th image and read it into a coimage structure, you can do:

    coi = fs.readImage(5);

(NOTE: this is probably the image img00004.png - don't be confused.) If you want to access a sequence of images, say every 3rd image starting from the 2nd, do:

    fs.next=2; fs.step=3;
    coi1 = fs.readNextImage(); % the 2nd
    coi2 = fs.readNextImage(); % the 5th
    ...

Additionally, fs contains the "ground truth" rectangles stored with the clip in

     rect = fs.gt_rect(1,:); % rectangle for 1st image

Beware: only some frames have valid ground truth rectangles, otherwise this rect = [0 0 0 0];

In the coi/ subdirectory, write the function

    drawFaceSequence(fs, from, step, number, rects)

that will display an animation of the video with the rectangles from rects drawn on it. Use the pause() command in your for loop to make sure Matlab draws each frame and it is is not shown too fast. For example, including pause(0.2) should display the animation at roughly 5 frames per second.

Part 2: LK at a single level

Review slides 24-27 of the Lukas-Kanade lecture. The algorithm repeatedly warps the "current" image backward according to the current motion to be similar to the "previous" image inside the area defined by the "previous rectangle".

Let $N$ be the number of pixels inside the rectangle. Recall that we need to compute a motion update $x = (u,v)$ (for translation-only motion) that satisfies: $$ Ax=b $$ where $A$ is an $N \times 2$ matrix, each of whose rows is the image gradient $(dx, dy$ at some pixel of the previous image, and $b$ is an $N \times 1$ column vector of errors (image intensity differences) between the previous and current image. To solve this with least squares, we compute $$ x = (A^T A)^{-1} A^T b. $$

The new combined motion (in motion + update) should be

    new_mot_u = mot_u + x_u;
    new_mot_v = mot_v + x_v;

Notice that $A$, hence $(A^T A)^{-1}$, does not change between iterations: we only need to re-warp the current image according to the updated motion.

The function LKonCoImage implements the LK algorithm on a single level:

    [mot, err, imot] = LKonCoImage(prevcoi, curcoi, prect, init_mot, params)

A default set of params can be generated with LKinitParams();

2.1 (u,v) motion

Finish the missing code inside LKonCoImage.m, to estimate the $(u,v)$ motion increment. You need to edit 4 places:

compute $A$. (Hint: for a matrix m, m(:) unrolls it into a column vector)
compute $A^T A$ and its inverse (variables AtA and AtAinv). (Hint: use the function pinv()).
compute mot_update using the above equations and It (previous-current)
update the motion mot based on mot_update

Now call LKonCoImage on pairs of nearby frames, using the ground truth rectangle fs.gt_rect(i,:) as the prect. The input motion can be blank: [0 0 0 x0 y0].

Note: for the scale estimation (which you will implement below) to be numerically stable, [x0,y0] should be close to the center of the rectangle (rectCenter() can help here).

If you set the show_fig parameter to 1, you should be able to see the animation of the iterations.

2.2 (u,v,s) motion

Now we will implement motion with scale. The formulas are very similar: $x = (u, v, s)$ is the mot_update we want to solve for, and so each row of $A$ has another column: $$ A_i = (dx_i \; dy_i \; ww_i) $$ where $ww = dx \cdot (x-x_0) + dy \cdot (y-y_0)$ for each pixel inside the rectangle. (Thus, $A^T A$ will be a $3 \times 3$ matrix.)

Finally the motion update is a bit more complex:

    new_mot_u = mot_u + x_u + x_u * mot_s;
    new_mot_v = mot_v + x_v + x_v * mot_s;
    new_mot_s = mot_s + x_s + mot_s * x_s;

Finish the code in LKonCoImage to deal with the case do_scale=1

2.3 Experimenting with single-scale LK

Experiment with LKonCoImage and answer the following questions:

How far from the correct motion (in u, v, and s) can a single level of LK typically recover? Hint: play with the input motion.
How consistent is the result (when it can converge) given different starting motions? Can you make it more consistent by changing the exit conditions - the parameters: max_iter, uvs_min_significant, and err_change_thr. Explain what these do and whether/how changing them changes the accuracy.

Part 3: Multi-level LK

The function LKonPyaramid.m implements the multi-resolution LK. It should call LKonCoImage() for each level of an Gaussian image pyramid. The image pyramid is built using

    pyr = coPyramid(coi, levs);

Each time we go "up" a level the image is subsampled by a factor of 2. The origin of the coordinate system is kept at the same pixel, but the width and height of any object is reduced to 1/2. A feature at coordinate x_L2 on level 2 would have coordinate x_L1 = x_L2 * 2; on level 1. The function rectChangeLevel() implements level conversions for rectangles.

3.1 Picking active levels

The first task is to figure out which levels of the pyramid we want to work on. The function defineActiveLevels should return a vector [L_start : L_end], inclusive, of the levels we can work on. L_start for now should be the lowest level of the given pyramid (since we are willing to work on very big images), but the top level needs to be computed so that the number of pixels in the area of the rectangle is not smaller than specified in the parameter min_pix(1).

Fill in defineActiveLevels.m

3.2 Updating the motion

We also need to update the motion as we move from level to level. This is done by calling the function uvsChangeLevel() (in the uvs/ subdirectory). Implement the body of this function.

3.3 Experimenting with multi-scale LK

Experiment with LKonPyramid on various examples. How much further can init_motion be from the true motion?

Part 4: Incremental tracking for a sequence

Function LKonSequence runs LKonPyramid on a sequence in an incremental way: 1→2, 2→3, 3→4, ...

Implement the missing lines in the loop of LKonSequence.m. The code should call LKonPyramid for each pair of images, record the resulting motion and rectangle in the mots and rects matrices, and update init_motion for the next frame. You can assume that the motion to the next frame is similar to the motion to the current frame, perhaps with some dampening. Recall that the input motion's center $(x_0,y_0)$ should be close to the center of prect.

How long can you track, and under what situations does it break?

Submitting

Please submit only your changed .m files, and a few images showing the results of your experiments (e.g., video frames with correctly or incorrectly tracked rectangles). Please include one README.txt or README.html file containing a description of all experiments and submitted images, the answers to all questions, and any relevant implementation notes. Please be sure that you do not submit a copy of the data.

The Dropbox link to submit your assignment is here.

Last update 8-Dec-2016 16:15:00

smr at princeton edu