ECE 847 Digital Image Processing

Fall 2004

This course introduces students to the basic concepts, issues, and algorithms in digital image processing and computer vision. Topics include image formation, projective geometry, convolution, Fourier analysis and other transforms, pixel-based processing, segmentation, texture, detection, stereo, and motion. The goal is to equip students with the skills and tools needed to manipulate images, along with an appreciation for the difficulty of the problems. Students will implement several standard algorithms, evaluate the strengths and weakness of various approaches, and explore a topic of their own choosing in a course project.

Syllabus

Announcements

8/17/04: Welcome!
8/20/04: Matlab diary from today's lecture; miscellaneous computer vision resources
8/23/04: Matlab diary from today's tutorial session
Resources:
- connected components: http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/OWENS/LECT3/node2.html
- principal compnents analysis (PCA): A tutorial on Principal Components Analysis
- active shape models (ASMs):
  - T.F. Cootes, D. Cooper, C.J. Taylor and J. Graham, "Active Shape Models - Their Training and Application." Computer Vision and Image Understanding. Vol. 61, No. 1, Jan. 1995, pp. 38-59
  - http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/COOTES/pdms.html
  - http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/COOTES/asms.html
  - http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/BMVA96Tut/node37.html
Deadline for declarations of intent to contribute to vislib: Wednesday 10/20/04, in class; bring written description of function(s) and interface(s).
stereo ppt slides are in MyCLE under ClassMaterials

Schedule

Week Topic Assignment
1 Pixel-based processing I HW1: Warm-up, due 8/27
2 Pixel-based processing II HW2: Processing pixels, due 9/3
3 Filters and edge detection I
4 Filters and edge detection II HW3: Point distribution models, due 9/17
5 Stereo
6 Motion HW4: Head tracking, due 10/1
7 Transforms I
8 Transforms II HW5: Stereo matching, due 10/15
9 Pattern detection
10 Segmentation HW6: Lucas-Kanade tracking, due 10/29
11 Texture start projects
12 Projective geometry
13 Image formation
14 Advanced topics
15 Project presentations projects due

Projects

In your final project, you will investigate some area of image processing or computer vision in more detail. Typically this will involve formulating a problem, reading the literature, proposing a solution, implementing the solution, evaluating the results, and communicating your findings. In the case of a survey project, the quality and depth of the literature review should be increased significantly to compensate for the lack of implementation.

Project deadlines:

11/19/04: progress report (1 page)
12/3/04: final written report (up to 5 pages)
12/7/04: final oral presentation in class during final exam slot, 1:00-4:00 (5 minutes)
12/9/04: peer reviews of written reports and oral presentations

Final project reports from our Fall 2004 internal mini-conference:

Assignments

Homeworks:

HW#1 (due 8/27; report due by 4:30pm in my office, code should be emailed to the grader by 11:59pm):
- Write code to detect and count the oranges in oranges1.pgm (or, in BMP format, oranges1.bmp).
  - If you're using Matlab, then the code should be in a single file named username_hw1.m (with your Clemson user id replacing username) and emailed to the GA with the subject line "ECE 847 HW1" (without quotes). The code should read 'oranges1.pgm' in the same directory as the '.m' file (hardcode the filename but not the path), and it should display one figure containing a color-coded image with a different color for each region, and a cross at the center of each orange (the cross should be a different color from all the regions). If you use erosion, write your own code to do so (do not use Matlab's erode.m). To ensure that your code is robust, the GA will run your code on oranges1.pgm and on a similar image whose name will be changed to oranges1.pgm. For this homework, all functions in the image processing toolbox are disallowed (except imread and imwrite), unless stated otherwise. All standard Matlab functions, including conv, conv2, filter2, imagesc, colormap, etc, are okay.
    - Note: Matlab is terribly inefficient with for-loops. As a result, try to eliminate as many for-loops as you can by parallelizing your code whenever possible. For example, erosion using 2-neighbors in the horizontal direction can be accomplished without any for-loops as follows:
      
      tmp1 = [im(:,2:end) zeros(size(im,1),1)]; % shift left
      tmp2 = [zeros(size(im,1),1) im(:,1:end-1)]; % shift right
      out = im & tmp1 & tmp2; % erode
      
      Use this example code to help parallelize your own erosion and connected components code. If you are unable to achieve reasonable efficiency this way, then use bwlabel in the code you email the GA so that your code can be graded, but also attach a file called username_hw1_slow.m that contains your slow implementation. In your report include just the slow implementation. You will not receive full credit if you use bwlabel, but you will receive substantial partial credit. On the other hand, if you only submit one file and that file takes an unreasonable amount of time (more than a couple of minutes or so), then it will not be graded.
  - If you're using C or C++, then email a Windows executable named username_hw1 (with your Clemson user id replacing username) to the grader. Be sure to remove the .exe extension so that it is not blocked by email delivery. The code should read 'oranges1.pgm' in the same directory as the executable and write the result to 'output.ppm' in the same directory. The executable should take no parameters. To help you, here is some C code for reading / writing PGM and PPM images: pnmio.h and pnmio.c]
- Write a report describing your approach, including the actual code, resulting images, and lessons learned. Submit a hardcopy to the GA.
HW#2 (due 9/3; hardcopy report due by 4:30pm in my office, code emailed to the grader by 4:30pm):
- Write code to automatically detect and classify the fruit in the image fruit1.pgm (or, in BMP format, fruit1.bmp). Trace the boundaries of the objects using a wall-following algorithm. Display the original image with a one-pixel-thick boundary overlaid on each object. The color of the boundaries are important: Red indicates apple, green indicates grapefruit, and yellow indicates banana. Classification may be done using any method of your choosing, including region size, elongation, convexity, etc.
- Note: The apple boundaries will not come out good with a single threshold. One solution is to use a double threshold. It is fine to hardcode threshold values based upon your observations of this image. To help, I would recommend thresholds of 120 and 100 for the double-thresholding. I'm providing these numbers b/c you don't have access to the second image, and the second image happens to be a bit brighter than the first. If your second threshold is smaller than 100 (e.g., 80), then much of the background will blend with one of the apples. Be sure not to double-threshold the bananas and grapefruit or you will again get too much background. For this HW, it's fine to hardcode the thresholds, sizes, eccentricities, etc. Just be sure to try to choose your thresholds as conservatively as possible b/c you don't have access to the second image. In the second image the camera is approximately the same distance away, so the sizes and eccentricities will be similar. The lighting is as similar as possible, but for some reason the image is lighter. Observations such as these should be in your "lessons learned".
- If you're using Matlab, then your code should be in a single file named username_hw2.m (with your Clemson user id replacing username) and emailed to the GA with the subject line "ECE 847 HW2" (without quotes). If you choose to use multiple .m files, then start the names of the other files with the prefix username_hw2 to avoid conflict with other students' files. Note: For this homework, you are free to use any command in Matlab's Image Processing Toolbox (up to version 6.5, except for regionprops), as well as any other Matlab command. For example, you may use imerode and bwlabel but not bwtraceboundary and regionprops. Your code should read 'fruit1.pgm' in the same directory as the '.m' file (hardcode the filename but not the path) and display in figure 1 the original image with a single-pixel-thick boundary overlaid on each object, with the appropriate color. Your goal is to get the color of each boundary correct, as well as to get the boundary location as accurate as possible.
- Because of the principal components analysis (for elongatedness), C and C++ are not recommended for this homework. If you choose to do so, however, that is fine. Just email a Windows executable named username_hw2 (with your Clemson user id replacing username) to the grader. Be sure to remove the .exe extension so that it is not blocked by email delivery. The code should read 'fruit1.pgm' in the same directory as the executable and write the result to 'output.ppm' in the same directory. The executable should take no parameters, and the output images should appear as described in the preceding Matlab section. To help you, C code for reading / writing PGM and PPM images can be found above.
- To ensure that your code is robust, the GA will run your detection code on fruit1.pgm and on a similar image whose name will be changed to fruit1.pgm.
- Write a report describing your approach, including the actual code, resulting images, and lessons learned.
HW#3 (due 9/17; hardcopy report due by 4:30pm in my office, code emailed to the grader by 4:30pm):
- Off-line, construct a point distribution model (PDM) of a banana. Run your detection and wall-following code on bananas1.pgm (or bananas1.bmp) to obtain the boundary of each banana as a list of points. Align the boundaries of all the bananas by overlaying their centroids, aligning their orientations, and normalizing for scale; subsample each boundary to obtain 100 equally-spaced boundary points. Display all these shapes overlaid in a single figure. Perform principal components analysis (PCA) on the boundaries. How many components do you think should be retained? Plot the mean banana shape, along with the +/- one-sigma shape for each of the first three components. Use Matlab's subplot command to show all the variations in one figure. Now perturb one of the training shapes using Gaussian noise on all the boundary points, as well as a larger amount of noise on one of the points. In each case, project the perturbed shape to the manifold to get the closest shape in the manifold. Display both shapes overlaid for both cases.
- Extra credit: Now detect the boundaries of the objects in banana_stuff1.pgm (or, banana_stuff1.bmp), and project the boundaries onto your banana principal axes. (Note: The toy ringed fish at the top of the image, and the speckled moon near the center, will not threshold well; you are not expected to get their boundaries.) What do you notice? Display the banana projection overlaid on each object.
- Your code should
  - display, in figure 1, all the shapes used for training overlaid on top of one another, aligned and normalized for scale.
  - display, in figure 2, four subplots, each showing the mean outline along with the +/- one-sigma variation (+/- sqrt(lambda_i), where lambda_i is the ith eigenvalue) in the ith principal component, i=1,2,3,4.
  - display, in figure 3, two subplots, each showing the perturbed shape overlaid on the closest shape in the manifold. One subplot shows small perturbations of all the points, while the other shows a large perturbation for one of the points.
  - extra credit: read 'banana_stuff1.pgm' and display in figure 4 the boundaries of the banana projections overlaid on the image at the correct location, scale, and orientation for each object.
- For this homework, the GA will only run your code on the images provided.
- Write a report describing your approach, including the actual code, resulting images, and analysis of the results. Be sure to describe which parts of the process were automated and which were manual. For any steps that you were unable to automate, outline some possible approaches to automate it given more time.
- (Note about grading: It is very important for this assignment that you actually build a PDM model of a banana and project the objects in the other image to the model. It is also very important that you wrestle with the problem of automating the process, which you will demonstrate in two ways: (1) actual code to automate some parts of the process, and (2) describing in your report various approaches that could be used to solve the problem along with their strengths and weaknesses. If you tried something that failed, explain why it failed. Or if you have an idea but don't have time to implement it, describe it and evaluate how well you think it would do if you had more time. Automate the process as much as you are able, but use manual processing wherever it is necessary for you to complete the assignment in a reasonable amount of time.)
HW#4 (due 10/1; hardcopy report due by 4:30pm in my office, code emailed to the grader by 4:30pm):
- In this assignment, you will write code to track a person's head through a video sequence using the gradient magnitude around the perimeter of the head, modeled as an ellipse. The algorithm is described in detail in "An Elliptical Head Tracker" (1997). You will run the code on the images in the video sequence melissa_head.zip. Here are the steps you should follow:
  - First write code to compute the points along the boundary of a vertically-oriented ellipse given its center point and the length of its two axes. Fix the aspect ratio to a hard-coded value (1.4 is the suggested value, even though the paper says 1.2). Also write code to compute the sum of the gradient magnitude around the perimeter of an ellipse. Run this code over all possible (x,y) positions in the first image of the sequence (ignoring image boundaries), generating a likelihood map for the person's location. Use a single scale determined manually. Plot the likelihood map in one figure, and the most likely ellipse location in another figure (overlaying the ellipse on the grayscale image).
  - Now write code to compute the location of the head in frame i+1 given its location in frame i, using a local search around the current location varying x, y, and scale. Run your tracker over the entire sequence, displaying the best location in each image. Use pause(0.3) after each display, so that you can see the results for each image. Evaluate the algorithm's performance with and without constant-velocity prediction.
  - Augment the code to use, instead of the gradient magnitude, the absolute value of the dot product of the gradient with the ellipse normal. (The equations for this version can be found in Section 3 of "Elliptical Head Tracking Using Intensity Gradients and Color Histograms" (1998).) Run this tracker over the entire sequence, with constant-velocity prediction. Evaluate the performance of the algorithm.
- Your code should display, for each image as the tracker is running,
  - the likelihood map of the image at the current scale in figure 1;
  - the likelihood map at the two adjacent scales in figures 2 and 3, respectively;
  - the ellipse with the maximum score overlaid on the grayscale image in figure 4.
- For this homework, the GA will only run your code on the images provided.
- Write a report describing your approach, including the actual code, resulting images, and analysis of the results. Include images of some representative images from the sequence, but not every image of the entire sequence. Explain the factors involved in choosing the search range, and compare the dot product with the magnitude, and the constant-velocity prediction with static prediction.
- If you're using Matlab, then your code should be in a single file named username_hw4.m (with your Clemson user id replacing username) and emailed to the GA with the subject line "ECE 847 HW4" (without quotes). If you choose to use multiple .m files, then start the names of the other files with the prefix username_hw4 to avoid conflict with other students' files. For this homework, you may NOT use Matlab's edge command; instead compute the gradient yourself using conv2. If you're using C or C++, then email a Windows executable named username_hw4 (with your Clemson user id replacing username) to the grader. Be sure to remove the .exe extension so that it is not blocked by email delivery. The executable should take no parameters, reading the images in the same directory as the executable and writing the results to the same directory to output%04.ppm (or displaying them on the screen). You may run your code on pgm images, or another format, if that is more convenient than the jpg format provided.
HW#5 (due 10/15; hardcopy report due by 4:30pm in my office, code emailed to the grader by 4:30pm):
- In this assignment you will implement the basic correlation-based stereo algorithm in Matlab and compare your output with those of more sophisticated techniques. Recall that a (left) disparity map D(i,j) between a left image L and a right image R which have been rectified is an array such the pixel corresponding to L(i,j) in the right image is R(i,j-D(i,j)). (Don't forget that matrices in Matlab are indexed by (row,column).)
  - Write code to match two rectified stereo images using brute-force correlation search, computing a disparity map. The disparity map should be the same size as the two input images, although the values at the left edge will be erroneous. Use SSD (sum-of-squared differences) correlation and match from left to right (i.e., for each window in the left image, search in the right image, so that the disparity map is with respect to the left image). Use a window size of 5x5.
    Hint: Do NOT write five nested for loops. Instead, to make the code more efficient for Matlab, first compute the dissimilarities for all the pixels and convolve them with a summing kernel (all ones) in both directions, storing the result in a 3D "matrix." (The summing kernel plays the role of the correlation window.) Then use the [val,ind]=min(,,3) function to find the disparity map. There should be just one for loop (over disparity).
  - Run your code on tsukuba_left.tif and tsukuba_right.tif with maxdisp=15 and display it with imagesc. What kind of errors do you see? Play with some of the parameters. For example, try setting maxdisp=8. Try SAD (sum-of-absolute-differences) instead of SSD. Set the window size to 20x20. Try filtering the images before correlation using a Laplacian-of-Gaussian kernel (i.e., convolve with [0 -1 0 ; -1 4 -1 ; 0 -1 0]). How do these changes affect the output? Now run the algorithm on lamp_left.tif and lamp_right.tif with maxdisp=6. What happens? Why is this image difficult?
  - Write code that implements the left-right consistency check, which retains a value in the left disparity map only if the corresponding point in the right disparity map yielded the negative of that disparity. The resulting disparity map should be valid only at the pixels that pass the consistency check; set other pixels to zero. (Hint: Convert the left-right disparity map to a vector and add it to a ramp vector; then check whether right-left disparity map evaluated at this vector is the negative of the left-right disparity map.)
  - Take a look at the results of the latest stereo research at http://www.middlebury.edu/stereo (click on the "Results" tab). Look only at the first column (all) under the column Tsukuba. What errors do you see in the best algorithm (the one [3] with minimum error in this column)? Find the output of another algorithm that looks better to you. What does this tell you about the difficulty of finding meaningful objective criteria to evaluate such algorithms? What is the primary error of the dynamic programming technique [1d]? How well does real-time correlation ([7]) fare on this image (i.e., what is its rank compared to more expensive methods)?
  - Your code should read the files tsukuba_left.tif and tsukuba_right.tif and output
    - in Figure 1, the disparity map computing using left-to-right correlation
    - in Figure 2, the disparity map computed using correlation with the left-right consistency check
    For all the parameters (SAD vs. SSD, LoG prefilter or not, window size, etc.), choose values that produce the best results.
  - Write a report describing your approach, including the actual code, resulting images, and analysis of the results.
  Homework procedures: Collaboration is allowed and encouraged, but each of you must submit his or her own work. Copying and pasting of others' code is not allowed. If you collaborate, please indicate the name of your collaborator(s) on the top of your report. Reports should be clear, concise, and coherent, as well as neatly formatted (typewritten is preferred, but handwritten is okay as long as it's neat and legible). Please write your user ID on any written report. Late homework will lose 10 points per day up to four days. Code that does not conform to the stated format will result in points deducted (and may require a late resubmission incurring late penalties).
HW#6 (due 11/5; hardcopy report due by 4:30pm in my office, code emailed to the grader by 4:30pm):
- In this assignment you will implement the basic 2D Lucas-Kanade tracking algorithm with automatic feature selection. You will be using the image sequence statue.zip.
  - Compute the minimum eigenvalue of the 2x2 gradient matrix for each pixel in the first image (statue583.bmp), ignoring the boundary pixels. Write code to compute the eigenvalue yourself, i.e., don't use Matlab's eigen or svd function. Use a 5x5 window or any other size that works. Display the results as a likelihood image, that is, the likelihood that each pixel is a good feature.
  - From this likelihood map, automatically determine the 50 best features while enforcing a minimum distance of 5 pixels between the center points of any pair of features. Plot these 50 features overlaid on the original image, using something like clf; imagesc(img); hold on; plot(x,y,'ro');.
  - Track 51 features (the 50 best along with one feature in the untextured sky -- okay to hardcode this location) throughout the image sequence using the Lucas-Kanade algorithm, with a translation only model. Solve Zd=e iteratively for each pair of consecutive images. Declare convergence when the incremental displacement is less than 0.1 pixels or the number of iterations exceeds 10. Declare a feature lost if the final residue is greater than a threshold. Display the features overlaid on the original images, using pause(0.1) each time to enable the images to be visible during the execution of the code.
  - Submit code that reads the images in the sequence and displays
    - In Figure 1, the likelihood map
    - In Figure 2, the 50 best features in the first frame
    - In Figure 3, the locations of the 51 (or fewer, after some are lost) features in each image frame as the sequence progresses
  - If you're using Matlab, then your code should be in a single file named username_hw6.m (with your Clemson user id replacing username) and emailed to the GA with the subject line "ECE 847 HW6" (without quotes). If you choose to use multiple .m files, then start the names of the other files with the prefix username_hw6 to avoid conflict with other students' files.
  - Write a report describing your approach, including the actual code, resulting images, and analysis of the results.
Homework grading:
- A. Report is coherent, concise, clear, and neat, with correct grammar and punctuation. Code works correctly the first time and achieves good results on both images.
- B. Report adequately describes the work done, and code generally produces good results. There are a small number of defects either in the implementation or the writeup, but the essential components are there.
- C. Report or code are inadequate. The report contains major errors or is illegible, the code does not produce meaningful results or does not run at all, or instructions are not followed.
- D or F. Report or code not attempted, not turned in, or contain extremely serious deficiencies.

Administrivia

Instructor: Stan Birchfield, 207-A Riggs Hall, 656-5912, email: stb at clemson
Grader: Sunil Guduru, email: guduru at clemson (Note: Please use this email address, not Sunil's regular one)
Lectures: 1:25 - 2:15 MWF, 307 Riggs Hall

Week	Topic	Assignment
1	Pixel-based processing I	HW1: Warm-up, due 8/27
2	Pixel-based processing II	HW2: Processing pixels, due 9/3
3	Filters and edge detection I
4	Filters and edge detection II	HW3: Point distribution models, due 9/17
5	Stereo
6	Motion	HW4: Head tracking, due 10/1
7	Transforms I
8	Transforms II	HW5: Stereo matching, due 10/15
9	Pattern detection
10	Segmentation	HW6: Lucas-Kanade tracking, due 10/29
11	Texture	start projects
12	Projective geometry
13	Image formation
14	Advanced topics
15	Project presentations	projects due