ECE 847 Digital Image Processing
|
|
Fall 2004
This course introduces students to the basic concepts, issues, and algorithms in
digital image processing and computer vision. Topics include image formation,
projective geometry, convolution, Fourier analysis and other transforms,
pixel-based processing, segmentation, texture, detection, stereo, and motion.
The goal is to equip students with the skills and tools needed to manipulate
images, along with an appreciation for the difficulty of the problems. Students
will implement several standard algorithms, evaluate the strengths and weakness
of various approaches, and explore a topic of their own choosing in a course
project.
Syllabus
- 8/17/04: Welcome!
- 8/20/04: Matlab diary from today's lecture;
miscellaneous
computer vision resources
- 8/23/04: Matlab
diary from today's tutorial session
- Resources:
- Deadline for declarations of intent to contribute to vislib: Wednesday
10/20/04, in class; bring written description of function(s) and interface(s).
- stereo ppt slides are in MyCLE under ClassMaterials
Week
| Topic
| Assignment
|
1
| Pixel-based processing I
|
HW1: Warm-up, due 8/27 |
2
| Pixel-based processing II
|
HW2: Processing pixels, due 9/3 |
3
| Filters and edge detection I
|
|
4
| Filters and edge detection II
|
HW3: Point distribution models, due 9/17 |
5
| Stereo |
|
6
| Motion |
HW4: Head tracking, due 10/1 |
7
| Transforms I
|
|
8
| Transforms II
|
HW5: Stereo matching, due 10/15 |
9
| Pattern detection |
|
10
| Segmentation |
HW6: Lucas-Kanade tracking, due 10/29 |
11
| Texture |
start projects |
12
| Projective geometry
|
|
13
| Image formation
|
|
14
| Advanced topics
|
|
15
| Project presentations
|
projects due |
In your final project, you will investigate some area of image processing or computer vision in more detail. Typically
this will involve formulating a problem, reading the literature, proposing a solution, implementing the solution,
evaluating the results, and communicating your findings. In the case of a survey project, the quality and depth of
the literature review should be increased significantly to compensate for the lack of implementation.
Project deadlines:
- 11/19/04: progress report (1 page)
- 12/3/04: final written
report (up to 5 pages)
- 12/7/04: final oral presentation in class during
final exam slot, 1:00-4:00 (5 minutes)
- 12/9/04: peer reviews of written
reports and oral presentations
Final project reports from our Fall 2004 internal mini-conference:
- Particle filter tracking with Isomap
- Image registration
- Background subtraction using color and gradient
information
- Shape descriptor using polar plot for shape
recognition
- 3D photography using shadows
- Digital image restoration techniques and automation
- Elliptical head tracker using intensity gradients and
texture histograms
- Image Compression: Review and Comparison of Haar
Wavelet Transform and Vector Quantization
- Facial Feature Detection and Gaze Tracking
- Fingerprint image enhancement and minutiae extraction
- Real time head tracker using frame grabber and a
webcam
- Reading Barcodes Etched on Shiny Surfaces Using Basic
Image Processing
- Face recognition using eigenfaces
- Skeet tracking: counting hits and misses
- Simultaneous localisation and mapping with a single
camera
- Semi-automated feature based image morphing using
graph cuts
- Image Morphing: Feature based, View and Mesh
- Head tracking using stereo
- Computer vision for computer games
- Tracking of objects in spatiotemporal volume by
graph-cuts
- Hand recognition and tracking
- Star sensor: Star pattern recognition
- (omitted)
- Image synthesis for a CT simulator
- Vehicle boundary detection and tracking
- Tracking using intensity gradients and particle
filtering
- Iris recognition for personal identification
- Image morphing: a comparative study
- Head tracking using learned linear subspaces
Homeworks:
- HW#1 (due 8/27; report due by 4:30pm in my office, code should be emailed to
the grader by
11:59pm):
- Write code to detect and count the oranges in
oranges1.pgm (or, in BMP format, oranges1.bmp).
- If you're using Matlab, then the code should be in a single file named username_hw1.m
(with your Clemson user id replacing username) and emailed to the GA with the
subject line "ECE 847 HW1" (without quotes). The code should read
'oranges1.pgm' in the same directory as the '.m' file (hardcode the filename but
not the path), and it should display one figure containing a color-coded image with a
different color for each region, and a cross at the center of each orange (the
cross should be a different color from all the regions). If you use
erosion, write your own code to do so (do not use Matlab's erode.m). To
ensure that your code is robust, the GA will run
your code on oranges1.pgm and on a similar image whose name will be changed to
oranges1.pgm. For this homework, all functions in the image processing
toolbox are disallowed (except imread and imwrite), unless
stated otherwise. All standard Matlab functions, including conv, conv2,
filter2, imagesc, colormap, etc, are okay.
- Note: Matlab is terribly inefficient with for-loops. As a
result, try to eliminate as many for-loops as you can by parallelizing your code
whenever possible. For example, erosion using 2-neighbors in the
horizontal direction can be accomplished without any for-loops as follows:
tmp1 = [im(:,2:end) zeros(size(im,1),1)]; % shift left
tmp2 = [zeros(size(im,1),1) im(:,1:end-1)]; % shift right
out = im & tmp1 & tmp2; % erode
Use this example code to help parallelize your own erosion and connected
components code. If you are unable to achieve reasonable efficiency this
way, then use bwlabel in the code you email the GA so that your code can be
graded, but also attach a file called username_hw1_slow.m that
contains your slow implementation. In your report include just the slow
implementation. You will not receive full credit if you use bwlabel, but
you will receive substantial partial credit. On the other hand, if
you only submit one file and that file takes an unreasonable amount of time
(more than a couple of minutes or so), then it will not be graded.
- If you're using C or C++, then email a Windows executable named
username_hw1 (with your Clemson user id replacing username) to the
grader. Be sure to remove the .exe extension so that it is not blocked by
email delivery. The code should read 'oranges1.pgm' in the same directory
as the executable and write the result to 'output.ppm' in the same directory.
The executable should take no parameters. To help you, here is some C code
for reading / writing PGM and PPM images: pnmio.h
and pnmio.c]
- Write a report describing your approach, including the actual code,
resulting images, and lessons learned. Submit a hardcopy to the GA.
- HW#2 (due 9/3; hardcopy report due by 4:30pm in my office, code emailed to
the grader by 4:30pm):
- Write code to automatically detect and classify the fruit in the image
fruit1.pgm (or, in BMP format, fruit1.bmp).
Trace the boundaries of the objects using a wall-following algorithm. Display the original image with a one-pixel-thick
boundary overlaid on each object. The color of the boundaries are
important: Red indicates apple, green indicates grapefruit, and yellow
indicates banana. Classification may be done using any method of your
choosing, including region size, elongation, convexity, etc.
- Note:
The apple boundaries will not come out good with a single threshold. One
solution is to use a double threshold. It is fine to hardcode threshold
values based upon your observations of this image. To help, I would
recommend thresholds of 120 and 100 for the double-thresholding. I'm
providing these numbers b/c you don't have access to the second image, and the
second image happens to be a bit brighter than the first. If your second
threshold is smaller than 100 (e.g., 80), then much of the background will blend
with one of the apples. Be sure not to double-threshold the bananas and
grapefruit or you will again get too much background. For this HW, it's
fine to hardcode the thresholds, sizes, eccentricities, etc. Just be sure to try
to choose your thresholds as conservatively as possible b/c you don't have
access to the second image. In the second image the camera is approximately the
same distance away, so the sizes and eccentricities will be similar. The
lighting is as similar as possible, but for some reason the image is lighter.
Observations such as these should be in your "lessons learned".
- If
you're using Matlab, then your code should be in a single file named
username_hw2.m (with your Clemson user id replacing username) and emailed to the
GA with the subject line "ECE 847 HW2" (without quotes). If you choose to
use multiple .m files, then start the names of the other files with the prefix
username_hw2 to avoid conflict with other students' files. Note: For
this homework, you are free to use any command in Matlab's Image Processing
Toolbox (up to version 6.5, except for regionprops), as well as any other Matlab
command. For example, you may use imerode and bwlabel but not
bwtraceboundary and regionprops. Your code should read 'fruit1.pgm' in the
same directory as the '.m' file (hardcode the filename but not the path) and
display in figure 1 the original image with a single-pixel-thick boundary
overlaid on each object, with the appropriate color. Your goal is to get
the color of each boundary correct, as well as to get the boundary location as
accurate as possible.
- Because of the principal components analysis (for elongatedness), C and C++ are not recommended
for this homework. If you choose to do so, however, that is fine.
Just email a Windows executable named
username_hw2 (with your Clemson user id replacing username) to the
grader. Be sure to remove the .exe extension so that it is not blocked by
email delivery. The code should read 'fruit1.pgm' in the same directory
as the executable and write the result to 'output.ppm' in the same directory.
The executable should take no parameters, and the output images should appear as
described in the preceding Matlab section. To help you, C code for reading
/ writing PGM and PPM images can be found above.
- To ensure that your code is robust, the GA will run your detection code on
fruit1.pgm and on a similar image whose name will be changed to fruit1.pgm.
- Write a report describing your approach, including the actual code,
resulting images, and lessons learned.
-
HW#3 (due 9/17; hardcopy report due by 4:30pm in my office, code emailed to
the grader by 4:30pm):
- Off-line, construct a point distribution model (PDM) of a banana. Run your detection
and wall-following code on
bananas1.pgm (or
bananas1.bmp) to obtain the boundary of each banana as a list of points.
Align the boundaries of all the bananas by overlaying their centroids, aligning their orientations, and
normalizing for scale; subsample each boundary to obtain 100 equally-spaced boundary points. Display all these shapes overlaid in a single figure. Perform principal components analysis (PCA) on the boundaries. How many
components do you think should be retained? Plot the mean banana shape,
along with the +/- one-sigma shape for each of the first three components.
Use Matlab's subplot command to show all the variations in one figure. Now
perturb one of the training shapes using Gaussian noise on all the boundary
points, as well as a larger amount of noise on one of the points. In each
case, project the perturbed shape to the manifold to get the closest shape in
the manifold. Display both shapes overlaid for both cases.
- Extra
credit: Now
detect the boundaries of the objects in banana_stuff1.pgm (or,
banana_stuff1.bmp), and project the boundaries
onto your
banana principal axes. (Note: The toy ringed fish at the top of the
image, and the speckled moon near the center, will not threshold well; you are
not expected to get their boundaries.) What do you notice? Display the banana
projection overlaid on each object.
- Your code should
- display, in figure 1, all the shapes used for training overlaid on top of
one another, aligned and normalized for scale.
- display, in figure 2, four subplots, each showing the mean outline
along with the +/- one-sigma variation (+/- sqrt(lambda_i), where lambda_i is
the ith eigenvalue) in the ith principal component,
i=1,2,3,4.
- display, in figure 3, two subplots, each showing the perturbed shape
overlaid on the closest shape in the manifold. One subplot shows small
perturbations of all the points, while the other shows a large perturbation for
one of the points.
- extra credit: read 'banana_stuff1.pgm' and display in figure 4 the boundaries of the
banana projections overlaid on the image at the correct location, scale, and
orientation for each object.
- For this homework, the GA will only run your code on the images provided.
- Write a report describing your approach, including the actual code,
resulting images, and analysis of the results. Be sure to describe which parts of
the process were automated and which were manual. For any steps that you
were unable to automate, outline some possible approaches to automate it given
more time.
- (Note about grading: It is very important for this
assignment that you actually build a PDM model of a banana and project the
objects in the other image to the model. It is also very important that
you wrestle with the problem of automating the process, which you will
demonstrate in two ways: (1) actual code to automate some parts of the
process, and (2) describing in your report various approaches that could be used
to solve the problem along with their strengths and weaknesses. If you
tried something that failed, explain why it failed. Or if you have an idea
but don't have time to implement it, describe it and evaluate how well you think
it would do if you had more time. Automate the process as
much as you are able, but use manual processing wherever it is necessary for you
to complete the assignment in a reasonable amount of time.)
- HW#4 (due 10/1; hardcopy report due by 4:30pm in my office, code emailed to
the grader by 4:30pm):
- In this assignment, you will write code to track a person's head through a
video sequence using the gradient magnitude around the perimeter of the head,
modeled as an ellipse. The algorithm is described in detail in
"An Elliptical Head Tracker" (1997). You will run the code on the
images in the video sequence melissa_head.zip.
Here are the steps you should follow:
- First write code to compute the points along the boundary of a
vertically-oriented ellipse given its center point and the length of its two
axes. Fix the aspect ratio to a hard-coded value (1.4 is the suggested
value, even though the paper says 1.2). Also write code to compute the sum
of the gradient magnitude around the perimeter of an ellipse. Run this
code over all possible (x,y) positions in the first image of the sequence
(ignoring image boundaries), generating a likelihood map for the person's
location. Use a single scale determined manually. Plot the
likelihood map in one figure, and the most likely ellipse location in another
figure (overlaying the ellipse on the grayscale image).
- Now write code to compute the location of the head in frame i+1 given its
location in frame i, using a local search around the current location varying x,
y, and scale. Run your tracker over the entire sequence, displaying the
best location in each image. Use pause(0.3) after each display, so that
you can see the results for each image. Evaluate the algorithm's
performance with and without constant-velocity prediction.
- Augment the code to use, instead of the gradient magnitude, the absolute
value of the dot product
of the gradient with the ellipse normal. (The equations for this
version can be found in Section 3 of
"Elliptical Head Tracking Using Intensity Gradients and Color Histograms" (1998).) Run this tracker over the entire
sequence, with constant-velocity prediction. Evaluate the performance of
the algorithm.
- Your code should display, for each image as the tracker is running,
- the
likelihood map of the image at the current scale in figure 1;
- the likelihood map at the two adjacent scales in figures 2 and 3,
respectively;
- the ellipse with the maximum score overlaid on the
grayscale image in figure 4.
- For this homework, the GA will only run your code on the images provided.
- Write a report describing your approach, including the actual code,
resulting images, and analysis of the results. Include images of some
representative images from the sequence, but not every image of the entire
sequence. Explain the factors involved in choosing the search range, and
compare the dot product with the magnitude, and the constant-velocity prediction
with static prediction.
- If you're using Matlab, then
your code should be in a single file named
username_hw4.m (with your Clemson user id replacing username) and emailed to the
GA with the subject line "ECE 847 HW4" (without quotes). If you
choose to use multiple .m files, then start the names of the other
files with the prefix username_hw4 to avoid conflict with other
students' files. For this homework, you may NOT use Matlab's edge command;
instead compute the gradient yourself using conv2. If you're using C or
C++, then email a Windows executable named
username_hw4 (with your Clemson user id replacing username) to the
grader. Be sure to remove the .exe extension so that it is not blocked by
email delivery. The executable should take no parameters, reading the
images in the same directory
as the executable and writing the results to the same directory to output%04.ppm
(or displaying them on the screen). You may run your code on pgm images,
or another format, if that is more convenient than the jpg format provided.
- HW#5 (due 10/15; hardcopy report due by 4:30pm in my office, code emailed to
the grader by 4:30pm):
- In this assignment you will implement the basic correlation-based stereo
algorithm in Matlab and compare your output with those of more sophisticated
techniques. Recall that a (left) disparity map D(i,j) between a left
image L and a right image R which have been rectified is an
array such the pixel corresponding to L(i,j) in the right image is
R(i,j-D(i,j)). (Don't forget that matrices in Matlab are indexed by (row,column).)
- Write code to match two rectified stereo images using brute-force correlation search,
computing a disparity map. The disparity map should be the same size as the two
input images, although the values at the left edge will be erroneous. Use SSD
(sum-of-squared differences) correlation and match from left to right (i.e., for
each window in the left image, search in the right image, so that the disparity
map is with respect to the left image). Use a window size of 5x5.
Hint: Do NOT write five nested for loops. Instead, to make the
code more efficient for Matlab, first compute the dissimilarities for all
the pixels and convolve them with a summing kernel (all ones) in both
directions, storing the result in a 3D "matrix." (The summing kernel plays the
role of the correlation window.) Then use the [val,ind]=min(,,3)
function to find the disparity map. There should be just one for loop (over
disparity).
- Run your code on tsukuba_left.tif and
tsukuba_right.tif with maxdisp=15 and display it with imagesc.
What kind of errors do you see? Play with some of the parameters. For example, try setting maxdisp=8.
Try SAD (sum-of-absolute-differences) instead of SSD. Set the window size to
20x20. Try filtering the images before correlation using a Laplacian-of-Gaussian
kernel (i.e., convolve with [0 -1 0 ; -1 4 -1 ; 0 -1 0]). How do these changes
affect the output? Now run the algorithm on lamp_left.tif and
lamp_right.tif with maxdisp=6. What happens? Why is this image
difficult?
- Write code that implements
the left-right consistency check, which retains a
value in the left disparity map only if the corresponding point in the right
disparity map yielded the negative of that disparity. The resulting disparity
map should be valid only at the pixels that pass the consistency check; set
other pixels to zero. (Hint: Convert the left-right disparity map to a
vector and add it to a ramp vector; then check whether right-left disparity
map evaluated at this vector is the negative of the left-right disparity map.)
- Take a look at the results of the latest stereo research at
http://www.middlebury.edu/stereo
(click on the "Results" tab). Look only at the first
column (all) under the column Tsukuba. What errors do you
see in the best algorithm (the one [3] with minimum error in this column)?
Find the output of another algorithm that looks better to you. What does this
tell you about the difficulty of finding meaningful objective criteria to
evaluate such algorithms? What is the primary error of the dynamic programming
technique [1d]? How well does real-time correlation ([7]) fare on this image
(i.e., what is its rank compared to more expensive methods)?
- Your code should read the files tsukuba_left.tif and tsukuba_right.tif and
output
- in Figure 1, the disparity map computing using left-to-right correlation
- in Figure 2, the disparity map computed using correlation with the left-right consistency
check
For all the parameters (SAD vs. SSD, LoG prefilter or not, window size, etc.),
choose values that produce the best results.
- Write a report describing your approach, including the actual code,
resulting images, and analysis of the results.
Homework procedures: Collaboration is allowed and encouraged, but each of
you must submit his or her own work. Copying and pasting of others' code
is not allowed. If you collaborate, please indicate the name of your collaborator(s) on the top of your report. Reports should be clear, concise,
and coherent, as well as neatly formatted (typewritten is preferred, but
handwritten is okay as long as it's neat and legible). Please write your
user ID on any written report. Late homework will
lose 10 points per day up to four days. Code that does not conform to the
stated format will result in points deducted (and may require a late
resubmission incurring late penalties).
- HW#6 (due 11/5; hardcopy report due by 4:30pm in my office, code emailed to
the grader by 4:30pm):
- In this assignment you will implement the basic 2D Lucas-Kanade tracking
algorithm with automatic feature selection. You will be using the image
sequence statue.zip.
- Compute the minimum eigenvalue of the 2x2 gradient matrix for each
pixel in the first image (statue583.bmp), ignoring the boundary pixels.
Write code to compute the eigenvalue yourself, i.e., don't use Matlab's eigen or
svd function. Use a 5x5 window or any other size that works. Display
the results as a likelihood image, that is, the likelihood that each pixel is a
good feature.
- From this likelihood map, automatically determine the 50 best features while
enforcing a minimum distance of 5 pixels between the center points of any pair
of features. Plot these 50 features overlaid on the original image, using something like
clf; imagesc(img); hold on; plot(x,y,'ro');.
- Track 51 features (the 50 best along with one feature in the untextured sky
-- okay to hardcode this location) throughout the image sequence using the Lucas-Kanade
algorithm, with a translation only model. Solve Zd=e iteratively for each
pair of consecutive images. Declare convergence when the incremental
displacement is less than 0.1 pixels or the number of iterations exceeds 10.
Declare a feature lost if the final residue is greater than a threshold. Display the features overlaid on the original images, using
pause(0.1) each time to enable the images to be visible during the execution of the code.
- Submit code that reads the images in the sequence and displays
- In Figure 1, the likelihood map
- In Figure 2, the 50 best features in the first frame
- In Figure 3, the locations of the 51 (or fewer, after some are lost)
features in each image frame as the sequence progresses
- If you're using Matlab, then your code should be in a single file named
username_hw6.m (with your Clemson user id replacing username) and emailed to the
GA with the subject line "ECE 847 HW6" (without quotes). If you choose to
use multiple .m files, then start the names of the other files with the prefix
username_hw6 to avoid conflict with other students' files.
- Write a report describing your approach, including the actual code,
resulting images, and analysis of the results.
Homework grading:
- A. Report is coherent, concise, clear, and neat, with correct
grammar and punctuation. Code works correctly the first time and
achieves good results on both images.
- B. Report adequately
describes the work done, and code generally produces good results. There
are a small number of defects either in the implementation or the writeup, but
the essential components are there.
- C. Report or code are
inadequate. The report contains major errors or is illegible, the code
does not produce meaningful results or does not run at all, or instructions are
not followed.
- D or F. Report or code not attempted, not turned
in, or contain extremely serious deficiencies.
Instructor: Stan Birchfield, 207-A Riggs Hall, 656-5912, email: stb at clemson
Grader: Sunil Guduru, email: guduru at clemson (Note: Please
use this email address, not Sunil's regular one)
Lectures: 1:25 - 2:15 MWF, 307 Riggs Hall