ECE 847 Digital Image Processing
|
|
Fall 2005
This course introduces students to the basic concepts, issues, and algorithms in
digital image processing and computer vision. Topics include image formation,
projective geometry, convolution, Fourier analysis and other transforms,
pixel-based processing, segmentation, texture, detection, stereo, and motion.
The goal is to equip students with the skills and tools needed to manipulate
images, along with an appreciation for the difficulty of the problems. Students
will implement several standard algorithms, evaluate the strengths and weakness
of various approaches, and explore a topic of their own choosing in a course
project.
Syllabus
Week
| Topic
| Assignment
|
1
| Pixel-based processing |
HW1: Warm-up, due 9/2 |
2
| Pixel-based processing |
Quiz #1, 9/9 |
3
| Filters and edge detection |
HW2: Pixels and regions, due 9/16 |
4
| Filters and edge detection |
Quiz #2, 9/23 |
5
| Segmentation |
HW3: Canny edge detection, due 9/30 |
6
| Segmentation |
Quiz #3, 10/7 |
7
| Stereo |
HW4: Split-and-merge segmentation, due 10/14 |
8
| Stereo |
Quiz #4, 10/21 |
9
| Motion |
HW5: Stereo matching, due 10/28 |
10
| Motion |
Quiz #5, 11/4 |
11
| Image formation |
HW6: Lucas-Kanade tracking, due 11/11 |
12
| Projective geometry
|
Quiz #6, 11/18 |
13
| Projective geometry
|
|
14
| Color |
Quiz #7, 12/9 |
15
| Color |
projects due |
Readings to complement the lectures:
Computer vision in the news:
-
Help organizing your digital photos, CBS News, Feb. 9, 2006 (Riya)
- 'Silent drowning' pool girl saved by underwater cameras, Times Online, Aug. 31, 2005
- Courtrooms could host virtual crime scenes, New Scientist.com, March 10, 2005
-
Sportvision virtual first-down markers
-
Basketball buddies build a computerized shot doctor, USA Today, Feb. 7, 2003
(Noah Basketball)
-
Automotive applications:
-
Infiniti advanced lane departure warning system
-
Chrysler automobile uses CMOS cameras for smart headlights, IEEE Spectrum,
Apr. 2006 (Gentex SmartBeam)
-
Lexus uses computer vision
for automatic parallel parking, IEEE Spectrum, Apr. 2006 (Intelligent
Parking Assist)
-
Electronic vision unblocks
the 'blind spot', IEEE Spectrum, Apr. 2006 (Volvo's
Blind-Spot Information System)
-
Car, park thyself (Toyota's automatic parking feature), CBS News, Jan. 15,
2003
Vision in biological systems:
- P. Gurney,
Is our 'inverted' retina really 'bad design'?, Technical Journal,
1999
- C. Wieland,
Seeing back
to front, Creation, 1996 (see also
An eye for
creation, Creation, 1996)
- J. Sarfati,
Can it bee?,
Creation, 2003 -- honeybees using optic flow for navigation
-
Centeye -- obstacle avoidance using optic flow
- C. Stammers,
Trilobite technology, Creation, 1993
- S. M. Gon, The trilobite eye
- J. Sarfati,
Lobster eyes: brilliant geometric design, Creation, 2001
- Sight in
British garden birds
- Color vision in
birds
- P. Gurney, Our
eye movements and their control: Part 1, Technical Journal, 2002
- P. Gurney, Our
eye movements and their control: Part 2, Technical Journal, 2003
- C. Wieland,
New eyes
for blind cave fish, 2000
- T. Wagner,
Darwin vs.
the eye, Creation, 1994
In the assignments, you will implement several fundamental algorithms in C/C++,
documenting your findings is an accompanying report for each assignment.
The C/C++ languages are chosen for their fundamental importance, their ubiquity,
and their efficiency (which is crucial to image processing and computer vision).
For your convenience, you may use the latest version of the
Blepo computer vision library.
Assignments:
- HW#1 (due 9/2, 11:59pm):
- Implement the floodfill algorithm in C/C++. Create an executable
that allows the user to choose the filename and seed point; it is okay if you
hardcode the new color. The application should load the image from disk,
display the original image, run the algorithm, and display the resulting
image. (The specific interface is up to you: Either use command-line parameters,
such as: filename x y (in
that order), where 'filename' is the image filename and (x,y) are the
coordinates of the seed point; Or use a windows-based interface, such as
CFileDialog for selecting the file and GrabMouseClick for getting the seed
point.)
- To create a console app in Visual C++, follow these instructions: File -> New ->
Project -> Win32 Console Application. Give it a name and keep the checkbox
on "Create new workspace". Choose "An application that supports MFC." Now
compile and run (Build -> Build ..., and Build -> Execute, or F7 and
Ctrl-F5). Under FileView -> Source Files you will find the main cpp file.
(Also, I would recommend that you turn off Precompiled Headers: Project ->
Settings -> C/C++ -> Precompiled headers -> Not using precompiled headers.
Before you click on the radio button, though, first select All
configurations in the drop down box so that both Debug and Release versions
are affected.)
- Email the grader with the subject line "HW1" (without quotes). The email
should have the following files attached:
- username_hw1_exe -- This is the precompiled executable, but with
the extension removed so that it is not blocked by the email system.
- username_hw1_zip -- This is a zipped version of the folder
containing all the important files in your workspace directory. When the
grader unzips this file, it
should create a folder and put all the individual files inside it, so that he can just open the .dsw file and
compile. Before zipping the folder, be sure to delete the Debug and Release directories, as well as
the .opt, .ncb, and .clw files, since these files take up a lot of disk space and will be
automatically generated by VC++ anyway. To make grading easier, your
directory should be at the same level as the blepo directory (If your main file is directly inside your directory, then it will have
#include "../blepo/src/blepo.h"). Here is an example:
- all the source files that you actually wrote for the homework, attached
as individual text files (there will probably be only one or two of these).
- The image that the grader will use to test your code is
quantized.pgm and another image that is similar.
- For this assignment, you are encouraged to use the Blepo computer vision
library, a preliminary version of which can be found at the location emailed
to you. A tutorial on the library will be given in class. You may use
any part of the library except the Floodfill function itself.
- No report is due for this assignment.
- HW#2 (due 9/16, 11:59pm):
- Write code to automatically detect and classify the fruit in the images
fruit1.pgm and
fruit2.pgm (or, in BMP format,
fruit1.bmp and
fruit2.bmp). First detect all the foregrounds
regions of the image, distinguishing them from the background region. Then
compute the following properties of each foreground region:
- zeroth-, first- and second-order moments (regular and centralized)
- compactness, using the formula given in class
- eccentricity (or elongatedness), using eigenvectors (PCA)
- direction, using two methods: eigenvectors (PCA) and
the moments formula
Use the properties of your choice to classify the different types of fruit.
In addition, detect the banana stem using a method of your choice. The
same parameters should be used for all objects, and for both images.
- Display the original image with a one-pixel-thick boundary overlaid on each
object, the color of the boundary indicating the type of fruit: Red
indicates apple, green indicates grapefruit, and yellow indicates banana.
Draw a cross at the centroid of each object, and draw two perpendicular lines to
indicate its major and minor axes, with the relative length of each line
indicating the elongatedness. Indicate the banana stem by coloring the
boundary pixels there.
- Print all the region properties that you computed in the console window (or
in a message box or edit box).
- Write a report describing your approach, including the actual code,
resulting images, and lessons learned.
- Submit the code to the grader, in the format mentioned above (changing "HW1"
to "HW2", of course). Also attach an electronic copy of your report to the
email.
- HW#3 (due 9/30, 6:00pm):
- Implement the Canny edge detector. There should be three steps to your
code: gradient estimation, non-maximum suppression, and thresholding (with
hysteresis). For the gradient estimation, convolve the image
with the derivative of a Gaussian, rather than computing the derivative of the
smoothed image. Automatically compute the threshold values, and produce
clean results near the boundaries even for large values of sigma. Run your
code on the following images: cat.pgm,
venice.pgm, and cameraman.pgm.
- Write a report describing your approach, including the actual code,
resulting images, and lessons learned. Show the output at different stages
of processing, and show the changes in output that occur with changes in the
parameters (e.g., scale).
- Submit the code, along with an electronic copy of your report, to the grader in the format mentioned above.
In addition, turn in a hard copy of your report to Riggs 309.
- HW#4 (due 10/14, 11:59pm):
- Implement the split-and-merge algorithm for segmentation. The
homogeneity criterion should ensure that the graylevels of all the pixels within
each region have a standard deviation below a specified threshold. Adjacent regions,
if combined, should have a graylevel standard deviation above the threshold. Run your
code on the following images: holes.pgm,
hydrant.pgm with different thresholds.
- Your executable should take two parameters: input filename and
threshold value, specified either at the command line or through a graphical
user interface. The resulting label image should have one integer per
pixel indicating the region number. There should be no gaps in region
numbers; i.e., regions should be numbered 0,1,2,...,N-1, where N is the number
of regions found. Display the original image, the labeled image, a random
colorization of the random image, and a gray level image resulting from
assigning the mean gray level to each pixel in a region. Also write the
labeled image to disk so the grader can test whether the result satisfies the
homogeneity criterion. Use the following function for writing to disk:
void WriteImgInt(const ImgInt& img)
{
ImgInt::ConstIterator p = img.Begin();
FILE* fp = fopen("out.ii", "wb");
if (fp == NULL) BLEPO_ERROR("Unable to open file for writing");
fprintf(fp, "%d %d ", img.Width(), img.Height());
while (p != img.End()) fprintf(fp, "%d ", *p++);
fclose(fp);
}
- Write a report describing your approach, including the actual code,
resulting images, and lessons learned. Show the output at different stages
of processing (e.g., after split and after merge), and show the changes in output that occur with changes in the
threshold parameter (and any other parameters you use).
- Submit the code, along with an electronic copy of your report, to the grader in the format mentioned above.
In addition, turn in a hard copy of your report to Riggs 309.
- HW#5 (due 10/28, 11:59pm):
- Implement correlation-based matching of rectified stereo images. The
resulting disparity map should be the same size as the two input images,
although the values at the left edge will be erroneous. Match from left to
right (i.e., for each window in the left image, search in the right image), so
that the disparity map is with respect to the left image. Recall that a
(left) disparity map D(x,y) between a left image L and a right
image R that have been rectified is an array such the pixel
corresponding to L(x,y) in the right image is R(x-D(x,y), y).
Your code should be efficient as possible, on the order of several frames per
second. (Hint: First compute the dissimilarities of all the pixels and
convolve with a summing kernel (all ones) in both directions. Store the
results in an array of images, one per disparity. Be sure to use this
updated version of mmx_diff.asm or
xmm_diff.asm . Copy these files to blepo/src/Quick,
overriding the old versions. To use these files, you'll need to copy the
declaration from the assembly file to your source file. Also, be sure to
call CanDoMmx() to make sure your machine supports mmx_diff, and call
CanDoSse2() to make sure it supports xmm_diff.) Suggestion: use
SAD (sum of absolute differences) to match raw intensities and use a window size
of 5x5.
- Implement the left-to-right consistency check, retaining a value in the left
disparity map only if the corresponding point in the right disparity map yields
the negative of that disparity. The resulting disparity map should be valid
only at the pixels that pass the consistency check; set other pixels to zero.
- Run your code on
tsukuba_left.pgm and
tsukuba_right.pgm. What kind of errors do you notice? Now run the algorithm on
lamp_left.pgm and
lamp_right.pgm. What happens? Why is this image
difficult?
- Take a look at the results of the latest stereo research at
http://www.middlebury.edu/stereo
(click on the "Results" tab). Look only at the first column (all) under
the column Tsukuba. What errors do you see in the best algorithm (the
one with minimum error in this column)? Find the output of another algorithm
that looks better to you. What does this tell you about the difficulty of
finding meaningful objective criteria to evaluate such algorithms?
- Write a report describing your approach, including the actual code,
resulting images, and lessons learned. Show the output at different stages
of processing, and show the changes in output that occur with changes in the
parameters.
- Submit the code, along with an electronic copy of your report, to the grader
in the format mentioned above. In addition, turn in a hard copy of your
report to Riggs 309.
- HW#6 (due 11/11, 11:59pm):
- Implement Lucas-Kanade feature point detection and tracking.
- Detection. For each pixel in a graylevel image, construct the
2x2 covariance matrix of the gradients in the 5x5 window surrounding the pixel.
Then compute the minimum eigenvalue of the Hessian matrix for each pixel.
Arbitrate between pixels by enforcing a minimum distance of 10 pixels between
features. Detect the 100 most salient features, and display them overlaid on the original image.
- Tracking. For each feature, track its location from one image
frame to the next by solving the Lucas-Kanade equation Zd=e, where Z is the 2x2
gradient covariance matrix and e is the 2x1 vector of gradients multiplied
by the temporal derivative. Use the inverse compositional approach in
solving the equation. Display a movie of the original images with features
overlaid.
- Run your code on the following image sequence:
statue_sequence.zip .
- Of course, for this assignment you may not use any of the Lucas-Kanade
implementations in Blepo.
- Write a report describing your approach, including the actual code,
resulting images, and lessons learned.
- Submit the code, along with an electronic copy of your report, to the grader
in the format mentioned above. In addition, turn in a hard copy of your
report to Riggs 309.
Grading:
- A. Report is coherent, concise, clear, and neat, with correct
grammar and punctuation. Code works correctly the first time and
achieves good results on both images.
- B. Report adequately
describes the work done, and code generally produces good results. There
are a small number of defects either in the implementation or the writeup, but
the essential components are there.
- C. Report or code are
inadequate. The report contains major errors or is illegible, the code
does not produce meaningful results or does not run at all, or instructions are
not followed.
- D or F. Report or code not attempted, not turned
in, or contains extremely serious deficiencies.
Extra credit: Contributions to the Blepo computer vision library
will earn up to 10 points extra credit on your final grade. In general,
you should expect 1 point for a major bug fix, and 2-7 points for a significant
extension to an existing function or implementation of an algorithm or set of
functions. Contributions should be cleaning written, with code-level and
user-level documentation, and a test harness. To receive extra credit, you
must meet the following deadlines:
- announce (non-binding) intention to contribute (10/21)
- get interface approval (11/4)
- turn in final code and documentation (11/9)
In your final project, you will investigate some area of image processing or computer vision in more detail. Typically
this will involve formulating a problem, reading the literature, proposing a solution, implementing the solution,
evaluating the results, and communicating your findings. In the case of a survey project, the quality and depth of
the literature review should be increased significantly to compensate for the lack of implementation.
Project deadlines:
- 11/4: team and title
- 11/22: progress report (1 page)
- 11/29: final written
report (up to 5 pages)
- 12/12: final oral presentation in class during
final exam slot, 1:00-4:00
- 12/14: peer reviews of written
reports and oral presentations
Instructor: Stan Birchfield, 207-A Riggs Hall, 656-5912, email: stb at clemson
Grader: Prashant Oswal, email: prashao at clemson (please use this
account, not his regular account)
Lectures: 12:20 - 1:10 MWF, 301 Riggs Hall