ECE 847 Digital Image Processing
|
|
Fall 2012
This course introduces students to the basic concepts, issues, and algorithms in
digital image processing and computer vision. Topics include image formation,
projective geometry, convolution, Fourier analysis and other transforms,
pixel-based processing, segmentation, texture, detection, stereo, and motion.
The goal is to equip students with the skills and tools needed to manipulate
images, along with an appreciation for the difficulty of the problems. Students
will implement several standard algorithms, evaluate the strengths and weakness
of various approaches, and explore a topic of their own choosing in a course
project.
Syllabus
Week
| Topic
| Assignment
|
1
| Pixel-based processing |
HW1: Floodfill, due 8/31 |
2
| Pixel-based processing |
Quiz #1, 9/7 |
3
| Filters and edge detection |
HW2: Pixels and regions, due 9/14 |
4
| Filters and edge detection |
Quiz #2, 9/21 |
5
| Segmentation |
HW3: Edge detection, due 9/28 |
6
| Segmentation |
Quiz #3, 10/5 |
7
| Stereo |
HW4: Segmentation, due 10/12 |
8
| Stereo |
Quiz #4, 10/19 |
9
| Motion |
HW5: Stereo matching, due 10/26 |
10
| Motion |
Quiz #5, 11/2 |
11
| Image formation |
HW6: Lucas-Kanade tracking, due 11/9 |
12
| Projective geometry
|
Quiz #6, 11/16 |
13
| Projective geometry
|
|
14
| Color |
Quiz #7, 12/7 |
15
| Color |
projects due |
Readings to complement the lectures:
- Sonka et al., Region-based shape representation and description
- Robyn Owens,
Mathematical morphology (dilation and erosion)
- R. Fisher et al.,
Connected components
- Bill Green,
Canny
edge detection tutorial
- Bob Fisher et al.,
Canny edge detector
- Michael Bach, Muller-Lyer
illusion
- Various authors,
Split-and-merge segmentation
- Serge Beucher,
Watershed segmentation;
Roerdink and Meijster,
The watershed
transform; Matlab,
watershed tutorial
- Sylvain Bougnoux,
Learning epipolar geometry
- Nikos Paragios,
Level set tutorial; J.
Sethian's level set page
- R.
Wang, various lectures
-
Adobe
TIFF specification document (color spaces and JPEG)
-
AIM-DP (color spaces)
-
Amara Graps,
Introduction to Wavelets;
wavelet
resources
Computer vision in the news:
-
Help organizing your digital photos, CBS News, Feb. 9, 2006 (Riya)
-
'Silent drowning' pool girl saved by underwater cameras, Times Online, Aug. 31, 2005
- Courtrooms could host virtual crime scenes, New Scientist.com, March 10, 2005
-
Sportvision virtual first-down markers
-
Basketball buddies build a computerized shot doctor, USA Today, Feb. 7, 2003
(Noah Basketball)
-
Automotive applications:
-
Infiniti advanced lane departure warning system
-
Infiniti Around View Monitor,
Nissan Around View Monitor, 2007
-
Chrysler automobile uses CMOS cameras for smart headlights, IEEE Spectrum,
Apr. 2006 (Gentex SmartBeam)
-
Lexus uses computer vision
for automatic parallel parking, IEEE Spectrum, Apr. 2006 (Intelligent
Parking Assist)
-
Electronic vision unblocks
the 'blind spot', IEEE Spectrum, Apr. 2006 (Volvo's
Blind-Spot Information System)
-
Car, park thyself (Toyota's automatic parking feature), CBS News, Jan. 15,
2003
-
Mobileye EyeQ2
-
Ford's Lane
Keeping System
- Content-Aware Image Sizing
-
Fly-Eye Inspired Speed Sensor
-
Sudoku solver:
http://www.codeproject.com/KB/game/WebcamSudokuSolver.aspx
Vision in biological systems:
- P. Gurney,
Is our 'inverted' retina really 'bad design'?, Technical Journal,
1999
- C. Wieland, Seeing back
to front, Creation, 1996 (see also
An eye for
creation, Creation, 1996)
- J. Sarfati, Can it bee?,
Creation, 2003 -- honeybees using optic flow for navigation
- Centeye -- obstacle avoidance using optic flow
- C. Stammers,
Trilobite technology, Creation, 1993
- S. M. Gon, The trilobite eye
- J. Sarfati, Lobster eyes: brilliant geometric design, Creation, 2001
- Sight in
British garden birds
- Color vision in
birds
- P. Gurney, Our
eye movements and their control: Part 1, Technical Journal, 2002
- P. Gurney, Our
eye movements and their control: Part 2, Technical Journal, 2003
- C. Wieland, New eyes
for blind cave fish, 2000
- T. Wagner,
Darwin vs.
the eye, Creation, 1994
- D. E. Stoltzmann,
The specified complexity of retinal imagery, CRSQ, 43(1):4-12, June
2006
- Eye Design
Book -- overview of eyes in animal world
- Human visual system:
Computer vision companies:
Software:
Additional computer vision
resources
Resources for current students (restricted access,
not open to the public)
In the assignments, you will implement several fundamental algorithms in C/C++,
documenting your findings is an accompanying report for each assignment.
C/C++ is chosen for its fundamental importance, ubiquity,
and efficiency (which is crucial to image processing and computer vision).
For your convenience, you are encouraged to use the latest version of the
Blepo computer vision library.
Your code must compile under Visual Studio 2010 or VC++ 6.0.
You should develop your code in Debug mode but test in Release mode before
submitting. The grader will test in Release mode. To make grading easier, your code should do one of the following:
-
#include "blepo.h" (In this case it does not matter where your blepo
directory is, because the grader can simply change the directory include
settings (Tools->Options->Directories->Include files) for Visual Studio
to automatically find the header file.)
or
-
#include "../blepo/src/blepo.h" (assuming your main file
is directly inside your directory). In other words, your assignment directory
should be at the same level as the blepo directory. Here is an example:
To turn in your assignment, send an email to
assign@assign.ece.clemson.edu
. Be sure to do the following:
- use CMake, as explained here.
- make the subject line "ECE847-1,#n" (without quotes but with the # sign),
where 'n' is the assignment number.
- cc the instructor and grader, so we have a record of your
submission in case something is wrong with the assign server. We cannot grade what we do not receive.
- send this email from your @clemson.edu account, because the assign server is
not smart enough to know who you are if you use another account. If you want to use a web-based interface, do not use GMail but instead be sure to send from
webmail.clemson.edu.
- If you want to use Gmail, you have to make sure that it does NOT send from @g.clemson.edu. It is not sufficient to
change the 'send mail as:' to @clemson.edu. In fact, these changing your GMail settings as follows used to work but no longer seems to:
- login to your account through gmail.com (not from Clemson's Google
Apps)
- click on 'Settings . Settings . Accounts and Import'. Under 'Send mail as:',
select 'Add another email address', type in your userid@clemson.edu, click
'Next step', then select 'Send through clemson.edu SMTP servers', type 'smtp.clemson.edu' along with your userid and
password, select 'Secured connection using SSL', then 'Add account'.
- attach a zip file containing all the files needed to compile your project. But do NOT check in all
the other files that Visual Studio creates automatically. When in
doubt, check out your code to a new temporary directory and verify that it
compiles and runs. In other words,
- Do include files such as .h, .c, .cpp, .rc,
.vcxproj, .sln, ... (or .dsp and .dsw if using VC6.0). Also, if you have built an MFC Windows
application (as opposed to a console-based application), check in the res directory that contains .ico and .rc2
files.
- Do NOT include these files: .aps, .clw, .ncb,
.opt, .plg, .suo, .sdf. Also, be
sure to delete the Debug and Release and ipch directories.
- include your report in the zip file (in any standard format
such as .pdf or .doc;
but NOT .docx).
Reports should be professionally written, with a title, a description of the
problem, a description of the algorithm, a detailed discussion of your
particular implementation, results, and analysis.
An example report. Similarly, code should
be professionally and cleanly written, making use of standard programming
practices.
- the body of the email is not important and may be left blank
All assignments are due at 11:59pm on the due date shown. An 8-hour grace
period is extended, so that no points will be deducted for anything submitted
before 8:00am the next morning.
Assignments:
- HW#1 (Floodfill)
- Implement the floodfill algorithm in C/C++. Create an executable
that allows the user to choose the filename and seed point; it is okay if you
hardcode the new color. The application should load the image from disk,
display the original image, run the algorithm, and display the resulting
image. (The specific interface is up to you: Either use command-line parameters,
such as: filename x y (in
that order), where 'filename' is the image filename and (x,y) are the
coordinates of the seed point; Or use a windows-based interface, such as
CFileDialog for selecting the file and GrabMouseClick for getting the seed
point.)
- To create a console app in Visual C++ 6.0, follow these instructions: File -> New ->
Project -> Win32 Console Application. Give it a name and keep the checkbox
on "Create new workspace". Choose "An application that supports MFC." Now
compile and run (Build -> Build ..., and Build -> Execute, or F7 and
Ctrl-F5). Under FileView -> Source Files you will find the main cpp file.
(Also, I would recommend that you turn off Precompiled Headers: Project ->
Settings -> C/C++ -> Precompiled headers -> Not using precompiled headers.
Before you click on the radio button, though, first select All
configurations in the drop down box so that both Debug and Release versions
are affected.)
- The images that the grader will use to test your code are
quantized.pgm,
tillman.ppm, and others that are similar.
- Your code should work for either grayscale or color images, and it should allow the new value to be a Bgr color
(just load the image into ImgBgr, and treat it like a color image).
- For simplicity, use 4-neighbor connectedness (but 8-connected is fine,
too, if you want to do a little additional work).
- To make memory management easier, feel free to use std::stack or
std::vector.
- A tutorial on the Blepo library will be given in class. You may use
any part of the library except the Floodfill function itself.
- No report is due for this assignment.
- HW#2 (Fruit classification)
- Write code to automatically detect and classify fruit on a dark background.
- Implement double thresholding using two
thresholds that you determine by trial and error, which are hardcoded in your code.
- At any point before or during thresholding, perform noise removal (if
needed) using your own combination of erosion / dilation /
opening / closing.
- Implement connected components (by repeated applications of floodfill) to detect and count the foreground regions of
the graylevel image, distinguishing them from the background. Hint:
Use an ImgInt rather than
an ImgGray for the output labels, in case there are more than 256 regions due to
noise,
even if there are only a small number of objects in the image.
- Compute the properties of each foreground region, including
- zeroth-, first- and second-order moments (regular and centralized)
- compactness (To compute the area, simply count the number of pixels. To compute the perimeter, apply the logical XOR to the
thresholded image and the result of eroding this image with a 3x3 structuring
element of all ones; the result will be the number of 4-connected foreground
boundary pixels.)
- * eccentricity (or elongatedness), using eigenvalues
- * direction, using either eigenvectors (PCA) or the moments formula (they are
equivalent)
- Using a combination of these properties or others that you develop, write an
algorithm to automatically classify each piece of fruit into one of three
categories: apple, grapefruit, and banana.
- * Also detect the banana stem using an idea that you come up with.
- Your output should look like this:
- One figure window should show the original image. Three additional figures
should show the result of thresholding the image with the low and high thresholds, along with the
output of double-thresholding. Be sure to set the title of each figure to an appropriate
human-readable string that indicates what is being displayed. Feel free to
display additional intermediate results in other figures if you like.
- In a final figure, display the original image with a one-pixel-thick boundary overlaid on each
object, the color of the boundary indicating the type of fruit: Red
indicates apple, green indicates grapefruit, and yellow indicates banana.
For each object, draw a cross at its centroid and draw* two perpendicular lines
(with appropriate lengths) to indicate the major and minor axes. Indicate the banana stem* by coloring
with magenta the
boundary pixels in that portion of the banana.
- Print out all the region properties you computed, either on the console
window (using printf, for example) or in the dialog window (using SetWindowText).
- The grader will test your code on the images fruit1.pgm
and fruit2.pgm (or, in BMP format,
fruit1.bmp and fruit2.bmp), along with other similar images
(same scale and lighting conditions, but the image dimensions, rotation, and
number of fruit instances may change). The same algorithm parameters should
be used for all objects and for both images.
- For this assignment, you may use any Blepo functions in ImageOperations.h,
except for the dilation and erosion functions. You may not use any Blepo functionality contained or
prototyped in ImageAlgorithms.h.
- As a debugging strategy, however, you may find it helpful to use various
Blepo functions (e.g., dilation, erosion, Floodfill, ConnectedComponents) as
stand-ins until you write your own versions.
- No report is due for this assignment.
HW#3 (Canny edge detection)
- Write code to perform low-pass and high-pass filtering on an image.
- Your code should accept a single
scale parameter (sigma) as input, along with the name of the image file. (An optional third parameter should specify the name of the file to use for chamfer matching; if the third parameter is not provided, then no chamfer matching is performed.)
Convolve the image with an isotropic Gaussian kernel and with derivatives of the Gaussian along the
x and y directions. You should use the separability property of the Gaussian and Gaussian derivative to speed computation, and you should construct the kernels automatically based on the sigma parameter.
Do not worry about image borders -- the simplest
solution is to simply set the border pixels in the convolution result to zero, but if you want to extend the image to improve the quality of the results, that is fine, too.
- Implement the Canny edge detector. There should be three steps to your
code: gradient estimation, non-maximum suppression, and thresholding with
hysteresis (i.e., double-thresholding). The gradient estimation has already been done in the previous step (Be sure to convolve with the derivative of a Gaussian rather than to compute finite differences of the smoothed image). Non-maximum suppression sets to zero any pixel that is
not a local maximum along the direction of the gradient. For hysteresis thresholding, which is similar to floodfill, automatically compute the threshold values based upon
image statistics. Run your
code on the following images: cat.pgm and
cameraman.pgm.
- * Implement the chamfer distance algorithm using the Manhattan distance.
Convert the cherrypepsi.jpg image from color to grayscale before computing the Canny edges, then
compute the minimum chamfer distance from each pixel to the nearest Canny edge pixel. Perform an exhaustive search (for simplicity, only consider
locations for which the template is completely in bounds) for the
best location of cherrypepsi_template.jpg
template . This is done by computing, for each location, the sum of the distances from all pixels
in the template (when centered at that location) to the nearest Canny edge pixel in the image. The best location is the one that yields the minimum value. (If these values are
subtracted from a large constant, then they can be considered as a probability map.)
- Your output should look like this:
- Print the values of your 1D Gaussian and Gaussian derivative kernels.
- One figure should show the original image. Another figure should show the smoothed image (after convolving with the Gaussian in both x and y). Four additional figures
should show the gradient components in the x and y directions, along with the gradient magnitude and phase (angle).
- Another figure should show the edges after non-maximum suppression, and
another figure should show the final Canny result.
- If chamfer matching is performed, then display in two separate figures the probability map, and the original image with the rectangle corresponding to the peak
overlaid.
- For this assignment, you may not use any Blepo functionality contained or
prototyped in ImageAlgorithms.h (e.g., Chamfer), and you may not use the Gauss*,
Grad*, Convolve, Correlate, Smooth, etc.
functions prototyped in ImageOperations.h.
- No report is due for this assignment.
HW#4 (Watershed segmentation)
- Implement the simplified Vincent-Soille marker-based watershed segmentation algorithm.
- The
basic algorithm involves three steps: (1) Compute the magnitude of the image
gradient, quantized; (2) Construct a data
structure allowing fast access to all the pixels with a certain value; (3) Apply
breadth-first search to flood the pixels one gray level at a time, starting with the
minimum value, assigning each pixel to either the nearest existing catchment
basin or to a new catchment basin.
- The algorithm is considered "simplified" because it does not explicitly use dams. Although dams are often included
in descriptions of the Watershed algorithm, they are unnecessary, having almost no effect on the final result.
- Define the watershed (boundary) pixels as those which
occur at a transition between basins.
- Your code should include the marker-based modifications to reduce oversegmentation.
- *Unlike previous assignments, care should be taken to ensure that your algorithm properly handles the pixels along
the image order,
so that objects touching the image border are segmented correctly.
- In separate figure windows, display the result of the algorithm at the various stages of the computation as
shown in the slide in class (threshold, chamfer,
initial watershed, edges of initial watershed, gradient magnitude, markers, and final result).
- The grader will test your code on the images: holes.pgm
and cells_small.pgm.
Due to the difficulty of
thresholding these images, it is okay for your code to have a command-line switch
to select the appropriate parameter(s).
- For this assignment, you may not use any Blepo code in Watershed.cpp.
- No report is due for this assignment.
HW#5 (Stereo matching)
- Implement correlation-based matching of rectified stereo images. The
resulting disparity map should be the same size as the two input images,
although the values at the left edge will be erroneous. Match from left to
right (i.e., for each window in the left image, search in the right image), so
that the disparity map is with respect to the left image. Recall that a
(left) disparity map D(x,y) between a left image L and a right
image R that have been rectified is an array such that the pixel
corresponding to L(x,y) is R(x-D(x,y), y).
- Implement the left-to-right consistency check, retaining a value in the left
disparity map only if the corresponding point in the right disparity map yields
the negative of that disparity. The resulting disparity map should be valid
only at the pixels that pass the consistency check; set other pixels to zero.
- Your code should be efficient as possible, on the order of several frames per
second. (Hint: First compute the dissimilarities of all the pixels
for each disparity, storing the results in an array of images; then convolve
each image with a summing kernel (all ones) in both directions. Further
speedup can be obtained using mmx_diff and xmm_diff in Blepo, but this is not
required.)
- Suggestion: use SAD (sum of absolute differences) to match raw
intensities and use a window size of 5x5.
- Run your code on tsukuba_left.pgm and
tsukuba_right.pgm. Show the results both with and without the consistency
check. What kind of errors do you notice? Now run the algorithm on
lamp_left.pgm and lamp_right.pgm. What happens? Why is this image
difficult?
- *Your code should output a PLY file that can be read by
MeshLab. This will enable
you to visualize the matching results in 3D. Here is an
example PLY file created from a set of
Kermit images. PLY files are ASCII files with a
simple format: In the header you specify the number of vertices, along
with the properties stored for each vertex (e.g., x y z nx ny nz r g b); then
after the header there is one line per vertex. For your assignment, you
should just output six columns (x y z r
g b) for each matched pixel, ignoring the normal components. Manually set the focal
length and baseline to nominal values in order to achieve visually plausible results.
You can use
either perspective or orthographic projection to get your x,y,z coordinates.
Orthographic is simpler and will lead to a more aesthetically pleasing point
cloud, but it is less accurate mathematically. Your stereo matching code does not have to work in color space, but you should definitely
use RGB color to make your PLY file more pleasant to look at:
tsukuba_left.ppm and tsukuba_right.ppm
. (Note: When you click the "Import Mesh" button (the one that looks like File.Open), MeshLab merges the file that you select with the current mesh -- it does not replace the current mesh with the new one. To fix this problem, either be sure to click the "Reload" button (next to the "Import Mesh" button) or close MeshLab altogether and start it up again.)
- In separate figure windows, display the two input images, the disparity map resulting from
left-to-right matching, and disparity map after applying the left-right disparity check. Optionally, you
may display other intermediate results if you want.
- Take a look at the results of the latest stereo research at
http://vision.middlebury.edu/stereo
(click on the "Evaluation" tab). Look only at the column (nonocc) under
the column Tsukuba. What errors do you see in the best algorithm (the
one with minimum error in this column)? What does this tell you about the
difficulty of the problem?
- No report is due for this assignment.
HW#6 (Lucas-Kanade)
- Implement Lucas-Kanade feature point detection and tracking.
- Detection. For each pixel in a grayscale image, construct the
2x2 covariance matrix of the gradients in the 5x5 window surrounding the pixel.
Then compute the minimum eigenvalue of the gradient covariance matrix for each pixel.
Perform non-maximal suppression to detect the n most salient features, separated from
each other by a distance of at least k pixels, where
n=100 and k=8.
- Tracking. For each feature, track its location from one image
frame to the next by iteratively solving the Lucas-Kanade equation Zd=e, where Z is the 2x2
gradient covariance matrix and e is the 2x1 vector of gradients multiplied
by the temporal derivative. Display a movie of the original images with features
overlaid. You will want to smooth the images first by convolving with a Gaussian
to increase the basin of attraction, particularly to handle swift camera motion,
and you should use a large window size, e.g., 11x11 or 17x17, for the same
reason. For more details, you may want to refer to Jean-Yves Bouguet's
technical report
(but ignore the pyramidal part) or the
KLT references. Keep
your feature coordinates as floating point values throughout the tracking
process, only rounding for display purposes. Use bilinear interpolation to compute the image pixel
values at non-integer locations.
- Your code should accept as input parameters the filename, start frame, and end frame. An easy way
to do this is to specify the format string, then to use sprintf or CString::Format. Example:
CString str; str.Format("img%04d.bmp", i);
- In a figure window, display the current grayscale image with the feature points overlaid as
red dots. Call this display function within a for loop, so that a movie is displayed within the window.
- Run your code on a synthetic sequence generated by translating an image (such as from one of the
sequences below) a known amount, e.g., 1 pixel to the right each frame.
- Run your code on the following image sequences:
flowergarden.zip and
*statue_sequence.zip, overlaying the features
on the original images. Your code will be tested on these images. The feature points
should appear as thought they are latched onto the image, not floating on top of it.
- For this assignment you may not use any of the Lucas-Kanade or KLT
implementations in Blepo, or any other existing implementations of Lucas-Kanade.
You also may not use any of the Interp functions.
- No report is due for this assignment.
Grading standard:
- A. Report is coherent, concise, clear, and neat, with correct
grammar and punctuation. Code works correctly the first time and
achieves good results on both images. All items marked (*)
are implemented.
- B. Report adequately
describes the work done, and code generally produces good results. There
are a small number of defects either in the implementation or the writeup, but
the essential components are there. Many or all items marked (*)
are not implemented.
- C. Report or code are
inadequate. The report contains major errors or is illegible, the code
does not run or produces significantly flawed results, or instructions are
not followed.
- D or F. Report or code not attempted, not turned
in, or contains extremely serious deficiencies.
Detailed grading breakdown is available in the
grading chart.
In your final project, you will investigate some area of image processing or computer vision in more detail. Typically
this will involve formulating a problem, reading the literature, proposing a solution, implementing the solution
(using the programming language/environment of your choice),
evaluating the results, and communicating your findings. In the case of a survey project, the quality and depth of
the literature review should be increased significantly to compensate for the lack of implementation.
Project deadlines:
- 11/2: team (1 or 2 people), title, and brief description
- 11/23: progress report (1 page)
- 12/10:
final oral presentation in class during final exam slot, 8:00-10:30
- 12/12:
final written report (5 pages)
To turn in your report, please send me a single email per group (do not email
the assign server) with two attachments:
- PDF file containing your 5-page report, conference format (title, authors,
abstract, introduction, method, experimental results, conclusion, references)
- PPT file containing your slides
Both files should have the same name, which should correspond somehow to
your topic. Use underscores instead of spaces. Do not send PPTX files.
Example: face_detection.pdf and face_detection.ppt.
You do *not* need to send me your code (although you may if you like).
Projects from previous years
Instructor: Stan Birchfield,
209 Riggs Hall, 656-5912, email: stb at clemson
Office hours: MWF afternoons
Grader: Brian Peasley, bpeasle at clemson
Lectures: 12:20 - 1:10 MWF, 223 Riggs Hall