ECE
877 Computer Vision
|
|
Spring 2011
This course builds upon ECE 847 by exposing students to fundamental concepts, issues, and algorithms in
digital image processing and computer vision. Topics include segmentation, texture, detection,
3D reconstruction, calibration, shape, and energy minimization.
The goal is to equip students with the skills and tools needed to manipulate
images, along with an appreciation for the difficulty of the problems. Students
will implement several standard algorithms, evaluate the strengths and weakness
of various approaches, and explore a topic in more detail in a course
project.
Syllabus
Week
| Topic
| Assignment
|
1
| Classification |
HW1: Template matching, due 1/21 |
2
| Classification |
Quiz #1, 1/28 |
3
| Shape |
HW2: Level sets, due 2/4 |
4
| Shape |
Quiz #2, 2/11 |
5
| Texture |
HW3: Feature detection / matching, due 2/18 |
6
| Texture |
Quiz #3, 2/25 |
7
| Model fitting |
HW4: Mosaicking, due 3/4 |
8
| Model fitting |
Quiz #4, 3/11 |
9
| Camera calibration |
HW5: Two-view reconstruction, due 3/18 |
10
| [break] |
Quiz #5, 3/28 |
11
| Multiple view geometry |
HW6: N-view reconstruction, due 4/8 |
12
| Multiple view geometry |
Quiz #6, 4/15 |
13
| 3D reconstruction |
|
14
| 3D reconstruction |
Quiz #7, 4/29 |
15
| Function optimization |
projects due |
See ECE 847 Readings and Resources.
In the assignments, you will implement several fundamental algorithms in C/C++,
documenting your findings is an accompanying report for each assignment. C/C++
is chosen for its fundamental importance, ubiquity, and efficiency (which is
crucial to image processing and computer vision). For your convenience, you
are encouraged to use the latest version of the
Blepo computer vision library.
Your code must compile under Visual Studio 2010, Visual Studio 2008, or VC++ 6.0.
To make grading easier, your code should do one of the following:
-
#include "blepo.h" (In this case it does not matter where your blepo
directory is, because the grader can simply change the directory include
settings (Tools->Options->Directories->Include files) for Visual Studio
to automatically find the header file.)
or
-
#include "../blepo/src/blepo.h" (assuming your main file
is directly inside your directory). In other words, your assignment directory
should be at the same level as the blepo directory. Here is an example:
To turn in your assignment, send an email to
assign@assign.ece.clemson.edu
(and cc the instructor and grader) with the subject line "ECE877-1,#n"
(without quotes but with the # sign), where 'n' is the assignment number.
You may leave the body of the email blank. You must send this email from your Clemson account, because the assign server is
not smart enough to know who you are if you use another account. (E.g., do
not use @g.clemson.edu) Be sure that this file is actually
attached to the email rather than being automatically included in the body of
the email (Eudora, for example, has been known include files inline, but this
behavior can be turned off). Also, be sure to change the extension of
your zip file (e.g., change .zip to _zip) so that the server
does not block the attachment!!! We cannot grade what we do not receive.
(Also be sure that you're not hiding extensions for known types; in Windows
explorer, uncheck the box "Tools.Folder Options.View.Hide extensions for known
file types".)
To your email, attach a zip file containing your report (in any standard format
such as .pdf or .doc;
but not .docx),
and all the files needed to compile your project (such as .h, .c, .cpp, .rc,
.vcproj, .sln, .dsp, .dsw). Also check in the res directory that contains .ico and .rc2
files. But do NOT check in all the
other files that Visual Studio creates automatically, such as .aps, .clw, .ncb,
.opt, .plg, .suo, or the Debug or Release
directories. When in doubt, check out your code to a new temporary directory
and verify that it compiles and runs.
All assignments are due at 11:59pm on the due date shown. An 8-hour grace
period is extended, so that no points will be deducted for anything submitted
before 8:00am the next morning.
In addition to submitting your report electronically, please also turn in a
hardcopy. The deadline for the electronic copy is the same as for the
code, whereas the hardcopies should be brought to the instructor by the next
class period after the deadline (at the latest). Just leave it in the
pouch on my door (or slip it under the
door) if I'm not in. No points will be deducted for printing in
black-and-white, even if the report is in color.
An example
report
Assignments:
- HW#1 (Detection)
- Using the images provided (textdoc-training.bmp
and textdoc-testing.bmp), pick a letter of the
alphabet and build a model of that letter's appearance (in the lower-case Times
font of the main body text -- do not worry about the title, heading, or figure
captions) using the training image. Then search in the testing image for
all occurrences of the letter. For this assignment, your detector may use
a simple model such as a single template built from a single example. The
output should be a colored (e.g., red) rectangle outlining each letter found in
the testing image.
- Do the same thing, but this time for the entire word
"texture" (all lowercase).
- Show receiving operating characteristic curves (ROC) for the detection
problems, with different thresholds. To make an ROC curve, vary the
threshold and measure the false positives and false negatives, plotting on a
graph. Then connect the dots. Here is a demonstration of ROC curves:
http://wise.cgu.edu/sdtmod/measures6.asp (or see
http://en.wikipedia.org/wiki/Receiver_operating_characteristic ).
- Write a report describing your approach, including your algorithms and
methodology, experimental results, and discussion.
- HW#2 (Level sets)
- Implement the level set segmentation algorithm described in the Chan-Vese
2001 paper. For simplicity, let the implicit surface initially surround
the image, excluding only a narrow band of pixels near the border. As the surface evolves, the zeroth level
set should conform to the boundaries of the object(s).
- Be sure to periodically reinitialize the level set function using the signed
distance to the contour.
- Run your code on the fruitfly.pgm image.
- Write a report describing your approach, including the algorithm,
methodology, experimental results, and discussion. In your report, show
results for different initial implicit surfaces, including one that is neither
completely outside nor completely inside the image.
- HW#3 (Feature detection and matching)
- Implement the SURF feature detector described in Herbert Bay, Andreas Ess,
Tinne Tuytelaars, Luc van Gool, "SURF: Speeded Up Robust Features", Computer
Vision and Image Understanding (CVIU), Vol. 110, No. 3, pp. 346--359, 2008.
Simplifications:
- Only one octave (four scales within octave)
- Implement U-SURF (no rotation)
- Use simple 1D quadratic interpolation between scales, none spatially
(optional)
- Note: First convert to grayscale and do all processing on the
grayscale image. Also, only detect features at the middle two scales (the
other two scales are for non-maximal suppression).
- Detect features in some pairs of these images:
tillman.zip . Choose from either of the two
resolutions: the original resolution, or downsampled by 8 in both
directions. Overlay rectangles/circles over original image to display the
SURF features detected. Color the rectangles/circles as red or blue
depending on the sign of the Laplacian (red = bright surrounded by dark, whereas
blue = dark surrounded by light, as shown in the slides)
- Extra: Displaying correspondences by displaying two images, the SURF
features detected, and the matches. For computing descriptor, do not worry
about Gaussian weighting of circle
- Write a report describing your approach, including the algorithm,
methodology, experimental results, and discussion.
- HW#4 (Mosaicking)
- Create a mosaic from the Tillman input images (above).
- For reference, you may want to refer to Szeliski's classic
tutorial on
image alignment and stitching. However, for our purposes, we will
greatly simplify the problem:
- Use the images that are downsampled by a factor of 8 in both directions to make it easier
to fit the result on the screen
- Correspondence between pairs of images may be performed manually (feel
free to use these or
these). (Note that in a real
implementation, a feature detector/descriptor like SIFT/SURF/DAISY would be
used.)
- Using the correspondences, solve for the homography between pairs of
overlapping images using the normalized Direct Linear Transformation algorithm (DLT).
If you want, you may use Blepo's HomographyFit or OpenCV's cvFindHomography:
http://www.seas.upenn.edu/~bensapp/opencvdocs/ref/opencvref_cv.htm .
(You should be able to call OpenCV functions directly from Blepo; simply #include
the correct header files and then the OpenCV data structures and functions will
be available. See the top of FaceDetector.cpp for an example.) For more
detail on the DLT, see
http://w3.impa.br/~lvelho/outgoing/thales/visao/ex1.pdf .
- Choose the middle image as the reference. Warp the other images to it,
and feather the pixels to reduce the effect of seams. For feathering, you
may find the chamfer distance helpful. In a separate
display, show the outlines of the warped images (see Fig. 16 on p. 44 of the
above tutorial).
- Write a report describing your approach, including the algorithm,
methodology, experimental results, and discussion.
- HW#5 (Flat-world reconstruction)
- In this assignment, you will practice mapping image coordinates to metric
coordinates in a world plane. There are three parts:
- Write code to compute the homography between
this image and the ground plane, given the fact that the squares on the
floor are 16 inches on each side. Unwarp the image to a top-down / bird's
eye view, and provide an interface so that when a user clicks on two points in
the original image, the following is displayed: the two points in the
original image, the two points in the unwarped image connected by a line, and
the length (in inches) is displayed in a popup (modal) dialog box -- hint:
use AfxMessageBox(). It is okay to allow the user to click only a single
pair of points.
- Using this sequence of traffic images,
compute a BGR background image by calculating the mean of all the images.
Then compute the homography between this image and the ground plane, given known
coordinates (see file in directory). Now apply background subtraction,
along with morphological operations, to yield a binary segmentation of blobs for
each image separating the vehicles from the background. (Important:
A color background model yields much better results than a grayscale model.)
Now, for each image in the sequence, unwarp both the original image and the
binary result.
- When run, your program should display 6 windows total. First, it will
show the original and unwarped floor images (in two separate windows) and wait
for user input. Then it will show the original and unwarped traffic and
blob images in 4 separate windows, displaying the images in sequence as a video.
Hint: Use Figure::PlaceToTheRightOf() to easily prevent overlap
between figures in a row.
- Write a report describing your approach, including the algorithm,
methodology, experimental results, and discussion.
- HW#6 (3D reconstruction)
- This assignment has two parts:
- Write an application that computes the fundamental matrix between a pair of
uncalibrated stereo images using the normalized Eight-point algorithm.
Then, the application should display the two images and allow the user to
repeatedly click on a point in the first image. When a point is clicked,
then the two epipolar lines (which come from the fundamental matrix) associated with that point should
be displayed. Allow at least 5 clicks. The application only needs to
work with these two images: burgher1_small.jpg and
burgher2_small.jpg, and you may hardcode
these correspondences (or your own) in
your program to simplify the problem.
- In a separate window, use OpenGL to display a looping video of the traffic
images (from the last assignment) in 3D. For simplicity, assume that the
background is a horizontal plane, that the vehicles are vertical cardboard
cutouts, and that the camera is vertically oriented with unity aspect ratio
pixels. For simplicity, display each pixel in the original image as a 3D
RGB point. Optional: use OpenGL to display oriented textured
planes. Here is a sample application using
OpenGL.
- Write a report describing your approach, including the algorithm,
methodology, experimental results, and discussion.
Grading standard:
- A. Report is coherent, concise, clear, and neat, with correct
grammar and punctuation. Code works correctly the first time and
achieves good results on both images.
- B. Report adequately
describes the work done, and code generally produces good results. There
are a small number of defects either in the implementation or the writeup, but
the essential components are there.
- C. Report or code are
inadequate. The report contains major errors or is illegible, the code
does not run or produces significantly flawed results, or instructions are
not followed.
- D or F. Report or code not attempted, not turned
in, or contains extremely serious deficiencies.
Detailed grading breakdown is available in the grading chart.
In your final project, you will investigate some area of image processing or computer vision in more detail. Typically
this will involve formulating a problem, reading the literature, proposing a solution, implementing the solution
(using the programming language/environment of your choice),
evaluating the results, and communicating your findings. In the case of a survey project, the quality and depth of
the literature review should be increased significantly to compensate for the lack of implementation.
Project deadlines:
- 4/01: team (1 or 2 people), title, and brief description
- 4/18: progress report (1 page)
-
5/05: final written report (5 pages)
-
5/06: final oral presentation in class during
final exam slot, 3:00-5:30
To turn in your report, please send me a single email per group (do not email
the assign server) with two attachments:
- PDF file containing your 5-page report, conference format (title, authors,
abstract, introduction, method, experimental results, conclusion, references)
- PPT file containing your slides
Both files should have the same name, which should correspond somehow to your
topic. Use underscores instead of spaces. Do not send PPTX files. Example:
face_detection.pdf and face_detection.ppt. You do *not* need to send me your
code (although you may if you like).
Projects from previous years
Instructor: Stan Birchfield, 209 Riggs Hall, 656-5912, email: stb at clemson
Grader: Vidya Murali, 017 Riggs Hall, vmurali at clemson
Lectures: 1:25 - 2:15 MWF, 227 Riggs Hall