Blepo Overview
Blepo contains an extensive list of classes and functions for reading/writing image
files, displaying images and visualizing data, low-level image processing,
higher-level computer vision, and linear algebra.
The latest version of Blepo contains the following functionality:
- Images:
- image classes (8-bit graylevel, 24-bit blue-green-red, 1-bit binary,
integer, single-precision floating point)
- load / save image file (BMP, PGM/PPM, JPEG)
- save image to EPS file (helpful for including images in Latex documents)
- get / set (individual pixels and rectangular subimages)
- bitwise logical operations (and, or, xor, not)
- convert between image types
- comparison (equal, not equal, less than, greater than, less than or
equal, greater than or equal)
- resample, downsample, upsample
- bilinear interpolation
- Image processing:
- correlation, convolution
- gradient (Prewitt, Sobel, Gaussian, magnitude)
- median filter
- morphological operations (erode, dilate, grayscale erode and dilate)
- floodfill
- connected components
- Chamfer distance
- FFT / inverse FFT
- Computer vision:
- Lucas-Kanade feature detection and tracking
- Canny edge detection
- Viola-Jones face detection
- Mean shift segmentation
- Split-and-merge segmentation
- Watershed segmentation
- Elliptical head tracking
- Camera calibration
- Matrices:
- matrix classes (double-precision floating point)
- create identity, random matrices
- diag
- add, subtract, multiply, negate, transpose matrices
- Euclidean norm of a vector
- comparison
- Linear algebra:
- decomposition (SVD, QR, LU)
- solve linear equation
- eigenvalues and eigenvectors
- determinant, inverse
- Display:
- easy-to-use figure class
- display image in window on screen with mouse coordinates
- resize window
- get mouse input from image window (with wait, without wait, point, rect, etc.)
- file open and save directly from window
- draw line, rect, circle, ellipse, elliptic arc
- Capture
- real-time capture of live video from single webcam (Logitech Quickcam Pro 4000) using DirectShow
- real-time capture of live video from single IEEE 1394 camera
- real-time capture of live video from DataTranslation DT3120 color
framegrabber
Design
All of the
source code is written is C/C++, with some low-level operations being written in assembly
language to take advantage of computationally efficient SIMD operations (MMX/SSE/SSE2). The
library uses the facilities of C++ to automatically handle the allocation and deallocation
of memory, thus minimizing the possibility of memory leaks or invalid memory
accesses; and yet this management is done in a fairly transparent
way, without garbage collection or reference counting, so that programmers who
desire control over the details should feel comfortable in knowing at any given
time what is happening underneath the hood. Although the code is written primarily in C++, minimal use has been made of
advanced C++ facilities (such as generic programming and virtual functions) that
have the tendency to make the code opaque.
Instead, emphasis has been placed upon simplicity and ease of use, so that even
beginning C++ programmers, or advanced C programmers, should find the library
painless to learn. An attempt has been made to maintain a clean and
consistent interface to facilitate such use. Behind this interface, the
actual implementation is a combination of code written from scratch and code
borrowed from other open-source libraries, such as OpenCV and the GNU Scientific
Library (GSL).
Blepo is designed to meet the following three criteria:
- Easy to use. A new user should be able to start using the library in a short amount of time, without
a steep learning curve. The syntax should be clean, readable, and easy to remember. Low-level details such as memory management
should, as much as possible, be handled automatically.
- Efficient. Speed should not be sacrificed in order to achieve ease of use. Because of the overwhelming
amount of data in computer vision, the library should be able to process such data efficiently.
- Extensive. To maximize the usefulness of the library, its scope should be broad. Routines for general
functions (e.g., accessing pixels, reading/writing image files, displaying
images, image processing, linear algebra) common to all researchers in the field should be included,
as well as higher-level algorithms (e.g., texture, tracking, segmentation, stereo) across the spectrum of computer vision.
These criteria are achieved through a novel combination of C and C++, taking
advantage of the strengths of each. Instead of relying exclusively upon
either the procedural paradigm (C) or the object-oriented paradigm (C++), Blepo
uses what we call the object-augmented paradigm, which is a combination
of both. The concept is rather simple, namely to provide a number of well-designed classes along with functions that operate on
those classes. In this manner, some of the functionality resides in the
methods of the classes themselves, while other functionality resides in
functions outside the classes.
To illustrate how this works, consider a
simple example. Suppose we wish to compute the connected components of an
image. In C, the natural way to do this would be to store the image in the
struct, which would have to be allocated and deallocated manually. A
function would be called to do the work:
img_gray* img = alloc_image(320, 240);
img_int* labels = alloc_image(320, 240);
connected_components(img, labels);
free_image(img);
free_image(labels);
Having to allocate and deallocate the memory manually is not only tedious but also dangerous because it can easily lead to memory violations
or memory leaks. Moreover, the user has to know how much memory to allocate for the output, which
in turn requires
knowing something about the connected components algorithm.
Using C++, we can hide the memory allocation and deallocation in the constructor and destructor, respectively, leading to much
cleaner code. However, in C++'s object-oriented approach, there are three possible
ways of making connected components a method of a class:
Option #1
| Option #2
| Option #3
|
ImgGray img(320, 240);
ImgInt labels(320, 240);
img.ConnectedComponents(&labels);
|
ImgGray img(320, 240);
ImgInt labels(320, 240);
labels.ConnectedComponents(&img);
| ImgGray img(320, 240);
ImgInt labels(320, 240);
ConnectedComponentsEngine cc;
cc.DoIt(img, &labels);
|
All of these alternatives leave the programmer dissatisfied, because none of them appears to be a natural formulation.
Our approach is simply to retain the classes but provide a function outside them:
ImgGray img(320, 240);
ImgInt labels;
ConnectedComponents(img, &labels);
Here, the syntax is clean, and the ordering of the parameters is natural (input
before output). Only the memory for the input needs to be allocated before
calling the function, because the function itself allocates the memory for the
output. (But if the output has already been allocated, then the function
skips the allocation, so that no penalty is incurred.) All the memory is
automatically deallocated when the objects fall out of scope. Although
this memory allocation and deallocation happen automatically, they happen at
definite places in the code, so that the user remains in complete control by
paying attention to when the constructor and destructor are called.
By passing all images as references or pointers, the
resulting code is as efficient as possible. Memory is only allocated when
needed, and the user is free to reuse memory that has been allocated. In
contrast, reference counting (another option under C++) is not able to guarantee
this benefit:
ImgGray img(320, 240);
ImgInt labels = ConnectedComponents(img);
With reference counting, the function allocates the memory for the output,
then the memory is assigned to the variable 'labels' without reallocating.
But because the function does not know about 'labels', it will allocate the
memory no matter what, causing an inefficiency when 'labels' has already
been allocated. Reference counting has the added drawback that the
assignment operator is unnatural to interpret, because the code img2 =
img1 does not actually copy the data but rather causes both images to point to
the same block of data, which is confusing. In Blepo, the assignment
operator makes an exact replica of img1, while the built-in C++ mechanism of
references is used to cause two variables to point to the same block of data, if
that is desired.
Comparison with other libraries
A number of libraries have appeared over the years to facilitate computer
vision research, including the following:
- Matlab. Although designed as a generic platform for matrix analysis, Matlab is popular with computer vision researchers because it is extremely easy to use and is an excellent platform for prototyping quick ideas. Nevertheless it is extremely computationally inefficient; its visualization capabilities are not tailored for image sequences;
and it is not suited for large projects due to the lack of advanced software
features.
- OpenCV.
This open-source library has become the most popular computer vision library to
date. It contains scores of useful computer vision functions and runs on Windows or Linux.
One drawback is that the code is primarily written in C using structs, often
leaving the user with the burden of mundane low-level tasks such as memory
management and type safety. Blepo provides an easy interface to many
OpenCV routines.
- IPL. This library contained an extensive collection of image processing functions
(but no computer vision routines), all hand-optimized by Intel programmers for various Pentium processors using MMX assembly language. Despite being free (no cost), this library was not open-source, and it is no longer available.
- IPP. As the successor to the Image Processing Library (IPL) and the Signal Processing Library (SPL), this library contains a large number of functions for image processing, signal processing, and small matrix analysis, along with a few computer vision routines, all hand-optimized for Pentium processors using MMX, SSE, and SSE2 assembly language.
The library is written completely in C, leaving memory management to the user.
The library is neither open-source nor free (no cost).
- CImg, cool image (David Tschumperlé).
An impressive library written as a single header file with a simple and
intuitive interface. Includes functions for file reading/writing, image
display, basic image processing, and 3D visualization. It is highly
portable and released under the CeCILL-C License license (LGPL-like).
- vxl. Aiming to be for computer vision
what OpenGL is for graphics, this extensive open-source library (including numerics and display
as well as image processing and some computer vision) works on Windows or Linux.
The extensive use of templates makes for somewhat awkward syntax,
there are no SIMD operations for efficient low-level processing,
- ImLib3D. This is a much smaller library written for 3D medical imaging on Linux. It has a clean syntax, uses templates and iterators, and interfaces with the shell for non-compiled use. It is released under the GNU GPL.
- vigra. This small library, written as part of a Ph.D. thesis, explores the application of advanced object-oriented and generic programming techniques such as templates, iterators, functors, and data accessors to computer vision. These techniques make the code very difficult for an outsider to read or use, and the license is not GPL-compatible.
- XVision.
- vista.
- VisLib.
- DARPA IUE.
- Khoros
- VisionLab (Netherlands)
- Diamond3D (MERL)
- Microsoft Vision SDK
- HIPR
-- Hypermedia Image Processing Reference, Java
- LTI-Lib
- CMVision
- BV-Tool (split and merge)
- Generic Image Library (GIL) from Adobe
- Imalab (Augustin Lux, Machine Vision and Applications 2004)
- CIMPL Numerical Performance Library (Baris Sumengen) Efficient and easy to use. Version 0.1.
- ImgSource. A commercial
image processing package for reading/writing images, displaying them on the screen, and
manipulating them for human viewing.
- CVIPtools
- ITK
-
RAVL (Recognition and Vision Library)
-
Boost generic image library
-
AllSeeingI (ASI). Visual programming environment.
- Etc. (ImageLib, VTK, ...)
Extensive list of vision software