Blepo Overview

Blepo contains an extensive list of classes and functions for reading/writing image files, displaying images and visualizing data, low-level image processing, higher-level computer vision, and linear algebra. The latest version of Blepo contains the following functionality:

Images:
- image classes (8-bit graylevel, 24-bit blue-green-red, 1-bit binary, integer, single-precision floating point)
- load / save image file (BMP, PGM/PPM, JPEG)
- save image to EPS file (helpful for including images in Latex documents)
- get / set (individual pixels and rectangular subimages)
- bitwise logical operations (and, or, xor, not)
- convert between image types
- comparison (equal, not equal, less than, greater than, less than or equal, greater than or equal)
- resample, downsample, upsample
- bilinear interpolation
Image processing:
- correlation, convolution
- gradient (Prewitt, Sobel, Gaussian, magnitude)
- median filter
- morphological operations (erode, dilate, grayscale erode and dilate)
- floodfill
- connected components
- Chamfer distance
- FFT / inverse FFT
Computer vision:
- Lucas-Kanade feature detection and tracking
- Canny edge detection
- Viola-Jones face detection
- Mean shift segmentation
- Split-and-merge segmentation
- Watershed segmentation
- Elliptical head tracking
- Camera calibration
Matrices:
- matrix classes (double-precision floating point)
- create identity, random matrices
- diag
- add, subtract, multiply, negate, transpose matrices
- Euclidean norm of a vector
- comparison
Linear algebra:
- decomposition (SVD, QR, LU)
- solve linear equation
- eigenvalues and eigenvectors
- determinant, inverse
Display:
- easy-to-use figure class
- display image in window on screen with mouse coordinates
- resize window
- get mouse input from image window (with wait, without wait, point, rect, etc.)
- file open and save directly from window
- draw line, rect, circle, ellipse, elliptic arc
Capture
- real-time capture of live video from single webcam (Logitech Quickcam Pro 4000) using DirectShow
- real-time capture of live video from single IEEE 1394 camera
- real-time capture of live video from DataTranslation DT3120 color framegrabber

Design

All of the source code is written is C/C++, with some low-level operations being written in assembly language to take advantage of computationally efficient SIMD operations (MMX/SSE/SSE2). The library uses the facilities of C++ to automatically handle the allocation and deallocation of memory, thus minimizing the possibility of memory leaks or invalid memory accesses; and yet this management is done in a fairly transparent way, without garbage collection or reference counting, so that programmers who desire control over the details should feel comfortable in knowing at any given time what is happening underneath the hood. Although the code is written primarily in C++, minimal use has been made of advanced C++ facilities (such as generic programming and virtual functions) that have the tendency to make the code opaque. Instead, emphasis has been placed upon simplicity and ease of use, so that even beginning C++ programmers, or advanced C programmers, should find the library painless to learn. An attempt has been made to maintain a clean and consistent interface to facilitate such use. Behind this interface, the actual implementation is a combination of code written from scratch and code borrowed from other open-source libraries, such as OpenCV and the GNU Scientific Library (GSL).

Blepo is designed to meet the following three criteria:

Easy to use. A new user should be able to start using the library in a short amount of time, without a steep learning curve. The syntax should be clean, readable, and easy to remember. Low-level details such as memory management should, as much as possible, be handled automatically.
Efficient. Speed should not be sacrificed in order to achieve ease of use. Because of the overwhelming amount of data in computer vision, the library should be able to process such data efficiently.
Extensive. To maximize the usefulness of the library, its scope should be broad. Routines for general functions (e.g., accessing pixels, reading/writing image files, displaying images, image processing, linear algebra) common to all researchers in the field should be included, as well as higher-level algorithms (e.g., texture, tracking, segmentation, stereo) across the spectrum of computer vision.

These criteria are achieved through a novel combination of C and C++, taking advantage of the strengths of each. Instead of relying exclusively upon either the procedural paradigm (C) or the object-oriented paradigm (C++), Blepo uses what we call the object-augmented paradigm, which is a combination of both. The concept is rather simple, namely to provide a number of well-designed classes along with functions that operate on those classes. In this manner, some of the functionality resides in the methods of the classes themselves, while other functionality resides in functions outside the classes.

To illustrate how this works, consider a simple example. Suppose we wish to compute the connected components of an image. In C, the natural way to do this would be to store the image in the struct, which would have to be allocated and deallocated manually. A function would be called to do the work:

img_gray* img = alloc_image(320, 240);
img_int* labels = alloc_image(320, 240);
connected_components(img, labels);
free_image(img);
free_image(labels);

Having to allocate and deallocate the memory manually is not only tedious but also dangerous because it can easily lead to memory violations or memory leaks. Moreover, the user has to know how much memory to allocate for the output, which in turn requires knowing something about the connected components algorithm.

Using C++, we can hide the memory allocation and deallocation in the constructor and destructor, respectively, leading to much cleaner code. However, in C++'s object-oriented approach, there are three possible ways of making connected components a method of a class:

Option #1 Option #2 Option #3

ImgGray img(320, 240); ImgInt labels(320, 240); img.ConnectedComponents(&labels);

ImgGray img(320, 240); ImgInt labels(320, 240); labels.ConnectedComponents(&img);

ImgGray img(320, 240); ImgInt labels(320, 240); ConnectedComponentsEngine cc; cc.DoIt(img, &labels);

All of these alternatives leave the programmer dissatisfied, because none of them appears to be a natural formulation.

Our approach is simply to retain the classes but provide a function outside them:

ImgGray img(320, 240);
ImgInt labels;
ConnectedComponents(img, &labels);

Here, the syntax is clean, and the ordering of the parameters is natural (input before output). Only the memory for the input needs to be allocated before calling the function, because the function itself allocates the memory for the output. (But if the output has already been allocated, then the function skips the allocation, so that no penalty is incurred.) All the memory is automatically deallocated when the objects fall out of scope. Although this memory allocation and deallocation happen automatically, they happen at definite places in the code, so that the user remains in complete control by paying attention to when the constructor and destructor are called.

By passing all images as references or pointers, the resulting code is as efficient as possible. Memory is only allocated when needed, and the user is free to reuse memory that has been allocated. In contrast, reference counting (another option under C++) is not able to guarantee this benefit:

ImgGray img(320, 240);
ImgInt labels = ConnectedComponents(img);

With reference counting, the function allocates the memory for the output, then the memory is assigned to the variable 'labels' without reallocating. But because the function does not know about 'labels', it will allocate the memory no matter what, causing an inefficiency when 'labels' has already been allocated. Reference counting has the added drawback that the assignment operator is unnatural to interpret, because the code img2 = img1 does not actually copy the data but rather causes both images to point to the same block of data, which is confusing. In Blepo, the assignment operator makes an exact replica of img1, while the built-in C++ mechanism of references is used to cause two variables to point to the same block of data, if that is desired.

Comparison with other libraries

A number of libraries have appeared over the years to facilitate computer vision research, including the following:

Matlab. Although designed as a generic platform for matrix analysis, Matlab is popular with computer vision researchers because it is extremely easy to use and is an excellent platform for prototyping quick ideas. Nevertheless it is extremely computationally inefficient; its visualization capabilities are not tailored for image sequences; and it is not suited for large projects due to the lack of advanced software features.
OpenCV. This open-source library has become the most popular computer vision library to date. It contains scores of useful computer vision functions and runs on Windows or Linux. One drawback is that the code is primarily written in C using structs, often leaving the user with the burden of mundane low-level tasks such as memory management and type safety. Blepo provides an easy interface to many OpenCV routines.
IPL. This library contained an extensive collection of image processing functions (but no computer vision routines), all hand-optimized by Intel programmers for various Pentium processors using MMX assembly language. Despite being free (no cost), this library was not open-source, and it is no longer available.
IPP. As the successor to the Image Processing Library (IPL) and the Signal Processing Library (SPL), this library contains a large number of functions for image processing, signal processing, and small matrix analysis, along with a few computer vision routines, all hand-optimized for Pentium processors using MMX, SSE, and SSE2 assembly language. The library is written completely in C, leaving memory management to the user. The library is neither open-source nor free (no cost).
CImg, cool image (David Tschumperlé). An impressive library written as a single header file with a simple and intuitive interface. Includes functions for file reading/writing, image display, basic image processing, and 3D visualization. It is highly portable and released under the CeCILL-C License license (LGPL-like).
vxl. Aiming to be for computer vision what OpenGL is for graphics, this extensive open-source library (including numerics and display as well as image processing and some computer vision) works on Windows or Linux. The extensive use of templates makes for somewhat awkward syntax, there are no SIMD operations for efficient low-level processing,
ImLib3D. This is a much smaller library written for 3D medical imaging on Linux. It has a clean syntax, uses templates and iterators, and interfaces with the shell for non-compiled use. It is released under the GNU GPL.
vigra. This small library, written as part of a Ph.D. thesis, explores the application of advanced object-oriented and generic programming techniques such as templates, iterators, functors, and data accessors to computer vision. These techniques make the code very difficult for an outsider to read or use, and the license is not GPL-compatible.
XVision.
vista.
VisLib.
DARPA IUE.
Khoros
VisionLab (Netherlands)
Diamond3D (MERL)
Microsoft Vision SDK
HIPR -- Hypermedia Image Processing Reference, Java
LTI-Lib
CMVision
BV-Tool (split and merge)
Generic Image Library (GIL) from Adobe
Imalab (Augustin Lux, Machine Vision and Applications 2004)
CIMPL Numerical Performance Library (Baris Sumengen) Efficient and easy to use. Version 0.1.
ImgSource. A commercial image processing package for reading/writing images, displaying them on the screen, and manipulating them for human viewing.
CVIPtools
ITK
RAVL (Recognition and Vision Library)
Boost generic image library
AllSeeingI (ASI). Visual programming environment.
Etc. (ImageLib, VTK, ...) Extensive list of vision software