Qualitative Vision-Based Mobile Robot Navigation

Zhichao Chen and Stan Birchfield

The ability of a mobile robot to follow a desired trajectory is necessary for many applications. For example, a courier robot may need to deliver items from one office to another in the same building, or even in a different building; a delivery robot may need to transport parts from one machine to another in an industrial setting; a robot may need to travel along a prespecified route to give a tour of a facility; or a team of robots may need to follow the path taken earlier by a scout robot.

Intuitively, the vastly overdetermined nature of the problem (thousands of pixels in an image versus one turning command output) has inspired us to seek an approach that is considerably simpler than those previously proposed. We have developed a simple technique that uses a single, off-the-shelf camera attached to the front of the robot. The technique follows the teach-replay approach in which the robot is manually led through the path once during a teaching phase and then follows the path autonomously during the replay phase. Without any calibration (even calibration for lens distortion), the robot is able to follow the path by making only qualitative comparisons between the feature coordinates computed during the teaching phase with those computed during replay. The technique does not involve Jacobians, homographies, or fundamental matrices, and it does not require an estimate of the focus of expansion.

The basic idea is illustrated below.  As the robot moves from position C to position D along a straight line, the coordinate of the projected landmark in the image plane p moves monotonically from uC to uD.   Therefore, as long as the robot remains on the path, the magnitude of the coordinate decreases and the landmark remains on the same side of the image plane.  More precisely, |uC| < |uD| and sign(uC) = sign(uD). 

This observation leads us to a simple control algorithm:  The robot continually moves forward, turning to the right whenever the first constraint |uC| < |uD| is violated and to the left whenever the second constraint sign(uC) = sign(uD) is violated.  The control decision spaced is depicted below:

Feature points are detected and tracked using the KLT tracker, and multiple features are aggregated in a voting scheme to make the final decision.  The desired path is divided into milestones separated by approximately 0.8 seconds of run time, and the robot sequentially moves toward the milestone images as it makes its way toward the goal.

We have demonstrated the technique on several indoor and outdoor experiments, with slanted surfaces and dynamic occluding objects, at distances over 100 meters.  Below are three videos of the robot successfully navigating a path.  In the video, the upper left corner shows the current image seen by the robot with feature points overlaid as red squares.  Features that are voting to turn right are outlined in green, while those voting to turn left are outlined in yellow.  The milestone image is displayed in the upper right corner, and the lower left corner shows the robot captured by a camcorder.  In the lower right corner, the white curve is the desired path captured during the teaching phase, while the red curve is the current path during the replay phase.  These curves are obtained from the odometry data, which are not used by the algorithm.  

Down a ramp (47 seconds, 8 MB)

In a parking lot (1 min. 38 sec., 20 MB)

Along a sidewalk (1 min. 30 sec., 27 MB)

Rough terrain (48 sec., 8 MB)

Recovering from obstacle avoidance (detected using sonar) (54 sec., 2 MB)

Omnidirectional camera (74 sec., 10 MB)

Visual sensing will be essential for mobile robots to progress in the direction of increased robustness, reduced cost, and reduced power consumption. If robots can be made to use computationally efficient algorithms and off-the-shelf cameras with minimal setup (e.g., no calibration), then the door is opened for robots to be widely deployed (e.g., multiple inexpensive coordinating robots). The algorithm described here is one step in this direction.


This work was partially supported by a Ph.D. fellowship from the National Institute for Medical Informatics, Medical Media Lab.