Classification of Clothing using Interactive Perception

Bryan Willimon, Stan Birchfield, and Ian Walker


Laundry is a daily routine throughout the world for people of all walks of life. While this routine was changed significantly more than a century ago when the tasks of washing and drying were automated by modern-day appliances, the remaining tasks of sorting and folding clothes are still performed manually even today as they have been for thousands of years. Nevertheless, with the recent explosion of interest and development in household service robots, there is a realistic possibility that these remaining parts of the laundry process may be automated within the coming generations.

Laundry is a common household chore that is a difficult problem to automate. The process of "doing the laundry" consists of several steps: handling, washing, drying, isolating/extracting, classifying, unfolding/flattening, folding, and putting it away into a predetermined drawer or storage unit. We present a system for automatically extracting and classifying items in a pile of laundry. Using only visual sensors, the robot identifies and extracts items sequentially from the pile. When an item has been removed and isolated, a model is captured of the shape and appearance of the object, which is then compared against a database of known items. The classification procedure relies upon silhouettes, edges, and other low-level image measurements of the articles of clothing.  This research builds on our earlier work involving interactive perception.


Our laundry-handling system involves several parts:

(1). Isolating / Extracting
(2). Classifying

(1). Isolation / Extraction: To isolate an item from the pile, an overhead image is first segmented, and the closest foreground segment (measured using stereo disparity) is selected. Chamfering is used to determine the grasp point which is then used to extract the item from the pile in an fully automatic way using interactive perception. (2). Classification: Once the robot removes and isolates an article of clothing from the pile, the arm lifts and swings so that the article hangs freely without touching the table or ground. This open area is monitored by a third side-facing camera that captures both a "frontal view" and a "side view" image of the article, using the robot to rotate 90 degrees about the vertical axis between images. These two views are subtracted from a background image to obtain two binary silhouettes of the clothing.

Experimental results

The proposed approach was applied in a number of different scenarios to test its ability to perform practical interactive perception. A PUMA 500 robotic arm was used to interact with the objects, which rested upon a flat table with uniform appearance. The objects themselves and their type were unknown to the system. The entire system, from image input to manipulation to classification, is automatic.

Isolation / Extraction Experiment

The extraction and isolation process. From top to bottom:

(1) The image taken by one of the downward-facing stereo cameras
(2) The result of graph-based segmentation
(3) The object found along with its grasp point (red dot)
(4) The image taken by the side-facing camera
(5-6) The binary silhouettes of the front and side views of the isolated object.

Time flows from left to right, showing the progress as each individual article of clothing is removed from the pile and examined.


Classification Experiment


With 6 categories, 5 items per category, and 20 images per item, the database collected by the extraction / isolation procedure consists of 600 images. This database was labeled in a supervised learning manner so that the corresponding category of each image was known. Eight tests were used to compare the training and test images.

We conducted three experiments:

(1) Leave-one-out classification
(2) Train-and-test classification
(3) Comparing interaction vs. non-interaction

(1) In leave-one-out classification, each of the 600 images was compared with the remaining 599 images in the database. If the nearest neighbor among these 599 was in the same category as the image, then the classification was considered a success, otherwise a failure.


(2) In train-and-test classification, three articles of clothing from each category were selected for the training set and the remaining two articles were used for the test set. Therefore, each image was compared with the 360 images in the training set, and the category of the nearest neighbor among these images was compared with the category of the image to determine success or failure.


(3) The process of interacting with each article of clothing provided the system with multiple views using various grasp locations, allowing the system to collect 20 total images of each object. Therefore, in the next experiment we compared features from all 20 images of each object with the remaining images in the database.




This research was supported by the U.S. National Science Foundation under grants IIS-1017007, IIS-0844954, and IIS-0904116.