Objects in the world always occlude something behind them, even if it is nothing more interesting than the sky. The boundary between an object and the occluded background is often easy to model as a curve projected onto the image plane. However, this boundary is more complicated when we look at the pixels in an image. Since pixels are patches rather than points, they often pick up colors from both an object and its background. For most objects in the world, this is not really a problem. For natural objects such as trees, smoke, water, and hair, many pixels might receive light from multiple objects, and if we were to extract such an object and place it on a different background, it would appear fake. The tree example shown here has hundreds of pixels that contain some combination of blue and green.
Many people in diverse fields like computer graphics, film and television, and remote sensing have already noticed this problem. In graphics, the alpha value of a pixel is often added to denote how much an object contributes to the color of a boundary pixel, which allows proper rendering of overlapping objects. If we start from an image rather than a 3D model, however, we must estimate alpha.
This page gives a brief summary of the technique and is intended to
supplement the paper, published in the IEEE Conference on Computer
Vision and Pattern Recognition 2000, whose images unfortunately had to be
displayed
without color. Readers are referred to the paper for more information.
Often the boundary can be specified, either through an edge detector, a segmentation algorithm, or a boundary-drawing tool such as Intelligent Scissors. This boundary can be dilated into a ribbon of appropriate thickness, as shown here:
Of course, this boundary region does not catch all the blue in the tree, which will cause problems later.
Another way is to use a paint program to specify which parts of the
image are "pure," meaning the colors they
contain belong only to one of the objects. This is a more flexible
method, but not without its drawbacks. Here is the tree example again:
Regardless of the method used, the purpose is to segment the image into three regions so that we can segment color space into three regions. Conceptually, mapping the tree and sky regions into color space looks like this:
We assume that all pixels along the boundary fall somewhere in the uncolored
region, between the two distributions of object colors. We also assume
that the two object regions do not intersect in color space, otherwise
there will be ambiguities in the computation.
After computing the "unmixed" colors, one from each object, we are done with that pixel. To move an object from one image to another, we use the background of the new object to mix with the estimated foreground color in the proportion specified by each pixel's alpha value.
Of course, more details are in the paper.
As expected, the object specification method produces better results. As a comparison, we gave (nearly) the same input to the Extractor tool in Adobe Photoshop 5.6 (TM), which does the same thing. Here is the result:
The background used is slightly different, but the tree is very different in appearance.
When the background consists only of sky, however, the algorithm is equivalent to blue screen matting, a problem extensively studied in the film and television industry. The advantage of this technique is that it works against more complicated backgrounds.
Here is another tree example, but the background is also trees:
Again, the object specification method is used. Some of the specular reflections in the background are brought into the foreground, and some of the dark twigs connecting the leaves were sent into the background, but the rendering is realistic.
Here is an example that does not work well using the object specification method. There are many colors in the background behind this smoke plume, including those of another smoke plume. By using Intelligent Scissors, we can extract the boundary and dilate it different amounts (the big black square is one large dilation in order to encompass the hole in the smoke.
Finally, we have a woman whose hair is being blown about by the wind. The boundary must definitely be dilated by different amounts both to capture all the hair and to exclude the riverbank, which is of the same color as the hair.
This one appears to be a crowd favorite.
M. Ruzon and C. Tomasi, "Alpha
Estimation in Natural Images," In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, Volume I, pp. 18-25, Hilton Head Island, SC, June 2000.
[PS]
[PDF]
In the years following the publication of this paper, many improvements have been made by many different researchers. In particular, natural image matting has been extended to video. Most of the articles have been published in SIGGRAPH
as opposed to CVPR. Most researchers acknowledge this work as the seminal paper on natural image matting.