Alpha Estimation in Natural Images

Mark A. Ruzon

Objects in the world always occlude something behind them, even if it is nothing more interesting than the sky.  The boundary between an object and the occluded background is often easy to model as a curve projected onto the image plane.  However, this boundary is more complicated when we look at the pixels in an image.  Since pixels are patches rather than points, they often pick up colors from both an object and its background.  For most objects in the world, this is not really a problem.  For natural objects such as trees, smoke, water, and hair, many pixels might receive light from multiple objects, and if we were to extract such an object and place it on a different background, it would appear fake.  The tree example shown here has hundreds of pixels that contain some combination of blue and green.

Many people in diverse fields like computer graphics, film and television, and remote sensing have already noticed this problem.  In graphics, the alpha value of a pixel is often added to denote how much an object contributes to the color of a boundary pixel, which allows proper rendering of overlapping objects.  If we start from an image rather than a 3D model, however, we must estimate alpha.

This page gives a brief summary of the technique and is intended to supplement the paper, published in the IEEE Conference on Computer Vision and Pattern Recognition 2000, whose images unfortunately had to be displayed without color.  Readers are referred to the paper for more information.

## Specifying Object and Boundary Regions

Because the boundary of a natural object can have arbitrary topology, we cannot expect always to be able to find an approximation to it using an edge detector or a segmentation algorithm.  Therefore, we must specify the boundary, and we are allowed to be somewhat coarse in our specification.

Often the boundary can be specified, either through an edge detector, a segmentation algorithm, or a boundary-drawing tool such as Intelligent Scissors.  This boundary can be dilated into a ribbon of appropriate thickness, as shown here:

Of course, this boundary region does not catch all the blue in the tree, which will cause problems later.

Another way is to use a paint program to specify which parts of the image are "pure," meaning the colors they
contain belong only to one of the objects.  This is a more flexible method, but not without its drawbacks.  Here is the tree example again:

Regardless of the method used, the purpose is to segment the image into three regions so that we can segment color space into three regions.  Conceptually, mapping the tree and sky regions into color space looks like this:

We assume that all pixels along the boundary fall somewhere in the uncolored region, between the two distributions of object colors.  We also assume that the two object regions do not intersect in color space, otherwise there will be ambiguities in the computation.

## Estimating Alpha

The two "frontiers" of the color distributions are treated as probability distributions in color space.  Each probability distribution is modeled as a set of Gaussians.  For an arbitrary color between the two distributions, we compute its alpha value through maximum likelihood estimation.  We linearly interpolate between the two distributions to find the intermediate distribution that maximizes the value at that color.  Since the interpolation is parameterized to [0,1], the value of the parameter gives the alpha value.

After computing the "unmixed" colors, one from each object, we are done with that pixel.  To move an object from one image to another, we use the background of the new object to mix with the estimated foreground color in the proportion specified by each pixel's alpha value.

Of course, more details are in the paper.

## Results

Here are the results for the tree using the two different input specification methods:

As expected, the object specification method produces better results.  As a comparison, we gave (nearly) the same input to the Extractor tool in Adobe Photoshop 5.6 (TM), which does the same thing.  Here is the result:

The background used is slightly different, but the tree is very different in appearance.

When the background consists only of sky, however, the algorithm is equivalent to blue screen matting, a problem extensively studied in the film and television industry.  The advantage of this technique is that it works against more complicated backgrounds.

Here is another tree example, but the background is also trees:

Again, the object specification method is used.  Some of the specular reflections in the background are brought into the foreground, and some of the dark twigs connecting the leaves were sent into the background, but the rendering is realistic.

Here is an example that does not work well using the object specification method.  There are many colors in the background behind this smoke plume, including those of another smoke plume.  By using Intelligent Scissors, we can extract the boundary and dilate it different amounts (the big black square is one large dilation in order to encompass the hole in the smoke.

Finally, we have a woman whose hair is being blown about by the wind.  The boundary must definitely be dilated by different amounts both to capture all the hair and to exclude the riverbank, which is of the same color as the hair.

This one appears to be a crowd favorite.

## Conclusions

The assumptions made here are general, but they do not seem to be quite general enough.  A few artifacts are visible here and there, mainly because it is difficult to get a good separation between the colors of each object, for instance, if both contain specular reflections.  Nevertheless, the technique expands the power of image extraction tools well beyond those that assume that a boundary is a curve that happens to be a little blurred.

## Reference

M. Ruzon and C. Tomasi, "Alpha Estimation in Natural Images," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Volume I, pp. 18-25, Hilton Head Island, SC, June 2000. [PS] [PDF]

## Postscript

In the years following the publication of this paper, many improvements have been made by many different researchers. In particular, natural image matting has been extended to video. Most of the articles have been published in SIGGRAPH as opposed to CVPR. Most researchers acknowledge this work as the seminal paper on natural image matting.

Page maintained by mark34@cs.stanford.edu