Another important subject within computer vision is image segmentation. It is the process of dividing an image into different regions based on the characteristics of pixels to identify objects or boundaries to simplify an image and more efficiently analyze it. Segmentation impacts a number of domains, from the filmmaking industry to the field of medicine. For instance, the software behind green screens implements image segmentation to crop out the foreground and place it on a background for scenes that cannot be shot or would be dangerous to shoot in real life. Image segmentation is also used to track objects in a sequence of images and to classify terrains, like petroleum reserves, in satellite images. Some medical applications of segmentation include the identification of injured muscle, the measurement of bone and tissue, and the detection of suspicious structures to aid radiologists (Computer Aided Diagnosis, or CAD).
One way to view segmentation is clustering, where pixels sharing certain features such as color, intensity, or texture are grouped together and represented as a single entity.
The Gestalt theory provided an approach to image segmentation with a basis in psychology, accounting for factors that cause people to see things as a "unified whole" to extract meaningful information. The fundamental viewpoint of the German Gestalt theorists can be summarized by what Gestalt psychologist Kurt Koffka stated, "The whole is other than the sum of the parts" (often incorrectly translated into "The whole is greater than the sum of the parts").
They identified proximity, similarity, common fate, common region, parallelism, symmetry, continuity, and closure as the factors that make people group certain visual elements together. However, it is difficult to translate these factors into algorithms.
The goal of clustering is to group the pixels in an image into "clusters" that are similar in some respect. For example, clustering could involve choosing a certain number of cluster centers, which are representative of the different pixel intensities in the image and assigning each pixel in the image to the center it is the closest or the most similar to.
There are several clustering methods, three of which are introduced below. In the examples, we are measuring "similarity" by Euclidean distance.
Hierarchical Agglomerative Clustering (HAC): a bottom-up algorithm
With K = 5 clusters:
K means: a top-down algorithm
- Initialize K clusters randomly, or greedily choose K.
- Assign each pixel to the closest center.
- Update cluster centers by computing the average of the pixels in the cluster.
- Repeat steps 2 and 3 until no pixels change cluster centers.
With K = 5 clusters:
Mean Shift
- Initialize a random seed and window W.
- Calculate the center of gravity (the "mean") of W.
- Shift W to the mean.
- Repeat steps 2 and 3 until convergence.
To read more about the mean shift algorithm, click here.
HAC vs. K means vs. Mean Shift
- Unlike K means and mean shift, HAC provides a hierarchy of clusters. However, the clusters may be imbalanced, and you still need to specify the number of clusters, as is the case with K means.
- K means finds cluster centers that are a good representation of the data. However, it is prone to being affected by outliers and local minima and can be slow in runtime. Because of this, K means is rarely used for pixel segmentation.
- Contrary to K means, mean shift is robust to outliers. However, the output depends on window size, and similar to K means, mean shift can be computationally expensive.
Grab Cat
A successful segmentation of an image should allow us to separate objects from the background and transfer them from one image to another. Here, we can segment pictures of cats using the K means algorithm (K = 5) and transfer the adorable felines onto different backgrounds.
Picture and background:
Segmented: The pixels are partitioned into five groups, shown below. You can select the groups that form the kitten portion, discard the background (image 5, in this case), and move the segmented parts onto a separate background.
Final result: A kitty sleeping on a leaf!
Picture and background:
Segmented: Only the segmented pixels in image 1 need to be removed since the rest are a part of the kitten.
Final result: A gargantuan kitten on a beach!