BACK

Tutorial 3: Image Segmentation

Another important subject within computer vision is image segmentation. It is the process of dividing an image into different regions based on the characteristics of pixels to identify objects or boundaries to simplify an image and more efficiently analyze it. Segmentation impacts a number of domains, from the filmmaking industry to the field of medicine. For instance, the software behind green screens implements image segmentation to crop out the foreground and place it on a background for scenes that cannot be shot or would be dangerous to shoot in real life. Image segmentation is also used to track objects in a sequence of images and to classify terrains, like petroleum reserves, in satellite images. Some medical applications of segmentation include the identification of injured muscle, the measurement of bone and tissue, and the detection of suspicious structures to aid radiologists (Computer Aided Diagnosis, or CAD).

Clustering


One way to view segmentation is clustering, where pixels sharing certain features such as color, intensity, or texture are grouped together and represented as a single entity.

The Gestalt theory provided an approach to image segmentation with a basis in psychology, accounting for factors that cause people to see things as a "unified whole" to extract meaningful information. The fundamental viewpoint of the German Gestalt theorists can be summarized by what Gestalt psychologist Kurt Koffka stated, "The whole is other than the sum of the parts" (often incorrectly translated into "The whole is greater than the sum of the parts").

Gestalt Theory


They identified proximity, similarity, common fate, common region, parallelism, symmetry, continuity, and closure as the factors that make people group certain visual elements together. However, it is difficult to translate these factors into algorithms.

Gestalt Theory


The goal of clustering is to group the pixels in an image into "clusters" that are similar in some respect. For example, clustering could involve choosing a certain number of cluster centers, which are representative of the different pixel intensities in the image and assigning each pixel in the image to the center it is the closest or the most similar to.

Clustering


There are several clustering methods, three of which are introduced below. In the examples, we are measuring "similarity" by Euclidean distance.

Hierarchical Agglomerative Clustering (HAC): a bottom-up algorithm

HAC


With K = 5 clusters:

HAC


K means: a top-down algorithm

  1. Initialize K clusters randomly, or greedily choose K.
  2. Assign each pixel to the closest center.
  3. Update cluster centers by computing the average of the pixels in the cluster.
  4. Repeat steps 2 and 3 until no pixels change cluster centers.
K means


With K = 5 clusters:

K means


Mean Shift

  1. Initialize a random seed and window W.
  2. Calculate the center of gravity (the "mean") of W.
  3. Shift W to the mean.
  4. Repeat steps 2 and 3 until convergence.
Mean Shift

To read more about the mean shift algorithm, click here.



HAC vs. K means vs. Mean Shift



Grab Cat

A successful segmentation of an image should allow us to separate objects from the background and transfer them from one image to another. Here, we can segment pictures of cats using the K means algorithm (K = 5) and transfer the adorable felines onto different backgrounds.

Picture and background:

Kitten Background

Segmented: The pixels are partitioned into five groups, shown below. You can select the groups that form the kitten portion, discard the background (image 5, in this case), and move the segmented parts onto a separate background.

Kitten Kitten Kitten Kitten Kitten

Final result: A kitty sleeping on a leaf!

Kitten


Picture and background:

Kitten Background

Segmented: Only the segmented pixels in image 1 need to be removed since the rest are a part of the kitten.

Kitten Kitten Kitten Kitten Kitten

Final result: A gargantuan kitten on a beach!

Kitten
BACK