Choosing a representation brings us only part of the way to a successful texture segmentation. There are many more choices that must be made in the design of an algorithm. Some of these decisions are specific to the model chosen, but other decisions apply to segmentation in general, and it is these issues that will be discussed here.

One of the most fundamental choices is that between supervised texture
segmentation and unsupervised texture segmentation. The difference between
the two can be summed up in the *a priori* knowledge about the specific
task
that the algorithm is trying to address. If it is believed that the number
of different textures is small, and that the textures are distinct from each
other, then we can delineate small regions of homogeneous texture, extract
feature vectors using our model, and use these vectors as ``fixed points'' in
feature space. Now all vectors can be labeled by assigning them the label
of their closest neighbor that is a fixed point. By using neural networks or
other machine learning algorithms, the system can be told when it makes
mistakes, and it can adjust its model accordingly.

If the number of possible textures is too large, or if no assumptions can be made about the type of textures to be presented to the system, then unsupervised methods must be used. Whereas before each feature vector could be assigned to a class as it was generated, here statistical analysis must be performed on the entire distribution of vectors. The goal is to recognize clusters in the distribution and assign the same label to them all. In general, this is a much harder task to accomplish.

However, neither method will work properly in the absence of a function that can take two feature vectors and compute a number which represents the perceptual distance between the two windows. If the parameters are homogeneous it may be possible to use the simple Euclidean metric to compute distance. However, it is almost always the case that a more complicated function must be used, one that at the very least normalizes the differences between different components. And yet, some components may be more important for discrimination purposes than others, so it makes sense to weight them more highly. Often this leads to a distance function that is a heuristic based on experimentation and intuition rather than formal mathematical principles.

Up until this point we have ignored the physical geometry of the image itself. We may choose a good representation and have an intelligent distance function, but due to image noise or imperfections in our model we may wind up with an unsatisfactory segmentation because we have not yet taken into account a simple assumption: much of the time, the label at a pixel and the label of an adjacent pixel are the same. Without this assumption, we are often likely to have many small ``holes'' in a large region, or a slew of small, oddly shaped regions.

In determining whether such a segmentation is ``correct'' or ``good,'' we must ask a question: what do we expect to get from a texture segmentation algorithm? This question has no single answer for all tasks, but we can make a reasonable approximation. A segmentation should have a small number of regions (perhaps the famous constant 7 +/- 2 often mentioned in psychological literature), most of which are fairly large, fairly convex, and almost always simply connected.

There are many ways to try to satisfy such constraints. One method is called split-and-merge. The algorithm recursively splits an image region (which starts out as the entire image) by dividing it into quarters. Feature vectors are computed for each block. If the four blocks are judged to be similar, they are grouped back together and the process ends. If not, each of the blocks are recursively divided and analyzed using the same procedure. When we are finished, there will be many different sized blocks, each of which has a homogeneous texture according to our model. However, the arbitrary division of the image into rectangles (called a ``quad-tree'' decomposition) might have accidentally split up regions of homogeneous texture. The merge step, then, tries to fix this by looking at adjacent regions that were not compared to each other during the split phase, and merging them if they are similar. The resulting regions will not be rectangular, but hopefully they will give a satisfactory result.

Another method is called the voting method. It is customary to take whatever operator has been designed and apply it to as many different areas of the image as possible. When a feature vector is computed, the associated label that is derived is usually applied to the pixel at the center of the window chosen for the calculation. The voting method builds on this by noting that each pixel is used in many different windows. Therefore, each window, upon deciding on a label, can ``cast a vote'' at each pixel in its window. After all computations have been carried out, each pixel will contain a certain number of votes for each possible label, and the label with the most votes becomes the label for the pixel. This requires more computations than are used in split-and-merge, but the resulting regions have boundaries that appear to be smooth rather than blocky.

Fri Sep 5 16:40:07 PDT 1997