Laser Range Data Classification Using AMNs

In this project, we applied Associative Markov Networks to the the task of terrain classification. Terrain classification is very useful for autonomous mobile robots in real-world environments for path planning, target detection, and as a pre-processing step for other perceptual tasks.

Dataset

The Stanford Segbot Project has provided us with a laser range maps of the Stanford campus collected by a moving robot equipped with SICK2 laser sensors. The data consists of around 35 million points, represented as 3D coordinates in an absolute frame of reference. Thus, the only available information is the location of points.

Here is a snapshot of an overhead view of the Stanford Quad:

Our goal will be to classify the laser range points into four classes: ground, building, tree, and shrubbery. Since classifying ground points is trivial given their absolute z-coordinate, we classify ground points deterministically by thresholding the z coordinate at a value close to 0. After we do that, we are left with approximately 20 million non-ground points.

Features

Recall that each point is represented simply as a location in an absolute 3D coordinate system. That's why the features we use require pre-processing to infer properties of the local neighborhood of a point, such as how planar it is, or how much of the neighbors are close to the ground. The features described below are used in both flat and joint classification.

The features we use should be invariant to rotation in the x-y plane, as well as the density of the range scan, since scans tend to be sparser in regions farther from the robot.

PCA-based features: for each point we sample 100 points in a ball of radius 0.5 m. We then run PCA, to get the plane of maximum variance (spanned by the first two principal components.) We then form a 3x3x3 cube around the neighborhood of the point oriented with respect to the principal plan, and compute the percentage of points lying in the various sub-cubes. We use a number of features derived from the cube such as the percentage of points in the central column, the outside corners, the central plane, etc. These features capture the local distribution well and are especially useful in finding planes.
Vertical features: for each point we also take a cylinder of radius 0.2 m, which extends vertically to include all the points in a "column" of the range scan. We then compute what percentage of the points lie in various segments of this vertical column: e.g. between 2m and 2.5m, etc.
Threshold feature we also use an indicator feature of whether or not a point lies within 2m of the ground. This feature is especially useful in classifying shrubbery.

We use a quadratic kernel over all the features in both flat and joint classification.

Training

For training we select roughly 30,000 points that represent the classes well: a segment of a wall, a tree, some bushes. The pieces don't form a coherent segment of a scene.

Flat Model

The flat model is a multi-class SVM with a quadratic kernel over the above features. Here is a sample result:

We can see that the flat model achieves reasonable performance in a lot of places, but fails to enforce local consistency of the classification predictions. For example arches on buildings and other less planar regions are consistently confused for trees, even though they are surrounded entirely by buildings.

Voted Model

A possible improvement to the flat model is to take its predictions and smooth them using voting. For each point we look at its local neighborhood (we varied the radius to get the best possible results) and assign to the point the label of the majority of its neighbors. Here is a sample result:

We can see that the voted model performs slightly better than flat: for example, it smooths out trees and some areas of the buildings. Still it fails in areas like arches of buildings where the flat classifier has a locally consistent wrong prediction.

AMN Model

As we saw above there is a lot of room for improvement in the local coherence of predictions. To do that we use a pairwise AMN over laser scan points, with associative potentials to ensure smoothness.

Edges

Each point is connected to 6 of its neighbors: 3 of them are sampled randomly from the local neighborhood in a sphere of radius 0.5m, and the other 3 are sampled at random from the vertical cylinder column of radius 0.2m. It is important to ensure vertical consistency since a lot of the time a flat classifier is wrong in areas that are higher of the ground (due to the decrease in point density) or because objects tend to look different as we vary their z-coordinate (for example tree trunks and tree crowns look different.) While we experimented with a variety of edge features including various distances between points, we found that even using a only a bias feature performed well.

Learning

Since the training data consists of around 30,000 points (and hence 180,000 edges), we used CPLEX to learn the models (see the paper for details).

Inference

For inference, we used mincut combined with the alpha-expansion algorithm proposed by Boykov, et. al. (Y. Boykov, O. Veksler, R. Zabih, "Fast Approximate Energy Minimization via Graph Cuts"). The ideas used in posing MAP-inference as a graph-cut problem are similar to the ones in "What Energy Functions can be Minimized via Graph Cuts?" Kolmogorov and Zabih. We also used Kolmogorov's modified mincut algorithm (using two trees to search for augmented paths).

We split up the dataset into 16 square (in the xy-plane) regions, the largest of which contains around 3.5 million points. The implementation is largely dominated by I/O time, with the actual mincut taking less than a minute even for the largest segment.

Results

Here is a sample result achieved with the AMN model:

We can see that the predictions are much smoother: for example building arches and tree trunks are predicted correctly. Here is another example:

Here is a VRML fly-through animation of the same results:

People

Vassil Chatalbashev
Dinkar Gupta
Ben Taskar
Mike Montemerlo