In this paper, we examine a method for feature subset selection based
on Information Theory.  Initially, a framework for defining the
theoretically optimal, but computationally intractable, method for
feature subset selection is presented.  We show that our goal should be to
eliminate a feature if it gives us little or no additional information
beyond that subsumed by the remaining features.  In particular, this will be
the case for both irrelevant and redundant features. We then give an efficient
algorithm for feature selection which computes an approximation to the
optimal feature selection criterion.  The conditions under which the
approximate algorithm is successful are examined.  Empirical results
are given on a number of data sets, showing that the algorithm effectively
handles datasets with large numbers of features.
<p>