My home page
My group
Professional activities

Daphne Koller Publications

Hierarchically classifying documents using very few words (1997)

by D. Koller and M. Sahami

Abstract: The proliferation of topic hierarchies for text documents has resulted in a need for tools that automatically classify new documents within such hierarchies. Existing classification schemes which ignore the hierarchical structure and treat the topics as separate classes are often inadequate in text classification where the there is a large number of classes and a huge number of relevant features needed to distinguish between them. We propose an approach that utilizes the hierarchical topic structure to decompose the classification task into a set of simpler problems, one at each node in the classification tree. As we show, each of these smaller problems can be solved accurately by focusing only on a very small set of features, those relevant to the task at hand. This set of relevant features varies widely throughout the hierarchy, so that, while the overall relevant feature set may be large, each classifier only examines a small subset. The use of reduced feature sets allows us to utilize more complex (probabilistic) models, without encountering many of the standard computational and robustness difficulties.

Download Information

D. Koller and M. Sahami (1997). "Hierarchically classifying documents using very few words." Proceedings of the 14th International Conference on Machine Learning (ICML) (pp. 170-178). pdf ps.gz

Bibtex citation

  title = {Hierarchically classifying documents using very few words},
  author = {D. Koller and M. Sahami},
  booktitle = {Proceedings of the 14th International Conference on Machine Learning (ICML)}, 
  address = {Nashville, Tennessee}, 
  month = {July},
  year = 1997, 
  pages = {170--178},

full list
Click to go to robotics Click to go to theory Click to go to CS Stanford Click to go to Stanford's Webpage
home | biography | research | papers | my group
courses | professional activities | FAQ | personal