My home page
Biography
Research
Publications
My group
Courses
Professional activities
FAQ
Personal
Papers

Daphne Koller Publications

Learning on the Test Data: Leveraging `Unseen' Features (2003)

by B. Taskar, M.-F. Wong, and D. Koller


Abstract: This paper addresses the problem of classification in situations where the data distribution is not homogeneous: Data instances might come from different locations or times, and therefore are sampled from related but different distributions. In particular, features may appear in some parts of the data that are rarely or never seen in others. In most situations with non-homogeneous data, the training data is not representative of the distribution under which the classifier must operate. We propose a method, based on probabilistic graphical models, for utilizing unseen features during classification. Our method introduces, for each such unseen feature, a continuous hidden variable describing its influence on the class whether it tends to be associated with some label. We then use probabilistic inference over the test data to infer a distribution over the value of this hidden variable. Intuitively, we learn the role of this unseen feature from the test set, generalizing from those instances whose label we are fairly sure about. Our overall probabilistic model is learned from the training data. In particular, we also learn models for characterizing the role of unseen features; these models use meta-features of those features, such as words in the neighborhood of an unseen feature, to infer its role. We present results for this framework on the task of classifying news articles and web pages, showing significant improvements over models that do not use unseen features.


Download Information

B. Taskar, M.-F. Wong, and D. Koller (2003). "Learning on the Test Data: Leveraging `Unseen' Features." Proc. Twentieth International Conference on Machine Learning (ICML). pdf ps.gz

Bibtex citation

@inproceedings{Taskar+al:ICML03,
  title = {Learning on the Test Data: Leveraging `Unseen' Features},
  author = {B. Taskar and M.-F. Wong and D. Koller},
  booktitle = {Proc. Twentieth International Conference on Machine Learning (ICML)}, 
  year = 2003,
}

full list
Click to go to robotics Click to go to theory Click to go to CS Stanford Click to go to Stanford's Webpage
home | biography | research | papers | my group
courses | professional activities | FAQ | personal