Learning an Agent's Utility Function by Observing Behavior (2001)by U. Chajewska, D. Koller, and D. Ormoneit
Abstract:
This paper considers the task of predicting the future decisions of an agent A based on his past decisions. We assume that A is rational  he uses the principle of maximum expected utility. We also assume that the probability distribution P he assigns to random events is known, so that we need only infer his utility function u to model his decision process. We consider the task of using A's previous decisions to learn about u. In particular, A's past decisions can be viewed as constraints on u. If we have a prior probability distribution p(u) over u (e.g., learned from a set of utility functions in the population), we can then condition on these constraints to obtain a posterior distribution q(u). We present an efficient Markov Chain Monte Carlo scheme to generate samples from p(u), which can be used to estimate not only a single "expected" course of action for A, but a distribution over possible courses of action. We show that this capability is particularly useful in a twoplayer setting where a second learning agent is trying to optimize her own payoff, which also depends on A's actions and utilities.
Download Information
U. Chajewska, D. Koller, and D. Ormoneit (2001). "Learning an Agent's Utility Function by Observing Behavior." Proceedings of the Eighteenth International Conference on Machine Learning (ICML).


Bibtex citation
@inproceedings{Chajewska+al:ICML01,
title = {Learning an Agent's Utility Function by Observing Behavior},
author = {U. Chajewska and D. Koller and D. Ormoneit},
booktitle = {Proceedings of the Eighteenth International Conference on Machine Learning (ICML)},
address = {Williams College, Pennsylvania},
month = {June},
year = {2001},
}
full list
