Stationary Deterministic Policies for Constrained MDPs with Multiple Rewards, Costs, and Discount Factors

Dmitri A. Dolgov, and Edmund H. Durfee

In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence (IJCAI-05). Pages 1326--1332. August 2005.

Copyright © 2005 IJCAI. Online proceedings are available at http://www.ijcai.org/search.php.

Abstract
We consider the problem of policy optimization for a resource-limited agent with multiple time-dependent objectives, represented as an MDP with multiple discount factors in the objective function and constraints. We show that limiting search to stationary deterministic policies, coupled with a novel problem reduction to mixed integer programming, yields an algorithm for finding such policies that is computationally feasible, where no such algorithm has heretofore been identified. In the simpler case where the constrained MDP has a single discount factor, our technique provides a new way for finding an optimal deterministic policy, where previous methods could only find randomized policies. We analyze the properties of our approach and describe implementation results.

BibTex
@inproceedings{ dolgov05stationary,
   paperID   = "IJCAI-05",
   month     = "August",
   author    = "Dmitri A. Dolgov and Edmund H. Durfee",
   booktitle = "Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence (IJCAI-05)",
   address   = "Edinburgh, Scotland",
   title     = "Stationary Deterministic Policies for Constrained MDPs with Multiple Rewards, Costs, and Discount Factors",
   pages     = "1326--1332",
   year      = "2005"
}

Download:

[pdf]

[ps.gz]