|
ABSTRACT
One-switch utility functions are an important class of nonlinear utility functions that can model human beings whose decisions change with their wealth level. We study how to maximize the expected utility for Markov decision problems with given one-switch utility functions. We first utilize the fact that one-switch utility functions are weighted sums of linear and exponential utility functions to prove that there exists an optimal policy that is both stationary and deterministic as the wealth level approaches negative infinity. We then develop a solution method, the backward-induction method, that starts with this policy and augments it for higher and higher wealth levels. Our backward-induction method determines maximal expected utilities in finite time, different from the previous functional value iteration method, that typically determines only approximately maximal expected utilities.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
J. L. Corner and P. D. Corner. Characteristics of decisions in decision analysis practice. The Journal of Operational Research Society, 46:304--314, 1995.
|
| |
5
|
|
| |
6
|
S. Koenig and R. G. Simmons. Risk-sensitive planning with probabilistic decision graphs. In Proceedings of the Fourth International Conference on Principles of Knowledge Representation and Reasoning (KR-94), pages 2301--2308, 1994.
|
| |
7
|
|
| |
8
|
Y. Liu and S. Koenig. Existence and finiteness conditions for risk-sensitive planning: Results and conjectures. In Proceedings of the Twentieth Annual Conference on Uncertainty in Artificial Intelligence (UAI-05), 2005.
|
| |
9
|
Y. Liu and S. Koenig. Risk-sensitive planning with one-switch utility functions: Value iteration. In Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI-05), pages 993--999, 2005.
|
| |
10
|
C. McMillen and M. Veloso. Thresholded rewards: Acting optimally in timed, zero-sum games. In Proceedings of The Twenty-Second Conference on Artificial Intelligence (AAAI-07), 2007.
|
| |
11
|
Y. Nakamura. Sumex utility functions. Mathematical Social Sciences, 31:39--47, 1996.
|
| |
12
|
E. Nikolova, M. Brand, and D. R. Karger. Optimal route planning under uncertainty. In Proceedings of the Sixteenth International Conference on Automated Planning and Scheduling (ICAPS-06), 2006.
|
| |
13
|
S. D. Patek. On terminating Markov decision processes with a risk averse objective function. Automatica, 37(9):1379--1386, 2001.
|
| |
14
|
P. Perny, O. Spanjaard, and L.-X. Storme. State space search for risk-averse agents. In Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI-07), pages 2353--2358, 2007.
|
| |
15
|
|
| |
16
|
J. von Neumann and O. Morgenstern. Theory of Games and Economic Behavior. Princeton University Press, 1st edition, 1944.
|
|