|
ABSTRACT
We present a method for transforming the infinite interactive state space of interactive POMDPs (I-POMDPs) into a finite one, thereby enabling the computation of exact solutions. I-POMDPs allow sequential decision making in multi-agent environments by modeling other agents' beliefs, capabilities, and preferences as part of the interactive state space. Since beliefs are allowed to be arbitrarily nested and are continuous, it is not possible to compute optimal solutions using value iteration as in POMDPs. We present a method that transforms the original state space into a finite one by grouping the other agents' behaviorally equivalent models into equivalence classes. This enables us to compute the complete optimal solution for the I-POMDP, which may be represented as a policy graph. We illustrate our method using the multi-agent Tiger problem and discuss features of the solution.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
D. S. Bernstein, E. A. Hansen, and S. Zilberstein. Bounded policy iteration for decentralized pomdps. In Proceedings of the The Nineteenth International Joint Conference on Artificial Intelligence (IJCAI-05), 2005.
|
| |
4
|
A. R. Cassandra, M. L. Littman, and N. L. Zhang. Incremental pruning: A simple, fast, exact method for partially observable markov decision processes. In Uncertainty in Artificial Intelligence, Rhode Island, Providence, 1997.
|
 |
5
|
|
| |
6
|
P. Doshi and P. Gmytrasiewicz. A particle filtering based approach to approximating interactive pomdps. In Proceedings of the The Twentieth National Conference on Artificial Intelligence (AAAI-05), 2005.
|
| |
7
|
|
| |
8
|
P. J. Gmytrasiewicz and P. Doshi. A framework for sequential planning in multi-agent settings. In Journal of AI Research, 2005.
|
| |
9
|
E. A. Hansen, D. S. Bernstein, and S. Zilberstein. Dynamic programming for partially observable stochastic games. In Proceedings of the 19th National Conference on Artificial Intelligence (AAAI-04), 2004.
|
| |
10
|
J. C. Harsanyi. Games with incomplete information played by 'bayesian' players. In Management Science, pages 14(3):159--182, Nov 2005.
|
| |
11
|
|
| |
12
|
R. Nair, M. Tambe, M. Yokoo, D. Pynadath, and S. Marsella. Taming decentralized pomdps: Towards efficient policy computation for multiagent settings. In Proceedings of International Joint Conference in Artificial Intelligence (IJCAI), 2003.
|
| |
13
|
J. M. Porta, M. T. Spaan, and N. Vlassis. Value iteration for continuous-state pomdps. In IAS Technical report IAS-UVA-04-04, 2004.
|
| |
14
|
P. Poupart and C. Boutilier. Value directed compression of pomdps. In Seventeenth Annual Conference on Neural Information Processing Systems, 2003.
|
 |
15
|
|
| |
16
|
R. Smallwood and E. Sondik. The optimal control of partially observable markov decision processes over a finite horizon. Operations Research, 21:1071--1088, 1973.
|
| |
17
|
S. Thrun. Monte carlo POMDPs. In S. Solla, T. Leen, and K.-R. Müller, editors, Advances in Neural Information Processing Systems 12, pages 1064--1070. MIT Press, 2000.
|
|