| Discounted deterministic Markov decision processes and discounted all-pairs shortest paths |
| Full text |
Pdf
(450 KB)
|
| Source
|
Symposium on Discrete Algorithms
archive
Proceedings of the Nineteenth Annual ACM -SIAM Symposium on Discrete Algorithms
table of contents
New York, New York
Pages 958-967
Year of Publication: 2009
|
|
Authors
|
|
| Publisher |
Society for Industrial and Applied Mathematics
Philadelphia, PA, USA
|
| Bibliometrics |
Downloads (6 Weeks): 6, Downloads (12 Months): 57, Citation Count: 0
|
|
|
ABSTRACT
We present two new algorithms for finding optimal strategies for discounted, infinite-horizon, Deterministic Markov Decision Processes (DMDP). The first one is an adaptation of an algorithm of Young, Tarjan and Orlin for finding minimum mean weight cycles. It runs in O(mn + n2 log n) time, where n is the number of vertices (or states) and m is the number of edges (or actions). The second one is an adaptation of a classical algorithm of Karp for finding minimum mean weight cycles. It runs in O(mn) time. The first algorithm has a slightly slower worst-case complexity, but is faster than the first algorithm in many situations. Both algorithms improve on a recent O(mn2)-time algorithm of Andersson and Vorobyov. We also present a randomized Õ(m1/2n2)-time algorithm for finding Discounted All-Pairs Shortest Paths (DAPSP), improving several previous algorithms.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
D. Andersson and S. Vorobyov. Fast algorithms for monotonic discounted linear programs with two variables per inequality. TR NI06019-LAA, Isaac Newton Institute, Cambridge, United Kingdom, 2006.
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
F. d'Epenoux. A probabilistic production and inventory problem. Manag. Science, 10(1):98--108, 1963.
|
| |
9
|
|
| |
10
|
A. Ehrenfeucht and J. Mycielski. Positional strategies for mean payoff games. International Journal of Game Theory, 8:109--113, 1979.
|
 |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
R. A. Howard. Dynamic programming and Markov processes. MIT Press, 1960.
|
| |
16
|
|
| |
17
|
R. M. Karp. A characterization of the minimum cycle mean in a digraph. Discrete Mathematics, 23(3):309--311, 1978.
|
| |
18
|
L. G. Khachiyan. A polynomial time algorithm in linear programming. Soviet Math. Dokl., 20:191--194, 1979.
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
|
| |
23
|
O. Madani. Polynomial value iteration algorithms for detrerminstic MDPs. In Proc. of the 18th UAI, pages 311--318, 2002.
|
| |
24
|
Y. Mansour and S. P. Singh. On the complexity of policy iteration. In Proc. of the 15th UAI, pages 401--408, 1999.
|
| |
25
|
M. Melekopoglou and A. Condon. On the complexity of the policy improvement algorithm for markov decision processes. ORSA Journal on Computing, 6(2):188--192, 1994.
|
| |
26
|
|
| |
27
|
M. L. Puterman. Markov decision processes. Wiley, 1994.
|
| |
28
|
L. S. Shapley. Stochastic games. Proc. Nat. Acad. Sci. U.S.A., 39:1095--1100, 1953.
|
| |
29
|
|
| |
30
|
|
| |
31
|
N. E. Young, R. E. Tarjan, and J. B. Orlin. Faster parametric shortest path and minimum-balance algorithms. Networks, 21:205--221, 1991.
|
 |
32
|
|
| |
33
|
|
|