ACM Home Page
Please provide us with feedback. Feedback
Discounted deterministic Markov decision processes and discounted all-pairs shortest paths
Full text PdfPdf (450 KB)
Source Symposium on Discrete Algorithms archive
Proceedings of the Nineteenth Annual ACM -SIAM Symposium on Discrete Algorithms table of contents
New York, New York
Pages 958-967  
Year of Publication: 2009
Authors
Omid Madani  AI Center, Menlo Park, CA
Mikkel Thorup  AT&T Labs - Research, Florham Park, NJ
Uri Zwick  Tel Aviv University, Tel Aviv, Israel
Publisher
Society for Industrial and Applied Mathematics  Philadelphia, PA, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 57,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

We present two new algorithms for finding optimal strategies for discounted, infinite-horizon, Deterministic Markov Decision Processes (DMDP). The first one is an adaptation of an algorithm of Young, Tarjan and Orlin for finding minimum mean weight cycles. It runs in O(mn + n2 log n) time, where n is the number of vertices (or states) and m is the number of edges (or actions). The second one is an adaptation of a classical algorithm of Karp for finding minimum mean weight cycles. It runs in O(mn) time. The first algorithm has a slightly slower worst-case complexity, but is faster than the first algorithm in many situations. Both algorithms improve on a recent O(mn2)-time algorithm of Andersson and Vorobyov. We also present a randomized Õ(m1/2n2)-time algorithm for finding Discounted All-Pairs Shortest Paths (DAPSP), improving several previous algorithms.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
D. Andersson and S. Vorobyov. Fast algorithms for monotonic discounted linear programs with two variables per inequality. TR NI06019-LAA, Isaac Newton Institute, Cambridge, United Kingdom, 2006.
 
2
 
3
 
4
 
5
 
6
 
7
 
8
F. d'Epenoux. A probabilistic production and inventory problem. Manag. Science, 10(1):98--108, 1963.
 
9
 
10
A. Ehrenfeucht and J. Mycielski. Positional strategies for mean payoff games. International Journal of Game Theory, 8:109--113, 1979.
11
 
12
 
13
 
14
 
15
R. A. Howard. Dynamic programming and Markov processes. MIT Press, 1960.
 
16
 
17
R. M. Karp. A characterization of the minimum cycle mean in a digraph. Discrete Mathematics, 23(3):309--311, 1978.
 
18
L. G. Khachiyan. A polynomial time algorithm in linear programming. Soviet Math. Dokl., 20:191--194, 1979.
 
19
 
20
 
21
 
22
 
23
O. Madani. Polynomial value iteration algorithms for detrerminstic MDPs. In Proc. of the 18th UAI, pages 311--318, 2002.
 
24
Y. Mansour and S. P. Singh. On the complexity of policy iteration. In Proc. of the 15th UAI, pages 401--408, 1999.
 
25
M. Melekopoglou and A. Condon. On the complexity of the policy improvement algorithm for markov decision processes. ORSA Journal on Computing, 6(2):188--192, 1994.
 
26
 
27
M. L. Puterman. Markov decision processes. Wiley, 1994.
 
28
L. S. Shapley. Stochastic games. Proc. Nat. Acad. Sci. U.S.A., 39:1095--1100, 1953.
 
29
 
30
 
31
N. E. Young, R. E. Tarjan, and J. B. Orlin. Faster parametric shortest path and minimum-balance algorithms. Networks, 21:205--221, 1991.
32
 
33

Collaborative Colleagues:
Omid Madani: colleagues
Mikkel Thorup: colleagues
Uri Zwick: colleagues