ACM Home Page
Please provide us with feedback. Feedback
Satisficing strategies for resource-limited policy search in dynamic environments
Full text PdfPdf (331 KB)
Source International Conference on Autonomous Agents archive
Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 3 table of contents
Bologna, Italy
SESSION: Session 11C: decision making table of contents
Pages: 1325 - 1332  
Year of Publication: 2002
ISBN:1-58113-480-0
Authors
Dmitri Dolgov  University of Michigan, Ann Arbor, MI
Edmund H. Durfee  University of Michigan, Ann Arbor, MI
Sponsors
ACM: Association for Computing Machinery
SIGART: ACM Special Interest Group on Artificial Intelligence
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 24,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/545056.545126
What is a DOI?

ABSTRACT

In this work, we examine the problem of searching for schedulable real-time control policies for resource-limited agents acting in dynamic environments. The dynamic properties of the environment and resource limitations of the agent render the problem of solving for an optimal policy infeasible. We therefore limit our search to a satisficing, rather than an optimal policy. We view the policy search as a search for a state reachability graph, with an action assigned to each of the states in the graph. In the search algorithm, we exploit properties of the reachability graphs to propagate failure conditions from inherent failure states to other states in the reachability graph, which allows us to exploit constraint satisfaction techniques to quickly remove some unacceptable policies from consideration. Our analysis and experiments show that, under certain conditions, such as when the "safe" states in the reachability graph are separated from the failure states by a relatively small set of states, we can use backtracking and memoization techniques that significantly improve the efficiency of the search algorithm.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Aström, K.J.: Optimal control of Markov decision processes with incomplete state estimation. J. Math. Anal. Appl., (1965)10, 174--205.
 
2
 
3
Boutilier C., Dean T., Hanks S.: Decision-Theoretic Planning: Structural Assumptions and Computational Leverage, Journal of Artificial Intelligence Research 1 (1999), 1--94.
 
4
Dechter R.: Learning while searching in constraint satisfaction problems, Proceedings AAAI'86 (1986).
 
5
Ginsberg M.: Dynamic backtracking, Journal of Artificial Intelligence Research 1 (1993), 25--46.
 
6
Li H., Atkins E., Durfee E., Shin K.: Practical State Probability Approximation for a Resource-Limited Real-Time Agent. Proceedings of the IJCAI-01 Workshop on Planning with Resources, (2001).
 
7
 
8
Musliner D., Durfee E., Shin K. CIRCA: A Cooperative Intelligent Real-Time Control Architecture, IEEE Transactions on Systems, Man, and Cybernetics (1993).
 
9
 
10
Richard T., Richard B.: Nogood learning for Constraint Satisfaction, IC Parc Imperial College London.
 
11
 
12
Schiex T., Verfaillie G.: Nogood Recording for Static and Dynamic Constraint Satisfaction Problems, Proceedings of the 5th IEEE International Conference on Tools with Artificial Intelligence (1993).
 
13
Smallwood R., Sondik E.J.: The optimal control of partially observable Markov processes over a finite horizon. (1973).
 
14
Zilberstein S, Russell, S.J. Anytime sensing, planning and action: A practical model for robot control. In Proc. 19th International Joint Conference on Artificial Intelligence, Chambery, France (1993).

Collaborative Colleagues:
Dmitri Dolgov: colleagues
Edmund H. Durfee: colleagues