| Policy teaching through reward function learning |
| Full text |
Pdf
(434 KB)
|
Source
|
Electronic Commerce
archive
Proceedings of the tenth ACM conference on Electronic commerce
table of contents
Stanford, California, USA
SESSION: Session 9
table of contents
Pages 295-304
Year of Publication: 2009
ISBN:978-1-60558-458-4
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 5, Downloads (12 Months): 17, Citation Count: 0
|
|
|
ABSTRACT
Policy teaching considers a Markov Decision Process setting in which an interested party aims to influence an agent's decisions by providing limited incentives. In this paper, we consider the specific objective of inducing a pre-specified desired policy. We examine both the case in which the agent's reward function is known and unknown to the interested party, presenting a linear program for the former case and formulating an active, indirect elicitation method for the latter. We provide conditions for logarithmic convergence, and present a polynomial time algorithm that ensures logarithmic convergence with arbitrarily high probability. We also offer practical elicitation heuristics that can be formulated as linear programs, and demonstrate their effectiveness on a policy teaching problem in a simulated ad-network setting. We extend our methods to handle partial observations and partial target policies, and provide a game-theoretic interpretation of our methods for handling strategic agents.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Moshe Babaioff , Michal Feldman , Noam Nisan, Combinatorial agency, Proceedings of the 7th ACM conference on Electronic commerce, p.18-28, June 11-15, 2006, Ann Arbor, Michigan, USA
[doi> 10.1145/1134707.1134710]
|
| |
2
|
M. Babaioff, M. Feldman, and N. Nisan. Mixed strategies in combinatorial agency. In Proc. 2nd Int. Workshop on Internet and Network Economics (WINE'06), 2006.
|
| |
3
|
D. Bergemann and J. Valimaki. Learning and strategic pricing. Econometrica, 64(5):1125--49, September 1996.
|
 |
4
|
|
| |
5
|
P. Bolton and M. Dewatripont. Contract Theory. MIT Press, 2005.
|
| |
6
|
|
| |
7
|
J. Chuang, M. Feldman, and M. Babaioff. Incentives in peer-to-peer systems. In N. Nisan, T. Roughgarden, E. Tardos, and V. Vazirani, editors, Algorithmic Game Theory. Cambridge University Press, 2007.
|
 |
8
|
Michal Feldman , John Chuang , Ion Stoica , Scott Shenker, Hidden-action in multi-hop routing, Proceedings of the 6th ACM conference on Electronic commerce, p.117-126, June 05-08, 2005, Vancouver, BC, Canada
[doi> 10.1145/1064009.1064022]
|
| |
9
|
B. Grunbaum. Partitions of mass-distributions and of convex bodies by hyperplanes. Pacific Journal of Mathematics, 10(4):1257--1261, 1960.
|
| |
10
|
N. Immorlica, K. Jain, and M. Mahdian. Game-theoretic aspects of designing hyperlink structures. In Proc. 2nd Int. Workshop on Internet and Network Economics (WINE'06), pages 150--161, 2006.
|
| |
11
|
M.O. Jackson. Mechanism theory. In U. Derigs, editor, The Encyclopedia of Life Support Systems. EOLSS Publishers, 2003.
|
| |
12
|
G. Keller and S. Rady. Optimal experimentation in a changing environment. Review of Economic Studies, 66(3):475--507, July 1999.
|
| |
13
|
J.-J. Laffont and D. Martimort. The Theory of Incentives: The Principal-Agent Model. Princeton University Press, 2001.
|
 |
14
|
|
| |
15
|
|
| |
16
|
|
 |
17
|
|
| |
18
|
R. Vanderbei. Linear programming: foundations and extensions. Springer, 3rd edition, 2008.
|
| |
19
|
H. Varian. Revealed preference. In M. Szenberg, editor, Samuelsonian Economics and the 21st Century. Oxford University Press, 2003.
|
| |
20
|
H. Zhang, Y. Chen, and D. Parkes. A general approach to environment design with one agent. In Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (IJCAI-09), 2009.
|
| |
21
|
H. Zhang and D. Parkes. Value-based policy teaching with active indirect elicitation. In Proceedings of the Twenty-Third National Conference on Artificial Intelligence (AAAI-2008), 2008.
|
| |
22
|
|
|