ACM Home Page
Please provide us with feedback. Feedback
Policy teaching through reward function learning
Full text PdfPdf (434 KB)
Source
Electronic Commerce archive
Proceedings of the tenth ACM conference on Electronic commerce table of contents
Stanford, California, USA
SESSION: Session 9 table of contents
Pages 295-304  
Year of Publication: 2009
ISBN:978-1-60558-458-4
Authors
Haoqi Zhang  Harvard University, Cambridge, MA, USA
David C. Parkes  Harvard University, Cambridge, MA, USA
Yiling Chen  Harvard University, Cambridge, MA, USA
Sponsors
ACM: Association for Computing Machinery
SIGEcom: ACM Special Interest Group on Electronic Commerce
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 17,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1566374.1566417
What is a DOI?

ABSTRACT

Policy teaching considers a Markov Decision Process setting in which an interested party aims to influence an agent's decisions by providing limited incentives. In this paper, we consider the specific objective of inducing a pre-specified desired policy. We examine both the case in which the agent's reward function is known and unknown to the interested party, presenting a linear program for the former case and formulating an active, indirect elicitation method for the latter. We provide conditions for logarithmic convergence, and present a polynomial time algorithm that ensures logarithmic convergence with arbitrarily high probability. We also offer practical elicitation heuristics that can be formulated as linear programs, and demonstrate their effectiveness on a policy teaching problem in a simulated ad-network setting. We extend our methods to handle partial observations and partial target policies, and provide a game-theoretic interpretation of our methods for handling strategic agents.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
M. Babaioff, M. Feldman, and N. Nisan. Mixed strategies in combinatorial agency. In Proc. 2nd Int. Workshop on Internet and Network Economics (WINE'06), 2006.
 
3
D. Bergemann and J. Valimaki. Learning and strategic pricing. Econometrica, 64(5):1125--49, September 1996.
4
 
5
P. Bolton and M. Dewatripont. Contract Theory. MIT Press, 2005.
 
6
 
7
J. Chuang, M. Feldman, and M. Babaioff. Incentives in peer-to-peer systems. In N. Nisan, T. Roughgarden, E. Tardos, and V. Vazirani, editors, Algorithmic Game Theory. Cambridge University Press, 2007.
8
 
9
B. Grunbaum. Partitions of mass-distributions and of convex bodies by hyperplanes. Pacific Journal of Mathematics, 10(4):1257--1261, 1960.
 
10
N. Immorlica, K. Jain, and M. Mahdian. Game-theoretic aspects of designing hyperlink structures. In Proc. 2nd Int. Workshop on Internet and Network Economics (WINE'06), pages 150--161, 2006.
 
11
M.O. Jackson. Mechanism theory. In U. Derigs, editor, The Encyclopedia of Life Support Systems. EOLSS Publishers, 2003.
 
12
G. Keller and S. Rady. Optimal experimentation in a changing environment. Review of Economic Studies, 66(3):475--507, July 1999.
 
13
J.-J. Laffont and D. Martimort. The Theory of Incentives: The Principal-Agent Model. Princeton University Press, 2001.
14
 
15
 
16
17
 
18
R. Vanderbei. Linear programming: foundations and extensions. Springer, 3rd edition, 2008.
 
19
H. Varian. Revealed preference. In M. Szenberg, editor, Samuelsonian Economics and the 21st Century. Oxford University Press, 2003.
 
20
H. Zhang, Y. Chen, and D. Parkes. A general approach to environment design with one agent. In Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (IJCAI-09), 2009.
 
21
H. Zhang and D. Parkes. Value-based policy teaching with active indirect elicitation. In Proceedings of the Twenty-Third National Conference on Artificial Intelligence (AAAI-2008), 2008.
 
22

Collaborative Colleagues:
Haoqi Zhang: colleagues
David C. Parkes: colleagues
Yiling Chen: colleagues