| Privacy-preserving reinforcement learning |
| Full text |
Pdf
(334 KB)
|
| Source
|
ICML; Vol. 307
archive
Proceedings of the 25th international conference on Machine learning
table of contents
Helsinki, Finland
Pages 864-871
Year of Publication: 2008
ISBN:978-1-60558-205-4
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 4, Downloads (12 Months): 50, Citation Count: 0
|
|
|
ABSTRACT
We consider the problem of distributed reinforcement learning (DRL) from private perceptions. In our setting, agents' perceptions, such as states, rewards, and actions, are not only distributed but also should be kept private. Conventional DRL algorithms can handle multiple agents, but do not necessarily guarantee privacy preservation and may not guarantee optimality. In this work, we design cryptographic solutions that achieve optimal policies without requiring the agents to share their private information.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Naoki Abe , Naval Verma , Chid Apte , Robert Schroko, Cross channel optimized marketing by reinforcement learning, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
[doi> 10.1145/1014052.1016912]
|
| |
2
|
Cogill, R., Rotkowitz, M., Van Roy, B., & Lall, S. (2006). An Approximate Dynamic Programming Approach to Decentralized Control of Stochastic Systems. LNCIS, 329, 243--256.
|
| |
3
|
Dåmgard, I., & Jurik, M. (2001). A Generalisation, a Simplification and Some Applications of Paillier's Probabilistic Public-Key System. Public Key Cryptography 2001. Springer.
|
| |
4
|
|
| |
5
|
|
 |
6
|
|
| |
7
|
Kearns, M., Tan, J., & Wortman, J. (2007). Privacy-Preserving Belief Propagation and Sampling. NIPS 20.
|
| |
8
|
Lindell, Y., & Pinkas, B. (2002). Privacy Preserving Data Mining. Journal of Cryptology, 15, 177--206.
|
| |
9
|
Dahlia Malkhi , Noam Nisan , Benny Pinkas , Yaron Sella, Fairplay—a secure two-party computation system, Proceedings of the 13th conference on USENIX Security Symposium, p.20-20, August 09-13, 2004, San Diego, CA
|
| |
10
|
Moallemi, C. C., & Roy, B. V. (2004). Distributed optimization in adaptive networks. NIPS 16.
|
| |
11
|
Sakuma, J., & Kobayashi, S. (2008). Large-scale kmeans Clustering with User-Centric Privacy Preservation. Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD) 2008, to appear.
|
| |
12
|
|
| |
13
|
|
| |
14
|
Watkins, C. (1989). Learning from Delayed Rewards. Cambridge University.
|
| |
15
|
|
 |
16
|
|
 |
17
|
|
|