|
ABSTRACT
Essentially all data mining algorithms assume that the data-generating process is independent of the data miner's activities. However, in many domains, including spam detection, intrusion detection, fraud detection, surveillance and counter-terrorism, this is far from the case: the data is actively manipulated by an adversary seeking to make the classifier produce false negatives. In these domains, the performance of a classifier can degrade rapidly after it is deployed, as the adversary learns to defeat it. Currently the only solution to this is repeated, manual, ad hoc reconstruction of the classifier. In this paper we develop a formal framework and algorithms for this problem. We view classification as a game between the classifier and the adversary, and produce a classifier that is optimal given the adversary's optimal strategy. Experiments in a spam detection domain show that this approach can greatly outperform a classifier learned in the standard way, and (within the parameters of the problem) automatically adapt the classifier to the adversary's evolving manipulations.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
 |
3
|
Robert B. Doorenbos , Oren Etzioni , Daniel S. Weld, A scalable comparison-shopping agent for the World-Wide Web, Proceedings of the first international conference on Autonomous agents, p.39-48, February 05-08, 1997, Marina del Rey, California, United States
[doi> 10.1145/267658.267666]
|
 |
4
|
|
| |
5
|
|
| |
6
|
D. Fudenberg and D. Levine. The Theory of Learning in Games. MIT Press, Cambridge, MA, 1999.
|
| |
7
|
D. Fudenberg and J. Tirole. Game Theory. MIT Press, Cambridge, MA, 1991.
|
| |
8
|
|
| |
9
|
L. Guernsey. Retailers rise in Google rankings as rivals cry foul. New York Times, November 20, 2003.
|
 |
10
|
|
 |
11
|
|
| |
12
|
M. Kearns. Computational game theory. Tutorial, Department of Computer and Information Sciences, University of Pennsylvania, Philadelphia, PA, 2002. http://www.cis.upenn.edu/~mkearns/nips02tutorial/.
|
| |
13
|
|
| |
14
|
B. Krebs. Online piracy spurs high-tech arms race. Washington Post, June 26, 2003.
|
| |
15
|
M. L. Littman. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the Eleventh International Conference on Machine Learning, pages 157--163, New Brunswick, NJ, 1994. Morgan Kaufmann.
|
| |
16
|
B. Lloyd. Been gazumped by Google? Trying to make sense of the "Florida" update. Search Engine Guide, November 25, 2003.
|
 |
17
|
|
| |
18
|
A. McCallum and K. Nigam. A comparison of event models for Naive Bayes text classification. In Proceedings of the AAAI-98 Workshop on Learning for Text Categorization, Madison, WI, 1998. AAAI Press.
|
| |
19
|
F. Nielsen. Email data, 2003. http://www.imm.dtu.dk/~rem/datasets/.
|
| |
20
|
|
| |
21
|
J. Rennie. Ifile spam classifier, 2003. http://www.nongnu.org/ifile/.
|
| |
22
|
|
| |
23
|
M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz. A Bayesian approach to filtering junk e-mail. In Proceedings of the AAAI-98 Workshop on Learning for Text Categorization, Madison, WI, 1998. AAAI Press.
|
| |
24
|
Georgios Sakkis , Ion Androutsopoulos , Georgios Paliouras , Vangelis Karkaletsis , Constantine D. Spyropoulos , Panagiotis Stamatopoulos, A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists, Information Retrieval, v.6 n.1, p.49-73, January 2003
[doi> 10.1023/A:1022948414856]
|
 |
25
|
|
| |
26
|
J. M. Smith. Evolution and the Theory of Games. Cambridge University Press, Cambridge, UK, 1982.
|
| |
27
|
P. Turney. Cost-sensitive learning bibliography. Online bibliography, NRC Institute for Information Technology, Ottawa, Canada, 1998. http://members.rogers.com/peter.turney/bibliographies/cost-sensitive.html.
|
| |
28
|
WordNet 2.0: A lexical database for the English language, 2003. http://www.cogsci.princeton.edu/~wn/.
|
CITED BY 20
|
|
|
|
|
|
|
|
|
|
|
Marco Barreno , Blaine Nelson , Russell Sears , Anthony D. Joseph , J. D. Tygar, Can machine learning be secure?, Proceedings of the 2006 ACM Symposium on Information, computer and communications security, March 21-24, 2006, Taipei, Taiwan
|
|
|
Simone Stumpf , Vidya Rajaram , Lida Li , Margaret Burnett , Thomas Dietterich , Erin Sullivan , Russell Drummond , Jonathan Herlocker, Toward harnessing user feedback for machine learning, Proceedings of the 12th international conference on Intelligent user interfaces, January 28-31, 2007, Honolulu, Hawaii, USA
|
|
|
|
|
|
Blaine Nelson , Marco Barreno , Fuching Jack Chi , Anthony D. Joseph , Benjamin I. P. Rubinstein , Udam Saini , Charles Sutton , J. D. Tygar , Kai Xia, Exploiting machine learning to subvert your spam filter, Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats, p.1-9, April 15-15, 2008, San Francisco, California
|
|
|
Marco Barreno , Peter L. Bartlett , Fuching Jack Chi , Anthony D. Joseph , Blaine Nelson , Benjamin I.P. Rubinstein , Udam Saini , J. D. Tygar, Open problems in the security of learning, Proceedings of the 1st ACM workshop on Workshop on AISec, October 27-27, 2008, Alexandria, Virginia, USA
|
|
|
|
|
|
|
|
|
Clifton Phua , Vincent Lee , Kate Smith-Miles , Ross Gayler, Adaptive communal detection in search of adversarial identity crime, Proceedings of the 2007 international workshop on Domain driven data mining, p.1-10, August 12-12, 2007, San Jose, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Simone Stumpf , Vidya Rajaram , Lida Li , Weng-Keen Wong , Margaret Burnett , Thomas Dietterich , Erin Sullivan , Jonathan Herlocker, Interacting meaningfully with machine learning systems: Three experiments, International Journal of Human-Computer Studies, v.67 n.8, p.639-662, August, 2009
|
|
|
Pranam Kolari , Akshay Java , Tim Finin , Tim Oates , Anupam Joshi, Detecting spam blogs: a machine learning approach, proceedings of the 21st national conference on Artificial intelligence, p.1351-1356, July 16-20, 2006, Boston, Massachusetts
|
|
|
|
|
|
Benjamin Liebald , Dan Roth , Neelay Shah , Vivek Srikumar, Proactive intrusion detection, Proceedings of the 23rd national conference on Artificial intelligence, p.772-777, July 13-17, 2008, Chicago, Illinois
|
|
|
|
|