|
ABSTRACT
Spam is a key problem in electronic communication, including large-scale email systems and the growing number of blogs. Content-based filtering is one reliable method of combating this threat in its various forms, but some academic researchers and industrial practitioners disagree on how best to filter spam. The former have advocated the use of Support Vector Machines (SVMs) for content-based filtering, as this machine learning methodology gives state-of-the-art performance for text classification. However, similar performance gains have yet to be demonstrated for online spam filtering. Additionally, practitioners cite the high cost of SVMs as reason to prefer faster (if less statistically robust) Bayesian methods. In this paper, we offer a resolution to this controversy. First, we show that online SVMs indeed give state-of-the-art classification performance on online spam filtering on large benchmark data sets. Second, we show that nearly equivalent performance may be achieved by a Relaxed Online SVM (ROSVM) at greatly reduced computational cost. Our results are experimentally verified on email spam, blog spam, and splog detection tasks.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Bratko and B. Filipic. Spam filtering using compression models. Technical Report IJS-DP-9227, Department of Intelligent Systems, Jozef Stefan Institute, L jubljana, Slovenia, 2005.
|
| |
2
|
G. Cauwenberghs and T. Poggio. Incremental and decremental support vector machine learning. In NIPS, pages 409--415, 2000.
|
| |
3
|
G. V. Cormack. TREC 2006 spam track overview. In To appear in: The Fifteenth Text REtrieval Conference (TREC 2006) Proceedings, 2006.
|
| |
4
|
G. V. Cormack and A. Bratko. Batch and on-line spam filter comparison. In Proceedings of the Third Conference on Email and Anti-Spam (CEAS), 2006.
|
| |
5
|
G. V. Cormack and T. R. Lynam. TREC 2005 spam track overview. In The Fourteenth Text REtrieval Conference (TREC 2005) Proceedings, 2005.
|
| |
6
|
G. V. Cormack and T. R. Lynam. On-line supervised spam filter evaluation. Technical report, David R. Cheriton School of Computer Science, University of Waterloo, Canada, February 2006.
|
| |
7
|
|
 |
8
|
|
| |
9
|
H. Drucker, V. Vapnik, and D. Wu. Support vector machines for spam categorization. IEEE Transactions on Neural Networks, 10(5):1048--1054, 1999.
|
| |
10
|
J. Goodman and W. Yin. Online discriminative spam filter training. In Proceedings of the Third Conference on Email and Anti-Spam (CEAS), 2006.
|
| |
11
|
P. Graham. A plan for spam. 2002.
|
| |
12
|
P. Graham. Better bayesian filtering. 2003.
|
| |
13
|
|
| |
14
|
|
 |
15
|
|
| |
16
|
J. Kivinen, A. Smola, and R. Williamson. Online learning with kernels. In Advances in Neural Information Processing Systems 14, pages 785--793. MIT Press, 2002.
|
| |
17
|
P. Kolari, T. Finin, and A. Joshi. SVMs for the blogosphere: Blog identification and splog detection. AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs, 2006.
|
| |
18
|
W. Krauth and M. Mézard. Learning algorithms with optimal stability in neural networks. Journal of Physics A, 20(11):745--752, 1987.
|
 |
19
|
|
| |
20
|
V. Metsis, I. Androutsopoulos, and G. Paliouras. Spam filtering with naive bayes - which naive bayes? Third Conference on Email and Anti-Spam (CEAS), 2006.
|
| |
21
|
G. Mishne, D. Carmel, and R. Lempel. Blocking blog spam with language model disagreement. Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), May 2005.
|
| |
22
|
J. Platt. Sequenital minimal optimization: A fast algorithm for training support vector machines. In B. Scholkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods - Support Vector Learning. MIT Press, 1998.
|
| |
23
|
|
| |
24
|
G. L. Wittel and S. F. Wu. On attacking statistical spam filters. CEAS: First Conference on Email and Anti-Spam, 2004.
|
CITED BY 15
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Shan-Hung Wu , Keng-Pei Lin , Chung-Min Chen , Ming-Syan Chen, Asymmetric support vector machines: low false-positive learning under the user tolerance, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|