| Bayesian clustering for email campaign detection |
| Full text |
Pdf
(619 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 382
archive
Proceedings of the 26th Annual International Conference on Machine Learning
table of contents
Montreal, Quebec, Canada
Pages 385-392
Year of Publication: 2009
ISBN:978-1-60558-516-1
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 13, Downloads (12 Months): 30, Citation Count: 0
|
|
|
ABSTRACT
We discuss the problem of clustering elements according to the sources that have generated them. For elements that are characterized by independent binary attributes, a closed-form Bayesian solution exists. We derive a solution for the case of dependent attributes that is based on a transformation of the instances into a space of independent feature functions. We derive an optimization problem that produces a mapping into a space of independent binary feature vectors; the features can reflect arbitrary dependencies in the input space. This problem setting is motivated by the application of spam filtering for email service providers. Spam traps deliver a real-time stream of messages known to be spam. If elements of the same campaign can be recognized reliably, entire spam and phishing campaigns can be contained. We present a case study that evaluates Bayesian clustering for this application.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
Lau, J., & Green, P. (2007). Bayesian Model-Based Clustering Procedures. Journal of Computational and Graphical Statistics, 16, 526--558.
|
| |
4
|
Teo, C., Globerson, A., Roweis, S., & Smola, A. (2008). Convex Learning with Invariances. Advances in Neural Information Processing Systems, 20, 1489--1496.
|
| |
5
|
|
| |
6
|
Williams, C. (2000). A MCMC approach to hierarchical mixture modelling. Advances in Neural Information Processing Systems, 12, 680--686.
|
| |
7
|
|
|