ACM Home Page
Please provide us with feedback. Feedback
Bayesian clustering for email campaign detection
Full text PdfPdf (619 KB)
Source ACM International Conference Proceeding Series; Vol. 382 archive
Proceedings of the 26th Annual International Conference on Machine Learning table of contents
Montreal, Quebec, Canada
Pages 385-392  
Year of Publication: 2009
ISBN:978-1-60558-516-1
Authors
Peter Haider  University of Potsdam, Potsdam, Germany
Tobias Scheffer  University of Potsdam, Potsdam, Germany
Sponsors
: MITACS
: NSF
Microsoft Research : Microsoft Research
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 30,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1553374.1553424
What is a DOI?

ABSTRACT

We discuss the problem of clustering elements according to the sources that have generated them. For elements that are characterized by independent binary attributes, a closed-form Bayesian solution exists. We derive a solution for the case of dependent attributes that is based on a transformation of the instances into a space of independent feature functions. We derive an optimization problem that produces a mapping into a space of independent binary feature vectors; the features can reflect arbitrary dependencies in the input space. This problem setting is motivated by the application of spam filtering for email service providers. Spam traps deliver a real-time stream of messages known to be spam. If elements of the same campaign can be recognized reliably, entire spam and phishing campaigns can be contained. We present a case study that evaluates Bayesian clustering for this application.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
Lau, J., & Green, P. (2007). Bayesian Model-Based Clustering Procedures. Journal of Computational and Graphical Statistics, 16, 526--558.
 
4
Teo, C., Globerson, A., Roweis, S., & Smola, A. (2008). Convex Learning with Invariances. Advances in Neural Information Processing Systems, 20, 1489--1496.
 
5
 
6
Williams, C. (2000). A MCMC approach to hierarchical mixture modelling. Advances in Neural Information Processing Systems, 12, 680--686.
 
7

Collaborative Colleagues:
Peter Haider: colleagues
Tobias Scheffer: colleagues