ACM Home Page
Please provide us with feedback. Feedback
Spam filter evaluation with imprecise ground truth
Full text PdfPdf (371 KB)
Source
Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval table of contents
Boston, MA, USA
SESSION: Spamming table of contents
Pages 604-611  
Year of Publication: 2009
ISBN:978-1-60558-483-6
Authors
Gordon V. Cormack  University of Waterloo, Waterloo, ON, Canada
Aleksander Kolcz  Microsoft Live Labs, Redmond, WA, USA
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 41,   Downloads (12 Months): 131,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1571941.1572045
What is a DOI?

ABSTRACT

When trained and evaluated on accurately labeled datasets, online email spam filters are remarkably effective, achieving error rates an order of magnitude better than classifiers in similar applications. But labels acquired from user feedback or third-party adjudication exhibit higher error rates than the best filters -- even filters trained using the same source of labels. It is appropriate to use naturally occuring labels -- including errors -- as training data in evaluating spam filters. Erroneous labels are problematic, however, when used as ground truth to measure filter effectiveness. Any measurement of the filter's error rate will be augmented and perhaps masked by the label error rate. Using two natural sources of labels, we demonstrate automatic and semi-automatic methods that reduce the influence of labeling errors on evaluation, yielding substantially more precise measurements of true filter error rates.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
The CEAS 2008 live spam challenge. http://www.ceas.cc/2008/challenge/challenge.html, 2007.
 
2
Assis, F. OSBF-Lua. http://osbf-lua.luaforge.net/.
 
3
 
4
Brodley, C.E., and Friedl, M.A. Identifying mislabeled training data. JAIR 11 (1999), 131--167.
 
5
Buckley, C., and Voorhees, E.M. Retrieval system evaluation. In TREC -- Experiment and Evaluation in Information Retrieval, E.M. Voorhees and D.K. Harman, Eds. MIT Press, Boston, 2005.
 
6
Cormack, G.V. TREC 2006 Spam Track Overview. In Fifteenth Text REtrieval Conference (TREC-2006) (Gaithersburg, MD, 2006), NIST.
 
7
Cormack, G.V. TREC 2007 Spam Track Overview. In Sixteenth Text REtrieval Conference (TREC-2007) (Gaithersburg, MD, 2007), NIST.
 
8
Cormack, G.V. University of waterloo participation in the trec 2007 spam track. In Sixteenth Text REtrieval Conference (TREC-2007) (Gaithersburg, MD, 2007), NIST.
 
9
Cormack, G.V., and Bratko, A. Batch and on-line spam filter evaluation. In CEAS 2006: The Third Conference on Email and Anti-Spam (Mountain View, CA, 2006).
 
10
Cormack, G.V., and Lynam, T.R. Spam corpus creation for trec. In CEAS (2005).
 
11
Cormack, G.V., and Lynam, T.R. TREC 2005 Spam Track overview. http://plg.uwaterloo.ca/~gvcormac/trecspamtrack05, 2005.
12
 
13
Glas, A.S., Lijmer, J.G., Prins, M.H., Bonsel, G.J., and Bossuyt, P.M.M. The diagnostic odds ratio: a single indicator of test performance. Journal of Clinical Epidemiology 56, 11 (2003), 1129--1135.
 
14
Goodman, J., and tau Yih, W. Online discriminative spam filter training. In The Third Conference on Email and Anti-Spam (Mountain View, CA, 2006).
 
15
Graham, P. A plan for spam. http://www.paulgraham.com/spam.html, 2002.
 
16
Graham-Cumming, J. SpamOrHam. Virus Bulletin (2006-06-01).
 
17
Kocz, A., and Alspector, J. SVM-based filtering of E-mail spam with content-specific misclassification costs. TextDM 2001 (IEEE ICDM-2001 Workshop on Text Mining) (2001).
 
18
Lam, C.P., and Stork, D.G. Evaluating classifiers by means of test data with noisy labels. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI-03) (Acapulco, Mexico, 2003).
 
19
Lynam, T., and Cormack, G. Trec spam filter evaluation took kit. http://plg.uwaterloo.ca/~trlynam/spamjig.
20
 
21
Mojdeh, M., and Cormack, G.V. A mail client plugin for privacy-preserving spam filter evaluation. In Proceedings of the 5th Conference on Email and Anti-Spam (CEAS 2008) (2008).
 
22
Raymond, E.S., Relson, D., Andree, M., and Louis, G. Bogofilter. http://bogofilter.sourceforge.net/, 2004.
 
23
 
24
 
25
Sculley, D., and Cormack, G.V. Filtering spam in the presence of noisy user feedback. In Proceedings of the 5th Conference on Email and Anti-Spam (CEAS 2008) (2008).
26
 
27
Sculley, D., and Wachman, G.M. Relaxed online SVMs in the TREC Spam Filtering Track. In Sixteenth Text REtrieval Conference (TREC-2007) (Gaithersburg, MD, 2007), NIST.
28
 
29
Swets, J.A. Effectiveness of information retrieval systems. American Documentation 20 (1969), 72--89.
 
30
tau Yih, W., McCann, R., and Kołcz, A. Improving spam filtering by detecting gray mail. In Proc. CEAS 2007 -- Fourth Conference on Email and Anti-Spam (Mountain View, CA, 2007).

Collaborative Colleagues:
Gordon V. Cormack: colleagues
Aleksander Kolcz: colleagues