ACM Home Page
Please provide us with feedback. Feedback
Evaluating cost-sensitive Unsolicited Bulk Email categorization
Full text PdfPdf (566 KB)
Source Symposium on Applied Computing archive
Proceedings of the 2002 ACM symposium on Applied computing table of contents
Madrid, Spain
SESSION: Information access and retrieval table of contents
Pages: 615 - 620  
Year of Publication: 2002
ISBN:1-58113-445-2
Author
José María Gómez Hidalgo  Universidad Europea CEES, 28670 Villaviciosa de Odón, Madrid, Spain
Sponsor
SIGAPP: ACM Special Interest Group on Applied Computing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 37,   Citation Count: 12
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/508791.508911
What is a DOI?

ABSTRACT

In the recent years, Unsolicited Bulk Email has became an increasingly important problem, with a big economic impact. In this paper, we discuss cost-sensitive Text Categorization methods for UBE filtering. In concrete, we have evaluated a range of Machine Learning methods for the task (C4.5, Naive Bayes, PART, Support Vector Machines and Rocchio), made cost sensitive through several methods (Threshold Optimization, Instance Weighting, and Meta-Cost). We have used the Receiver Operating Characteristic Convex Hull method for the evaluation, that best suits classification problems in which target conditions are not known, as it is the case. Our results do not show a dominant algorithm nor method for making algorithms cost-sensitive, but are the best reported on the test collection used, and approach real-world hand-crafted classifiers accuracy.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
I. Androutsopoulos, J. Koutsias, K. Chandrinos, G. Paliouras, and C. Spyropoulos. An evaluation of naive bayesian anti-spam filtering. In Proc. of the Work. on Machine Learning in the New Information Age, ECML, 2000.
2
 
3
I. Androutsopoulos, G. Paliouras, V. Karkaletsis, G. Sakkis, C. Spyropoulos, and P. Stamatopoulos. Learning to filter spam e-mail: A comparison of a naive bayesian and a memory-based approach. In Proc. of the Work. on Machine Learning and Textual Inf. Access, PKDD, 2000.
 
4
X. Carreras and L. Márquez. Boosting trees for anti-spam email filtering. In Proc. of RANLP, 2001.
5
6
 
7
H. Drucker, V. Vapnik, and D. Wu. Automatic text categorization and its applications to text retrieval. IEEE Trans. on Neural Networks, 10(5), 1999.
 
8
eTesting Labs. Brightmail, inc. anti-spam service: Comparative performance test. Technical report, eTesting Labs Inc (Ziff Davis Media Inc.), 2001.
 
9
J. Gómez Hidalgo, M. Maña López, and E. Puertas Sanz. Combining text and heuristics for cost-sensitive spam filtering. In Proc. of CONLL, 2000.
 
10
P. Hoffman and D. Crocker. Unsolicited bulk email: Mechanisms for control. Technical Report UBE-SOL, IMCR-008, Internet Mail Cons., 1998.
 
11
P. Pantel and D. Lin. Spamcop: A spam classification & organization program. In Learning for Text Categorization: Papers from the 1998 Work. AAAI Tech. Rep. WS-98-05, 1998.
 
12
 
13
M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz. A bayesian approach to filtering junk e-mail. In Learning for Text Categorization: Papers from the 1998 Work. AAAI Tech. Rep. WS-98-05, 1998.
 
14
G. Sakkis, I. Androutsopoulos, G. Paliouras, V. Karkaletsis, C. D. Spyropoulos, and P. Stamatopoulos. Stacking classifiers for anti-spam filtering of e-mail. In Proc. of EMNLP, 2001.
15
 
16
 
17
 
18

CITED BY  12

Collaborative Colleagues:
José María Gómez Hidalgo: colleagues