ACM Home Page
Please provide us with feedback. Feedback
Boosting support vector machines for text classification through parameter-free threshold relaxation
Full text PdfPdf (404 KB)
Source Conference on Information and Knowledge Management archive
Proceedings of the twelfth international conference on Information and knowledge management table of contents
New Orleans, LA, USA
SESSION: Knowledge management session 3: classification table of contents
Pages: 247 - 254  
Year of Publication: 2003
ISBN:1-58113-723-0
Authors
James G. Shanahan  Clairvoyance Corporation
Norbert Roma  Clairvoyance Corporation
Sponsors
ACM: Association for Computing Machinery
SIGMIS: ACM Special Interest Group on Management Information Systems
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 74,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/956863.956911
What is a DOI?

ABSTRACT

Support vector machine (SVM) learning algorithms focus on finding the hyperplane that maximizes the margin (the distance from the separating hyperplane to the nearest examples) since this criterion provides a good upper bound of the generalization error. When applied to text classification, these learning algorithms lead to SVMs with excellent precision but poor recall. Various relaxation approaches have been proposed to counter this problem including: asymmetric SVM learning algorithms (soft SVMs with asymmetric misclassification costs); uneven margin based learning; and thresholding. A review of these approaches is presented here. In addition, in this paper, we describe a new threshold relaxation algorithm. This approach builds on previous thresholding work based upon the beta-gamma algorithm. The proposed thresholding strategy is parameter free, relying on a process of retrofitting and cross validation to set algorithm parameters empirically, whereas our previous approach required the specification of two parameters (beta and gamma). The proposed approach is more efficient, does not require the specification of any parameters, and similarly to the parameter-based approach, boosts the performance of baseline SVMs by at least 20% for standard information retrieval measures.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Arampatzis A., Unbiased S-D Threshold Optimization, Initial Query Degradation, Decay, and Incrementality, for Adaptive Document Filtering, Tenth Text Retrieval Conference (TREC-2001), 2002, 596--605.
 
2
Ault T., Yang Y., kNN, Rocchio and Metrics for Information Filtering at TREC-10, Tenth Text Retrieval Conference (TREC-2001), 2002, 84--93
 
3
Cancedda N. et al., Kernel Methods for Document Filtering, Eleventh Text Retrieval Conference (TREC-11), 2003.
 
4
 
5
Evans, D. A., Shanahan, J., Tong, X., Roma, N., Stoica, E., Sheftel, V., Montgomery, J., Bennett, J., Fujita, S., Grefenstette, G. Topic Specific Optimization and Structuring. Tenth Text Retrieval Conference (TREC-2001), 2002, 132--141.
 
6
 
7
Keerthi, S. S., Shevade, S. K., Bhattacharyya, C., Murthy, K. R. K. Improvements to Platt's SMO algorithm for SVM classifier design. Technical report, Dept of CSA, IISc, Bangalore, India, 1999.
 
8
LeCun, Y., Jackel, L. D., Bottou, L., Cortes, C., Denker, J. S., Drucker, H., Guyon, I., Muller, U. A., Sackinger, E., Simard, P. and Vapnik, V. Learning algorithms for classification: A comparison on handwritten digit recognition. Neural Networks: The Statistical Mechanics Perspective, 261--276, 1995.
 
9
Lewis D. D., The Reuters-21578 text categorization test collection. http://www.research.att.com/ lewis/reuters21578.html. Checked on 11 May 1998; Timestamp Tue Jan 20 21:07:21 EST 1998.
 
10
Lewis D. D., Applying Support Vector Machines to the TREC-2001 Batch Filtering and Routing Tasks, Tenth Text Retrieval Conference (TREC-2001), 2002, 286--294.
 
11
 
12
Mayfield J., McNamee P., Costello C., Piatko C., Banerjee A., JHU/APL at TREC 2001: Experiments in Filtering and in Arabic, Video, and Web Retrieval, at TREC-10, Tenth Text Retrieval Conference (TREC-2001), 2002, 322--332.
 
13
 
14
 
15
 
16
Robertson S. E., Soboroff I., The TREC 2001 Filtering Track Report, Tenth Text Retrieval Conference (TREC-2001), 2002, 26--37.
 
17
Robertson S. E., Walker S., Zaragoza H., Herbrich H., Microsoft Cambridge at TREC 2002: Filtering Track, Eleventh Text Retrieval Conference (TREC-2002), 2003.
 
18
 
19
Shanahan J. G., Roma N., Improving SVM Text Classification Performance through Threshold Adjustment, European Conference on Machine Learning (ECML) 2003, To Appear.
 
20
 
21
Vapnik, V., Statistical Learning Theory, Wiley, 1998
 
22
Voorhees E.M., Overview of TREC 2002, Eleventh Text Retrieval Conference (TREC-2002), 2002, 1--16.
23
 
24
Zhai, C., Jansen, P., Stoica, E., Grot, N., Evans, D. A. Threshold Calibration in CLARIT Adaptive Filtering. Seventh Text Retrieval Conference (TREC-7), 1999, 149--156.
 
25
Y. Zhang and J. Callan. "YFilter at TREC-9". In Proceedings of the Ninth Text REtrieval Conference (TREC-9), (pp. 135--140). National Institute of Standards and Technology, 2001, special publication 500-249.


Collaborative Colleagues:
James G. Shanahan: colleagues
Norbert Roma: colleagues