ACM Home Page
Please provide us with feedback. Feedback
Supervised term weighting for automated text categorization
Full text PdfPdf (579 KB)
Source Symposium on Applied Computing archive
Proceedings of the 2003 ACM symposium on Applied computing table of contents
Melbourne, Florida
SESSION: Information access and retrieval table of contents
Pages: 784 - 788  
Year of Publication: 2003
ISBN:1-58113-624-2
Authors
Franca Debole  Istituto di Scienza e Technologie dell'Informazione, Pisa (Italy)
Fabrizio Sebastiani  Istituto di Scienza e Tecnologie dell'Informazione, Pisa (Italy)
Sponsor
SIGAPP: ACM Special Interest Group on Applied Computing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 79,   Citation Count: 15
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/952532.952688
What is a DOI?

ABSTRACT

The construction of a text classifier usually involves (i) a phase of term selection, in which the most relevant terms for the classification task are identified, (ii) a phase of term weighting, in which document weights for the selected terms are computed, and (iii) a phase of classifier learning, in which a classifier is generated from the weighted representations of the training documents. This process involves an activity of supervised learning, in which information on the membership of training documents in categories is used. Traditionally, supervised learning enters only phases (i) and (iii). In this paper we propose instead that learning from training data should also affect phase (ii), i.e. that information on the membership of training documents to categories be used to determine term weights. We call this idea supervised term weighting (STW). As an example, we propose a number of "supervised variants" of t f idf weighting, obtained by replacing the idf function with the function that has been used in phase (i) for term selection. We present experimental results obtained on the standard Reuters-21578 benchmark with one classifier learning method (support vector machines), three term selection functions (information gain, chi-square, and gain ratio), and both local and global term selection and weighting.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
F. Debole and F. Sebastiani. Supervised term weighting for automated text categorization. Technical Report 2002-TR-08, Istituto di Scienza e Tecnologie dell'Informazione, Consiglio Nazionale delle Ricerche, Pisa, IT, 2002. Submitted for publication.
 
3
 
4
 
5
6
 
7
 
8
 
9
10
 
11
 
12
13

CITED BY  16

Collaborative Colleagues:
Franca Debole: colleagues
Fabrizio Sebastiani: colleagues