| Associative text categorization exploiting negated words |
| Full text |
Pdf
(214 KB)
|
| Source
|
Symposium on Applied Computing
archive
Proceedings of the 2006 ACM symposium on Applied computing
table of contents
Dijon, France
SESSION: Data mining (DM)
table of contents
Pages: 530 - 535
Year of Publication: 2006
ISBN:1-59593-108-2
|
|
Authors
|
|
Elena Baralis
|
Politecnico di Torino, Corso Duca degli Abruzzi, Torino, Italy
|
|
Paolo Garza
|
Politecnico di Torino, Corso Duca degli Abruzzi, Torino, Italy
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 3, Downloads (12 Months): 35, Citation Count: 1
|
|
|
ABSTRACT
Associative classification has been recently applied to text document categorization. However, differently from classification of structured data, the quality of the generated classifier is rather low. This effect is mainly due to the poor precision of generated rules.To increase the precision of associative classifiers we propose the use of classification rules including negated words, i.e. words that the considered document should not contain. Rules are in the form "If a document includes words A and B, but not word Z, then it belongs to class C1". Mining classification rules with negated words becomes quickly intractable when decreasing the support threshold. We tackle this problem by means of an opportunistic approach, where negated words are only generated to specialize rules that may wrongly classify training documents. Hence precision is increased, without losing recall.Experiments on the Reuters corpus show that our classifier based on negated words achieves good precision and recall results, while yielding an easily interpretable model typical of associative classifiers.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
 |
3
|
|
| |
4
|
|
 |
5
|
Sergey Brin , Rajeev Motwani , Craig Silverstein, Beyond market baskets: generalizing association rules to correlations, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.265-276, May 11-15, 1997, Tucson, Arizona, United States
|
 |
6
|
|
| |
7
|
B. Goethals and M. J. Zaki. FIMI'03: Workshop on frequent itemset mining implementations. In FIMI'03, 2003.
|
 |
8
|
Jiawei Han , Jian Pei , Yiwen Yin, Mining frequent patterns without candidate generation, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.1-12, May 15-18, 2000, Dallas, Texas, United States
|
| |
9
|
S. Hettich and S. D. Bay. The reuters-21578 text collection. The UCI KDD Archive.
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In KDD'98, NY, 1998.
|
| |
14
|
|
 |
15
|
|
| |
16
|
J. Quinlan. C4.5: program for classification learning. Morgan Kaufmann, 1992.
|
| |
17
|
J. Rocchio. Relevance feedback in information retrieval. Prentice-Hall, 1971.
|
| |
18
|
|
 |
19
|
|
 |
20
|
Ke Wang , Senqiang Zhou , Yu He, Growing decision trees on support-less association rules, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.265-269, August 20-23, 2000, Boston, Massachusetts, United States
[doi> 10.1145/347090.347147]
|
| |
21
|
|
|