| Feature selection with conditional mutual information maximin in text categorization |
| Full text |
Pdf
(181 KB)
|
| Source
|
Conference on Information and Knowledge Management
archive
Proceedings of the thirteenth ACM international conference on Information and knowledge management
table of contents
Washington, D.C., USA
SESSION: IR-4 (information retrieval): machine learning in information retrieval
table of contents
Pages: 342 - 349
Year of Publication: 2004
ISBN:1-58113-874-1
|
|
Authors
|
|
Gang Wang
|
Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
|
|
Frederick H. Lochovsky
|
Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 12, Downloads (12 Months): 104, Citation Count: 4
|
|
|
ABSTRACT
Feature selection is an important component of text categorization. This technique can both increase a classifier's computation speed, and reduce the overfitting problem. Several feature selection methods, such as information gain and mutual information, have been widely used. Although they greatly improve the classifier's performance, they have a common drawback, which is that they do not consider the mutual relationships among the features. In this situation, where one feature's predictive power is weakened by others, and where the selected features tend to bias towards major categories, such selection methods are not very effective. In this paper, we propose a novel feature selection method for text categorization called <i>conditional mutual information maximin</i> (CMIM). It can select a set of individually discriminating and weakly dependent features. The experimental results show that CMIM can perform much better than traditional feature selection methods.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
Mark Craven , Dan DiPasquo , Dayne Freitag , Andrew McCallum , Tom Mitchell , Kamal Nigam , Seán Slattery, Learning to extract symbolic knowledge from the World Wide Web, Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence, p.509-516, July 1998, Madison, Wisconsin, United States
|
| |
5
|
S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman. Indexing by latent semantic indexing. Jounal of the American Socity for Information Science, 1990.
|
| |
6
|
|
| |
7
|
|
| |
8
|
F. Fleuret. Binary feature selection with conditional mutual infomration. Technical Report, 2003.
|
| |
9
|
|
| |
10
|
|
| |
11
|
T. Joachims. A probabilistic analysis of the rocchio algorithm with tfidf for text categorization. In Procedings 4th European Conference on Reaserch and Advanced Technology for Digital Libraries (ECDL'00), 2000.
|
| |
12
|
D. Lewis and M. Ringuette. Comparison of two learning algorithms for text categorization. In Procedings of the 3rd Annual Symposium on Document Analysis and Information Retrieval (SDAIR'94), 1994.
|
| |
13
|
|
| |
14
|
I. Moulinier. Is learning bias an issue on the text categorization problem? Technical report, 1997.
|
| |
15
|
K. Nigam, J. Lafferty, and A. McCallum. Using maximum entropy for text classification. In IJCAI-99 Workshop on Machine Learning for Information Filtering, 1999.
|
| |
16
|
J. Platt. Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods, 2000.
|
 |
17
|
Hinrich Schütze , David A. Hull , Jan O. Pedersen, A comparison of classifiers and document representations for the routing problem, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, p.229-237, July 09-13, 1995, Seattle, Washington, United States
[doi> 10.1145/215206.215365]
|
| |
18
|
|
| |
19
|
I. Tsamardinos, C. Aliferis, and A. Statnikov. Algorithms for large scale markov blanket discovery. In Proceedings of the 16th International Florida Artificial Intelligence Research Society Conference(FLAIRS), 2003.
|
| |
20
|
|
| |
21
|
E. Wiener, J. Pedersen, and A. Weigend. A neural network approach to topic spotting. In Procedings of the 4th Annual Symposium on Document Analysis and Information Retrieval (SDAIR'95), 1995.
|
| |
22
|
H. Yang and J. Moody. Feature selection based on joint mutual information. In Proceedings of International ICSC Symposium on Advances in Intelligent Data Analysis, 1999.
|
| |
23
|
|
| |
24
|
|
CITED BY 4
|
|
Jun Yan , Ning Liu , Benyu Zhang , Shuicheng Yan , Zheng Chen , Qiansheng Cheng , Weiguo Fan , Wei-Ying Ma, OCFS: optimal orthogonal centroid feature selection for text categorization, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
|
|
|
Hongxing He , Huidong Jin , Jie Chen , Damien McAullay , Jiuyong Li , Tony Fallon, Analysis of breast feeding data using data mining methods, Proceedings of the fifth Australasian conference on Data mining and analystics, p.47-52, November 29-30, 2006, Sydney, Australia
|
|
|
|
|
|
|
REVIEW
"Luminita State : Reviewer"
The main, and very often computationally overwhelming, characteristic of text data is its extremely high dimensionality, which could prove to be a severe obstacle for any classification algorithm. One of the most frequently used ways to reduce dim
more...
|