|
ABSTRACT
The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert labor power, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Ion Androutsopoulos , John Koutsias , Konstantinos V. Chandrinos , Constantine D. Spyropoulos, An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages, Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, p.160-167, July 24-28, 2000, Athens, Greece
[doi> 10.1145/345508.345569]
|
 |
3
|
|
| |
4
|
ATTARDI, G., DI MARCO,S.,AND SALVI, D. 1998. Categorization by context. J. Univers. Comput. Sci. 4, 9, 719-736.
|
 |
5
|
|
 |
6
|
|
 |
7
|
P. Biebricher , N. Fuhr , G. Lustig , M. Schwantner , G. Knorz, The automatic indexing system AIR/PHYS - from research to applications, Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval, p.333-342, May 1988, Grenoble, France
[doi> 10.1145/62437.62470]
|
 |
8
|
|
| |
9
|
|
| |
10
|
CAVNAR,W.B.AND TRENKLE, J. M. 1994. N-grambased text categorization. In Proceedings of SDAIR-94, 3rd Annual Symposium on Docu-ment Analysis and Information Retrieval (Las Vegas, NV, 1994), 161-175.
|
| |
11
|
|
 |
12
|
Soumen Chakrabarti , Byron Dom , Piotr Indyk, Enhanced hypertext categorization using hyperlinks, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.307-318, June 01-04, 1998, Seattle, Washington, United States
|
 |
13
|
Chris Clack , Johnny Farringdon , Peter Lidwell , Tina Yu, Autonomous document classification for business, Proceedings of the first international conference on Autonomous agents, p.201-208, February 05-08, 1997, Marina del Rey, California, United States
[doi> 10.1145/267658.267716]
|
| |
14
|
|
| |
15
|
COHEN, W. W. 1995a. Learning to classify English text with ILP methods. In Advances in Inductive Logic Programming, L. De Raedt, ed. IOS Press, Amsterdam, The Netherlands, 124-143.
|
| |
16
|
COHEN, W. W. 1995b. Text categorization and relational learning. In Proceedings of ICML-95, 12th International Conference on Machine Learning (Lake Tahoe, CA, 1995), 124-132.
|
| |
17
|
COHEN,W.W.AND HIRSH, H. 1998. Joins that generalize: text classification using WHIRL.InProceedings of KDD-98, 4th International Conference on Knowledge Discovery and Data Mining (New York, NY, 1998), 169-173.
|
 |
18
|
|
 |
19
|
|
 |
20
|
|
 |
21
|
|
| |
22
|
DAGAN, I., KAROV,Y.,AND ROTH, D. 1997. Mistakedriven learning in text categorization. In Proceedings of EMNLP-97, 2nd Conference on Empirical Methods in Natural Language Processing (Providence, RI, 1997), 55-63.
|
| |
23
|
DEERWESTER, S., DUMAIS,S.T.,FURNAS,G.W., LANDAUER, T. K., AND HARSHMAN, R. 1990. Indexing by latent semantic indexing. J. Amer. Soc. Inform. Sci. 41, 6, 391-407.
|
| |
24
|
DENOYER, L., ZARAGOZA, H., AND GALLINARI, P. 2001. HMM-based passage models for document classification and ranking. In Proceedings of ECIR- 01, 23rd European Colloquium on Information Retrieval Research (Darmstadt, Germany, 2001).
|
| |
25
|
DIAZ ESTEBAN, A., DE BUENAGA RODRIGUEZ, M., URENA LOPEZ,L.A.,AND GARCIA VEGA, M. 1998. Integrating linguistic resources in an uniform way for text classification tasks. In Proceedings of LREC-98, 1st International Conference on Language Resources and Evaluation (Grenada, Spain, 1998), 1197-1204.
|
| |
26
|
|
| |
27
|
DRUCKER, H., VAPNIK,V.,AND WU, D. 1999. Automatic text categorization and its applications to text retrieval. IEEE Trans. Neural Netw. 10,5, 1048-1054.
|
 |
28
|
|
 |
29
|
Susan Dumais , John Platt , David Heckerman , Mehran Sahami, Inductive learning algorithms and representations for text categorization, Proceedings of the seventh international conference on Information and knowledge management, p.148-155, November 02-07, 1998, Bethesda, Maryland, United States
[doi> 10.1145/288627.288651]
|
| |
30
|
|
| |
31
|
FIELD, B. 1975. Towards automatic indexing: automatic assignment of controlled-language indexing and classification from free indexing. J. Document. 31, 4, 246-265.
|
| |
32
|
FORSYTH, R. S. 1999. New directions in text categorization. In Causal Models and Intelligent Data Management, A. Gammerman, ed. Springer, Heidelberg, Germany, 151-185.
|
| |
33
|
|
| |
34
|
FUHR, N. 1985. Aprobabilistic model of dictionarybased automatic indexing. In Proceedings of RIAO-85, 1st International Conference "Re-cherche d'Information Assistee par Ordinateur" (Grenoble, France, 1985), 207-216.
|
| |
35
|
|
 |
36
|
|
| |
37
|
FUHR, N., HARTMANN, S., KNORZ, G., LUSTIG,G., SCHWANTNER, M., AND TZERAS, K. 1991. AIR/X"a rule-based multistage indexing system for large subject fields. In Proceedings of RIAO-91, 3rd International Conference "Recherche d'Information Assistee par Ordina-teur" (Barcelona, Spain, 1991), 606-623.
|
| |
38
|
|
 |
39
|
|
| |
40
|
|
| |
41
|
|
| |
42
|
GALE, W. A., CHURCH,K.W.,AND YAROWSKY, D. 1993. A method for disambiguating word senses in a large corpus. Comput. Human. 26, 5, 415-439.
|
 |
43
|
Norbert Gövert , Mounia Lalmas , Norbert Fuhr, A probabilistic description-oriented approach for categorizing web documents, Proceedings of the eighth international conference on Information and knowledge management, p.475-482, November 02-06, 1999, Kansas City, Missouri, United States
[doi> 10.1145/319950.320053]
|
| |
44
|
GRAY,W.A.AND HARLEY, A. J. 1971. Computerassisted indexing. Inform. Storage Retrieval 7, 4, 167-174.
|
| |
45
|
|
| |
46
|
P. J. Hayes , P. M. Andersen , I. B. Nirenburg , L. M. Schmandt, TCS: a shell for content-based text categorization, Proceedings of the sixth conference on Artificial intelligence applications, p.320-326, January 1990, Santa Barbara, California, United States
|
| |
47
|
HEAPS, H. 1973. A theory of relevance for automatic document classification. Inform. Control 22, 3, 268-278.
|
| |
48
|
William Hersh , Chris Buckley , T. J. Leone , David Hickam, OHSUMED: an interactive retrieval evaluation and new large test collection for research, Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, p.192-201, July 03-06, 1994, Dublin, Ireland
|
| |
49
|
|
 |
50
|
|
| |
51
|
ITTNER,D.J.,LEWIS,D.D.,AND AHN, D. D. 1995. Text categorization of low quality images. In Proceedings of SDAIR-95, 4th Annual Symposium on Document Analysis and Information Retrieval (Las Vegas, NV, 1995), 301-315.
|
 |
52
|
|
 |
53
|
Raj D. Iyer , David D. Lewis , Robert E. Schapire , Yoram Singer , Amit Singhal, Boosting for document routing, Proceedings of the ninth international conference on Information and knowledge management, p.70-77, November 06-11, 2000, McLean, Virginia, United States
[doi> 10.1145/354756.354794]
|
| |
54
|
|
| |
55
|
|
| |
56
|
|
| |
57
|
|
| |
58
|
JOHN, G. H., KOHAVI, R., AND PFLEGER, K. 1994. Irrelevant features and the subset selection problem. In Proceedings of ICML-94, 11th International Conference on Machine Learning (New Brunswick, NJ, 1994), 121-129.
|
| |
59
|
JUNKER,M.AND ABECKER, A. 1997. Exploiting thesaurus knowledge in rule induction for text classification. In Proceedings of RANLP-97, 2nd International Conference on Recent Advances in Natural Language Processing (Tzigov Chark, Bulgaria, 1997), 202-207.
|
| |
60
|
JUNKER,M.AND HOCH, R. 1998. An experimental evaluation of OCR text representations for learning document classifiers. Internat. J. Document Analysis and Recognition 1, 2, 116-122.
|
| |
61
|
|
 |
62
|
|
| |
63
|
|
 |
64
|
|
| |
65
|
|
| |
66
|
|
| |
67
|
|
| |
68
|
|
 |
69
|
|
| |
70
|
LAM, W., LOW,K.F.,AND HO, C. Y. 1997. Using a Bayesian network induction approach for text categorization. In Proceedings of IJCAI-97, 15th International Joint Conference on Artificial Intelligence (Nagoya, Japan, 1997), 745-750.
|
| |
71
|
|
| |
72
|
LANG, K. 1995. NEWSWEEDER: learning to filter netnews. In Proceedings of ICML-95, 12th International Conference on Machine Learning (Lake Tahoe, CA, 1995), 331-339.
|
 |
73
|
|
 |
74
|
|
 |
75
|
|
 |
76
|
|
| |
77
|
|
 |
78
|
|
 |
79
|
|
| |
80
|
LEWIS, D. D. 1995c. The TREC-4 filtering track: description and analysis. In Proceedings of TREC-4, 4th Text Retrieval Conference (Gaithersburg, MD, 1995), 165-180.
|
| |
81
|
|
| |
82
|
LEWIS,D.D.AND CATLETT, J. 1994. Heterogeneous uncertainty sampling for supervised learning. In Proceedings of ICML-94, 11th International Conference on Machine Learning (New Brunswick, NJ, 1994), 148-156.
|
| |
83
|
|
| |
84
|
LEWIS,D.D.AND HAYES, P. J. 1994. Guest editorial for the special issue on text categorization. ACM Trans. Inform. Syst. 12, 3, 231.
|
| |
85
|
LEWIS,D.D.AND RINGUETTE, M. 1994. A comparison of two learning algorithms for text categorization. In Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval (Las Vegas, NV, 1994), 81-93.
|
 |
86
|
David D. Lewis , Robert E. Schapire , James P. Callan , Ron Papka, Training algorithms for linear text classifiers, Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, p.298-306, August 18-22, 1996, Zurich, Switzerland
[doi> 10.1145/243199.243277]
|
 |
87
|
|
| |
88
|
LI,Y.H.AND JAIN, A. K. 1998. Classification of text documents. Comput. J. 41, 8, 537-546.
|
 |
89
|
|
| |
90
|
LIERE,R.AND TADEPALLI, P. 1997. Active learning with committees for text categorization. In Proceedings of AAAI-97, 14th Conference of the American Association for Artificial Intelligence (Providence, RI, 1997), 591-596.
|
 |
91
|
|
| |
92
|
|
 |
93
|
|
| |
94
|
|
 |
95
|
|
| |
96
|
|
| |
97
|
|
| |
98
|
MERKL, D. 1998. Text classification with selforganizing maps: Some lessons learned. Neurocomputing 21, 1/3, 61-77.
|
| |
99
|
|
| |
100
|
|
| |
101
|
MLADENIC,D.AND GROBELNIK, M. 1998. Word sequences as features in text-learning. In Proceedings of ERK-98, the Seventh Electrotechnical and Computer Science Conference (Ljubljana, Slovenia, 1998), 145-148.
|
| |
102
|
|
| |
103
|
MOULINIER, I., RASKINIS,G.,AND GANASCIA, J.-G. 1996. Text categorization: a symbolic approach. In Proceedings of SDAIR-96, 5th Annual Symposium on Document Analysis and Information Retrieval (Las Vegas, NV, 1996), 87-99.
|
| |
104
|
|
 |
105
|
Hwee Tou Ng , Wei Boon Goh , Kok Leong Low, Feature selection, perception learning, and a usability case study for text categorization, Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval, p.67-73, July 27-31, 1997, Philadelphia, Pennsylvania, United States
|
| |
106
|
|
 |
107
|
|
| |
108
|
|
 |
109
|
|
 |
110
|
|
| |
111
|
ROBERTSON,S.E.AND HARDING, P. 1984. Probabilistic automatic indexing by learning from human indexers. J. Document. 40, 4, 264-270.
|
| |
112
|
ROBERTSON,S.E.AND SPARCK JONES, K. 1976. Relevance weighting of search terms. J. Amer. Soc. Inform. Sci. 27, 3, 129-146. Also reprinted in Willett {1988}, pp. 143-160.
|
| |
113
|
|
 |
114
|
|
| |
115
|
SABLE,C.L.AND HATZIVASSILOGLOU, V. 2000. Textbased approaches for non-topical image categorization. Internat. J. Dig. Libr. 3, 3, 261-275.
|
| |
116
|
|
 |
117
|
|
| |
118
|
|
| |
119
|
|
 |
120
|
|
| |
121
|
|
 |
122
|
Hinrich Schütze , David A. Hull , Jan O. Pedersen, A comparison of classifiers and document representations for the routing problem, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, p.229-237, July 09-13, 1995, Seattle, Washington, United States
[doi> 10.1145/215206.215365]
|
| |
123
|
|
 |
124
|
Fabrizio Sebastiani , Alessandro Sperduti , Nicola Valdambrini, An improved boosting algorithm and its application to text categorization, Proceedings of the ninth international conference on Information and knowledge management, p.78-85, November 06-11, 2000, McLean, Virginia, United States
[doi> 10.1145/354756.354804]
|
 |
125
|
Amit Singhal , Mandar Mitra , Chris Buckley, Learning routing queries in a query zone, Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval, p.25-32, July 27-31, 1997, Philadelphia, Pennsylvania, United States
|
| |
126
|
|
| |
127
|
SLONIM,N.AND TISHBY, N. 2001. The power of word clusters for text classification. In Proceedings of ECIR-01, 23rd European Colloquium on Information Retrieval Research (Darmstadt, Germany, 2001).
|
| |
128
|
|
| |
129
|
|
| |
130
|
|
| |
131
|
TUMER,K.AND GHOSH, J. 1996. Error correlation and error reduction in ensemble classifiers. Connection Sci. 8, 3-4, 385-403.
|
 |
132
|
|
| |
133
|
VAN RIJSBERGEN, C. J. 1977. A theoretical basis for the use of co-occurrence data in information retrieval. J. Document. 33, 2, 106-119.
|
| |
134
|
|
| |
135
|
|
| |
136
|
Sholom M. Weiss , Chidanand Apte , Fred J. Damerau , David E. Johnson , Frank J. Oles , Thilo Goetz , Thomas Hampp, Maximizing Text-Mining Performance, IEEE Intelligent Systems, v.14 n.4, p.63-69, July 1999
[doi> 10.1109/5254.784086]
|
| |
137
|
WIENER,E.D.,PEDERSEN,J.O.,AND WEIGEND,A.S. 1995. A neural network approach to topic spotting. In Proceedings of SDAIR-95, 4th Annual Symposium on Document Analysis and Information Retrieval (Las Vegas, NV, 1995), 317-332.
|
| |
138
|
|
 |
139
|
|
| |
140
|
|
 |
141
|
|
| |
142
|
|
 |
143
|
|
 |
144
|
|
| |
145
|
|
| |
146
|
|
 |
147
|
|
CITED BY 381
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Henri Avancini , Alberto Lavelli , Bernardo Magnini , Fabrizio Sebastiani , Roberto Zanoli, Expanding domain-specific lexicons by term categorization, Proceedings of the 2003 ACM symposium on Applied computing, March 09-12, 2003, Melbourne, Florida
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Marco Degemmis , Pasquale Lops , Giovanni Semeraro , Maria Francesca Costabile , Stefano Paolo Guida , Oriana Licchelli, Improving collaborative recommender systems by means of user profiles, Designing personalized user experiences in eCommerce, Kluwer Academic Publishers, Norwell, MA, 2004
|
|
|
|
|
|
|
|
|
Dou Shen , Zheng Chen , Qiang Yang , Hua-Jun Zeng , Benyu Zhang , Yuchang Lu , Wei-Ying Ma, Web-page classification through summarization, Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, July 25-29, 2004, Sheffield, United Kingdom
|
|
|
Tie-Yan LIU , Yiming YANG , Hao WAN , Qian ZHOU , Bin GAO , Hua-Jun ZENG , Zheng CHEN , Wei-Ying MA, An experimental study on large-scale web categorization, Special interest tracks and posters of the 14th international conference on World Wide Web, May 10-14, 2005, Chiba, Japan
|
|
|
|
|
|
|
|
|
|
|
|
Ludovic Denoyer , Jean-Noël Vittaut , Patrick Gallinari , Sylvie Brunessaux , Stephan Brunessaux, Structured multimedia document classification, Proceedings of the 2003 ACM symposium on Document engineering, November 20-22, 2003, Grenoble, France
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Amit Sheth , Clemens Bertram , David Avant , Brian Hammond , Krysztof Kochut , Yashodhan Warke, Managing Semantic Content for the Web, IEEE Internet Computing, v.6 n.4, p.80-87, July 2002
|
|
|
|
|
|
Kazem Taghva , Thomas Nartker , Julie Borsack, Information access in the presence of OCR errors, Proceedings of the 1st ACM workshop on Hardcopy document processing, p.1-8, November 12-12, 2004, Washington, DC, USA
|
|
|
|
|
|
|
|
|
Sheng Gao , Wen Wu , Chin-Hui Lee , Tat-Seng Chua, A MFoM learning approach to robust multiclass multi-label text categorization, Proceedings of the twenty-first international conference on Machine learning, p.42, July 04-08, 2004, Banff, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jose P. G. Mahedero , Álvaro MartÍnez , Pedro Cano , Markus Koppenberger , Fabien Gouyon, Natural language processing of lyrics, Proceedings of the 13th annual ACM international conference on Multimedia, November 06-11, 2005, Hilton, Singapore
|
|
|
Anne Kao , Lesley Quach , Steve Poteet , Steve Woods, User assisted text classification and knowledge management, Proceedings of the twelfth international conference on Information and knowledge management, November 03-08, 2003, New Orleans, LA, USA
|
|
|
|
|
|
|
|
|
|
|
|
Tao Li , Feng Liang , Sheng Ma , Wei Peng, An integrated framework on mining logs files for computing system management, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
|
|
|
|
|
|
|
|
|
Ding-Yi Chen , Xue Li , Zhao Yang Dong , Xia Chen, Determining the fitness of a document model by using conflict instances, Proceedings of the sixteenth Australasian database conference, p.125-133, January 01, 2005, Newcastle, Australia
|
|
|
|
|
|
Tie-Yan Liu , Yiming Yang , Hao Wan , Hua-Jun Zeng , Zheng Chen , Wei-Ying Ma, Support vector machines classification with a very large-scale taxonomy, ACM SIGKDD Explorations Newsletter, v.7 n.1, p.36-43, June 2005
|
|
|
Songbo Tan , Xueqi Cheng , Moustafa M. Ghanem , Bin Wang , Hongbo Xu, A novel refinement approach for text categorization, Proceedings of the 14th ACM international conference on Information and knowledge management, October 31-November 05, 2005, Bremen, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Bin Gao , Tie-Yan Liu , Guang Feng , Tao Qin , Qian-Sheng Cheng , Wei-Ying Ma, Hierarchical Taxonomy Preparation for Text Categorization Using Consistent Bipartite Spectral Graph Copartitioning, IEEE Transactions on Knowledge and Data Engineering, v.17 n.9, p.1263-1273, September 2005
|
|
|
|
|
|
Fuchun Peng , Xiangji Huang , Dale Schuurmans , Shaojun Wang, Text classification in Asian languages without word segmentation, Proceedings of the sixth international workshop on Information retrieval with Asian languages, p.41-48, July 07-07, 2003, Sappro, Japan
|
|
|
Ying Liu , Shamkant B. Navathe , Jorge Civera , Venu Dasigi , Ashwin Ram , Brian J. Ciliax , Ray Dingledine, Text Mining Biomedical Literature for Discovering Gene-to-Gene Relationships: A Comparative Study of Algorithms, IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), v.2 n.1, p.62-76, January 2005
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Xavier Sevillano , Germán Cobo , Francesc Alías , Joan Claudi Socoró, Feature diversity in cluster ensembles for robust document clustering, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
|
|
|
|
|
|
Alan F. Smeaton , Bart Lehane , Noel E. O'Connor , Conor Brady , Gary Craig, Automatically selecting shots for action movie trailers, Proceedings of the 8th ACM international workshop on Multimedia information retrieval, October 26-27, 2006, Santa Barbara, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
C. Biagioli , E. Francesconi , A. Passerini , S. Montemagni , C. Soria, Automatic semantics extraction in law documents, Proceedings of the 10th international conference on Artificial intelligence and law, June 06-11, 2005, Bologna, Italy
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Aynur Dayanik , David D. Lewis , David Madigan , Vladimir Menkov , Alexander Genkin, Constructing informative prior distributions from domain knowledge in text classification, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Panagiotis G. Ipeirotis , Eugene Agichtein , Pranay Jain , Luis Gravano, To search or to crawl?: towards a query optimizer for text-centric tasks, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA
|
|
|
Jean-Michel Renders , Eric Gaussier , Cyril Goutte , Francois Pacull , Gabriela Csurka, Categorization in multiple category systems, Proceedings of the 23rd international conference on Machine learning, p.745-752, June 25-29, 2006, Pittsburgh, Pennsylvania
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
F. Fdez-Riverola , E. L. Iglesias , F. Díaz , J. R. Méndez , J. M. Corchado, Applying lazy learning algorithms to tackle concept drift in spam filtering, Expert Systems with Applications: An International Journal, v.33 n.1, p.36-48, July, 2007
|
|
|
|
|
|
Adriano Veloso , Wagner Meira, Jr. , Marco Cristo , Marcos Gonçalves , Mohammed Zaki, Multi-evidence, multi-criteria, lazy associative document classification, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
|
|
|
|
|
|
Ling-Yu Duan , Jinqiao Wang , Yantao Zheng , Jesse S. Jin , Hanqing Lu , Changsheng Xu, Segmentation, categorization, and identification of commercial clips from TV streams using multimodal analysis, Proceedings of the 14th annual ACM international conference on Multimedia, October 23-27, 2006, Santa Barbara, CA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Alfio Gliozzo , Carlo Strapparava , Ido Dagan, Investigating unsupervised learning for text categorization bootstrapping, Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, p.129-136, October 06-08, 2005, Vancouver, British Columbia, Canada
|
|
|
|
|
|
|
|
|
Xin Li , Hsinchun Chen , Zhu Zhang , Jiexun Li, Automatic patent classification using citation network information: an experimental study in nanotechnology, Proceedings of the 2007 conference on Digital libraries, June 18-23, 2007, Vancouver, BC, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
B. Barla Cambazoglu , Evren Karaca , Tayfun Kucukyilmaz , Ata Turk , Cevdet Aykanat, Architecture of a grid-enabled Web search engine, Information Processing and Management: an International Journal, v.43 n.3, p.609-623, May, 2007
|
|
|
|
|
|
|
|
|
|
|
|
Lei Zhang , Debbie Zhang , Simeon J. Simoff , John Debenham, Weighted kernel model for text categorization, Proceedings of the fifth Australasian conference on Data mining and analystics, p.111-114, November 29-30, 2006, Sydney, Australia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Shouning Qu , Bing Zhang , Xinsheng Yu , Qin Wang, The development and application of Chinese intelligent question answering system based on J2EE technology, Proceedings of the 1st international conference on Forensic applications and techniques in telecommunications, information, and multimedia and workshop, January 21-23, 2008, Adelaide, Australia
|
|
|
|
|
|
|
|
|
Jian Hu , Lujun Fang , Yang Cao , Hua-Jun Zeng , Hua Li , Qiang Yang , Zheng Chen, Enhancing text clustering by leveraging Wikipedia semantics, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, July 20-24, 2008, Singapore, Singapore
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Nayer M. Wanas , Dina A. Said , Nadia H. Hegazy , Nevin M. Darwish, A study of local and global thresholding techniques in text categorization, Proceedings of the fifth Australasian conference on Data mining and analystics, p.91-101, November 29-30, 2006, Sydney, Australia
|
|
|
|
|
|
|
|
|
Manish A. Bhide , Ajay Gupta , Rahul Gupta , Prasan Roy , Mukesh K. Mohania , Zenita Ichhaporia, LIPTUS: associating structured and unstructured information in a banking environment, Proceedings of the 2007 ACM SIGMOD international conference on Management of data, June 11-14, 2007, Beijing, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
S. Gupta , R. Goyal , K. Shubham , L. Dey , A. Malik , S. Chaudhury , S. Bhattacharya, Knowledge Discovery from Semi-Structured Data for Conceptual Organization, Proceedings of the 2006 IEEE/WIC/ACM international conference on Web Intelligence and Intelligent Agent Technology, p.291-294, December 18-22, 2006
|
|
|
Shlomo Argamon , Casey Whitelaw , Paul Chase , Sobhan Raj Hota , Navendu Garg , Shlomo Levitan, Stylistic text classification using functional lexical features: Research Articles, Journal of the American Society for Information Science and Technology, v.58 n.6, p.802-822, April 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Claire Cardie , Cynthia Farina , Adil Aijaz , Matt Rawding , Stephen Purpura, A study in rule-specific issue categorization for e-rulemaking, Proceedings of the 2008 international conference on Digital government research, May 18-21, 2008, Montreal, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Steven Gianvecchio , Mengjun Xie , Zhenyu Wu , Haining Wang, Measurement and classification of humans and bots in internet chat, Proceedings of the 17th conference on Security symposium, p.155-169, July 28-August 01, 2008, San Jose, CA
|
|
|
|
|
|
|
|
|
|
|
|
David Patterson , Niall Rooney , Mykola Galushka , Vladimir Dobrynin , Elena Smirnova, SOPHIA-TCBR: A knowledge discovery framework for textual case-based reasoning, Knowledge-Based Systems, v.21 n.5, p.404-414, July, 2008
|
|
|
|
|
|
Elena Montanes , Pedro Alonso , Elias F. Combarro , Irene Diaz , Raquel Cortina , Jose Ranilla, Using Laplace and angular measures for Feature Selection in Text Categorisation, International Journal of Advanced Intelligence Paradigms, v.1 n.1, p.40-59, October 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Frederico A. Durão , Taciana A. Vanderlei , Eduardo S. Almeida , Silvio R. de L. Meira, Applying a semantic layer in a source code search tool, Proceedings of the 2008 ACM symposium on Applied computing, March 16-20, 2008, Fortaleza, Ceara, Brazil
|
|
|
Gabriel Pui Cheong Fung , Jeffrey Xu Yu , Huan Liu , Philip S. Yu, Time-dependent event hierarchy construction, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
|
|
|
Fernando Mourão , Leonardo Rocha , Renata Araújo , Thierson Couto , Marcos Gonçalves , Wagner Meira, Jr., Understanding temporal aspects in document classification, Proceedings of the international conference on Web search and web data mining, February 11-12, 2008, Palo Alto, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Tsvi Kuflik , Ilaria Pertot , Robert Moskovitch , Rosaly Zasso , Elisabetta Pellegrini , Cesare Gessler, Optimization of Fire blight scouting with a decision support system based on infection risk, Computers and Electronics in Agriculture, v.62 n.2, p.118-127, July, 2008
|
|
|
Anirban Dasgupta , Petros Drineas , Boulos Harb , Vanja Josifovski , Michael W. Mahoney, Feature selection methods for text classification, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Lei Wu , Zhiwei Li , Mingjing Li , Wei-Ying Ma , Nenghai Yu, Mutually beneficial learning with application to on-line news classification, Proceedings of the ACM first Ph.D. workshop in CIKM, November 09-09, 2007, Lisbon, Portugal
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yuefeng Li , Xujuan Zhou , Peter Bruza , Yue Xu , Raymond Y.K. Lau, A two-stage text mining model for information filtering, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|
|
Byungun Yoon , Robert Phaal , David Probert, Structuring technological information for technology roadmapping: data mining approach, Proceedings of the 7th WSEAS International Conference on Artificial intelligence, knowledge engineering and data bases, p.417-422, February 20-22, 2008, Cambridge, UK
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Andrei Z. Broder , Peter Ciccolo , Marcus Fontoura , Evgeniy Gabrilovich , Vanja Josifovski , Lance Riedel, Search advertising using web relevance feedback, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
Paul Buitelaar , Philipp Cimiano , Anette Frank , Matthias Hartung , Stefania Racioppa, Ontology-based information extraction and integration from heterogeneous data sources, International Journal of Human-Computer Studies, v.66 n.11, p.759-788, November, 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Leonardo Rocha , Fernando Mourão , Adriano Pereira , Marcos André Gonçalves , Wagner Meira, Jr., Exploiting temporal contexts in text classification, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Souneil Park , Seungwoo Kang , Sangyoung Chung , Junehwa Song, NewsCube: delivering multiple aspects of news to mitigate media bias, Proceedings of the 27th international conference on Human factors in computing systems, April 04-09, 2009, Boston, MA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Steven Bethard , Soumya Ghosh , James H. Martin , Tamara Sumner, Topic model methods for automatically identifying out-of-scope resources, Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries, June 15-19, 2009, Austin, TX, USA
|
|
|
Alberto F. De Souza , Felipe Pedroni , Elias Oliveira , Patrick M. Ciarelli , Wallace Favoreto Henrique , Lucas Veronese , Claudine Badue, Automated multi-label text categorization with VG-RAM weightless neural networks, Neurocomputing, v.72 n.10-12, p.2209-2217, June, 2009
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Roberto Perdisci , Davide Ariu , Prahlad Fogla , Giorgio Giacinto , Wenke Lee, McPAD: A multiple classifier system for accurate payload-based anomaly detection, Computer Networks: The International Journal of Computer and Telecommunications Networking, v.53 n.6, p.864-881, April, 2009
|
|
|
|
|
|
|
|
|
|
|
|
Evgeniy Gabrilovich , Andrei Broder , Marcus Fontoura , Amruta Joshi , Vanja Josifovski , Lance Riedel , Tong Zhang, Classifying search queries using the Web as a source of knowledge, ACM Transactions on the Web (TWEB), v.3 n.2, p.1-28, April 2009
|
|
|
|
|
|
|
|
|
|
|
|
Rudolf Mayer , Robert Neumayer , Andreas Rauber, Interacting with (semi-) automatically extracted context of digital objects, Proceedings of the 1st Workshop on Context, Information and Ontologies, p.1-9, June 01-01, 2009, Heraklion, Greece
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Son Doan , Ai Kawazoe , Nigel Collier, The role of roles in classifying annotated biomedical text, Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, June 29-29, 2007, Prague, Czech Republic
|
|
|
|
|
|
|
|
|
K. Rajan , V. Ramalingam , M. Ganesan , S. Palanivel , B. Palaniappan, Automatic classification of Tamil documents using vector space model and artificial neural network, Expert Systems with Applications: An International Journal, v.36 n.8, p.10914-10918, October, 2009
|
|
|
K. Rajan , V. Ramalingam , M. Ganesan , S. Palanivel , B. Palaniappan, Automatic classification of Tamil documents using vector space model and artificial neural network, Expert Systems with Applications: An International Journal, v.36 n.8, p.10914-10918, October, 2009
|
|
|
|
|
|
Bin Zhang , Fei Wang , Ta-Hsin Li , Wen Jun Yin , Jin Dong, Classification by discriminative regularization, Proceedings of the 23rd national conference on Artificial intelligence, p.746-751, July 13-17, 2008, Chicago, Illinois
|
|
|
Sutanu Chakraborti , Rahman Mukras , Robert Lothian , Nirmalie Wiratunga , Stuart Watt , David Harper, Supervised latent semantic indexing using adaptive sprinkling, Proceedings of the 20th international joint conference on Artifical intelligence, p.1582-1587, January 06-12, 2007, Hyderabad, India
|
|
|
|
|
|
Giovanni Semeraro , Marco Degemmis , Pasquale Lops , Pierpaolo Basile, Combining learning and word sense disambiguation for intelligent user profiling, Proceedings of the 20th international joint conference on Artifical intelligence, p.2856-2861, January 06-12, 2007, Hyderabad, India
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
David Patterson , Niall Rooney , Vladimir Dobrynin , Mykola Galushka, Sophia: a novel approach for textual case-based reasoning, Proceedings of the 19th international joint conference on Artificial intelligence, p.15-20, July 30-August 05, 2005, Edinburgh, Scotland
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
F. Fdez-Riverola , E. L. Iglesias , F. Díaz , J. R. Méndez , J. M. Corchado, SpamHunting: An instance-based reasoning system for spam labelling and filtering, Decision Support Systems, v.43 n.3, p.722-736, April, 2007
|
|
|
|
|
|
|
|
|
|
|