|
ABSTRACT
XML documents have recently become ubiquitous because of their varied applicability in a number of applications. Classification is an important problem in the data mining domain, but current classification methods for XML documents use IR-based methods in which each document is treated as a bag of words. Such techniques ignore a significant amount of information hidden inside the documents. In this paper we discuss the problem of rule based classification of XML data by using frequent discriminatory substructures within XML documents. Such a technique is more capable of finding the classification characteristics of documents. In addition, the technique can also be extended to cost sensitive classification. We show the effectiveness of the method with respect to other classifiers. We note that the methodology discussed in this paper is applicable to any kind of semi-structured data.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
C. Aggarwal, S. Gates, P. Yu. On the merits of using supervised clustering to build categorization systems. SIGKDD, 1999.
|
| |
3
|
|
| |
4
|
K. Alsabti, S. Ranka, V. Singh. CLOUDS: A Decision Tree Classifier for Large Datasets. SIGKDD, 1998.
|
| |
5
|
R. Andersen et al. Professional XML. Wrox Press Ltd, 2002.
|
| |
6
|
T. Asai, et al. Efficient substructure discovery from large semi-structured data. 2nd SIAM Int'l Conference on Data Mining, 2002.
|
| |
7
|
W. W. Cohen. Fast Effective Rule Induction. Int'l Conf. Machine Learning, 1995.
|
 |
8
|
|
| |
9
|
|
| |
10
|
R. Duda, P. Hart. Pattern Classification and Scene Analysis, Wiley, New York, 1973.
|
 |
11
|
Johannes Gehrke , Venkatesh Ganti , Raghu Ramakrishnan , Wei-Yin Loh, BOAT—optimistic decision tree construction, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.169-180, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
B. Liu, W. Hsu, Y. Ma. Integrating Classification and Association Rule Mining. SIGKDD, 1998.
|
| |
17
|
|
| |
18
|
|
| |
19
|
|
 |
20
|
|
 |
21
|
|
CITED BY 30
|
|
|
|
|
Qiankun Zhao , Sourav S. Bhowmick , Mukesh Mohania , Yahiko Kambayashi, Discovering frequently changing structures from historical structural deltas of unordered XML, Proceedings of the thirteenth ACM international conference on Information and knowledge management, November 08-13, 2004, Washington, D.C., USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Kosuke Hashimoto , Kiyoko F. Aoki-Kinoshita , Nobuhisa Ueda , Minoru Kanehisa , Hiroshi Mamitsuka, A new efficient probabilistic model for mining labeled ordered trees, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|
|
|
|
|
|
|
|
Adriano Veloso , Wagner Meira, Jr. , Marco Cristo , Marcos Gonçalves , Mohammed Zaki, Multi-evidence, multi-criteria, lazy associative document classification, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Charu C. Aggarwal , Na Ta , Jianyong Wang , Jianhua Feng , Mohammed Zaki, Xproj: a framework for projected structural clustering of xml documents, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
Ruoming Jin , Muad Abu-Ata , Yang Xiang , Ning Ruan, Effective and efficient itemset pattern summarization: regression-based approaches, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
|
|
|
|
|
|
Guihua Sun , Gao Cong , Xiaohua Liu , Chin-Yew Lin , Ming Zhou, Mining sequential patterns and tree patterns to detect erroneous sentences, Proceedings of the 22nd national conference on Artificial intelligence, p.925-930, July 22-26, 2007, Vancouver, British Columbia, Canada
|
|
|
Son Doan , Ai Kawazoe , Nigel Collier, The role of roles in classifying annotated biomedical text, Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, June 29-29, 2007, Prague, Czech Republic
|
|
|
|
|