ACM Home Page
Please provide us with feedback. Feedback
HCX: an efficient hybrid clustering approach for XML documents
Full text PdfPdf (469 KB)
Source
Document Engineering archive
Proceedings of the 9th ACM symposium on Document engineering table of contents
Munich, Germany
SESSION: Document analysis (II) table of contents
Pages 94-97  
Year of Publication: 2009
ISBN:978-1-60558-575-8
Authors
Sangeetha Kutty  Queensland University of Technology, Brisbane, Queensland, Australia
Richi Nayak  Queensland University of Technology, Brisbane, Queensland, Australia
Yuefeng Li  Queensland University of Technology, Brisbane, Queensland, Australia
Sponsors
SIGDOC: ACM Special Interest Group for Design of Communications
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 16,   Citation Count: 0
Additional Information:

abstract   references   index terms  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1600193.1600213
What is a DOI?

ABSTRACT

This paper proposes a novel Hybrid Clustering approach for XML documents (HCX) that first determines the structural similarity in the form of frequent subtrees and then uses these frequent subtrees to represent the constrained content of the XML documents in order to determine the content similarity. The empirical analysis reveals that the proposed method is scalable and accurate.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Aggarwal, C. C., Ta, N, et al. 2007. Xproj: A Framework for Projected Structural Clustering of XML documents. Proceedings of the 13th ACM SIGKDD International Conference on Knowledge discovery and data mining (San Jose, California, USA). 46--55.
 
2
Denoyer, L. and Gallinari, P. 2007. Report on the XML mining track at INEX 2005 and INEX 2006 Categorization and Clustering of XML Documents. ACM SIGIR Forum. 41(1): 79--90.
 
3
Denoyer, L. and Gallinari, P. 2008. Report on the XML mining track at INEX 2007 Categorization and Clustering of XML Documents. ACM SIGIR Forum. 42(1):22--28.
 
4
Erik, W. and Robert, J. G. 2008. XML Fever. Commun. ACM 51(7):40--46.
 
5
Hagenbuchner, M., Tsoi, A. C. et al. 2008. Efficient Clustering of Structured Documents using Graph Self-Organizing Maps. Focussed Access to XML Documents. 4862/2008:207--221.
 
6
Karypis, G. 2002 CLUTO - A Clustering Toolkit. Technical Report. University of Minnesota.
 
7
Kurgan, L., Swiercz, W. et al. 2002. Semantic Mapping of XML Tags using Inductive Machine Learning. Proceedings of the 2002 International Conference on Machine Learning and Applications (Las Vegas, Nevada, USA). 99--109.
 
8
Kutty, S., Nayak, R. et al. 2007. Clustering XML documents using Closed Frequent Subtrees - A Structural Similarity Approach. Focused Access to XML Documents. 4862/2008:183--194.
 
9
Kutty, S., Nayak, R. et al. 2007. PCITMiner - Prefix-based Closed Induced Tree Miner for Finding Closed Induced Frequent Subtrees. Proceedings of the Sixth Australasian Data Mining Conference (Gold Coast, Australia). 151--160.
 
10
Tran, T. and Nayak, R. 2008. Document Clustering using Incremental and Pairwise Approaches. Focused Access to XML Documents. 4862/2008: 222--232.
 
11
Vercoustre, A.-M., Fegas, M. et al. 2006. A Flexible Structured-Based Representation for XML Document Mining. Advances in XML Information Retrieval and Evaluation. 3977/2006:443--457.
 
12
Yao, J. and Zarida, M. 2007. Rare Pattens to Improve Path-based Clustering of Wikipedia Articles. Pre-proceedings of the Sixth Workshop of Initiative for the Evaluation of XML Retrieval (Dagstuhl, Germany). 224--231.