ACM Home Page
Please provide us with feedback. Feedback
Xproj: a framework for projected structural clustering of xml documents
Full text PdfPdf (1.05 MB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
San Jose, California, USA
SESSION: Research track papers table of contents
Pages: 46 - 55  
Year of Publication: 2007
ISBN:978-1-59593-609-7
Authors
Charu C. Aggarwal  IBM
Na Ta  Tsinghua University
Jianyong Wang  Tsinghua University
Jianhua Feng  Tsinghua University
Mohammed Zaki  Rensselear Polytechnic Institute
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 17,   Downloads (12 Months): 95,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1281192.1281201
What is a DOI?

ABSTRACT

XML has become a popular method of data representation both on the web and in databases in recent years. One of the reasons for the popularity of XML has been its ability to encode structural information about data records. However, this structural characteristic of data sets also makes it a challenging problem for a variety of data mining problems. One such problem is that of clustering, in which the structural aspects of the data result in a high implicit dimensionality of the data representation. As a result, it becomes more difficult to cluster the data in a meaningful way. In this paper, we propose an effective clustering algorithm for XML data which uses substructures of the documents in order to gain insights about the important underlying structures. We propose new ways of using multiple sub-structuralinformation in XML documents to evaluate the quality of intermediate cluster solutions, and guide the algorithms to a final solution which reflects the true structural behavior in individual partitions. We test the algorithm on a variety of real and synthetic data sets.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Satamoto, S. Arikawa. Efficient substructure discovery from large semi-structured data. ACM SIAM International Conference on Data Mining, 2002.
 
4
 
5
T. Dalamagas, T. Cheng, K. Winkel, T. Sellis. Clustering XML Documents Using Structural Summaries. Information Systems, Elsevier, January 2005. Also appeared in EDBT 2004 Workshops on Current Trends in Database Technology, 2004.
 
6
 
7
8
 
9
 
10
 
11
 
12
 
13
 
14
15
16
17
18


Collaborative Colleagues:
Charu C. Aggarwal: colleagues
Na Ta: colleagues
Jianyong Wang: colleagues
Jianhua Feng: colleagues
Mohammed Zaki: colleagues