ACM Home Page
Please provide us with feedback. Feedback
Similarity evaluation on tree-structured data
Full text PdfPdf (501 KB)
Source International Conference on Management of Data archive
Proceedings of the 2005 ACM SIGMOD international conference on Management of data table of contents
Baltimore, Maryland
SESSION: Research papers: graph and tree-structured data table of contents
Pages: 754 - 765  
Year of Publication: 2005
ISBN:1-59593-060-4
Authors
Rui Yang  National University of Singapore
Panos Kalnis  National University of Singapore
Anthony K. H. Tung  National University of Singapore
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 43,   Downloads (12 Months): 177,   Citation Count: 10
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1066157.1066243
What is a DOI?

ABSTRACT

Tree-structured data are becoming ubiquitous nowadays and manipulating them based on similarity is essential for many applications. The generally accepted similarity measure for trees is the edit distance. Although similarity search has been extensively studied, searching for similar trees is still an open problem due to the high complexity of computing the tree edit distance. In this paper, we propose to transform tree-structured data into an approximate numerical multidimensional vector which encodes the original structure information. We prove that the L1 distance of the corresponding vectors, whose computational complexity is O(|T1| + |T2|), forms a lower bound for the edit distance between trees. Based on the theoretical analysis, we describe a novel algorithm which embeds the proposed distance into a filter-and-refine framework to process similarity search on tree-structured data. The experimental results show that our algorithm reduces dramatically the distance computation cost. Our method is especially suitable for accelerating similarity query processing on large trees in massive datasets.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
Edgar Chavez and Gonzalo Navarro. Towards measuring the searching complexity of metric sapces. In Proc. of the Mexican Computing Meeting, pages 969--978, 2001.
4
 
5
6
 
7
K. Kailing, H. P. Kriegel, S. Schönauer, and T. Seidl. Efficient similarity search for hierarchical data in large databases. In EDBT, pages 676--693, March. 2004.
 
8
 
9
 
10
Nikos Mamoulis, David W. Cheung, and Wang Lian. Similarity search in sets and categorical data using the signature tree. In ICDE, pages 75--86, 2003.
 
11
 
12
13
 
14
Stanley M. Selkow. The tree-to-tree editing problem. Information Processing Letters, 6:184--186, December 1977.
15
 
16
Dennis Shasha and Kaizhong Zhang. Approximate Tree Pattern Matching. Oxford University, 1997.
 
17
 
18
 
19
 
20
21
 
22
Kaizhong Zhang. Algorithms for the constrained editing distance between ordered labelled trees and related problems. In Pattern Regonition, volume 28, pages 463--474, 1995.
 
23

CITED BY  10
Collaborative Colleagues:
Rui Yang: colleagues
Panos Kalnis: colleagues
Anthony K. H. Tung: colleagues