|
ABSTRACT
Tree-structured data are becoming ubiquitous nowadays and manipulating them based on similarity is essential for many applications. The generally accepted similarity measure for trees is the edit distance. Although similarity search has been extensively studied, searching for similar trees is still an open problem due to the high complexity of computing the tree edit distance. In this paper, we propose to transform tree-structured data into an approximate numerical multidimensional vector which encodes the original structure information. We prove that the L1 distance of the corresponding vectors, whose computational complexity is O(|T1| + |T2|), forms a lower bound for the edit distance between trees. Based on the theoretical analysis, we describe a novel algorithm which embeds the proposed distance into a filter-and-refine framework to process similarity search on tree-structured data. The experimental results show that our algorithm reduces dramatically the distance computation cost. Our method is especially suitable for accelerating similarity query processing on large trees in massive datasets.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Charu C. Aggarwal , Joel L. Wolf , Philip S. Yu, A new method for similarity indexing of market basket data, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.407-418, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
 |
2
|
|
| |
3
|
Edgar Chavez and Gonzalo Navarro. Towards measuring the searching complexity of metric sapces. In Proc. of the Mexican Computing Meeting, pages 969--978, 2001.
|
 |
4
|
|
| |
5
|
Luis Gravano , Panagiotis G. Ipeirotis , H. V. Jagadish , Nick Koudas , S. Muthukrishnan , Divesh Srivastava, Approximate String Joins in a Database (Almost) for Free, Proceedings of the 27th International Conference on Very Large Data Bases, p.491-500, September 11-14, 2001
|
 |
6
|
|
| |
7
|
K. Kailing, H. P. Kriegel, S. Schönauer, and T. Seidl. Efficient similarity search for hierarchical data in large databases. In EDBT, pages 676--693, March. 2004.
|
| |
8
|
|
| |
9
|
|
| |
10
|
Nikos Mamoulis, David W. Cheung, and Wang Lian. Similarity search in sets and categorical data using the signature tree. In ICDE, pages 75--86, 2003.
|
| |
11
|
|
| |
12
|
|
 |
13
|
|
| |
14
|
Stanley M. Selkow. The tree-to-tree editing problem. Information Processing Letters, 6:184--186, December 1977.
|
 |
15
|
Sudipto Guha , H. V. Jagadish , Nick Koudas , Divesh Srivastava , Ting Yu, Approximate XML joins, Proceedings of the 2002 ACM SIGMOD international conference on Management of data, June 03-06, 2002, Madison, Wisconsin
[doi> 10.1145/564691.564725]
|
| |
16
|
Dennis Shasha and Kaizhong Zhang. Approximate Tree Pattern Matching. Oxford University, 1997.
|
| |
17
|
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
 |
21
|
|
| |
22
|
Kaizhong Zhang. Algorithms for the constrained editing distance between ordered labelled trees and related problems. In Pattern Regonition, volume 28, pages 463--474, 1995.
|
| |
23
|
|
CITED BY 10
|
|
Liping Wang , Qing Li , Na Li , Guozhu Dong , Yu Yang, Substructure similarity measurement in chinese recipes, Proceeding of the 17th international conference on World Wide Web, April 21-25, 2008, Beijing, China
|
|
|
|
|
|
|
|
|
|
|
|
Jianxin Li , Jixue Liu , Chengfei Liu , Guoren Wang , Jeffrey Xu Yu , Chi Yangt, Computing structural similarity of source XML schemas against domain XML schema, Proceedings of the nineteenth conference on Australasian database, December 03-04, 2007, Gold Coast, Australia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|