ACM Home Page
Please provide us with feedback. Feedback
Propagation-vectors for trees (PVT): concise yet effective summaries for hierarchical data and trees
Full text PdfPdf (268 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval table of contents
Napa Valley, California, USA
SESSION: Potpourri table of contents
Pages 3-10  
Year of Publication: 2008
ISBN:978-1-60558-254-2
Authors
Venkata Snehith Cherukuri  Arizona State University, Tempe, AZ, USA
Kasim Selçuk Candan  Arizona State University, Tempe, AZ, USA
Sponsors
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 32,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458469.1458481
What is a DOI?

ABSTRACT

Summarization of hierarchical data and metadata is a fundamental operation in applications in many domains. In particular, similarity search of hierarchical data, such as XML, would benefit greatly from concise and indexable summaries. This is especially true in P2P scenarios, where the search needs to be done in a distributed fashion on multiple peers. This situation requires summaries which are small, yet effective in identifying potential peers that need to be further explored. In this paper, we propose a method, called propagation-vectors for trees (PVT) which constructs very concise and accurate summaries of hierarchical data, such as XML trees. We then show how to use this summary to perform similarity search on summarized data. The proposed summarization scheme relies on a label-propagation mechanism, which constructs an n-dimensional vector from a given tree with n unique data labels. Experimental results have shown that the constructed PVT summaries capture the structure of the input trees very accurately, the representations are highly concise, and that the search based on these summaries are faster than the existing approaches.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
P. Bille. Tree edit distance, alignment distance and inclusion. IT Univ. of Copenhagen TR-2003-23, 2003
 
2
D. Buttler. A short survey of document structure similarity algorithms. ICIT 2004.
 
3
 
4
R. Cilibrasi and P. Vitanyi. Clustering by compression. IEEE Transactions on Information Theory, 2005.
 
5
 
6
 
7
 
8
F. Gelgi, S.Vadrevu, H. Davulcu. Improving web data annotations with spreading activation. WISE 2005.
 
9
 
10
R. Goldman and J. Widom. Approximate dataguides. In Workshop on Query Processing for Semistructured Data and Non-Standard Data Formats, 1999.
 
11
 
12
IBM XML Generator. http://www.alphaworks.ibm.com/tech/xmlgenerator.
 
13
Hierarchical Clustering. http://www.cs.rpi.edu/~zaki/dmcourse/notes /chapter12.pdf
14
 
15
K. Kintigh. The promise and challenge of archaeological data integration. American Antiquity, 71(3):567--578, 2006.
 
16
 
17
S. Lu. A tree-to-tree distance and its application to cluster analysis. IEEE Trans. PAMI, 1:219--224, 1979.
 
18
 
19
A. Nierman and H. V. Jagadish. Evaluating structural similarity in XML documents. WebDB 2002.
20
21
 
22
 
23
R. Richardson and A. F. Smeaton. Using WordNet in a knowledge-based approach to information retrieval. Technical Report CA-0395, Dublin, Ireland, 1995.
 
24
R. Rada, H. Mili, E. Bicknell, and M. Blettner. Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man and Cybernetics, 19(1), 1989.
25
 
26
S. Selkow. The tree-to-tree editing problem. Information Processing Letters, 6(6):184--186, 1977.
 
27
28
 
29
 
30
 
31
 
32
K. Zhang, J. T. L. Wang, and D. Shasha. On the editing distance between undirected acyclic graphs and related problems. SoCPM 1995.

Collaborative Colleagues:
Venkata Snehith Cherukuri: colleagues
Kasim Selçuk Candan: colleagues