ACM Home Page
Please provide us with feedback. Feedback
Measuring similarity between collection of values
Full text PdfPdf (265 KB)
Source Workshop On Web Information And Data Management archive
Proceedings of the 6th annual ACM international workshop on Web information and data management table of contents
Washington DC, USA
SESSION: XML and semistructured data querying table of contents
Pages: 56 - 63  
Year of Publication: 2004
ISBN:1-58113-978-0
Authors
Carina F. Dorneles  Universidade Federal do Rio Grande do Sul(UFRGS)
Carlos A. Heuser  Universidade Federal do Rio Grande do Sul(UFRGS)
Andrei E. N. Lima  Universidade Federal do Rio Grande do Sul(UFRGS)
Altigran Soares da Silva  Universidade Federal do Amazonas
Edleno Silva de Moura  Universidade Federal do Amazonas
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 41,   Citation Count: 6
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1031453.1031465
What is a DOI?

ABSTRACT

In this paper, we propose a set of similarity metrics for manipulating collections of values occuring in XML documents.

Following the data model presented in TAX algebra, we treat an XML element as a labeled ordered rooted tree. Consider that XML nodes can be either atomic, i.e, they may contain single values such as short character strings, date, etc, or complex, i.e., nested structures that contain other nodes, we propose two types of similarity metrics: MAVs, for atomic nodes and MCVs, for complex nodes. In the first case, we suggest the use of several application domain dependent metrics. In the second case, we define metrics for complex values that are structure dependent, and can be distinctly applied for it and collections of values. We also present experiments showing the effectiveness of our method.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
Carmel, D., Efrati, N., Landau, G. M., Maarek, Y., and Mass, Y. An extension of the vector space model for querying XML documents via XML fragments. In Workshop On XML and Information Retrieval, SIGIR (2002).
4
5
6
 
7
Cohen, W., Ravikumar, P., and Fienberg, S. A comparison of string metrics for matching names and records. In KDD-2003 Workshop on Data Cleaning and Object Consolidation (2003).
 
8
 
9
Dorneles, C. F., Lima, A. E. N., Heuser, C. A., da Silva, A., and de Moura, E. S. Acessing xml data by allowing imprecise query arguments. Tech. Rep. RP-342, UFRGS, 2004.
 
10
 
11
Fuhr, and Grossjohann. XIRQL: an extension of XQL for information retrieval. In ACM SIGIR Workshop On XML and Information Retrieval (2000).
 
12
13
 
14
Jin, L., Li, C., and Mehrotra, S. Efficient similarity string joins in large data sets. In VLDB (April 2002).
15
16
 
17
18
 
19
Nadvorny, C. F., and Heuser, C. A. Twisting the metric space to achieve better metric trees. In Brazilian Symp. on Database, SBBD (2004).
 
20
21

CITED BY  6

Collaborative Colleagues:
Carina F. Dorneles: colleagues
Carlos A. Heuser: colleagues
Andrei E. N. Lima: colleagues
Altigran Soares da Silva: colleagues
Edleno Silva de Moura: colleagues