ACM Home Page
Please provide us with feedback. Feedback
TOSS: an extension of TAX with ontologies and similarity queries
Full text PdfPdf (401 KB)
Source International Conference on Management of Data archive
Proceedings of the 2004 ACM SIGMOD international conference on Management of data table of contents
Paris, France
SESSION: Research sessions: schema discovery table of contents
Pages: 719 - 730  
Year of Publication: 2004
ISBN:1-58113-859-8
Authors
Edward Hung  University of Maryland, College Park, MD
Yu Deng  University of Maryland, College Park, MD
V. S. Subrahmanian  University of Maryland, College Park, MD
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 35,   Citation Count: 5
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1007568.1007649
What is a DOI?

ABSTRACT

TAX is perhaps the best known extension of the relational algebra to handle queries to XML databases. One problem with TAX (as with many existing relational DBMSs) is that the semantics of terms in a TAX DB are not taken into account when answering queries. Thus, even though TAX answers queries with 100% precision, the recall of TAX is relatively low. Our TOSS system improves the recall of TAX via the concept of a similarity enhanced ontology (SEO). Intuitively, an ontology is a set of graphs describing relationships (such as isa, partof, etc.) between terms in a DB. An SEO also evaluates how similarities between terms (e.g. "J. Ullman", "Jeff Ullman", and "Jeffrey Ullman") affect ontologies. Finally, we show how the algebra proposed in TAX can be extended to take SEOs into account. The result is a system that provides a much higher answer quality than TAX does alone (quality is defined as the square root of the product of precision and recall). We experimentally evaluate the TOSS system on the DBLP and SIGMOD bibliographic databases and show that TOSS has acceptable performance.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
P. Bonatti, Y. Deng, and V. S. Subrahmanian. An ontology-extended relational algebra. In Proceedings of the IEEE International Conference on Information Reuse and Integration (IEEE IRI 2003), 2003.
 
3
 
4
D. Calvanese, G. D. Giacomo, and M. Lenzerini. A framework for ontology integration. In Proc. of the First Semantic Web Working Symposium, pages 303--316, 2001.
 
5
W. W. Cohen, P. Ravikumar, and S. E. Fienberg. A comparison of string metrics for matching names and records. In Proc. of the First Workshop on Data Cleaning, Record Linkage, and Object Consolidation, 2003.
 
6
DBLP XML records. Available at http://dblp.uni-trier.de/xml/, Nov 2003.
 
7
G. A. Miller et. al. WordNet - a lexical database for english. Cognitive Science Laboratory, Princeton University. Available at http://www.cogsci.princeton.edu/~wn/w3wn.html, 2000.
 
8
 
9
M. A. Jaro. Probabilistic linkage of large public health data files. Statistics in Medicine, 14:491--498, 1995.
 
10
 
11
 
12
A. Monge and C. Elkan. The field-matching problem: algorithm and applications. In Proc. of the Second International Conference on Knowledge Discovery and Data Mining, 1996.
 
13
SIGMOD Record in XML. Available at http://www.acm.org/sigmod/record/xml/, Nov 2002.
 
14
 
15
16
 
17
G. Wiederhold. Interoperation, mediation and ontologies. In International Symp. on Fifth Generation Computer Systems, Workshop on Heterogeneous Cooperative Knowledge Bases, ICOT, pages 33--48, 1994.
 
18
Apache Xindice XML database. Available at http://xml.apache.org/xindice/.


Collaborative Colleagues:
Edward Hung: colleagues
Yu Deng: colleagues
V. S. Subrahmanian: colleagues