ACM Home Page
Please provide us with feedback. Feedback
StatiX: making XML count
Full text PdfPdf (1.13 MB)
Source International Conference on Management of Data archive
Proceedings of the 2002 ACM SIGMOD international conference on Management of data table of contents
Madison, Wisconsin
SESSION: Research sessions: XML I table of contents
Pages: 181 - 191  
Year of Publication: 2002
ISBN:1-58113-497-5
Authors
Juliana Freire  Bell Labs
Jayant R. Haritsa  lndian Institute of Science
Maya Ramanath  lndian Institute of Science
Prasan Roy  Bell Labs
Jérôme Siméon  Bell Labs
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 26,   Citation Count: 29
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/564691.564713
What is a DOI?

ABSTRACT

The availability of summary data for XML documents has many applications, from providing users with quick feedback about their queries, to cost-based storage design and query optimization. StatiX is a novel XML Schema-aware statistics framework that exploits the structure derived by regular expressions (which define elements in an XML Schema) to pinpoint places in the schema that are likely sources of structural skew. As we discuss below, this information can be used to build concise, yet accurate, statistical summaries for XML data. StatiX leverages standard XML technology for gathering statistics, notably XML Schema validators, and it uses histograms to summarize both the structure and values in an XML document. In this paper we describe the StatiX system. We develop algorithms that decompose schemas to obtain statistics at different granularities and discuss how statistics can be gathered as documents are validated. We also present an experimental evaluation which demonstrates the accuracy and scalability of our approach and show an application of these statistics to cost-based XML storage design.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
 
4
 
5
D. Chambelin, J. Clark, D. Florescu, Jonathan Robie, J. Siméon, and M. Stefanescu. XQuery 1.0: An XML query language. W3C Working Draft, June 2001.
 
6
7
 
8
P. Fankhauser, M. Fernandez, A. Malhotra, M. Rys, J. Siméon, and P. Wadler. The XML query algebra, February 2001. http://www.w3.org/TR/2001/WD-query-algebra-20010215.
 
9
Galax system, October 2001. http://db.bell-labs.com/galax/.
 
10
 
11
Internet Movie Database. http://www.imdb.com.
12
 
13
 
14
 
15
XML query language (xql). http://www.oasis-open.org, 2001.
16
17
 
18
19
 
20
21
 
22
 
23
H. Thompson, D. Beech, M. Maloney, and N. Mendelsohn. XML Schema Part 1: Structures. W3C Working Draft, February 2000.
 
24
 
25
Xerces java parser 1.4.3. http://xml.apache.org/xerces-j/.
 
26
Xmark. http://monetdb.cwi.nl/xml.

CITED BY  29

Collaborative Colleagues:
Juliana Freire: colleagues
Jayant R. Haritsa: colleagues
Maya Ramanath: colleagues
Prasan Roy: colleagues
Jérôme Siméon: colleagues