ACM Home Page
Please provide us with feedback. Feedback
Expressiveness and complexity of XML Schema
Full text PdfPdf (558 KB)
Source ACM Transactions on Database Systems (TODS) archive
Volume 31 ,  Issue 3  (September 2006) table of contents
Pages: 770 - 813  
Year of Publication: 2006
ISSN:0362-5915
Authors
Wim Martens  Hasselt University and Transnational University of Limburg, School for Information Technology, Diepenbeek, Belgium
Frank Neven  Hasselt University and Transnational University of Limburg, School for Information Technology, Diepenbeek, Belgium
Thomas Schwentick  University of Dortmund, Dortmund, Germany
Geert Jan Bex  Hasselt University and Transnational University of Limburg, School for Information Technology, Diepenbeek, Belgium
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 24,   Downloads (12 Months): 213,   Citation Count: 12
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1166074.1166076
What is a DOI?

ABSTRACT

The common abstraction of XML Schema by unranked regular tree languages is not entirely accurate. To shed some light on the actual expressive power of XML Schema, intuitive semantical characterizations of the Element Declarations Consistent (EDC) rule are provided. In particular, it is obtained that schemas satisfying EDC can only reason about regular properties of ancestors of nodes. Hence, with respect to expressive power, XML Schema is closer to DTDs than to tree automata. These theoretical results are complemented with an investigation of the XML Schema Definitions (XSDs) occurring in practice, revealing that the extra expressiveness of XSDs over DTDs is only used to a very limited extent. As this might be due to the complexity of the XML Schema specification and the difficulty of understanding the effect of constraints on typing and validation of schemas, a simpler formalism equivalent to XSDs is proposed. It is based on contextual patterns rather than on recursive types and it might serve as a light-weight front end for XML Schema. Next, the effect of EDC on the way XML documents can be typed is discussed. It is argued that a cleaner, more robust, larger but equally feasible class is obtained by replacing EDC with the notion of 1-pass preorder typing (1PPT): schemas that allow one to determine the type of an element of a streaming document when its opening tag is met. This notion can be defined in terms of grammars with restrained competition regular expressions and there is again an equivalent syntactical formalism based on contextual patterns. Finally, algorithms for recognition, simplification, and inclusion of schemas for the various classes are given.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
4
 
5
Bray, T., Paoli, J., Sperberg-McQueen, C., Maler, E., and Yergeau, F. 2004. Extensible Markup Language (XML). Tech. rep. World Wide Web Consortium. Go online to http://www.w3.org/TR/REC-xml/.
 
6
Brüggemann-Klein, A., Murata, M., and Wood, D. 2001. Regular tree and regular hedge languages over unranked alphabets: Version 1, April 3, 2001. Tech. rep. HKUST-TCSC-2001-0. The Hongkong University of Science and Technology, Hong Kong.
 
7
 
8
Buck, L., Goldfarb, C., and Prescod, P. 2000. Datatypes for DTDs (DT4DTD) 1.0. Tech. rep. World Wide Web Consortium. Go online to http://www.w3.org/TR/dt4dtd/.
 
9
Clark, J. 2002. Multi-format schema converter based on RELAX NG. Go online to http://www.thaiopensource.com/relaxng/trang.html.
 
10
Clark, J. and Murata, M. 2001. Relax NG specification. Go online to http://www.relaxng.org/spec-20011203.html.
11
 
12
Cover, R. 2005. The Cover Pages. Go online to http://xml.coverpages.org/.
 
13
Cristau, J., Löding, C., and Thomas, W. 2005. Deterministic automata on unranked trees. In Proceedings of the 15th International Symposium on Fundamentals of Computation Theory (FCT 2005). Springer, Berlin, Germany, 68--79.
 
14
DuCharme, B. 2002. Filling in the DTD gaps with schematron. O'Reilly xml.com, Sebastopol, CA.
 
15
Fernandez, M., Malhotra, A., Marsh, J., Nagy, M., and Walsh, N. 2005. XQuery 1.0 and XPath 2.0 data model. Tech. rep. World Wide Web Consortium. Go online to http://www.w3.org/TR/xpath-datamodel/.
 
16
Fiorello, D., Gessa, N., Marinelli, P., and Vitali, F. 2004. DTD++ 2.0: Adding support for co-constraints. In Proceedings of the Extreme Markup Languages 2004 Conference (Extreme Markup Languages 2004).
 
17
Fokoué, A. and Schloss, B. 2004. XML Schema quality checker. Go online to http://www.alphaworks.ibm.com/tech/xmlsqc.
 
18
19
 
20
 
21
Jelliffe, R. 2001. The current state of the art of schema languages for XML. Presentation at XML Asia Pacific Conference, Sidney, Australia.
 
22
Jelliffe, R. 2005. Schematron. Go online to http://xml.ascc.net/schematron/.
 
23
 
24
Koch, C. and Scherzinger, S. 2003. Attribute grammars for scalable query processing on XML streams. In Proceedings of the 9th International Workshop on Database Programming Languages (DBPL 2003). Springer, Berlin, Germany, 233--256.
25
 
26
Mani, M. 2001. Keeping chess alive---do we need 1-unambiguous content models? Presentation at Extreme Markup Languages Conference 2001.
27
 
28
 
29
Martens, W., Neven, F., and Schwentick, T. 2004. Complexity of decision problems for simple regular expressions. In Proceedings of the 29th International Symposium on Mathematical Foundations of Computer Science (MFCS 2004). Springer, Berlin, Germany, 889--900.
 
30
Martens, W., Neven, F., and Schwentick, T. 2005. Which XML schemas admit 1-pass preorder typing? In Proceedings of the 10th International Conference on Database Theory (ICDT 2005). Springer, Berlin, Germany, 68--82.
 
31
Martens, W. and Niehren, J. 2005. Minimizing tree automata for unranked trees. In Prodeedings of the 10th International Symposium on Database Programming Languages (DBPL 2005). Springer, Berlin, Germany, 232--246.
 
32
Murata, M., Lee, D., and Mani, M. 2001. Taxonomy of XML schema languages using formal language theory. In Proceedings of the Extreme Markup Languages 2001 Conference (Extreme Markup Languages 2001), Montreal, P. Q., Canada.
33
 
34
35
36
 
37
38
 
39
40
 
41
Sperberg-McQueen, C. 2003. XML Schema 1.0: A language for document grammars. Presentation at the Joint International Conference of the Association for Computers and the Humanities Association for Literary and Linguistic Computing (ACH/ALLC 2003). Go online to http://www.w3.org/People/cmsmcq/2003/achallc/achallc2003.html.
 
42
Sperberg-McQueen, C. and Thompson, H. 2005. XML Schema. Go online to http://www.w3.org/XML/Schema.
43
 
44
 
45
Thompson, H., Beech, D., Maloney, M., and Mendelsohn, N. 2004. XML Schema Part 1: Structures. Tech. rep. World Wide Web Consortium. Go online to http://www.w3.org/TR/xmlschema-1/.
 
46
 
47
Vitali, F., Amorosi, N., and Gessa, N. 2003. Datatype- and namespace-aware DTDs: A minimal extension. In Proceedings of the Extreme Markup Languages 2003 Conference (Extreme Markup Languages 2003).

CITED BY  12

Collaborative Colleagues:
Wim Martens: colleagues
Frank Neven: colleagues
Thomas Schwentick: colleagues
Geert Jan Bex: colleagues