|
ABSTRACT
Among the various proposals answering the shortcomings of Document Type Definitions (DTDs), XML Schema is the most widely used. Although DTDs and XML Schema Definitions (XSDs) differ syntactically, they are still quite related on an abstract level. Indeed, freed from all syntactic sugar, XML Schemas can be seen as an extension of DTDs with a restricted form of specialization. In the present paper, we inspect a number of DTDs and XSDs harvested from the web and try to answer the following questions: (1) which of the extra features/expressiveness of XML Schema not allowed by DTDs are effectively used in practice; and, (2) how sophisticated are the structural properties (i.e. the nature of regular expressions) of the two formalisms. It turns out that at present real-world XSDs only sparingly use the new features introduced by XML Schema: on a structural level the vast majority of them can already be defined by DTDs. Further, we introduce a class of simple regular expressions and obtain that a surprisingly high fraction of the content models belong to this class. The latter result sheds light on the justification of simplifying assumptions that sometimes have to be made in XML research.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
P. Biron and A. Malhotra. XML Schema part 2: datatypes. W3C, May 2001, http://www.w3.org/TR/xmlschema-2/
|
| |
2
|
T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, and F. Yergeau. Extensible Markup Language (XML) 1.0. W3C, 3 edition, February 2004, http://www.w3.org/TR/2004/REC-xml-20040204/
|
| |
3
|
|
| |
4
|
A. Brüggemann-Klein, M. Murata, and D. Wood. Regular tree languages over non-ranked alphabets (draft 1). Unpublished manuscript, 1998.
|
| |
5
|
B. Choi. What are real DTDs like? In Proceedings WebDB 2002, pages 43--48, 2002.
|
| |
6
|
J. Clark. TREX - Tree Regular Expressions for XML: language specification, February 2001, http://www.thaiopensource.com/trex/spec.html
|
| |
7
|
J. Clark and M. Murata. RELAX NG Specification. OASIS, December 2001, http://www.oasis-open.org/committees/relax-ng/spec-20011203.html
|
| |
8
|
R. Cover. The cover pages, 2003, http://xml.coverpages.org/
|
| |
9
|
D. Fallside. XML Schema part 0: primer. W3C, May 2001, http://www.w3.org/TR/xmlschema-0/
|
| |
10
|
IBM corp. XML Schema Quality Checker, 2003. http://www.alphaworks.ibm.com/tech/xmlsqc
|
| |
11
|
A. Møller. Document Structure Description 2.0. BRICS, 2003, http://www.brics.dk/DSD/dsd2.pdf
|
| |
12
|
M. Murata. Document description and processing languages - regular language description for XML (RELAX): Part 1: RELAX core. Technical report, ISO/IEC, May 2001.
|
 |
13
|
|
| |
14
|
W. Martens, F. Neven and T. Schwentick Complexity of Decision Problems for Simple Regular Expressions. Submitted.
|
 |
15
|
|
| |
16
|
|
| |
17
|
H. Thompson, D. Beech, M. Maloney, and N. Mendelsohn. XML Schema part 1: structures. W3C, May 2001, http://www.w3.org/TR/xmlschema-1/
|
| |
18
|
E. van der Vliet. XML Schema. O'Reilly, Cambridge, 2002.
|
CITED BY 20
|
|
Geert Jan Bex , Wim Martens , Frank Neven , Thomas Schwentick, Expressiveness of XSDs: from practice to theory, there and back again, Proceedings of the 14th international conference on World Wide Web, May 10-14, 2005, Chiba, Japan
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Geert Jan Bex , Wouter Gelade , Wim Martens , Frank Neven, Simplifying XML schema: effortless handling of nondeterministic regular expressions, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|