ACM Home Page
Please provide us with feedback. Feedback
Towards a semantics for XML markup
Full text PdfPdf (73 KB)
Source Document Engineering archive
Proceedings of the 2002 ACM symposium on Document engineering table of contents
McLean, Virginia, USA
SESSION: Document reuse and semantics table of contents
Pages: 119 - 126  
Year of Publication: 2002
ISBN:1-58113-594-7
Authors
Allen Renear  University of Illinois at Urbana-Champaign
David Dubin  University of Illinois at Urbana-Champaign
C. M. Sperberg-McQueen  MIT Laboratory for Computer Science
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
SIGMIS: ACM Special Interest Group on Management Information Systems
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 88,   Citation Count: 7
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/585058.585081
What is a DOI?

ABSTRACT

Although XML Document Type Definitions provide a mechanism for specifying, in machine-readable form, the syntax of an XML markup language, there is no comparable mechanism for specifying the semantics of an XML vocabulary. That is, there is no way to characterize the meaning of XML markup so that the facts and relationships represented by the occurrence of XML constructs can be explicitly, comprehensively, and mechanically identified. This has serious practical and theoretical consequences. On the positive side, XML constructs can be assigned arbitrary semantics and used in application areas not foreseen by the original designers. On the less positive side, both content developers and application engineers must rely upon prose documentation, or, worse, conjectures about the intention of the markup language designer --- a process that is time-consuming, error-prone, incomplete, and unverifiable, even when the language designer properly documents the language. In addition, the lack of a substantial body of research in markup semantics means that digital document processing is undertheorized as an engineering application area. Although there are some related projects underway (XML Schema, RDF, the Semantic Web) which provide relevant results, none of these projects directly and comprehensively address the core problems of XML markup semantics. This paper (i) summarizes the history of the concept of markup meaning, (ii) characterizes the specific problems that motivate the need for a formal semantics for XML and (iii) describes an ongoing research project --- the BECHAMEL Markup Semantics Project --- that is attempting to develop such a semantics.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
AAP. Author's Guide to Electronic Manuscript Preparation and Markup. Electronic Manuscript Series. Association of American Publishers, Washington, DC, 1986. Current version: ANSI/NISO/ISO 12083 - 1995 Electronic Manuscript Preparation and Markup, National Information Standards Organization, 1995.
 
2
Berners-Lee, T., Hendler, J., and Lassila, O. The semantic web. Scientific American 284, 5 (May 2001), 35--43.
 
3
4
 
5
DeRose, S. J., Durand, D., Mylonas, E., and Renear, A. H. What is text, really? Journal of Computing in Higher Education 1, 2 (1990), 3--26.
 
6
Dubin, D., Renear, A., Sperberg-McQueen, C. M., and Huitfeldt, C. A logic programming environment for document semantics and inference. Presented at ALLC/ACH, Tübingen, Germany, July 2002.
 
7
 
8
 
9
Fay, C. The document management alliance. Bulletin of the American Society for Information Science 25, 1 (October/November 1998), 20--24.
 
10
Goldfarb, C. F. Document Composition Facility: Generalized Markup Language (GML) Users Guide. IBM General Products Division, 1978. SH20-9160-0.
11
 
12
IBM Corp. Application Description, IBM System/360 Document Processing: System. White Plains, NY, 1967. Form No. H20-0315.
 
13
Ide, N. M., and Sperberg-McQueen, C. M. Toward a unified docuverse: Standardizing document markup and access without procrustean bargains. In Proceedings of the 60th Annual Meeting of the American Society for Information Science (Medford, NJ, 1997), C. Schwartz and M. Rorvig, Eds., Information Today, Inc., pp. 347--360.
 
14
ISO. ISO 8879-1986 (E). Information processing --- Text and Office Systems --- Standard Generalized Markup Language (SGML). International Organization for Standardization, Geneva, 1986.
 
15
ISO. ISO/IEC 10744:1997: Information processing -- Hypermedia/Time-based Structuring Language (HyTime), second~ed. International Organization for Standardization, Geneva, May 1997, appendix A.3 Architectural Form Definition Requirements.
 
16
ISO. ISO/IEC 13250: 2000 Information technology -- SGML Applications -- Topic Maps. International Organization for Standardization, Geneva, 2000.
 
17
 
18
 
19
Lesk, M. E. Typing Documents on UNIX and GCOS: The -ms Macros for Troff, 1977.
 
20
Mamrak, S. A., Barnes, J., Hong, H., Joseph, C., Kaelbling, M., Nicholas, C., O'Connell, C., and Share, M. Descriptive markup -- the best approach? Communications of the Association for Computing Machinery 31, 7 (1988), 810--811.
21
 
22
Ossanna, J. F. NROFF/TROFF user's manual. Tech. Rep. 54, Bell Laboratories, Murray Hill, NJ, October 1976.
 
23
Ramalho, J. C., and Henriques, P. R. Beyond DTDs: constraining data content. In Proceedings of SGML/XML Europe 98 (Paris, May 1998), GCA.
 
24
Raymond, D. R., and Tompa, F. W. Markup reconsidered. Technical Report 356, Department of Computer Science, The University of Western Ontario, 1993. Presented at the First International Workshop on the Principles of Document Processing, Washinton DC, October 21-23 1992; an earlier version was circulated privately as "Markup Considered Harmful" in the late 1980s.
 
25
 
26
Reid, B. K. Scribe Introductory User's Manual, first ed. Carnegie-Mellon University, Computer Science Department, Pittsburgh, PA, August 1978.
 
27
 
28
 
29
Renear, A. Raising the bar: Text encoding from a logical point of view. CLIP 2001: Computers, Literature, Philology, Gerhard-Mercator University, Duisburg, Germany, December 2001.
 
30
 
31
 
32
 
33
Shobowale, G. SGML, XML, and the document-centered approach to electronic medical records. Bulletin of the American Society for Information Science 25, 1 (October/November 1998), 7--10.
 
34
Simons, G. F. Using architectural forms to map TEI data into an object-oriented database. Computers and the Humanities 33, 1--2 (1999), 85--101. Originally delivered in 1997 at the TEI 10 conference in Providence, RI.
 
35
Sperberg-McQueen, C. M., Dubin, D., Huitfeldt, C., and Renear, A. Drawing inferences on the basis of markup. In Proceedings of Extreme Markup Languages 2002 (Montreal, Canada, August 2002), B. T. Usdin and S. R. Newcomb, Eds.
 
36
Sperberg-McQueen, C. M., Huitfeldt, C., and Renear, A. Meaning and interpretation of markup. Markup Languages: Theory and Practice 2, 3 (2000), 215--234.
 
37
Sperberg-McQueen, C. M., Huitfeldt, C., and Renear, A. Practical extraction of meaning from markup. Paper delivered at ACH/ALLC 2001, New York, 2001.
 
38
Sperberg-McQueen, C. M., Renear, A., Huitfeldt, C., and Dubin, D. Skeletons in the closet: Saying what markup means. Presented at ALLC/ACH, Tübingen, Germany, July 2002.
 
39
Sperberg-McQueen, M., and Burnard, L., Eds. Guidelines for Text Encoding and Interchange (TEI P3). ACH/ALLC/ACL Text Encoding Initiative, Chicago, Oxford, 1994.
 
40
Spring, M. B. The origin and use of copymarks in electronic publishing. Journal of Documentation 45, 2 (June 1989), 110--123.
 
41
 
42
United States Department of Defense. MIL-M-28001 Military Specification: Markup Requirements and Generic Style Specification for Electronic Printed Output and Exchange of Text, 1988.
 
43
Welty, C., and Ide, N. Using the right tools: Enhancing retrieval from marked-up documents. Computers and the Humanities 33, 1--2 (1999), 59--84. Originally delivered in 1997 at the TEI 10 conference in Providence, RI.

CITED BY  7

Collaborative Colleagues:
Allen Renear: colleagues
David Dubin: colleagues
C. M. Sperberg-McQueen: colleagues