|
ABSTRACT
Although XML Document Type Definitions provide a mechanism for specifying, in machine-readable form, the syntax of an XML markup language, there is no comparable mechanism for specifying the semantics of an XML vocabulary. That is, there is no way to characterize the meaning of XML markup so that the facts and relationships represented by the occurrence of XML constructs can be explicitly, comprehensively, and mechanically identified. This has serious practical and theoretical consequences. On the positive side, XML constructs can be assigned arbitrary semantics and used in application areas not foreseen by the original designers. On the less positive side, both content developers and application engineers must rely upon prose documentation, or, worse, conjectures about the intention of the markup language designer --- a process that is time-consuming, error-prone, incomplete, and unverifiable, even when the language designer properly documents the language. In addition, the lack of a substantial body of research in markup semantics means that digital document processing is undertheorized as an engineering application area. Although there are some related projects underway (XML Schema, RDF, the Semantic Web) which provide relevant results, none of these projects directly and comprehensively address the core problems of XML markup semantics. This paper (i) summarizes the history of the concept of markup meaning, (ii) characterizes the specific problems that motivate the need for a formal semantics for XML and (iii) describes an ongoing research project --- the BECHAMEL Markup Semantics Project --- that is attempting to develop such a semantics.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
AAP. Author's Guide to Electronic Manuscript Preparation and Markup. Electronic Manuscript Series. Association of American Publishers, Washington, DC, 1986. Current version: ANSI/NISO/ISO 12083 - 1995 Electronic Manuscript Preparation and Markup, National Information Standards Organization, 1995.
|
| |
2
|
Berners-Lee, T., Hendler, J., and Lassila, O. The semantic web. Scientific American 284, 5 (May 2001), 35--43.
|
| |
3
|
|
 |
4
|
|
| |
5
|
DeRose, S. J., Durand, D., Mylonas, E., and Renear, A. H. What is text, really? Journal of Computing in Higher Education 1, 2 (1990), 3--26.
|
| |
6
|
Dubin, D., Renear, A., Sperberg-McQueen, C. M., and Huitfeldt, C. A logic programming environment for document semantics and inference. Presented at ALLC/ACH, Tübingen, Germany, July 2002.
|
| |
7
|
|
| |
8
|
|
| |
9
|
Fay, C. The document management alliance. Bulletin of the American Society for Information Science 25, 1 (October/November 1998), 20--24.
|
| |
10
|
Goldfarb, C. F. Document Composition Facility: Generalized Markup Language (GML) Users Guide. IBM General Products Division, 1978. SH20-9160-0.
|
 |
11
|
|
| |
12
|
IBM Corp. Application Description, IBM System/360 Document Processing: System. White Plains, NY, 1967. Form No. H20-0315.
|
| |
13
|
Ide, N. M., and Sperberg-McQueen, C. M. Toward a unified docuverse: Standardizing document markup and access without procrustean bargains. In Proceedings of the 60th Annual Meeting of the American Society for Information Science (Medford, NJ, 1997), C. Schwartz and M. Rorvig, Eds., Information Today, Inc., pp. 347--360.
|
| |
14
|
ISO. ISO 8879-1986 (E). Information processing --- Text and Office Systems --- Standard Generalized Markup Language (SGML). International Organization for Standardization, Geneva, 1986.
|
| |
15
|
ISO. ISO/IEC 10744:1997: Information processing -- Hypermedia/Time-based Structuring Language (HyTime), second~ed. International Organization for Standardization, Geneva, May 1997, appendix A.3 Architectural Form Definition Requirements.
|
| |
16
|
ISO. ISO/IEC 13250: 2000 Information technology -- SGML Applications -- Topic Maps. International Organization for Standardization, Geneva, 2000.
|
| |
17
|
|
| |
18
|
|
| |
19
|
Lesk, M. E. Typing Documents on UNIX and GCOS: The -ms Macros for Troff, 1977.
|
| |
20
|
Mamrak, S. A., Barnes, J., Hong, H., Joseph, C., Kaelbling, M., Nicholas, C., O'Connell, C., and Share, M. Descriptive markup -- the best approach? Communications of the Association for Computing Machinery 31, 7 (1988), 810--811.
|
 |
21
|
|
| |
22
|
Ossanna, J. F. NROFF/TROFF user's manual. Tech. Rep. 54, Bell Laboratories, Murray Hill, NJ, October 1976.
|
| |
23
|
Ramalho, J. C., and Henriques, P. R. Beyond DTDs: constraining data content. In Proceedings of SGML/XML Europe 98 (Paris, May 1998), GCA.
|
| |
24
|
Raymond, D. R., and Tompa, F. W. Markup reconsidered. Technical Report 356, Department of Computer Science, The University of Western Ontario, 1993. Presented at the First International Workshop on the Principles of Document Processing, Washinton DC, October 21-23 1992; an earlier version was circulated privately as "Markup Considered Harmful" in the late 1980s.
|
| |
25
|
|
| |
26
|
Reid, B. K. Scribe Introductory User's Manual, first ed. Carnegie-Mellon University, Computer Science Department, Pittsburgh, PA, August 1978.
|
| |
27
|
|
| |
28
|
|
| |
29
|
Renear, A. Raising the bar: Text encoding from a logical point of view. CLIP 2001: Computers, Literature, Philology, Gerhard-Mercator University, Duisburg, Germany, December 2001.
|
| |
30
|
|
| |
31
|
|
| |
32
|
Bruce Schatz , William H. Mischo , Timothy W. Cole , Joseph B. Hardin , Ann P. Bishop , Hsinchun Chen, Federating Diverse Collections of Scientific Literature, Computer, v.29 n.5, p.28-36, May 1996
[doi> 10.1109/2.493454]
|
| |
33
|
Shobowale, G. SGML, XML, and the document-centered approach to electronic medical records. Bulletin of the American Society for Information Science 25, 1 (October/November 1998), 7--10.
|
| |
34
|
Simons, G. F. Using architectural forms to map TEI data into an object-oriented database. Computers and the Humanities 33, 1--2 (1999), 85--101. Originally delivered in 1997 at the TEI 10 conference in Providence, RI.
|
| |
35
|
Sperberg-McQueen, C. M., Dubin, D., Huitfeldt, C., and Renear, A. Drawing inferences on the basis of markup. In Proceedings of Extreme Markup Languages 2002 (Montreal, Canada, August 2002), B. T. Usdin and S. R. Newcomb, Eds.
|
| |
36
|
Sperberg-McQueen, C. M., Huitfeldt, C., and Renear, A. Meaning and interpretation of markup. Markup Languages: Theory and Practice 2, 3 (2000), 215--234.
|
| |
37
|
Sperberg-McQueen, C. M., Huitfeldt, C., and Renear, A. Practical extraction of meaning from markup. Paper delivered at ACH/ALLC 2001, New York, 2001.
|
| |
38
|
Sperberg-McQueen, C. M., Renear, A., Huitfeldt, C., and Dubin, D. Skeletons in the closet: Saying what markup means. Presented at ALLC/ACH, Tübingen, Germany, July 2002.
|
| |
39
|
Sperberg-McQueen, M., and Burnard, L., Eds. Guidelines for Text Encoding and Interchange (TEI P3). ACH/ALLC/ACL Text Encoding Initiative, Chicago, Oxford, 1994.
|
| |
40
|
Spring, M. B. The origin and use of copymarks in electronic publishing. Journal of Documentation 45, 2 (June 1989), 110--123.
|
| |
41
|
|
| |
42
|
United States Department of Defense. MIL-M-28001 Military Specification: Markup Requirements and Generic Style Specification for Electronic Printed Output and Exchange of Text, 1988.
|
| |
43
|
Welty, C., and Ide, N. Using the right tools: Enhancing retrieval from marked-up documents. Computers and the Humanities 33, 1--2 (1999), 59--84. Originally delivered in 1997 at the TEI 10 conference in Providence, RI.
|
CITED BY 7
|
|
Petra Saskia Bayerl , Harald Lüngen , Daniela Goecke , Andreas Witt , Daniel Naber, Methods for the semantic analysis of document markup, Proceedings of the 2003 ACM symposium on Document engineering, November 20-22, 2003, Grenoble, France
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|