ACM Home Page
Please provide us with feedback. Feedback
Linear time membership in a class of regular expressions with interleaving and counting
Full text PdfPdf (313 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 17th ACM conference on Information and knowledge management table of contents
Napa Valley, California, USA
SESSION: DB: stream processing table of contents
Pages 389-398  
Year of Publication: 2008
ISBN:978-1-59593-991-3
Authors
Giorgio Ghelli  Università di Pisa, Pisa, Italy
Dario Colazzo  Université Paris Sud, Paris, France
Carlo Sartiani  Università di Pisa, Pisa, Italy
Sponsors
ACM: Association for Computing Machinery
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 74,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458082.1458135
What is a DOI?

ABSTRACT

The extension of Regular Expressions (REs) with an interleaving (shuffle) operator has been proposed in many occasions, since it would be crucial to deal with unordered data. However, interleaving badly affects the complexity of basic operations, and, expecially, makes membership NP-hard [13], which is unacceptable for most uses of REs.

REs form the basis of most XML type languages, such as DTDs and XML Schema types, and XDuce types [16, 11]. In this context, the interleaving operator would be a natural addition to the language of REs, as witnessed by the presence of limited forms of interleaving in XSD (the all group), Relax-NG, and SGML, provided that the NP-hardness of membership could be avoided.

We present here a restricted class of REs with interleaving and counting which admits a linear membership algorithm, and which is expressive enough to cover the vast majority of real-world XML types.

We first present an algorithm for membership of a list of words into a RE with interleaving and counting, based on the translation of the RE into a set of constraints.

We generalize the approach in order to check membership of XML trees into a class of EDTDs with interleaving and counting, which models the crucial aspects of DTDs and XSD schemas.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
D. Barbosa, G. Leighton, and A. Smith. Efficient incremental validation of XML documents after composite updates. In XSym, volume 4156 of LNCS, pages 107--121. Springer, 2006.
 
3
4
 
5
 
6
7
 
8
B. Choi. What are real DTDs like? In WebDB, pages 43--48, 2002.
 
9
W. Gelade, W. Martens, and F. Neven. Optimizing schema languages for XML: Numerical constraints and interleaving. In ICDT, 2007.
 
10
G. Ghelli, D. Colazzo, and C. Sartiani. Efficient inclusion for a class of XML types with interleaving and counting. In DBPL, 2007.
11
 
12
 
13
 
14
M. Montazerian, P. T. Wood, and S. R. Mousavi. XPath query satisfiability is in PTIME for real-world DTDs. In XSym, volume 4704 of LNCS, pages 17--30. Springer, 2007.
 
15
 
16
H. S. Thompson, D. Beech, M. Maloney, and N. Mendelsohn. XML Schema Part 1: Structures Second Edition. Technical report, World Wide Web Consortium, Oct 2004. W3C Recommendation.
 
17


Collaborative Colleagues:
Giorgio Ghelli: colleagues
Dario Colazzo: colleagues
Carlo Sartiani: colleagues