ACM Home Page
Please provide us with feedback. Feedback
Monadic datalog and the expressive power of languages for web information extraction
Full text PdfPdf (281 KB)
Source Symposium on Principles of Database Systems archive
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems table of contents
Madison, Wisconsin
SESSION: Research session 1: award winning papers table of contents
Pages: 17 - 28  
Year of Publication: 2002
ISBN:1-58113-507-6
Authors
Georg Gottlob  Technische Universität Wien, A-1040 Vienna, Austria
Christoph Koch  Technische Universität Wien, A-1040 Vienna, Austria
Sponsors
SIGART: ACM Special Interest Group on Artificial Intelligence
SIGMOD: ACM Special Interest Group on Management of Data
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): n/a,   Downloads (12 Months): n/a,   Citation Count: 16
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/543613.543617
What is a DOI?

ABSTRACT

Research on information extraction from Web pages (wrapping) has seen much activity in recent times (particularly systems implementations), but little work has been done on formally studying the expressiveness of the formalisms proposed or on the theoretical foundations of wrapping.In this paper, we first study monadic datalog as a wrapping language (over ranked or unranked tree structures). Using previous work by Neven and Schwentick, we show that this simple language is equivalent to full monadic second order logic (MSO) in its ability to specify wrappers. We believe that MSO has the right expressiveness required for Web information extraction and thus propose MSO as a yardstick for evaluating and comparing wrappers.Using the above result, we study the kernel fragment Elog- of the Elog wrapping language used in the Lixto system (a visual wrapper generator). The striking fact here is that Elog- exactly captures MSO, yet is easier to use. Indeed, programs in this language can be entirely visually specified. We also formally compare Elog to other wrapping languages proposed in the literature.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
S. Abiteboul, P. Buneman, and D. Suciu. Data on the Web. Morgan Kaufmann Publishers, 2000.
2
 
3
 
4
5
 
6
F. Azavant. Personal communication, Oct. 2001.
 
7
 
8
 
9
A. Brüggemann-Klein and D. Wood. "Regular Tree Languages over Non-ranked Alphabets". Unpublished manuscript, 1998.
 
10
 
11
12
 
13
W. F. Dowling and J. H. Gallier. "Linear-Time Algorithms for Testing the Satisfiability of Propositional Horn Formulae". Journal of Logic Programming,1(3):267-284, 1984.
 
14
M. Fernandez, J. Siméon, P. Wadler (eds.), S. Cluet, A. Deutsch, D. F. A. Levy, D. Maier, J. M. J. Robie, D. Suciu, and J. Widom. "XML Query Languages: Experiences and Exemplars", 1999. http://www-db.research.bell-labs.com/user/simeon/xquery.html.
 
15
 
16
 
17
 
18
 
19
 
20
L. Liu, C. Pu, and W. Han. "XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources", In Proceedings of the 16th International Conference on Data Engineering, 1998.
 
21
 
22
 
23
I. Muslea, S. Minton, and C. Knoblock. "STALKER: Learning Extraction Rules for Semistructured, Web-based Information Sources", 1998.
24
 
25
 
26
C. H. Papadimitriou. Computational Complexity. Addison-Wesley, 1994.
 
27
 
28
 
29
 
30
 
31
 
32
World Wide Web Consortium. http://www.w3c.org/tr/xpath/.
 
33

CITED BY  16

Collaborative Colleagues:
Georg Gottlob: colleagues
Christoph Koch: colleagues