ACM Home Page
Please provide us with feedback. Feedback
Learning query languages of Web interfaces
Full text PdfPdf (253 KB)
Source Symposium on Applied Computing archive
Proceedings of the 2004 ACM symposium on Applied computing table of contents
Nicosia, Cyprus
SESSION: Internet data management (IDM) table of contents
Pages: 1114 - 1121  
Year of Publication: 2004
ISBN:1-58113-812-1
Authors
André Bergholz  Xerox Research Centre Europe, 6 chemin de Maupertuis, Meylan, France
Boris Chidlovskii  Xerox Research Centre Europe, 6 chemin de Maupertuis, Meylan, France
Sponsor
SIGAPP: ACM Special Interest Group on Applied Computing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 25,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/967900.968127
What is a DOI?

ABSTRACT

This paper studies the problem of automatic acquisition of the query languages supported by a Web information resource. We describe a system that automatically probes the search interface of a resource with a set of test queries and analyses the returned pages to recognize supported query operators. The automatic acquisition assumes the availability of the number of matches the resource returns for a submitted query. The match numbers are used to train a learning system and to generate classification rules that recognize the query operators supported by a provider and their syntactic encodings. These classification rules are employed during the automatic probing of new providers to determine query operators they support. We report on results of experiments with a set of real Web resources.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
The Invisible Web, http://www.invisibleweb.com/.
 
2
BrightPlanet, http://www.brightplanet.com/.
 
3
CompletePlanet, http://www.completeplanet.com/.
 
4
G. Alonso. Myths around web services. IEEE Bulletin on Data Engineering, 25(4):3--9, 2002.
 
5
 
6
M. K. Bergman. The Deep Web: Surfacing hidden value. Journal of Electronic Publishing, 7(1), 2001.
 
7
D. Bredelet and B. Roustant. Java IWrap: Wrapper induction by grammar learning. Master's thesis, ENSIMAG Grenoble, 2000.
 
8
S. Byers, J. Freire, and C. T. Silva. Efficient acquisition of web data through restricted query interfaces. In Proc. WWW Conf., China, May 2001.
9
 
10
 
11
 
12
13
 
14
P. G. Ipeirotis and L. Gravano. Distributed search over the hidden web: Hierarchical database sampling and selection. In Proc. VLDB Conf., pp. 394--405, Hong Kong, China, August 2002.
15
 
16
 
17
 
18
 
19
20

Collaborative Colleagues:
André Bergholz: colleagues
Boris Chidlovskii: colleagues