| Learning query languages of Web interfaces |
| Full text |
Pdf
(253 KB)
|
| Source
|
Symposium on Applied Computing
archive
Proceedings of the 2004 ACM symposium on Applied computing
table of contents
Nicosia, Cyprus
SESSION: Internet data management (IDM)
table of contents
Pages: 1114 - 1121
Year of Publication: 2004
ISBN:1-58113-812-1
|
|
Authors
|
|
André Bergholz
|
Xerox Research Centre Europe, 6 chemin de Maupertuis, Meylan, France
|
|
Boris Chidlovskii
|
Xerox Research Centre Europe, 6 chemin de Maupertuis, Meylan, France
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 6, Downloads (12 Months): 26, Citation Count: 0
|
|
|
ABSTRACT
This paper studies the problem of automatic acquisition of the query languages supported by a Web information resource. We describe a system that automatically probes the search interface of a resource with a set of test queries and analyses the returned pages to recognize supported query operators. The automatic acquisition assumes the availability of the number of matches the resource returns for a submitted query. The match numbers are used to train a learning system and to generate classification rules that recognize the query operators supported by a provider and their syntactic encodings. These classification rules are employed during the automatic probing of new providers to determine query operators they support. We report on results of experiments with a set of real Web resources.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
The Invisible Web, http://www.invisibleweb.com/.
|
| |
2
|
BrightPlanet, http://www.brightplanet.com/.
|
| |
3
|
CompletePlanet, http://www.completeplanet.com/.
|
| |
4
|
G. Alonso. Myths around web services. IEEE Bulletin on Data Engineering, 25(4):3--9, 2002.
|
| |
5
|
|
| |
6
|
M. K. Bergman. The Deep Web: Surfacing hidden value. Journal of Electronic Publishing, 7(1), 2001.
|
| |
7
|
D. Bredelet and B. Roustant. Java IWrap: Wrapper induction by grammar learning. Master's thesis, ENSIMAG Grenoble, 2000.
|
| |
8
|
S. Byers, J. Freire, and C. T. Silva. Efficient acquisition of web data through restricted query interfaces. In Proc. WWW Conf., China, May 2001.
|
 |
9
|
Jamie Callan , Margaret Connell , Aiqun Du, Automatic discovery of language models for text databases, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.479-490, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
 |
13
|
|
| |
14
|
P. G. Ipeirotis and L. Gravano. Distributed search over the hidden web: Hierarchical database sampling and selection. In Proc. VLDB Conf., pp. 394--405, Hong Kong, China, August 2002.
|
 |
15
|
Panagiotis G. Ipeirotis , Luis Gravano , Mehran Sahami, Probe, count, and classify: categorizing hidden web databases, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.67-78, May 21-24, 2001, Santa Barbara, California, United States
|
| |
16
|
|
| |
17
|
|
| |
18
|
Shalom Tsur , Serge Abiteboul , Rakesh Agrawal , Umeshwar Dayal , Johannes Klein , Gerhard Weikum, Are Web Services the Next Revolution in e-Commerce? (Panel), Proceedings of the 27th International Conference on Very Large Data Bases, p.614-617, September 11-14, 2001
|
| |
19
|
|
 |
20
|
Ramana Yerneni , Chen Li , Hector Garcia-Molina , Jeffrey Ullman, Computing capabilities of mediators, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.443-454, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
|