|
ABSTRACT
An increasing number of data sources now become available on the Web, but often their contents are only accessible through query interfaces. For a domain of interest, there often exist many such sources with varied coverage or querying capabilities. As an important step to the integration of these sources, we consider the integration of their query interfaces. More specifically, we focus on the crucial step of the integration: accurately matching the interfaces. While the integration of query interfaces has received more attentions recently, current approaches are not sufficiently general: (a) they all model interfaces with flat schemas; (b) most of them only consider 1:1 mappings of fields over the interfaces; (c) they all perform the integration in a blackbox-like fashion and the whole process has to be restarted from scratch if anything goes wrong; and (d) they often require laborious parameter tuning. In this paper, we propose an interactive, clustering-based approach to matching query interfaces. The hierarchical nature of interfaces is captured with ordered trees. Varied types of complex mappings of fields are examined and several approaches are proposed to effectively identify these mappings. We put the human integrator back in the loop and propose several novel approaches to the interactive learning of parameters and the resolution of uncertain mappings. Extensive experiments are conducted and results show that our approach is highly effective.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
IceQ project: http://hanoi.cs.uiuc.edu/iceq/.
|
| |
2
|
|
| |
3
|
M. Bergman. The Deep Web: Surfacing the hidden value. BrightPlanet.com, 2000.
|
 |
4
|
|
| |
5
|
L. Dice. Measures of the amount of ecologic association between species. Ecology, 26(3), 1945.
|
| |
6
|
H. Do and E. Rahm. Coma - a system for flexible combination of schema matching approaches. In VLDB, 2002.
|
 |
7
|
AnHai Doan , Pedro Domingos , Alon Y. Halevy, Reconciling schemas of disparate data sources: a machine-learning approach, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.509-520, May 21-24, 2001, Santa Barbara, California, United States
|
| |
8
|
C. Fellbaum, editor. WordNet: An On-Line Lexical Database and Some of its Applications. MIT Press, Cambridge, MA, 1998.
|
| |
9
|
A. Halevy and J. Madhavan. Corpus-based knowledge representation. In Int. Joint Conf. on AI, 2003.
|
 |
10
|
|
| |
11
|
H. He, W. Meng, C. Yu, and Z. Wu. Wise-integrator: an automatic integrator of Web search interfaces for e-commerce. In VLDB, 2003.
|
| |
12
|
A. Hess and N. Kushmerick. Automatically attaching semantic metadata to Web services. In IJCAI Workshop on Information Integration on the Web, 2003.
|
| |
13
|
L. Kaufman and P. Rousseeuw. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, 1990.
|
| |
14
|
|
| |
15
|
S. Lawrence and C. Giles. Accessibility of information on the Web. Nature, 400, 1999.
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
S. Melnik, H. Garcia-Molina, and E. Rahm. Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In ICDE, 2002.
|
| |
20
|
|
| |
21
|
|
| |
22
|
M. Porter. An algorithm for suffix stripping. Program, 14(3), 1980.
|
| |
23
|
R. Pottinger and P. Bernstein. Merging models based on given correspondences. In VLDB, 2003.
|
| |
24
|
|
| |
25
|
|
| |
26
|
|
| |
27
|
|
 |
28
|
|
 |
29
|
|
| |
30
|
|
| |
31
|
|
| |
32
|
|
CITED BY 29
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Xiaoyong Chai , Ba-Quy Vuong , AnHai Doan , Jeffrey F. Naughton, Efficiently incorporating user feedback into information extraction and integration programs, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|