ACM Home Page
Please provide us with feedback. Feedback
An interactive clustering-based approach to integrating source query interfaces on the deep Web
Full text PdfPdf (228 KB)
Source International Conference on Management of Data archive
Proceedings of the 2004 ACM SIGMOD international conference on Management of data table of contents
Paris, France
SESSION: Research sessions: Web, XML and IR table of contents
Pages: 95 - 106  
Year of Publication: 2004
ISBN:1-58113-859-8
Authors
Wensheng Wu  University of Illinois at Urbana-Champaign
Clement Yu  University of Illinois at Chicago
AnHai Doan  University of Illinois at Urbana-Champaign
Weiyi Meng  SUNY at Binghamton
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 131,   Citation Count: 28
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1007568.1007582
What is a DOI?

ABSTRACT

An increasing number of data sources now become available on the Web, but often their contents are only accessible through query interfaces. For a domain of interest, there often exist many such sources with varied coverage or querying capabilities. As an important step to the integration of these sources, we consider the integration of their query interfaces. More specifically, we focus on the crucial step of the integration: accurately matching the interfaces. While the integration of query interfaces has received more attentions recently, current approaches are not sufficiently general: (a) they all model interfaces with flat schemas; (b) most of them only consider 1:1 mappings of fields over the interfaces; (c) they all perform the integration in a blackbox-like fashion and the whole process has to be restarted from scratch if anything goes wrong; and (d) they often require laborious parameter tuning. In this paper, we propose an interactive, clustering-based approach to matching query interfaces. The hierarchical nature of interfaces is captured with ordered trees. Varied types of complex mappings of fields are examined and several approaches are proposed to effectively identify these mappings. We put the human integrator back in the loop and propose several novel approaches to the interactive learning of parameters and the resolution of uncertain mappings. Extensive experiments are conducted and results show that our approach is highly effective.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
IceQ project: http://hanoi.cs.uiuc.edu/iceq/.
 
2
 
3
M. Bergman. The Deep Web: Surfacing the hidden value. BrightPlanet.com, 2000.
4
 
5
L. Dice. Measures of the amount of ecologic association between species. Ecology, 26(3), 1945.
 
6
H. Do and E. Rahm. Coma - a system for flexible combination of schema matching approaches. In VLDB, 2002.
7
 
8
C. Fellbaum, editor. WordNet: An On-Line Lexical Database and Some of its Applications. MIT Press, Cambridge, MA, 1998.
 
9
A. Halevy and J. Madhavan. Corpus-based knowledge representation. In Int. Joint Conf. on AI, 2003.
10
 
11
H. He, W. Meng, C. Yu, and Z. Wu. Wise-integrator: an automatic integrator of Web search interfaces for e-commerce. In VLDB, 2003.
 
12
A. Hess and N. Kushmerick. Automatically attaching semantic metadata to Web services. In IJCAI Workshop on Information Integration on the Web, 2003.
 
13
L. Kaufman and P. Rousseeuw. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, 1990.
 
14
 
15
S. Lawrence and C. Giles. Accessibility of information on the Web. Nature, 400, 1999.
 
16
 
17
 
18
 
19
S. Melnik, H. Garcia-Molina, and E. Rahm. Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In ICDE, 2002.
 
20
 
21
 
22
M. Porter. An algorithm for suffix stripping. Program, 14(3), 1980.
 
23
R. Pottinger and P. Bernstein. Merging models based on given correspondences. In VLDB, 2003.
 
24
 
25
 
26
 
27
28
29
 
30
 
31
 
32

CITED BY  29
Collaborative Colleagues:
Wensheng Wu: colleagues
Clement Yu: colleagues
AnHai Doan: colleagues
Weiyi Meng: colleagues