ACM Home Page
Please provide us with feedback. Feedback
Integrating web query results: holistic schema matching
Full text PdfPdf (368 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 17th ACM conference on Information and knowledge management table of contents
Napa Valley, California, USA
SESSION: DB: faceted search, web query results presentation table of contents
Pages 33-42  
Year of Publication: 2008
ISBN:978-1-59593-991-3
Authors
Shui-Lung Chuang  University of Illinois at Urbana-Champaign, Urbana, IL, USA
Kevin Chen-Chuan Chang  University of Illinois at Urbana-Champaign, Urbana, IL, USA
Sponsors
ACM: Association for Computing Machinery
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 18,   Downloads (12 Months): 203,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458082.1458090
What is a DOI?

ABSTRACT

The emergence of numerous data sources online has presented a pressing need for more automatic yet accurate data integration techniques. For the data returned from querying such sources, most works focus on how to extract the embedded structured data more accurately. However, to eventually provide an integrated access to these query results, a last but not least step is to combine the extracted data coming from different sources. A critical task is finding the correspondence of the data fields between the sources - a problem well known as schema matching. Query results are a small and biased sample set of instances obtained from sources; the obtained schema information is thus very implicit and incomplete, which often prevents existing schema matching approaches from performing effectively. In this paper, we develop a novel framework for understanding and effectively supporting schema matching on such instance-based data, especially for integrating multiple sources. We view discovering matching as constructing a more complete domain schema that best describes the input data. With this conceptual view, we can leverage various data instances and observed regularities seamlessly with holistic, multiple-source schema matching to achieve more accurate matching results. Our experiments show that our framework consistently outperforms baseline pairwise and clustering-based approaches (raising F-measure from 50-89% to 89-94%) and works uniformly well for the surveyed domains.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
L. Arlotta, V. Crescenzi, G. Mecca, and P. Merialdo. Automatic annotation of data extracted from large web sites. In Proc. of WebDB, 2003.
 
3
 
4
 
5
 
6
 
7
8
 
9
10
11
 
12
 
13
 
14
 
15
 
16
 
17
L. R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77:257---286, 1989.
 
18
 
19
20
 
21

Collaborative Colleagues:
Shui-Lung Chuang: colleagues
Kevin Chen-Chuan Chang: colleagues