| Bootstrapping pay-as-you-go data integration systems |
| Full text |
Pdf
(1.10 MB)
|
Source
|
International Conference on Management of Data
archive
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
table of contents
Vancouver, Canada
SESSION: Research Session 18: Database Integration As You Go
table of contents
Pages 861-874
Year of Publication: 2008
ISBN:978-1-60558-102-6
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 47, Downloads (12 Months): 444, Citation Count: 7
|
|
|
ABSTRACT
Data integration systems offer a uniform interface to a set of data sources. Despite recent progress, setting up and maintaining a data integration application still requires significant upfront effort of creating a mediated schema and semantic mappings from the data sources to the mediated schema. Many application contexts involving multiple data sources (e.g., the web, personal information management, enterprise intranets) do not require full integration in order to provide useful services, motivating a pay-as-you-go approach to integration. With that approach, a system starts with very few (or inaccurate) semantic mappings and these mappings are improved over time as deemed necessary. This paper describes the first completely self-configuring data integration system. The goal of our work is to investigate how advanced of a starting point we can provide a pay-as-you-go system. Our system is based on the new concept of a probabilistic mediated schema that is automatically created from the data sources. We automatically create probabilistic schema mappings between the sources and the mediated schema. We describe experiments in multiple domains, including 50-800 data sources, and show that our system is able to produce high-quality answers with no human intervention.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Knitro optimization software. http://www.ziena.com/knitro.htm.
|
| |
2
|
Secondstring. http://secondstring.sourceforge.net/.
|
 |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
 |
7
|
Robin Dhamankar , Yoonkyong Lee , AnHai Doan , Alon Halevy , Pedro Domingos, iMAP: discovering complex semantic matches between database schemas, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France
[doi> 10.1145/1007568.1007612]
|
| |
8
|
|
 |
9
|
AnHai Doan , Jayant Madhavan , Pedro Domingos , Alon Halevy, Learning to map between ontologies on the semantic web, Proceedings of the 11th international conference on World Wide Web, May 07-11, 2002, Honolulu, Hawaii, USA
[doi> 10.1145/511446.511532]
|
| |
10
|
|
| |
11
|
M. Dudik, S. J. Phillips, and R. E. Schapire. Performance guarantees for regularized maximum entropy density estimation. In Proc. of the 17th Annual Conf. on Computational Learning Theory, 2004.
|
 |
12
|
|
 |
13
|
|
 |
14
|
|
 |
15
|
|
 |
16
|
|
| |
17
|
|
 |
18
|
|
| |
19
|
M. Magnani and D. Montesi. Uncertainty in data integration: current approaches and open problems. In VLDB workshop on Management of Uncertain Data, pages 18--32, 2007.
|
| |
20
|
M. Magnani, N. Rizopoulos, P. Brien, and D. Montesi. Schema integration based on uncertain semantic mappings. Lecture Notes in Computer Science, pages 31--46, 2005.
|
| |
21
|
|
| |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
R. Pottinger and P. Bernstein. Creating a mediated schema based on initial correspondences. In IEEE Data Eng. Bulletin, pages 26--31, Sept 2002.
|
| |
26
|
|
| |
27
|
S. E. Fienberg W. Cohen, P. Ravikumar. A comparison of string distance metrics for name-matching tasks. In Proc. of IJCAI, 2003.
|
| |
28
|
Jiying Wang , Ji-Rong Wen , Fred Lochovsky , Wei-Ying Ma, Instance-based schema matching for web databases by domain-specific query probing, Proceedings of the Thirtieth international conference on Very large data bases, p.408-419, August 31-September 03, 2004, Toronto, Canada
|
|