ACM Home Page
Please provide us with feedback. Feedback
Top-k generation of integrated schemas based on directed and weighted correspondences
Full text PdfPdf (1.58 MB)
Source
International Conference on Management of Data archive
Proceedings of the 35th SIGMOD international conference on Management of data table of contents
Providence, Rhode Island, USA
SESSION: Research session 17: data integration table of contents
Pages 641-654  
Year of Publication: 2009
ISBN:978-1-60558-551-2
Authors
Ahmed Radwan  University of Miami, Miami, FL, USA
Lucian Popa  IBM Almaden Research Center, San Jose, CA, USA
Ioana R. Stanoi  IBM Almaden Research Center, San Jose, CA, USA
Akmal Younis  University of Miami, Miami, FL, USA
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 54,   Downloads (12 Months): 203,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1559845.1559913
What is a DOI?

ABSTRACT

Schema integration is the problem of creating a unified target schema based on a set of existing source schemas and based on a set of correspondences that are the result of matching the source schemas. Previous methods for schema integration rely on the exploration, implicit or explicit, of the multiple design choices that are possible for the integrated schema. Such exploration relies heavily on user interaction; thus, it is time consuming and labor intensive. Furthermore, previous methods have ignored the additional information that typically results from the schema matching process, that is, the weights and in some cases the directions that are associated with the correspondences.

In this paper, we propose a more automatic approach to schema integration that is based on the use of directed and weighted correspondences between the concepts that appear in the source schemas. A key component of our approach is a novel top-k ranking algorithm for the automatic generation of the best candidate schemas. The algorithm gives more weight to schemas that combine the concepts with higher similarity or coverage. Thus, the algorithm makes certain decisions that otherwise would likely be taken by a human expert. We show that the algorithm runs in polynomial time and moreover has good performance in practice.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
P. Brown, P. J. Haas, J. Myllymaki, H. Pirahesh, B. Reinwald, and Y. Sismanis. Toward Automated Large-Scale Information Integration and Discovery. In Data Management in a Connected World, pages 161--180, 2005.
 
3
4
 
5
 
6
M.-P. Dubuisson and A. K. Jain. A Modified Hausdorff Distance for Object Matching. In Proc. Int. Conf. on Pattern Recognition, pages 566--568, 1994.
 
7
A. Gal. Managing Uncertainty in Schema Matching with Top-K Schema Mappings. J. Data Semantics, 6:90--114, 2006.
 
8
H. Hamacher and M. Queyranne. K-best solutions to combinatorial optimization problems. Annals of Operations Research, 4:123--143, 1985/6.
 
9
 
10
 
11
R. J. Miller, D. Fisla, M. Huang, D. Kymlicka, F. Ku, and V. Lee. The Amalgam schema and data integration test suite. www.cs.toronto.edu/ miller/amalgam, 2001.
 
12
 
13
www.dbis.informatik.uni-goettingen.de/Mondial.
 
14
J. Munkres. Algorithms for the Assignment and Transportation Problems. Journal of the Society of Industrial and Applied Mathematics, 5(1):32--38, 1957.
 
15
J. R. Munkres. Topology. Prentice Hall, Inc., 2000.
 
16
K. G. Murty. An algorithm for ranking all the assignments in order of increasing cost. Operations Research, 16:682--687, 1968.
 
17
 
18
 
19
20
 
21
A. Radwan, A. Younis, M. A. Hernández, H. Ho, L. Popa, S. Shivaji, and S. Khuri. BioFederator: A Data Federation System for Bioinformatics on the Web. In IIWeb Workshop, pages 92--97, 2007.
 
22
 
23
 
24
G. Stumme and A. Maedche. FCA-MERGE: Bottom-up merging of ontologies. In IJCAI, pages 225--234, 2001.
25

Collaborative Colleagues:
Ahmed Radwan: colleagues
Lucian Popa: colleagues
Ioana R. Stanoi: colleagues
Akmal Younis: colleagues