| A theoretical framework for learning from a pool of disparate data sources |
| Full text |
Pdf
(695 KB)
|
| Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Edmonton, Alberta, Canada
POSTER SESSION: Poster papers
table of contents
Pages: 443 - 449
Year of Publication: 2002
ISBN:1-58113-567-X
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 4, Downloads (12 Months): 33, Citation Count: 5
|
|
|
ABSTRACT
Many enterprises incorporate information gathered from a variety of data sources into an integrated input for some learning task. For example, aiming towards the design of an automated diagnostic tool for some disease, one may wish to integrate data gathered in many different hospitals. A major obstacle to such endeavors is that different data sources may vary considerably in the way they choose to represent related data. In practice, the problem is usually solved by a manual construction of semantic mappings and translations between the different sources. Recently there have been attempts to introduce automated algorithms based on machine learning tools for the construction of such translations.In this work we propose a theoretical framework for making classification predictions from a collection of different data sources, without creating explicit translations between them. Our framework allows a precise mathematical analysis of the complexity of such tasks, and it provides a tool for the development and comparison of different learning algorithms. Our main objective, at this stage, is to demonstrate the usefulness of computational learning theory to this practically important area and to stimulate further theoretical and experimental research of questions related to this framework.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. Baxter. A model of inductive bias learning. Journal of Artificial Intelligence Research, 12:149--198, 2000.
|
 |
2
|
|
| |
3
|
|
| |
4
|
|
 |
5
|
AnHai Doan , Pedro Domingos , Alon Y. Halevy, Reconciling schemas of disparate data sources: a machine-learning approach, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.509-520, May 21-24, 2001, Santa Barbara, California, United States
|
| |
6
|
|
| |
7
|
D. Hall and J. Llinas. An introduction to multisensor data fusion. In Proceedings of the IEEE, volume 85, pages 6--23.
|
| |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
E. Rahm and P. Bernstein. On matching schemas automatically. Dept. of Computer Science, Univ. of Leipzig, 2001.
|
| |
12
|
S. Ben-David, J. Gehrke and R. Schuller. Technical report, Computer Science Department, Cornell University, May 2002.
|
| |
13
|
S. Thrun and J. O'Sullivan. Discovering structure in multiple learning tasks: The TC algorithm. In International Conference on Machine Learning, pages 489--497, 1996.
|
| |
14
|
|
| |
15
|
H. Wache, T. Vgele, U. Visser, H. Stuckenschmidt, G. Schuster, H. Neumann, and S. Hbner. Ontology-based integration of information - a survey of existing approaches. In Proceedings of the Workshop Ontologies and Information Sharing, IJCAI, 2001.
|
| |
16
|
L. Wald. An overview of concepts in fusion of earth data. In P. Gudmandsen, editor, Future trends in Remote Sensing, pages 385--390. Balkema, 1997.
|
| |
17
|
|
|