ACM Home Page
Please provide us with feedback. Feedback
A theoretical framework for learning from a pool of disparate data sources
Full text PdfPdf (695 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Edmonton, Alberta, Canada
POSTER SESSION: Poster papers table of contents
Pages: 443 - 449  
Year of Publication: 2002
ISBN:1-58113-567-X
Authors
Shai Ben-David  Cornell University, Ithaca, NY
Johannes Gehrke  Cornell University, Ithaca, NY
Reba Schuller  Cornell University, Ithaca, NY
Sponsors
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
: AAAI
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 33,   Citation Count: 5
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/775047.775111
What is a DOI?

ABSTRACT

Many enterprises incorporate information gathered from a variety of data sources into an integrated input for some learning task. For example, aiming towards the design of an automated diagnostic tool for some disease, one may wish to integrate data gathered in many different hospitals. A major obstacle to such endeavors is that different data sources may vary considerably in the way they choose to represent related data. In practice, the problem is usually solved by a manual construction of semantic mappings and translations between the different sources. Recently there have been attempts to introduce automated algorithms based on machine learning tools for the construction of such translations.In this work we propose a theoretical framework for making classification predictions from a collection of different data sources, without creating explicit translations between them. Our framework allows a precise mathematical analysis of the complexity of such tasks, and it provides a tool for the development and comparison of different learning algorithms. Our main objective, at this stage, is to demonstrate the usefulness of computational learning theory to this practically important area and to stimulate further theoretical and experimental research of questions related to this framework.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
J. Baxter. A model of inductive bias learning. Journal of Artificial Intelligence Research, 12:149--198, 2000.
2
 
3
 
4
5
 
6
 
7
D. Hall and J. Llinas. An introduction to multisensor data fusion. In Proceedings of the IEEE, volume 85, pages 6--23.
 
8
 
9
 
10
 
11
E. Rahm and P. Bernstein. On matching schemas automatically. Dept. of Computer Science, Univ. of Leipzig, 2001.
 
12
S. Ben-David, J. Gehrke and R. Schuller. Technical report, Computer Science Department, Cornell University, May 2002.
 
13
S. Thrun and J. O'Sullivan. Discovering structure in multiple learning tasks: The TC algorithm. In International Conference on Machine Learning, pages 489--497, 1996.
 
14
 
15
H. Wache, T. Vgele, U. Visser, H. Stuckenschmidt, G. Schuster, H. Neumann, and S. Hbner. Ontology-based integration of information - a survey of existing approaches. In Proceedings of the Workshop Ontologies and Information Sharing, IJCAI, 2001.
 
16
L. Wald. An overview of concepts in fusion of earth data. In P. Gudmandsen, editor, Future trends in Remote Sensing, pages 385--390. Balkema, 1997.
 
17


Collaborative Colleagues:
Shai Ben-David: colleagues
Johannes Gehrke: colleagues
Reba Schuller: colleagues