ACM Home Page
Please provide us with feedback. Feedback
Learning to join everything
Full text PdfPdf (134 KB)
Source
Conference on Information and Knowledge Management archive
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management table of contents
Lisbon, Portugal
Pages 9-10  
Year of Publication: 2007
ISBN:978-1-59593-803-9
Author
Fernando Pereira  University of Pennsylvania, Philadelphia, PA
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 38,   Citation Count: 0
Additional Information:

abstract  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1321440.1321443
What is a DOI?

ABSTRACT

Text, speech, images, video, DNA sequences provide information about entities that people can recognize when looking at a particular instance. But those entities and their attributes and relationships are not directly accessible to queries that join across types of sources. Information extraction methods based on supervised machine learning recognize mentions of entities and relationships of predefined types in different kinds of sources, which can then be used to answer some useful types of queries. However, supervised learning relies on hand-annotated training sets that are difficult to create and limit what types of entities and relationships can be joined for new applications. These limitations have prompted research into unsupervised extraction methods that rely on correlations among sources rather than hand-annotated training sets. While these methods are not yet as accurate as those based on supervised learning, they have the potential for a new query-by-example approach to information integration in which seed sets of query answers are expanded into ranked lists of potential answers by learning occurrence patterns from the seed answers. I will give examples of both types of methods from our research on biomedical information extraction, leading to some ideas on a possible convergence of search and databases through machine learning.