ACM Home Page
Please provide us with feedback. Feedback
Adopting ontologies for multisource identity resolution
Full text PdfPdf (455 KB)
Source ACM International Conference Proceeding Series; Vol. 308 archive
Proceedings of the first international workshop on Ontology-supported business intelligence table of contents
Karlsruhe, Germany
Article No. 6  
Year of Publication: 2008
ISBN:978-1-60558-219-1
Authors
Milena Yankova  Science University of Sheffield, United Kingdom
Horacio Saggion  Science University of Sheffield, United Kingdom
Hamish Cunningham  Science University of Sheffield, United Kingdom
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 103,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1452567.1452573
What is a DOI?

ABSTRACT

Identity resolution aims at identifying the newly presented facts and linking them to their previous mentions. Our main hypothesis is that variations of one and the same fact can be recognised, duplications removed and their aggregation actually increases the correctness of fact extraction. Our approach to the identity problem has been implemented as Identity Resolution Framework (IdRF). The framework provides a general solution identifying known and new facts in specific domains, and it can be used in different applications for processing of different types of entity. It uses an ontology for internal and resulting knowledge representational formalism. The ontology not only contains the representation of the domain, but also known entities and properties. Apart from extracting information from textual sources, we also exploit structured information available in databases mapping the database schema to the ontology and populating the ontology with existing knowledge. Our main goal is not to advocate one criterion among the others, but to introduce widely applicable solution of the identity resolution problem, we present a set of customisable criteria as well as a mechanism new criteria to be added. We have carried two series of experiments in two different business intelligence domains - company profiling and recruitment - achieving rather encouraging result.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Niraj Aswani, Kalina Bontcheva, and Hamish Cunningham. Mining information for instance unification. In International Semantic Web Conference, 2006.
 
2
A. Bagga and A. Biermann. A methodology for cross-document coreference. In Proceedings of the Fifth Joint Conference on Information Sciences, pages 207--210, 2000.
 
3
 
4
Mikhail Bilenko and Raymond J. Mooney. Employing trainable string similarity metrics for information integration. In Proceedings of the IJCAI-2003 Workshop on Information Integration on the Web, pages 67--72, Acapulco, Mexico, August 2003.
 
5
N. Chinchor. Overview of muc-7. In In Proceedings of MUC-7, 1998.
6
 
7
Ahmed K. Elmagarmid, Panagiotis G. Ipeirotis, and Vassilios S. Verykios. Duplicate record detection: A survey. Technical report, TKDE, January 2007.
 
8
Norberto Fernandez, Jose M. Blazquez, Jesus A. Fisteus, Luis Sanchez, Michael Sintek, Ansgar Bernardi, Manuel Fuentes, Angelo Marrara, and Zohar Ben-Ashe. News: Bringing semantic web technologies into news agencies. In International Semantic Web Conference, 2006.
 
9
Adam Funk, Diana Maynard, Horacio Saggion, and Kalina Bontcheva. Ontological integration of information extraction from multiple sources. In International Workshop on Multi-source, Multi-lingual Information Extraction and Summarisaton, 2007.
 
10
Fausto Giunchiglia, Pavel Shvaiko, and Mikalai Yatskevich. S-match: an algorithm and an implementation of semantic matching. In ESWS, pages 61--75, 2004.
 
11
Chong Jeong Gooi and James Allan. Cross-document coreference on a large scale corpus. In Proceedings of the Human Language Technology conference / North American chapter of the Association for Computational Linguistics annual meeting, Boston, 2004.
 
12
 
13
 
14
Atanas Kiryakov, Damyan Ognyanov, and Dimitar Mano. Owlim --- a pragmatic semantic repository for owl. In SSWS 2005, WISE, USA, 2005.
 
15
Michal C. A. Klein, Peter Mika, and Stefan Schlobach. Approximate instance unification using roughowl. 2007. submitted.
 
16
 
17
 
18
 
19
 
20
D. Maynard, M. Yankova, A. Kourakis, and A. Kokossis. Ontology-based information extraction for market monitoring and technology watch. In ESWC Workshop "End User Apects of the Semantic Web", Heraklion, Crete, 2005.
21
 
22
 
23
 
24
H. Saggion. Experiments on semantic-based clustering for cross-document coreference. In International Joint Conference on Natural Language Processing, Hyderabad, India, January 2008. AFNLP.
 
25
H. Saggion, J. Kuper, T. Declerck, D. Reidsma, and H. Cunningham. Intelligent multimedia indexing and retrieval through multi-source information extraction and merging. In IJCAI 2003, Acapulco, Mexico, 2003.
 
26
Ivan Terziev, Atanas Kiryakov, and Dimitar Mano. Base upper-level ontology (bulo) guidance. Technical Report Deliverable 1.8.1, SEKT project, UK, July 2005.
 
27
K. Yang, J. Jiang, H. Lee, and J. Ho. Extracting citation relationships from web documents for author disambiguation. Technical Report TR-IIS-06-017, Institute of Information Science, Academia Sinica Taipei Taiwan, December 2006.

Collaborative Colleagues:
Milena Yankova: colleagues
Horacio Saggion: colleagues
Hamish Cunningham: colleagues