ACM Home Page
Please provide us with feedback. Feedback
Effective and scalable solutions for mixed and split citation problems in digital libraries
Full text PdfPdf (695 KB)
Source Information Quality in Informational Systems archive
Proceedings of the 2nd international workshop on Information quality in information systems table of contents
Baltimore, Maryland
SESSION: Paper session II: record linkage, entity resolution table of contents
Pages: 69 - 76  
Year of Publication: 2005
ISBN:1-59593-160-0
Authors
Dongwon Lee  Penn State
Byung-Won On  Penn State
Jaewoo Kang  NCSU
Sanghyun Park  Yonsei Univ./Korea
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 26,   Citation Count: 8
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1077501.1077514
What is a DOI?

ABSTRACT

In this paper, we consider two important problems that commonly occur in bibliographic digital libraries, which seriously degrade their data qualities: Mixed Citation (MC) problem (i.e., citations of different scholars with their names being homonyms are mixed together) and Split Citation (SC) problem (i.e., citations of the same author appear under different name variants). In particular, we investigate an effective yet scalable solution since citations in such digital libraries tend to be large-scale. After formally defining the problems and accompanying challenges, we present an effective solution that is based on the state-of-the-art sampling-based approximate join algorithm. Our claim is verified through preliminary experimental results.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
R. Ananthakrishna, S. Chaudhuri, and V. Ganti. "Eliminating Fuzzy Duplicates in Data Warehouses". In VLDB, 2002.
2
 
3
M. Bilenko, R. Mooney, W. Cohen, P. Ravikumar, and S. Fienberg. "Adaptive Name-Matching in Information Integration". IEEE Intelligent System, 18(5):16--23, 2003.
4
5
 
6
W. Cohen, P. Ravikumar, and S. Fienberg. "A Comparison of String Distance Metrics for Name-matching tasks". In IIWeb Workshop held in conjunction with IJCAI, 2003.
 
7
 
8
 
9
 
10
I. P. Fellegi and A. B. Sunter. "A Theory for Record Linkage". J. of the American Statistical Society, 64:1183--1210, 1969.
11
12
13
 
14
Y. Hong, B.-W. On, and D. Lee. "System Support for Name Authority Control Problem in Digital Libraries: OpenDBLP Approach". In ECDL, 2004.
 
15
 
16
M. A. Jaro. "Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida". J. of the American Statistical Association, 84(406), 1989.
 
17
R. P. Kelley. "Blocking Considerations for Record Linkage Under Conditions of Uncertainty". In Proc. of Social Statistics Section, pages 602--605, 1984.
 
18
19
20
 
21
H. Pasula et al. "Identity Uncertainty and Citation Matching". In Advances in Neural Information Processing Systems. MIT Press, 2003.
22
23
24
 
25
W. E. Winkler and Y. Thibaudeau. "An Application of the Fellegi-Sunter Model of Record Linkage to the 1990 U.S. Decennial Census". Technical report, US Bureau of the Census, 1991.

CITED BY  8
Collaborative Colleagues:
Dongwon Lee: colleagues
Byung-Won On: colleagues
Jaewoo Kang: colleagues
Sanghyun Park: colleagues