ACM Home Page
Please provide us with feedback. Feedback
Comparative study of name disambiguation problem using a scalable blocking-based framework
Full text PdfPdf (1.06 MB)
Source International Conference on Digital Libraries archive
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries table of contents
Denver, CO, USA
SESSION: Tools & techniques track: identifying names of people and places table of contents
Pages: 344 - 353  
Year of Publication: 2005
ISBN:1-58113-876-8
Authors
Byung-Won On  Pennsylvania State University, University Park, PA
Dongwon Lee  Pennsylvania State University, University Park, PA
Jaewoo Kang  North Carolina State University, Raleigh, NC
Prasenjit Mitra  Pennsylvania State University, University Park, PA
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 67,   Citation Count: 17
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1065385.1065463
What is a DOI?

ABSTRACT

In this paper, we consider the problem of ambiguous author names in bibliographic citations, and comparatively study alternative approaches to identify and correct such name variants (e.g., "Vannevar Bush" and "V. Vush"). Our study is based on a scalable two-step framework, where step 1 is to substantially reduce the number of candidates via blocking, and step 2 is to measure the distance of two names via coauthor information. Combining four blocking methods and seven distance measures on four data sets, we present extensive experimental results, and identify combinations that are scalable and effective to disambiguate author names in citations.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
R. Ananthakrishna, S. Chaudhuri, and V. Ganti. "Eliminating Fuzzy Duplicates in Data Warehouses". In VLDB, 2002.
 
2
arXiv.org e Print archive. http://arxiv.org/.
 
3
M. Bilenko, R. Mooney, W. Cohen, P. Ravikumar, and S. Fienberg. "Adaptive Name-Matching in Information Integration". IEEE Intelligent System, 18(5):16--23, 2003.
4
5
 
6
W. Cohen, P. Ravikumar, and S. Fienberg. "A Comparison of String Distance Metrics for Name-matching tasks". In IIWeb Workshop held in conjunction with IJCAI, 2003.
 
7
 
8
I. P. Fellegi and A. B. Sunter. "A Theory for Record Linkage". J. of the American Statistical Society, 64:1183--1210, 1969.
 
9
A Library for Support Vector Machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm/.
10
11
12
 
13
 
14
M. A. Jaro. "Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida". J. of the American Statistical Association, 84(406), Jun. 1989.
 
15
R. P. Kelley. "Blocking Considerations for Record Linkage Under Conditions of Uncertainty". In Proc. of Social Statistics Section, pages 602--605, 1984.
 
16
 
17
 
18
CiteSeer: Scientific Literature Digital Library. http://www.citeseer.org/.
 
19
B. Majoros. "Naive Bayes Models for Classification". http://www.geocities.com/ResearchTriangle/Forum/1203/NaiveBayes.html.
20
 
21
 
22
H. Pasula, B. Marthi, B. Milch, S. Russell, and I. Shpitser. "Identity Uncertainty and Citation Matching". In Advances in Neural Information Processing Systems. MIT Press, 2003.
 
23
S. Sarawagi and A. Bhamidipaty. "Interactive Deduplication using Active Learning". In ACM SIGMOD, 2002.
 
24
SecondString: Open source Java-based Package of Approximate String-Matching. http://secondstring.sourceforge.net/.
 
25
 
26
W. E. Winkler and Y. Thibaudeau. "An Application of the Fellegi-Sunter Model of Record Linkage to the 1990 U.S. Decennial Census". Technical report, US Bureau of the Census, 1991.

CITED BY  17

Collaborative Colleagues:
Byung-Won On: colleagues
Dongwon Lee: colleagues
Jaewoo Kang: colleagues
Prasenjit Mitra: colleagues