ACM Home Page
Please provide us with feedback. Feedback
Name disambiguation in author citations using a K-way spectral clustering method
Full text PdfPdf (156 KB)
Source International Conference on Digital Libraries archive
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries table of contents
Denver, CO, USA
SESSION: Tools & techniques track: identifying names of people and places table of contents
Pages: 334 - 343  
Year of Publication: 2005
ISBN:1-58113-876-8
Authors
Hui Han  Yahoo! Inc., Sunnyvale, CA and The Pennsylvania State University, University Park, PA
Hongyuan Zha  Pennsylvania State University, University Park, PA
C. Lee Giles  Pennsylvania State University, University Park, PA
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 91,   Citation Count: 16
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1065385.1065462
What is a DOI?

ABSTRACT

An author may have multiple names and multiple authors may share the same name simply due to name abbreviations, identical names, or name misspellings in publications or bibliographies 1. This can produce name ambiguity which can affect the performance of document retrieval, web search, and database integration, and may cause improper attribution of credit. Proposed here is an unsupervised learning approach using K-way spectral clustering that disambiguates authors in citations. The approach utilizes three types of citation attributes: co-author names, paper titles, and publication venue titles 2. The approach is illustrated with 16 name datasets with citations collected from the DBLP database bibliography and author home pages and shows that name disambiguation can be achieved using these citation attributes.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Digital bibliography & library project. http://WWW.Informatik.Uni-Trier.DE/ley/db/index.html.
 
2
Getty's ULAN (Union List of Artist's Names). http://www.getty.edu/research/conducting research/vocabularies/ulan/.
 
3
The library of congress name authority file. http://www.loc.gov/marc/authority/index.html.
4
 
5
6
 
7
 
8
 
9
M. Bilenko, R. Mooney, W. Cohen, P. Ravikumar, and S. Fienberg. Adaptive name matching in information integration. IEEE Intelligent Systems, 18(5):16--23, 2003.
10
 
11
L. K. Branting. Name-matching algorithms for legal case-management systems. Journal of Information, Law and Technology (JILT), 1, 2002.
 
12
13
 
14
 
15
L. Daniel and J. Slezak. Street talk: the word on address-matching. Business Geographics, pages 26--33, 1995.
 
16
 
17
T. DiLauro, G. S. Choudhury, M. Patton, J. W. Warner, and E. W. Brown. Automated name authority control and enhanced searching in the levy collection. D-Lib Magazine, 7(4), 2001.
 
18
W. B. Dolan. Word sense ambiguation: Clustering related senses. Technical report, 1994.
 
19
 
20
D. G. Feitelson. On identifying name equivalences in digital libraries. Information Research, 9(4):192, 2004.
 
21
I. P. Fellegi and A. B. Sunter. A theory for record linkage. Journal of the American Statistical Association, 64:1183--1210, 1969.
 
22
23
 
24
P. Gillman. National name authority file: Report to the national council on archives. Technical Report British Library Research and Innovation Report 91, The British Library Board, 1998.
 
25
26
 
27
 
28
 
29
J. Karlgren and M. Sahlgren. From words to understanding. In Kanerva et al. (eds.) Foundations of Real World Intelligence. CSLI publications, pages 294--308, 2001.
30
31
 
32
 
33
34
 
35
A. E. Monge and C. Elkan. An efficient domain-independent algorithm for detecting approximately duplicate database records. In Research Issues on Data Mining and Knowledge Discovery, pages 23--29, 1997.
 
36
A. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Proceedings of Advances in Neural Information Processing Systems, pages 849--856, 2001.
 
37
H. Pasula, B. Marthi, B. Milch, S. Russell, and I. Shpitser. Identity uncertainty and citation matching. In Proceedings of Neural Information Processing Systems: Natural and Synthetic, number 15, 2002.
 
38
39
 
40
 
41
K. Seymore, A. McCallum, and R. Rosenfeld. Learning hidden Markov model structure for information extraction. In Proceedings of AAAI 99 Workshop on Machine Learning for Information Extraction, 1999.
 
42
 
43
M. Skounakis, M. Craven, and S. Ray. Hierarchical hidden markov models for information extraction. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, 2003.
 
44
45
 
46
 
47
H. R. Turtle and W. B. Croft. Uncertainty in information retrieval systems. Uncertainty Management in Information Systems, pages 189--224, 1996.
48
 
49
50
 
51
Y. Y. Yao, S. Wong, and L. S. Wang. A non-numeric approach to uncertain reasoning. International Journal of General Systems, 23(4):343--359, 1995.
 
52
H. Zha, C. Ding, M. Gu, X. He, and H. Simon. Spectral relaxation for k-means clustering. In Neural Information Processing Systems (NIPS 2001), pages 1057--1064, 2001.
53

CITED BY  16

Collaborative Colleagues:
Hui Han: colleagues
Hongyuan Zha: colleagues
C. Lee Giles: colleagues