ACM Home Page
Please provide us with feedback. Feedback
Using web information for author name disambiguation
Full text PdfPdf (418 KB)
Source
International Conference on Digital Libraries archive
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries table of contents
Austin, TX, USA
SESSION: 2 table of contents
Pages 49-58  
Year of Publication: 2009
ISBN:978-1-60558-322-8
Authors
Denilson Alves Pereira  Federal University of Minas Gerais, Belo Horizonte, Brazil
Berthier Ribeiro-Neto  Google Engineering, Belo Horizonte, Brazil
Nivio Ziviani  Federal University of Minas Gerais, Belo Horizonte, Brazil
Alberto H.F. Laender  Federal University of Minas Gerais, Belo Horizonte, Brazil
Marcos André Gonçalves  Federal University of Minas Gerais, Belo Horizonte, Brazil
Anderson A. Ferreira  Federal University of Minas Gerais, Belo Horizonte, Brazil
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 36,   Downloads (12 Months): 116,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1555400.1555409
What is a DOI?

ABSTRACT

In digital libraries, ambiguous author names may occur due to the existence of multiple authors with the same name (polysemes) or different name variations for the same author (synonyms). We proposed here a new method that uses information available on the Web to deal with both problems at the same time. Our idea consists of gathering information from input citations and submitting queries to a Web search engine, aiming at finding curricula vitae and Web pages containing publications of the ambiguous authors. From the content of documents in the answer sets returned by the Web search engine, useful information that can help in the disambiguation process is extracted. Using this information, author names are disambiguated by leveraging a hierarchical clustering method that groups citations in the same document together in a bottom-up fashion. Experimental results show that the our method yields results that outperform those of two state-of-the-art unsupervised methods and are statistically comparable with those of a supervised one, but requiring no training. We observe gains of up to 65.2% in the pairwise F1 metric when compared with our best unsupervised baseline method.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
3
4
 
5
D. Bollegala, Y. Matsuo, and M. Ishizuka. Disambiguating personal names on the web using automatically extracted key phrases. In Proceedings of the 17th European Conference on Artificial Intelligence, pages 553--557, Riva del Garda, Italy, August-September 2008.
 
6
R. G. Cota, M. A. Gonçalves, and A. H. F. Laender. A heuristic hierarchical clustering method for author name disambiguation in digital libraries. In Proceedings of the 22nd Brazilian Symposium on Databases, pages 20--34, Joao Pessoa, Brazil, October 2007.
 
7
O. Gospodnetić and E. Hatcher. Lucene in Action: A Guide to the Java Search Engine. Manning Publications Co., 2005.
8
9
 
10
J. Huang, S. Ertekin, and C. L. Giles. Efficient name disambiguation for large-scale databases. In Proceedings of the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 536--544, Berlin, Germany, September 2006.
 
11
12
 
13
14
15
 
16
17
18
19
20

Collaborative Colleagues:
Denilson Alves Pereira: colleagues
Berthier Ribeiro-Neto: colleagues
Nivio Ziviani: colleagues
Alberto H.F. Laender: colleagues
Marcos André Gonçalves: colleagues
Anderson A. Ferreira: colleagues