| Using web information for author name disambiguation |
| Full text |
Pdf
(418 KB)
|
Source
|
International Conference on Digital Libraries
archive
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
table of contents
Austin, TX, USA
Pages 49-58
Year of Publication: 2009
ISBN:978-1-60558-322-8
|
|
Authors
|
|
Denilson Alves Pereira
|
Federal University of Minas Gerais, Belo Horizonte, Brazil
|
|
Berthier Ribeiro-Neto
|
Google Engineering, Belo Horizonte, Brazil
|
|
Nivio Ziviani
|
Federal University of Minas Gerais, Belo Horizonte, Brazil
|
|
Alberto H.F. Laender
|
Federal University of Minas Gerais, Belo Horizonte, Brazil
|
|
Marcos André Gonçalves
|
Federal University of Minas Gerais, Belo Horizonte, Brazil
|
|
Anderson A. Ferreira
|
Federal University of Minas Gerais, Belo Horizonte, Brazil
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 36, Downloads (12 Months): 116, Citation Count: 0
|
|
|
ABSTRACT
In digital libraries, ambiguous author names may occur due to the existence of multiple authors with the same name (polysemes) or different name variations for the same author (synonyms). We proposed here a new method that uses information available on the Web to deal with both problems at the same time. Our idea consists of gathering information from input citations and submitting queries to a Web search engine, aiming at finding curricula vitae and Web pages containing publications of the ambiguous authors. From the content of documents in the answer sets returned by the Web search engine, useful information that can help in the disambiguation process is extracted. Using this information, author names are disambiguated by leveraging a hierarchical clustering method that groups citations in the same document together in a bottom-up fashion. Experimental results show that the our method yields results that outperform those of two state-of-the-art unsupervised methods and are statistically comparable with those of a supervised one, but requiring no training. We observe gains of up to 65.2% in the pairwise F1 metric when compared with our best unsupervised baseline method.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
 |
3
|
|
 |
4
|
|
| |
5
|
D. Bollegala, Y. Matsuo, and M. Ishizuka. Disambiguating personal names on the web using automatically extracted key phrases. In Proceedings of the 17th European Conference on Artificial Intelligence, pages 553--557, Riva del Garda, Italy, August-September 2008.
|
| |
6
|
R. G. Cota, M. A. Gonçalves, and A. H. F. Laender. A heuristic hierarchical clustering method for author name disambiguation in digital libraries. In Proceedings of the 22nd Brazilian Symposium on Databases, pages 20--34, Joao Pessoa, Brazil, October 2007.
|
| |
7
|
O. Gospodnetić and E. Hatcher. Lucene in Action: A Guide to the Java Search Engine. Manning Publications Co., 2005.
|
 |
8
|
Hui Han , Lee Giles , Hongyuan Zha , Cheng Li , Kostas Tsioutsiouliklis, Two supervised learning approaches for name disambiguation in author citations, Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, June 07-11, 2004, Tuscon, AZ, USA
[doi> 10.1145/996350.996419]
|
 |
9
|
|
| |
10
|
J. Huang, S. Ertekin, and C. L. Giles. Efficient name disambiguation for large-scale databases. In Proceedings of the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 536--544, Berlin, Germany, September 2006.
|
| |
11
|
In-Su Kang , Seung-Hoon Na , Seungwoo Lee , Hanmin Jung , Pyung Kim , Won-Kyung Sung , Jong-Hyeok Lee, On co-authorship for author disambiguation, Information Processing and Management: an International Journal, v.45 n.1, p.84-97, January, 2009
[doi> 10.1016/j.ipm.2008.06.006]
|
 |
12
|
Alberto H.F. Laender , Marcos André Gonçalves , Ricardo G. Cota , Anderson A. Ferreira , Rodrygo L.T. Santos , Allan J.C. Silva, Keeping a digital library clean: new solutions to old problems, Proceeding of the eighth ACM symposium on Document engineering, September 16-19, 2008, Sao Paulo, Brazil
[doi> 10.1145/1410140.1410195]
|
| |
13
|
|
 |
14
|
|
 |
15
|
|
| |
16
|
|
 |
17
|
Byung-Won On , Dongwon Lee , Jaewoo Kang , Prasenjit Mitra, Comparative study of name disambiguation problem using a scalable blocking-based framework, Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries, June 07-11, 2005, Denver, CO, USA
[doi> 10.1145/1065385.1065463]
|
 |
18
|
Denilson Alves Pereira , Berthier Ribeiro-Neto , Nivio Ziviani , Alberto H. F. Laender, Using web information for creating publication venue authority files, Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries, June 16-20, 2008, Pittsburgh PA, PA, USA
[doi> 10.1145/1378889.1378940]
|
 |
19
|
Yang Song , Jian Huang , Isaac G. Councill , Jia Li , C. Lee Giles, Efficient topic-based unsupervised name disambiguation, Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries, June 18-23, 2007, Vancouver, BC, Canada
[doi> 10.1145/1255175.1255243]
|
 |
20
|
|
|