| ArnetMiner: extraction and mining of academic social networks |
| Full text |
Pdf
(415 KB)
|
Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Las Vegas, Nevada, USA
SESSION: Industrial papers
table of contents
Pages 990-998
Year of Publication: 2008
ISBN:978-1-60558-193-4
|
|
Authors
|
|
Jie Tang
|
Tsinghua University, Beijing, China
|
|
Jing Zhang
|
Tsinghua University, Beijing, China
|
|
Limin Yao
|
Tsinghua University, Beijing, China
|
|
Juanzi Li
|
Tsinghua University, Beijing, China
|
|
Li Zhang
|
IBM, Beijing, China
|
|
Zhong Su
|
IBM, Beijing, China
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 50, Downloads (12 Months): 356, Citation Count: 0
|
|
|
ABSTRACT
This paper addresses several key issues in the ArnetMiner system, which aims at extracting and mining academic social networks. Specifically, the system focuses on: 1) Extracting researcher profiles automatically from the Web; 2) Integrating the publication data into the network from existing digital libraries; 3) Modeling the entire academic network; and 4) Providing search services for the academic network. So far, 448,470 researcher profiles have been extracted using a unified tagging approach. We integrate publications from online Web databases and propose a probabilistic framework to deal with the name ambiguity problem. Furthermore, we propose a unified modeling approach to simultaneously model topical aspects of papers, authors, and publication venues. Search services such as expertise search and people association search have been provided based on the modeling results. In this paper, we describe the architecture and main features of the system. We also present the empirical evaluation of the proposed methods.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
L. A. Adamic and E. Adar. How to search a social network. Social Networks, 27:187--203, 2005.
|
| |
2
|
C. Andrieu, N. de Freitas, A. Doucet, and M. I. Jordan. An introduction to mcmc for machine learning. Machine Learning, 50:5--43, 2003.
|
| |
3
|
|
 |
4
|
|
 |
5
|
|
 |
6
|
|
| |
7
|
D. M. Blei and J. D. McAuliffe. Supervised topic models. In Proc. of NIPS'07, 2007.
|
| |
8
|
|
| |
9
|
D. Brickley and L. Miller. Foaf vocabulary specification. In Namespace Document, http://xmlns.com/foaf/0.1/, September 2004.
|
 |
10
|
|
| |
11
|
F. Ciravegna. An adaptive algorithm for information extraction from web-related texts. In Proc. of IJCAI'01 Workshop, August 2001.
|
| |
12
|
|
| |
13
|
N. Craswell, A. P. de Vries, and I. Soboroff. Overview of the trec-2005 enterprise track. In TREC'05, pages 199--205, 2005.
|
 |
14
|
Hui Han , Lee Giles , Hongyuan Zha , Cheng Li , Kostas Tsioutsiouliklis, Two supervised learning approaches for name disambiguation in author citations, Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, June 07-11, 2004, Tuscon, AZ, USA
[doi> 10.1145/996350.996419]
|
 |
15
|
|
 |
16
|
|
 |
17
|
|
 |
18
|
|
| |
19
|
T. Kristjansson, A. Culotta, P. Viola, and A. McCallum. Interactive information extraction with constrained conditional random fields. In Proc. of AAAI'04, 2004.
|
| |
20
|
|
| |
21
|
A. McCallum. Multi-label text classification with a mixture model trained by em. In Proc. of AAAI'99 Workshop, 1999.
|
 |
22
|
|
| |
23
|
T. Minka. Estimating a dirichlet distribution. In Technique Report, http://research.microsoft.com/ minka/papers/dirichlet/, 2003.
|
 |
24
|
Zaiqing Nie , Yunxiao Ma , Shuming Shi , Ji-Rong Wen , Wei-Ying Ma, Web object retrieval, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
[doi> 10.1145/1242572.1242584]
|
| |
25
|
Michal Rosen-Zvi , Thomas Griffiths , Mark Steyvers , Padhraic Smyth, The author-topic model for authors and documents, Proceedings of the 20th conference on Uncertainty in artificial intelligence, p.487-494, July 07-11, 2004, Banff, Canada
|
 |
26
|
Mark Steyvers , Padhraic Smyth , Michal Rosen-Zvi , Thomas Griffiths, Probabilistic author-topic models for information discovery, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
[doi> 10.1145/1014052.1014087]
|
 |
27
|
|
| |
28
|
|
 |
29
|
|
| |
30
|
|
| |
31
|
X. Yin, J. Han, and P. Yu. Object distinction: Distinguishing objects with identical names. In Proc. of ICDE'2007, pages 1242--1246, 2007.
|
| |
32
|
|
|