ACM Home Page
Please provide us with feedback. Feedback
RankClus: integrating clustering with ranking for heterogeneous information network analysis
Full text PdfPdf (599 KB)
Source Extending Database Technology; Vol. 360 archive
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology table of contents
Saint Petersburg, Russia
SESSION: Research sessions: Heterogeneous & distributed table of contents
Pages 565-576  
Year of Publication: 2009
ISBN:978-1-60558-422-5
Authors
Yizhou Sun  University of Illinois at Urbana Champaign
Jiawei Han  University of Illinois at Urbana Champaign
Peixiang Zhao  University of Illinois at Urbana Champaign
Zhijun Yin  University of Illinois at Urbana Champaign
Hong Cheng  The Chinese University of Hong Kong
Tianyi Wu  University of Illinois at Urbana Champaign
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 74,   Citation Count: 1
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1516360.1516426
What is a DOI?

ABSTRACT

As information networks become ubiquitous, extracting knowledge from information networks has become an important task. Both ranking and clustering can provide overall views on information network data, and each has been a hot topic by itself. However, ranking objects globally without considering which clusters they belong to often leads to dumb results, e.g., ranking database and computer architecture conferences together may not make much sense. Similarly, clustering a huge number of objects (e.g., thousands of authors) in one huge cluster without distinction is dull as well.

In this paper, we address the problem of generating clusters for a specified type of objects, as well as ranking information for all types of objects based on these clusters in a multi-typed (i.e., heterogeneous) information network. A novel clustering framework called RankClus is proposed that directly generates clusters integrated with ranking. Based on initial K clusters, ranking is applied separately, which serves as a good measure for each cluster. Then, we use a mixture model to decompose each object into a K-dimensional vector, where each dimension is a component coefficient with respect to a cluster, which is measured by rank distribution. Objects then are reassigned to the nearest cluster under the new measure space to improve clustering. As a result, quality of clustering and ranking are mutually enhanced, which means that the clusters are getting more accurate and the ranking is getting more meaningful. Such a progressive refinement process iterates until little change can be made. Our experiment results show that RankClus can generate more accurate clusters and in a more efficient way than the state-of-the-art link-based clustering methods. Moreover, the clustering results with ranks can provide more informative views of data compared with traditional clustering.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
J. Bilmes. A gentle tutorial on the em algorithm and its application to parameter estimation for gaussian mixture and hidden markov models, 1997.
 
2
 
3
D. R. Cutting, D. R. Karger, J. O. Pedersen, and J. W. Tukey. Scatter/gather: a cluster-based approach to browsing large document collections. pages 318--329, 1992.
 
4
DBLP. The dblp computer science bibliography. http://www.informatik.uni-trier.de/ley/db/.
 
5
J. E. Gentle and W. HSrdle. Handbook of Computational Statistics: Concepts and Methods, chapter 7 Evaluation of Eigenvalues, pages 245--247. Springer, 1 edition, 2004.
 
6
C. L. Giles. The future of citeseer. In 10th European Conference on PKDD (PKDD'06), page 2, 2006.
 
7
 
8
J. E. Hirsch. An index to quantify an individual's scientific research output. Proceedings of the National Academy of Sciences, 102:16569, 2005.
9
 
10
11
 
12
13
 
14
S. Roy, T. Lane, and M. Werner-Washburne. Integrative construction and analysis of condition-specific biological networks. In Proceedings of AAAI'07, pages 1898--1899, 2007.
 
15
 
16
A. Sidiropoulos, D. Katsaros, and Y. Manolopoulos. Generalized h-index for disclosing latent facts in citation networks. CoRR, abs/cs/0607066, 2006.
 
17
 
18
O. Zamir and O. Etzioni. Grouper: A dynamic clustering interface to web search results. pages 1361--1374, 1999.

Collaborative Colleagues:
Yizhou Sun: colleagues
Jiawei Han: colleagues
Peixiang Zhao: colleagues
Zhijun Yin: colleagues
Hong Cheng: colleagues
Tianyi Wu: colleagues