ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Scalable graph clustering using stochastic flows: applications to community discovery
Full text MovMov (14:07),  PdfPdf (549 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Paris, France
SESSION: Research track papers table of contents
Pages: 737-746  
Year of Publication: 2009
ISBN:978-1-60558-495-9
Authors
Venu Satuluri  The Ohio State University, Columbus, OH, Uae
Srinivasan Parthasarathy  The Ohio State University, Columbus, OH, USA
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 57,   Downloads (12 Months): 301,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1557019.1557101
What is a DOI?

ABSTRACT

Algorithms based on simulating stochastic flows are a simple and natural solution for the problem of clustering graphs, but their widespread use has been hampered by their lack of scalability and fragmentation of output. In this article we present a multi-level algorithm for graph clustering using flows that delivers significant improvements in both quality and speed. The graph is first successively coarsened to a manageable size, and a small number of iterations of flow simulation is performed on the coarse graph. The graph is then successively refined, with flows from the previous graph used as initializations for brief flow simulations on each of the intermediate graphs. When we reach the final refined graph, the algorithm is run to convergence and the high-flow regions are clustered together, with regions without any flow forming the natural boundaries of the clusters. Extensive experimental results on several real and synthetic datasets demonstrate the effectiveness of our approach when compared to state-of-the-art algorithms.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
S. Brohee and J. van Helden. Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics, 7, 2006.
3
 
4
 
5
S. V. Dongen. Graph Clustering by Flow Simulation. PhD thesis, University of Utrecht, 2000.
6
7
 
8
 
9
 
10
B. Kernighan and S. Lin. An Efficient Heuristic Procedure for partitioning graphs. The Bell System Technical J., 49, 1970.
 
11
J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. CoRR, abs/0810.1355, 2008.
12
 
13
L. Li, C. J. Stoeckert, and D. S. Roos. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res, 13(9):2178--2189, September 2003.
 
14
M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E, 69(2):026113, Feb 2004.
 
15
V. Satuluri and S. Parthasarathy. Scalable Graph Clustering Using Stochastic Flows: Applications to Community Discovery. Technical Report OSU-CISRC-4/09-TR10, The Ohio State University.
 
16
R. Sharan, I. Ulitsky, and R. Shamir. Network-based prediction of protein function. Molecular Systems Biology, 3, 2007.
 
17
 
18
The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology. Nature Genetics, 25:25--29, 2000.
19

Collaborative Colleagues:
Venu Satuluri: colleagues
Srinivasan Parthasarathy: colleagues