|
ABSTRACT
We evaluate various heuristics for hierarchical spectral clustering in large telephone call graphs. Spectral clustering without additional heuristics often produces very uneven cluster sizes or low quality clusters that may consist of several disconnected components, a fact that appears to be common for several data sources but, to our knowledge, not described in the literature. Divide-and-Merge, a recently described postfiltering procedure may be used to eliminate bad quality branches in a binary tree hierarchy. We propose an alternate solution that enables k-way cuts in each step by immediately filtering unbalanced or low quality clusters before splitting them further. Our experiments are performed on graphs with various weight and normalization built based on call detail records. We investigate a period of eight months of more than two millions of Hungarian landline telephone users. We measure clustering quality both by cluster ratio as well as by the geographic homogeneity of the clusters obtained from telephone location data. Although divide-and-merge optimizes its clusters for cluster ratio, our method produces clusters of similar ratio much faster, furthermore we give geographically much more homogeneous clusters with the size distribution of our clusters resembling to that of the settlement structure.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
William Aiello , Fan Chung , Linyuan Lu, A random graph model for massive graphs, Proceedings of the thirty-second annual ACM symposium on Theory of computing, p.171-180, May 21-23, 2000, Portland, Oregon, United States
[doi> 10.1145/335305.335326]
|
| |
2
|
C. J. Alpert and A. B. Kahng. Multiway partitioning via geometric embeddings, orderings, and dynamic programming. IEEE Trans. on CAD of Integrated Circuits and Systems, 14(11):1342--1358, 1995.
|
| |
3
|
|
 |
4
|
Charles J. Alpert , So-Zen Yao, Spectral partitioning: the more eigenvectors, the better, Proceedings of the 32nd ACM/IEEE conference on Design automation, p.195-200, June 12-16, 1995, San Francisco, California, United States
[doi> 10.1145/217474.217529]
|
| |
5
|
W.-H. Au, K. C. C. Chan, and X. Yao. A novel evolutionary data mining algorithm with applications to churn prediction. IEEE Trans. Evolutionary Computation, 7(6):532--545, 2003.
|
| |
6
|
E. R. Barnes. An algorithm for partitioning the nodes of a graph. SIAM Journal on Algebraic and Discrete Methods, 3(4):541--550, Dec. 1982.
|
| |
7
|
A. A. Benczur, K. Csalogany, M. Kurucz, A. Lukacs, and L. Lukacs. Sociodemographic exploration of telecom communities. In NSF US-Hungarian Workshop on Large Scale Random Graphs Methods for Modeling Mesoscopic Behavior in Biological and Physical Systems, 2006.
|
| |
8
|
|
 |
9
|
Pak K. Chan , Martine D. F. Schlag , Jason Y. Zien, Spectral K-way ratio-cut partitioning and clustering, Proceedings of the 30th international conference on Design automation, p.749-754, June 14-18, 1993, Dallas, Texas, United States
[doi> 10.1145/157485.165117]
|
| |
10
|
D. Cheng, R. Kannan, S. Vempala, and G. Wang. On a recursive spectral algorithm for clustering from pairwise similarities. Technical report, MIT LCS Technical Report MIT-LCS-TR-906, 2003.
|
 |
11
|
|
| |
12
|
F. Chung and L. Lu. The average distances in random graphs with given expected degrees. Proceedings of the National Academy of Sciences of the United States of America, 99(25):15879--15882, 2002.
|
| |
13
|
F. Chung, L. Lu, and V. Vu. Eigenvalues of random power law graphs. Annals of Combinatorics, 2003.
|
| |
14
|
F. Chung, L. Lu, and V. Vu. Spectra of random graphs with given expected degrees. Proceedings of National Academy of Sciences, 100:6313--6318, 2003.
|
| |
15
|
|
| |
16
|
|
| |
17
|
I. Derenyi, G. Palla, and T. Vicsek. Clique percolation in random networks. Physical Review Letters, 94:49--60, 2005.
|
 |
18
|
|
| |
19
|
|
| |
20
|
W. E. Donath and A. J. Hoffman. Lower bounds for the partitioning of graphs. IBM Journal of Research and Development, 17(5):420--425, Sept. 1973.
|
| |
21
|
|
| |
22
|
M. Fiedler. Algebraic connectivity of graphs. Czechoslovak Mathematical Journal, 23(98), 1973.
|
| |
23
|
|
| |
24
|
E. Gorny. Russian livejournal: National specifics in the development of a virtual community. pdf online, May 2004.
|
| |
25
|
L. W. Hagen and A. B. Kahng. New spectral methods for ratio cut partitioning and clustering. IEEE Trans. on CAD of Integrated Circuits and Systems, 11(9):1074--1085, 1992.
|
| |
26
|
|
| |
27
|
G. Karypis. CLUTO: A clustering toolkit, release 2.1. Technical Report 02-017, University of Minnesota, Department of Computer Science, 2002.
|
 |
28
|
|
| |
29
|
K. Lang. Finding good nearly balanced cuts in power law graphs. 2004.
|
| |
30
|
K. Lang. Fixing two weaknesses of the spectral method. In NIPS '05: Advances in Neural Information Processing Systems, volume 18, Vancouver Canada, 2005.
|
| |
31
|
|
| |
32
|
M. Meila and J. Shi. A random walks view of spectral segmentation. In AISTATS, 2001.
|
 |
33
|
Amit A. Nanavati , Siva Gurumurthy , Gautam Das , Dipanjan Chakraborty , Koustuv Dasgupta , Sougata Mukherjea , Anupam Joshi, On the structural properties of massive telecom call graphs: findings and implications, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
[doi> 10.1145/1183614.1183678]
|
| |
34
|
J. P. Onnela, J. Saramaki, J. Hyvonen, G. Szabo, D. Lazer, K. Kaski, J. Kertesz, and A. L. Barabasi. Structure and tie strengths in mobile communication networks, Oct 2006.
|
 |
35
|
|
| |
36
|
|
 |
37
|
Tamás Sarlós , Adrás A. Benczúr , Károly Csalogány , Dániel Fogaras , Balázs Rácz, To randomize or not to randomize: space optimal summaries for hyperlink analysis, Proceedings of the 15th international conference on World Wide Web, May 23-26, 2006, Edinburgh, Scotland
[doi> 10.1145/1135777.1135823]
|
| |
38
|
|
| |
39
|
U. von Luxburg, O. Bousquet, and M. Belkin. Limits of spectral clustering. pages 857--864, Cambridge, MA, 2005. MIT Press.
|
| |
40
|
C.-P. Wei and I.-T. Chiu. Turning telecommunications call details to churn prediction: a data mining approach. Expert Syst. Appl., 23(2):103--112, 2002.
|
| |
41
|
|
| |
42
|
G. J. Wills. NicheWorks - interactive visualization of very large graphs. Journal of Computational and Graphical Statistics, 8(2):190--212, 1999.
|
| |
43
|
P. Zakharov. Structure of livejournal social network. In Proceedings of SPIE Volume 6601, Noise and Stochastics in Complex Systems and Finance, 2007.
|
| |
44
|
H. Zha, X. He, C. H. Q. Ding, M. Gu, and H. D. Simon. Spectral relaxation for k-means clustering. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, NIPS, pages 1057--1064. MIT Press, 2001.
|
CITED BY 2
|
|
Haizheng Zhang , John Yen , C. Lee Giles , Bamshad Mombaster , Myra Spiliopoulou , Jaideep Srivastava , Olfa Nasraoui , Andrew McCallum, WebKDD/SNAKDD 2007: web mining and social network analysis post-workshop report, ACM SIGKDD Explorations Newsletter, v.9 n.2, December 2007
|
|
|
|
|