| Colibri: fast mining of large static and dynamic graphs |
| Full text |
Pdf
(256 KB)
|
Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Las Vegas, Nevada, USA
SESSION: Research papers
table of contents
Pages: 686-694
Year of Publication: 2008
ISBN:978-1-60558-193-4
|
|
Authors
|
|
Hanghang Tong
|
Carnegie Mellon University, Pittsburgh, PA, USA
|
|
Spiros Papadimitriou
|
IBM T.J. Watson, Hawthorne, NY, USA
|
|
Jimeng Sun
|
IBM T.J. Watson, Hawthorne, NY, USA
|
|
Philip S. Yu
|
University of Illinois at Chicago, Chicago, IL, USA
|
|
Christos Faloutsos
|
Carnegie Mellon University, Pittsburgh, PA, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 15, Downloads (12 Months): 160, Citation Count: 1
|
|
|
ABSTRACT
Low-rank approximations of the adjacency matrix of a graph are essential in finding patterns (such as communities) and detecting anomalies. Additionally, it is desirable to track the low-rank structure as the graph evolves over time, efficiently and within limited storage. Real graphs typically have thousands or millions of nodes, but are usually very sparse. However, standard decompositions such as SVD do not preserve sparsity. This has led to the development of methods such as CUR and CMD, which seek a non-orthogonal basis by sampling the columns and/or rows of the sparse matrix. However, these approaches will typically produce overcomplete bases, which wastes both space and time. In this paper we propose the family of Colibri methods to deal with these challenges. Our version for static graphs, Colibri-S, iteratively finds a non-redundant basis and we prove that it has no loss of accuracy compared to the best competitors (CUR and CMD), while achieving significant savings in space and time: on real data, Colibri-S requires much less space and is orders of magnitude faster (in proportion to the square of the number of non-redundant columns). Additionally, we propose an efficient update algorithm for dynamic, time-evolving graphs, Colibri-D. Our evaluation on a large, real network traffic dataset shows that Colibri-D is over 100 times faster than the best published competitor (CMD).
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
R. Albert, H. Jeong, and A.-L. Barabasi. Diameter of the world wide web. Nature, (401):130--131, 1999.
|
 |
3
|
Lars Backstrom , Dan Huttenlocher , Jon Kleinberg , Xiangyang Lan, Group formation in large social networks: membership, growth, and evolution, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
[doi> 10.1145/1150402.1150412]
|
| |
4
|
Andrei Broder , Ravi Kumar , Farzin Maghoul , Prabhakar Raghavan , Sridhar Rajagopalan , Raymie Stata , Andrew Tomkins , Janet Wiener, Graph structure in the Web, Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking, p.309-320, June 2000, Amsterdam, The Netherlands
|
 |
5
|
Yun Chi , Xiaodan Song , Dengyong Zhou , Koji Hino , Belle L. Tseng, Evolutionary spectral clustering by incorporating temporal smoothness, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
[doi> 10.1145/1281192.1281212]
|
| |
6
|
S. Dorogovtsev and J. Mendes. Evolution of networks. Advances in Physics, 51:1079--1187, 2002.
|
| |
7
|
|
| |
8
|
|
| |
9
|
P. Drineas, M. W. Mahoney, and S. Muthukrishnan. Relative-error cur matrix decompositions. CoRR, abs/0708.3696, 2007.
|
 |
10
|
Michalis Faloutsos , Petros Faloutsos , Christos Faloutsos, On power-law relationships of the Internet topology, Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication, p.251-262, August 30-September 03, 1999, Cambridge, Massachusetts, United States
[doi> 10.1145/316188.316229]
|
| |
11
|
|
 |
12
|
David Gibson , Jon Kleinberg , Prabhakar Raghavan, Inferring Web communities from link topology, Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems, p.225-234, June 20-24, 1998, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/276627.276652]
|
| |
13
|
M. Girvan and M. E. J. Newman. Community structure is social and biological networks.
|
| |
14
|
G. H. Golub and C. F. Van-Loan. Matrix Computations. The Johns Hopkins University Press, Baltimore, 2nd edition, 1989.
|
 |
15
|
|
| |
16
|
|
 |
17
|
K. V. Ravi Kanth , Divyakant Agrawal , Ambuj Singh, Dimensionality reduction for similarity searching in dynamic databases, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.166-176, June 01-04, 1998, Seattle, Washington, United States
[doi> 10.1145/276304.276320]
|
 |
18
|
|
 |
19
|
Jure Leskovec , Jon Kleinberg , Christos Faloutsos, Graphs over time: densification laws, shrinking diameters and possible explanations, Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
[doi> 10.1145/1081870.1081893]
|
| |
20
|
M. E. J. Newman. The structure and function of complex networks. SIAM Review, 45:167--256, 2003.
|
 |
21
|
Jia-Yu Pan , Hyung-Jeong Yang , Christos Faloutsos , Pinar Duygulu, Automatic multimedia cross-modal correlation discovery, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
[doi> 10.1145/1014052.1014135]
|
 |
22
|
Jia-Yu Pan , Hyung-Jeong Yang , Christos Faloutsos , Pinar Duygulu, Automatic multimedia cross-modal correlation discovery, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
[doi> 10.1145/1014052.1014135]
|
| |
23
|
|
| |
24
|
|
 |
25
|
Jimeng Sun , Christos Faloutsos , Spiros Papadimitriou , Philip S. Yu, GraphScope: parameter-free mining of large time-evolving graphs, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
[doi> 10.1145/1281192.1281266]
|
| |
26
|
|
 |
27
|
|
| |
28
|
J. Sun, Y. Xie, H. Zhang, and C. Faloutsos. Less is more: Compact matrix decomposition for large sparse graphs. In SDM, 2007.
|
| |
29
|
|
| |
30
|
|
|