|
ABSTRACT
In this paper we study the problem of local triangle counting in large graphs. Namely, given a large graph G = (V;E) we want to estimate as accurately as possible the number of triangles incident to every node υ ∈ V in the graph. The problem of computing the global number of triangles in a graph has been considered before, but to our knowledge this is the first paper that addresses the problem of local triangle counting with a focus on the efficiency issues arising in massive graphs. The distribution of the local number of triangles and the related local clustering coefficient can be used in many interesting applications. For example, we show that the measures we compute can help to detect the presence of spamming activity in large-scale Web graphs, as well as to provide useful features to assess content quality in social networks. For computing the local number of triangles we propose two approximation algorithms, which are based on the idea of min-wise independent permutations (Broder et al. 1998). Our algorithms operate in a semi-streaming fashion, using O(jV j) space in main memory and performing O(log jV j) sequential scans over the edges of the graph. The first algorithm we describe in this paper also uses O(jEj) space in external memory during computation, while the second algorithm uses only main memory. We present the theoretical analysis as well as experimental results in massive graphs demonstrating the practical efficiency of our approach.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
N. Alon, R. Yuster, and U. Zwick. Finding and counting given length cycles. Algorithmica, 17(3):209--223, 1997.
|
| |
2
|
Ziv Bar-Yossef , Ravi Kumar , D. Sivakumar, Reductions in streaming algorithms, with an application to counting triangles in graphs, Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, p.623-632, January 06-08, 2002, San Francisco, California
|
| |
3
|
V. Batagelj and A. Mrvar. A subquadratic triad census algorithm for large sparse networks with small maximum degree. Social Networks, 23:237--243, 2001.
|
| |
4
|
T. Bohman, C. Cooper, and A. M. Frieze. Min-wise independent linear permutations. Electr. J. Comb, 7, 2000.
|
 |
5
|
|
| |
6
|
|
| |
7
|
|
 |
8
|
Andrei Z. Broder , Moses Charikar , Alan M. Frieze , Michael Mitzenmacher, Min-wise independent permutations (extended abstract), Proceedings of the thirtieth annual ACM symposium on Theory of computing, p.327-336, May 24-26, 1998, Dallas, Texas, United States
[doi> 10.1145/276698.276781]
|
| |
9
|
Andrei Z. Broder , Steven C. Glassman , Mark S. Manasse , Geoffrey Zweig, Syntactic clustering of the Web, Selected papers from the sixth international conference on World Wide Web, p.1157-1166, September 1997, Santa Clara, California, United States
|
 |
10
|
Luciana S. Buriol , Gereon Frahling , Stefano Leonardi , Alberto Marchetti-Spaccamela , Christian Sohler, Counting triangles in data streams, Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 26-28, 2006, Chicago, IL, USA
[doi> 10.1145/1142351.1142388]
|
 |
11
|
Carlos Castillo , Debora Donato , Luca Becchetti , Paolo Boldi , Stefano Leonardi , Massimo Santini , Sebastiano Vigna, A reference collection for web spam, ACM SIGIR Forum, v.40 n.2, p.11-24, December 2006
[doi> 10.1145/1189702.1189703]
|
 |
12
|
Carlos Castillo , Debora Donato , Aristides Gionis , Vanessa Murdock , Fabrizio Silvestri, Know your neighbors: web spam detection using the web topology, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 23-27, 2007, Amsterdam, The Netherlands
[doi> 10.1145/1277741.1277814]
|
| |
13
|
|
| |
14
|
|
 |
15
|
|
| |
16
|
J.-P. Eckmann and E. Moses. Curvature of co-links uncovers hidden thematic layers in the world wide web. PNAS, 99(9):5825--5829, 2002.
|
| |
17
|
J. Feigenbaum, S. Kannan, M. A. Gregor, S. Suri, and J. Zhang. On graph problems in a semi-streaming model. In ICALP, 2004.
|
 |
18
|
Dennis Fetterly , Mark Manasse , Marc Najork, Spam, damn spam, and statistics: using statistical analysis to locate spam web pages, Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004, June 17-18, 2004, Paris, France
[doi> 10.1145/1017074.1017077]
|
 |
19
|
|
| |
20
|
|
 |
21
|
|
| |
22
|
T. Haveliwala. Efficient computation of pagerank. Technical report, Stanford University, 1999.
|
| |
23
|
|
| |
24
|
|
| |
25
|
A. Itai and M. Rodeh. Finding a minimum circuit in a graph. SIAM Journal of Computing, 7(4):413--423, 1978.
|
 |
26
|
|
| |
27
|
|
 |
28
|
Jure Leskovec , Jon Kleinberg , Christos Faloutsos, Graphs over time: densification laws, shrinking diameters and possible explanations, Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
[doi> 10.1145/1081870.1081893]
|
| |
29
|
M. E. J. Newman. The structure and function of complex networks. SIAM Review, 45(2):167--256, 2003.
|
| |
30
|
T. Schank and D. Wagner. Finding, counting and listing all triangles in large graphs, an experimental study. In Proceedings of the 4th International Workshop on Experimental and Efficient Algorithms (WEA), 2005.
|
 |
31
|
|
| |
32
|
H. T. Welser, E. Gleave, D. Fisher, and M. Smith. Visualizing the signatures of social roles in online discussion groups. The Journal of Social Structure, 8(2), 2007.
|
CITED BY 2
|
|
Virginia Vassilevska , Ryan Williams, Finding, minimizing, and counting weighted subgraphs, Proceedings of the 41st annual ACM symposium on Theory of computing, May 31-June 02, 2009, Bethesda, MD, USA
|
|
|
|
|