ACM Home Page
Please provide us with feedback. Feedback
Graph mining: Laws, generators, and algorithms
Full text PdfPdf (911 KB)
Source ACM Computing Surveys (CSUR) archive
Volume 38 ,  Issue 1  (2006) table of contents
Article No. 2  
Year of Publication: 2006
ISSN:0360-0300
Authors
Deepayan Chakrabarti  Yahoo! Research and Carnegie Mellon University, Sunnyvale, CA
Christos Faloutsos  Yahoo! Research and Carnegie Mellon University, Sunnyvale, CA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 121,   Downloads (12 Months): 1139,   Citation Count: 18
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: http://doi.acm.org/10.1145/1132952.1132954

ABSTRACT

How does the Web look? How could we tell an abnormal social network from a normal one? These and similar questions are important in many fields where the data can intuitively be cast as a graph; examples range from computer networks to sociology to biology and many more. Indeed, any M : N relation in database terminology can be represented as a graph. A lot of these questions boil down to the following: “How can we generate synthetic but realistic graphs?” To answer this, we must first understand what patterns are common in real-world graphs and can thus be considered a mark of normality/realism. This survey give an overview of the incredible variety of work that has been done on these problems. One of our main contributions is the integration of points of view from physics, mathematics, sociology, and computer science. Further, we briefly describe recent advances on some related and interesting graph problems.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Adamic, L. A. and Huberman, B. A. 2000. Power-law distribution of the World Wide Web. Science 287, 2115.
2
 
3
Adamic, L. A., Lukose, R. M., Puniyani, A. R., and Huberman, B. A. 2001. Search in power-law networks. Physical Rev. E 64, 4, 046135 1--8.
 
4
5
 
6
 
7
Albert, R. and Barabási, A.-L. 2000. Topology of evolving networks: local events and universality. Physical Rev. Lett. 85, 24, 5234--5237.
 
8
Albert, R. and Barabási, A.-L. 2002. Statistical mechanics of complex networks. Rev. Modern Physics 74, 1, 47--97.
 
9
Albert, R., Jeong, H., and Barabási, A.-L. 1999. Diameter of the World-Wide Web. Nature 401, 130--131.
 
10
Albert, R., Jeong, H., and Barabási, A.-L. 2000. Error and attack tolerance of complex networks. Nature 406, 378--381.
 
11
 
12
Alon, N., Yuster, R., and Zwick, U. 1997. Finding and counting given length cycles. Algorithmica 17, 3, 209--223.
 
13
Amaral, L. A. N., Scala, A., Barthélémy, M., and Stanley, H. E. 2000. Classes of small-world networks. Proceedings of the National Academy of Sciences 97, 21, 11149--11152.
 
14
 
15
Bailey, N. T. J. 1974. The Mathematical Theory of Infectious Diseases and its Applications 2nd Ed. Charles Griffin, London, UK.
 
16
Baker, W. E. and Faulkner, R. R. 1993. The social organization of conspiracy: Illegal networks in the Heavy Electrical Equipment industry. Amer. Sociolog. Rev. 58, 6, 837--860.
 
17
 
18
Barabási, A.-L. 2002. Linked: The New Science of Networks 1st Ed. Perseus Books Group, New York, NY.
 
19
Barabási, A.-L. and Albert, R. 1999. Emergence of scaling in random networks. Science 286, 509--512.
 
20
Barabási, A.-L., Jeong, H., Néda, Z., Ravasz, E., Schubert, A., and Vicsek, T. 2002. Evolution of the social network of scientific collaborations. Physica A 311, 590--614.
 
21
 
22
Ben-Hur, A. and Guyon, I. 2003. Detecting stable clusters using principal component analysis. In Methods in Molecular Biology. M. J. Brownstein and A. Khudorsky, Eds. Humana Press, Totowa, NJ, 159--182.
 
23
 
24
Berry, M. W. 1992. Large scale singular value computations. Inte. J. Supercomput. Applic. 6, 1, 13--49.
25
 
26
Bianconi, G. and Barabási, A.-L. 2001. Competition and multiscaling in evolving networks. Europhysics Letters 54, 4, 436--442.
 
27
Boguñá, M. and Pastor-Satorras, R. 2002. Epidemic spreading in correlated complex networks. Physical Rev. E 66, 4, 047104.
 
28
Boldi, P., Codenotti, B., Santini, M., and Vigna, S. 2002. Structural properties of the African Web. In International World Wide Web Conference. ACM Press, New York, NY.
 
29
Bollobás, B. 1985. Random Graphs. Academic Press, London, UK.
 
30
 
31
Bollobás, B. and Riordan, O. 2002. The diameter of a scale-free random graph. Combinatorica.
 
32
Bonacich, P. 1987. Power and centrality: A family of measures. Amer. J. Sociol. 92, 5 (Mar.), 1170--1182.
 
33
Borgatti, S. 2002. The key player problem. In Proceedings of the National Academy of Sciences Workshop on Terrorism. National Academy of Sciences, Washington DC.
 
34
Borgatti, S. and Everett, M. G. 1989. The class of all regular equivalences: Algebraic structure and computation. Social Networks 11, 65--88.
 
35
Borgatti, S. and Everett, M. G. 1999. Models of core/periphery structures. Social Networks 21, 275--295.
 
36
Borgatti, S., Everett, M. G., and Freeman, L. C. 1999. UCINET V User's Guide. Analytic Technologies.
37
 
38
Brandes, U., Gaertler, M., and Wagner, D. 2003. Experiments on graph clustering algorithms. In European Symposium on Algorithms. Springer Verlag, Berlin, Germany.
 
39
 
40
 
41
Bu, T. and Towsley, D. 2002. On distinguishing between Internet power law topology generators. In IEEE INFOCOM. IEEE Computer Society Press, Los Alamitos, CA.
 
42
Burt, R. S. 1992. Structural Holes. Harvard University Press, Cambridge, MA.
 
43
Burt, R. S. 2001. Structural holes versus network closure as social capital. In Social Capital: Theory and Research. N. Lin, K. S. Cook, and R. S. Burt, Eds. Aldine de Gruyter, Hawthorne, NY.
 
44
Callaway, D. S., Newman, M. E. J., Strogatz, S. H., and Watts, D. J. 2000. Network robustness and fragility: Percolation on random graphs. Physical Rev. Lett. 85, 25, 5468--5471.
 
45
Calvert, K. L., Doar, M. B., and Zegura, E. W. 1997. Modeling Internet topology. IEEE Comm. 35, 6, 160--163.
 
46
Carlson, J. M. and Doyle, J. 1999. Highly optimized tolerance: A mechanism for power laws in designed systems. Physical Rev. E 60, 2, 1412--1427.
 
47
48
 
49
Chakrabarti, D., Zhan, Y., and Faloutsos, C. 2004. R-MAT: A recursive model for graph mining. In SIAM Data Mining Conference. SIAM, Philadelphia, PA.
50
51
 
52
 
53
Chen, Q., Chang, H., Govindan, R., Jamin, S., Shenker, S., and Willinger, W. 2001. The origin of power laws in Internet topologies revisited. In IEEE INFOCOM. IEEE Computer Society Press, Los Alamitos, CA.
 
54
Clauset, A., Newman, M. E. J., and Moore, C. 2004. Finding community structure of very large networks. Physical Rev. E 70, 066111.
 
55
Coleman, J. S. 1988. Social capital in the creation of human capital. Amer. J. Sociol. 94, S95--S121.
 
56
 
57
 
58
 
59
Cross, R., Borgatti, S., and Parker, A. 2002. Making invisible work visible: Using social network analysis to support strategic collaboration. CA. Manag. Rev. 44, 2, 25--46.
 
60
 
61
de Solla Price, D. J. 1976. A general theory of bibliometric and other cumulative advantage processes. J. Amer. Soci. Inform. Science 27, 292--306.
 
62
 
63
Dehaspe, L., Toivonen, H., and King, R. D. 1998. Finding frequent substructures in chemical compounds. In Conference of the ACM Special Interest Group on Knowledge Discovery and Data Mining. ACM Press, New York, NY.
 
64
 
65
Dombroski, M., Fischbeck, P., and Carley, K. M. 2003. Estimating the shape of covert networks. In Proceedings of the 8th International Command and Control Research and Technology Symposium.
66
 
67
Dorogovtsev, S. N., Goltsev, A. V., and Mendes, J. F. 2002. Pseudofractal scale-free web. Physical Rev. E 65, 6, 066122.
 
68
Dorogovtsev, S. N. and Mendes, J. F. 2003. Evolution of Networks: From Biological Nets to the Internet and WWW. Oxford University Press, Oxford, UK.
 
69
Dorogovtsev, S. N., Mendes, J. F., and Samukhin, A. N. 2000. Structure of growing networks with preferential linking. Physical Rev. Lett. 85, 21, 4633--4636.
 
70
Dorogovtsev, S. N., Mendes, J. F., and Samukhin, A. N. 2001. Giant strongly connected component of directed networks. Physical Rev. E 64, 025101 1--4.
 
71
Doyle, J. and Carlson, J. M. 2000. Power laws, highly optimized tolerance, and generalized source coding. Physical Rev. Lett. 84, 24 (June), 5656--5659.
 
72
 
73
 
74
Erdős, P. and Rényi, A. 1960. On the evolution of random graphs. Publication of the Mathematical Institute of the Hungarian Acadamy of Science 5, 17--61.
 
75
Erdős, P. and Rényi, A. 1961. On the strength of connectedness of random graphs. Acta Mathematica Scientia Hungary 12, 261--267.
 
76
Everitt, B. S. 1974. Cluster Analysis. John Wiley, New York, NY.
 
77
78
 
79
Feuerverger, A. and Hall, P. 1999. Estimating a tail exponent by modelling departure from a Pareto distribution. Annals Statist. 27, 2, 760--781.
80
 
81
Freeman, L. C. 1977. A set of measures of centrality based on betweenness. Sociometry 40, 1, 35--41.
82
83
 
84
Girvan, M. and Newman, M. E. J. 2002. Community structure in social and biological networks. In Proceedings of the National Academy of Sciences. Vol. 99. National Academy of Sciences, Washington DC.
 
85
Glover, F. 1989. Tabu search---part 1. ORSA J. Comput. 1, 3, 190--206.
 
86
Goh, K.-I., Oh, E., Jeong, H., Kahng, B., and Kim, D. 2002. Classificaton of scale-free networks. In Proceedings of the National Academy of Sciences. Vol. 99. National Academy of Sciences, Washington DC, 12583--12588.
 
87
Goldstein, M. L., Morris, S. A., and Yen, G. G. 2004. Problems with fitting to the power-law distribution. European Physics J. B 41, 255--258.
 
88
Govindan, R. and Tangmunarunkit, H. 2000. Heuristics for Internet map discovery. In IEEE INFOCOM. IEEE Computer Society Press, Los Alamitos, CA, 1371--1380.
 
89
Granovetter, M. S. 1973. The strength of weak ties. Amer. J. Sociol. 78, 6 (May), 1360--1380.
 
90
Hanneman, R. A. and Riddle, M. 2005. Introduction to social network methods. http://faculty.ucr.edu/hanneman/nettext/.
 
91
Hill, B. M. 1975. A simple approach to inference about the tail of a distribution. Annals Statist. 3, 5, 1163--1174.
 
92
Holder, L., Cook, D., and Djoko, S. 1994. Substructure discovery in the SUBDUE system. In National Conference on Artificial Intelligence Workshop on Knowledge Discovery in Databases. AAAI Press, Menlo Park, CA, 169--180.
 
93
 
94
Jeong, H., Tombor, B., Albert, R., Oltvai, Z. N., and Barabási, A.-L. 2000. The large-scale organization of metabolic networks. Nature 407, 6804, 651--654.
 
95
 
96
Karypis, G. and Kumar, V. 1998. Multilevel algorithms for multi-constraint graph partitioning. Tech. Rep. 98-019, University of Minnesota.
97
98
 
99
Kephart, J. O. and White, S. R. 1991. Directed-graph epidemiological models of computer viruses. In IEEE Symposium on Research in Security and Privacy. IEEE Computer Society Press, Los Alamitos, CA.
 
100
 
101
Killworth, P. D. and Bernard, H. R. 1978. Reverse small world experiment. Social Networks 1, 2, 103--210.
102
 
103
 
104
Kleinberg, J. 2001. Small world phenomena and the dynamics of information. In Neural Information Processing Systems Conference. MIT Press, Cambridge, MA.
 
105
Kleinberg, J., Kumar, R., Raghavan, P., Rajagopalan, S., and Tomkins, A. 1999. The web as a graph: Measurements, models and methods. In International Computing and Combinatorics Conference. Springer, Berlin, Germany.
 
106
Krapivsky, P. L. and Redner, S. 2001. Organization of growing random networks. Physical Rev. E 63, 6, 066123 1--14.
 
107
Krebs, V. E. 2001. Mapping networks of terrorist cells. Connections 24, 3, 43--52.
 
108
 
109
 
110
 
111
 
112
Leskovec, J., Chakrabarti, D., Kleinberg, J., and Faloutsos, C. 2005. Realistic, mathematically tractable graph generation and evolution, using Kronecker Multiplication. In Conference on Principles and Practice of Knowledge Discovery in Databases. Springer, Berlin, Germany.
113
 
114
 
115
McGovern, A. and Jensen, D. 2003. Identifying predictive structures in relational data using multiple instance learning. In International Conference on Machine Learning. AAAI Press, Menlo Park, CA.
 
116
Medina, A., Matta, I., and Byers, J. 2000. On the origin of power laws in Internet topologies. In Conference of the ACM Special Interest Group on Data Communications (SIGCOMM). ACM Press, New York, NY, 18--34.
 
117
 
118
Milgram, S. 1967. The small-world problem. Psychology Today 2, 60--67.
 
119
Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovshii, D., and Alon, U. 2002. Network motifs: Simple building blocks of complex networks. Science 298, 824--827.
 
120
Mitzenmacher, M. 2001. A brief history of generative models for power law and lognormal distributions. In Proceedings of the 39th Annual Allerton Conference on Communication, Control, and Computing. UIUC Press, Urbana-Champaign, IL.
 
121
 
122
Moody, J. 2001. Race, school integration, and friendship segregation in America. Amer. J. Sociol. 107, 3, 679--716.
 
123
Newman, M. E. J. 2003. The structure and function of complex networks. SIAM Rev. 45, 167--256.
 
124
Newman, M. E. J. 2005. Power laws, pareto distributions and Zipf's law. Contemp. Physics 46, 323--351.
 
125
Newman, M. E. J., Forrest, S., and Balthrop, J. 2002. Email networks and the spread of computer viruses. Physical Rev. E 66, 3, 035101 1--4.
 
126
Newman, M. E. J., Girvan, M., and Farmer, J. D. 2002. Optimal design, robustness and risk aversion. Physical Rev. Lett. 89, 2, 028301 1--4.
 
127
Newman, M. E. J., Strogatz, S. H., and Watts, D. J. 2001. Random graphs with arbitrary degree distributions and their applications. Physical Rev. E 64, 2, 026118 1--17.
 
128
Nijssen, S. and Kok, J. 2001. Faster association rules for multiple relations. In International Joint Conference on Artificial Intelligence. Morgan Kaufmann, San Francisco, CA.
129
 
130
Palmer, C. and Steffan, J. G. 2000. Generating network topologies that obey power laws. In IEEE Global Telecommunications Conference. IEEE Computer Society Press, Los Alamitos, CA.
 
131
 
132
Pastor-Satorras, R., Vásquez, A., and Vespignani, A. 2001. Dynamical and correlation properties of the Internet. Physical Rev. Lett. 87, 25, 258701 1--4.
 
133
Pastor-Satorras, R. and Vespignani, A. 2001b. Epidemic dynamics and endemic states in complex networks. Physical Rev. E 63, 6, 066117 1--8.
 
134
Pastor-Satorras, R. and Vespignani, A. 2001a. Epidemic spreading in scale-free networks. Physical Rev. Lett. 86, 14, 3200--3203.
 
135
Pastor-Satorras, R. and Vespignani, A. 2002a. Epidemic dynamics in finite size scale-free networks. Physical Rev. E 65, 3, 035108 1--4.
 
136
Pastor-Satorras, R. and Vespignani, A. 2002b. Immunization of complex networks. Physical Rev. E 65, 3, 036104 1--8.
 
137
Pennock, D. M., Flake, G. W., Lawrence, S., Glover, E. J., and Giles, C. L. 2002. Winners don't take all: Characterizing the competition for links on the Web. Proceedings of the National Academy of Sciences 99, 8, 5207--5211.
 
138
 
139
Ravasz, E. and Barabási, A.-L. 2002. Hierarchical organization in complex networks. Physical Rev. E 65, 026112 1--7.
 
140
Redner, S. 1998. How popular is your paper? an empirical study of the citation distribution. European Physics J. B 4, 131--134.
141
142
 
143
Simon, H. 1955. On a class of skew distribution functions. Biometrika 42, 3/4, 425--440.
 
144
Solé, R. V. and Montoya, J. M. 2001. Complexity and fragility in ecological networks. In Proceedings of the Royal Society of London B. Vol. 268. The Royal Society, London, UK, 2039--2045.
 
145
Sparrow, M. K. 1991. The application of network analysis to criminal intelligence: An assessment of the prospects. Social Networks 13, 3, 251--274.
 
146
 
147
Tangmunarunkit, H., Govindan, R., Jamin, S., Shenker, S., and Willinger, W. 2001. Network topologies, power laws, and hierarchy. Tech. Rep. 01-746, University of Southern California.
148
 
149
Tauro, S. L., Palmer, C., Siganos, G., and Faloutsos, M. 2001. A simple conceptual model for the Internet topology. In Global Internet. IEEE Computer Society Press, Los Alamitos, CA.
 
150
Tibshirani, R., Walther, G., and Hastie, T. 2001. Estimating the number of clusters in a dataset via the Gap statistic. J. Royal Statist. Soc., B 63, 411--423.
 
151
Travers, J. and Milgram, S. 1969. An experimental study of the Small World problem. Sociometry 32, 4, 425--443.
 
152
Tyler, J. R., Wilkinson, D. M., and Huberman, B. A. 2003. Email as Spectroscopy: Automated Discovery of Community Structure Within Organizations. Kluwer, B. V., The Netherlands.
 
153
van Dongen, S. M. 2000. Graph clustering by flow simulation. Ph.D. thesis, Univesity of Utrecht.
 
154
 
155
Wang, Y., Chakrabarti, D., Wang, C., and Faloutsos, C. 2003. Epidemic spreading in real networks: An eigenvalue viewpoint. In Symposium on Reliable Distributed Systems. IEEE Computer Society Press, Los Alamitos, CA, 25--34.
 
156
Wasserman, S. and Faust, K. 1994. Social Network Analysis: Methods and Applications. Cambridge University Press, Cambridge, UK.
 
157
Watts, D. J. 2003. Six Degrees: The Science of a Connected Age 1st Ed. W. W. Norton and Company, New York, NY.
 
158
Watts, D. J., Dodds, P. S., and Newman, M. E. J. 2002. Identity and search in social networks. Science 296, 1302--1305.
 
159
Watts, D. J. and Strogatz, S. H. 1998. Collective dynamics of ‘small-world’ networks. Nature 393, 440--442.
 
160
Waxman, B. M. 1988. Routing of multipoint connections. IEEE J. Select. Areas Comm. 6, 9 (Dec.), 1617--1622.
 
161
Weeks, M. R., Clair, S., Borgatti, S., Radda, K., and Schensul, J. J. 2002. Social networks of drug users in high-risk sites: Finding the connections. AIDS Behav. 6, 2, 193--206.
 
162
Winick, J. and Jamin, S. 2002. Inet-3.0: Internet Topology Generator. Tech. Rep. CSE-TR-456-02, University of Michigan, Ann Arbor, MS.
 
163
Wu, F. and Huberman, B. A. 2004. Finding communities in linear time: A physics approach. European Physics J. B 38, 2, 331--338.
 
164
 
165
Yook, S.-H., Jeong, H., and Barabási, A.-L. 2002. Modeling the Internet's large-scale topology. Proceedings of the National Academy of Sciences 99, 21, 13382--13386.

CITED BY  18

Collaborative Colleagues:
Deepayan Chakrabarti: colleagues
Christos Faloutsos: colleagues