ACM Home Page
Please provide us with feedback. Feedback
Self-similarity in the web
Full text PdfPdf (484 KB)
Source ACM Transactions on Internet Technology (TOIT) archive
Volume 2 ,  Issue 3  (August 2002) table of contents
Pages: 205 - 223  
Year of Publication: 2002
ISSN:1533-5399
Authors
Stephen Dill  IBM Almaden Research Center, San Jose, CA
Ravi Kumar  IBM Almaden Research Center, San Jose, CA
Kevin S. Mccurley  IBM Almaden Research Center, San Jose, CA
Sridhar Rajagopalan  IBM Almaden Research Center, San Jose, CA
D. Sivakumar  IBM Almaden Research Center, San Jose, CA
Andrew Tomkins  IBM Almaden Research Center, San Jose, CA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 20,   Downloads (12 Months): 149,   Citation Count: 13
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/572326.572328
What is a DOI?

ABSTRACT

Algorithmic tools for searching and mining the Web are becoming increasingly sophisticated and vital. In this context, algorithms that use and exploit structural information about the Web perform better than generic methods in both efficiency and reliability.We present an extensive characterization of the graph structure of the Web, with a view to enabling high-performance applications that make use of this structure. In particular, we show that the Web emerges as the outcome of a number of essentially independent stochastic processes that evolve at various scales. A striking consequence of this scale invariance is that the structure of the Web is "fractal"---cohesive subregions display the same characteristics as the Web at large. An understanding of this underlying fractal nature is therefore applicable to designing data services across multiple domains and scales.We describe potential applications of this line of research to optimized algorithm design for Web-scale data analysis.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Abiteboul, S., Quass, D., McHugh, J., Widom, J., and Wiener, J. 1997. The Lorel query language for semistructured data. Int. J. Digital Libr. 1, 1, 68--88.
 
2
Adamic, L. and Huberman, B. 2000. The nature of markets on the world wide web. Q. J. Econ. Commerce 1, 1, 5--12.
 
3
Adamic, L. and Huberman, B. 1999. Scaling behavior on the world wide web. Technical comment on Barabasi and Albert {1999}.
 
4
5
 
6
Arasu, A., Tomkins, A., and Tomlin, J. 2001. Pagerank computation and the structure of the web: Experiments and algorithms. Manuscript, 2001.
 
7
 
8
Barabasi, A. and Albert, R. 1999. Emergence of scaling in random networks. Science 286, 509.
 
9
10
 
11
Bollobas, B. 1985. Random Graphs. Academic Press.
12
 
13
 
14
 
15
 
16
 
17
Chakrabarti, S., Dom, B., Gibson, D., Ravi Kumar, S., Raghavan, P., Rajagopalan, S., and Tomkins, A. 1998b. Experiments in topic distillation. In SIGIR Workshop on Hypertext Information Retrieval on the Web.
 
18
 
19
 
20
Deerwester, S., Dumais, S., Furnas, G., Landauer, T., and Harshman, R. 1990. Indexing by latent semantic analysis. J. ASIS 41, 6, 391--407.
21
22
 
23
 
24
Harary, F. 1975. Graph Theory. Addison Wesley.
 
25
Hill, B. 1975. A simple approach to inference about the tail of a distribution. Ann. Stat. 3, 5, 1163--1174.
 
26
Huberman, B., Pirolli, P., Pitkow, J., and Lukose, R. 1998. Strong regularities in world wide web surfing. Science 280, 95--97.
27
 
28
 
29
 
30
 
31
Kumar, R., Raghavan, P., Rajagopalan, S., and Tomkins, A. 2000. On semi-automated taxonomy construction. In Proceedings of the 4th Web Data Base Conference.
32
 
33
Martindale, C. and Konopka, A. K. 1996. Oligonucleotide frequencies in DNA follow a Yule distribution. Comput. Chem. 20, 1, 35--38.
 
34
Mendelzon, A., Mihaila, G., and Milo, T. 1997. Querying the world wide web. J. Digital Libr. 1, 1, 68--88.
 
35
 
36
Palmer, E. M. 1985. Graphical Evolution. Wiley.
 
37
 
38
Pareto, V. 1897. Cours d'economie politique. Rouge, Lausanne et Paris.
39
40
 
41
Simon, H. A. 1955. On a class of skew distribution functions. Biometrika 42, 425--440.
 
42
Spertus, E. and Stein, L. 1998. A hyperlink-based recommender system written in Squeal. In Proceedings of the CIKM Workshop on Web Information and Data Management.
 
43
 
44
White H. D. and McCain, K. W. 1989. Bibliometrics. Ann. Rev. Inf. Sci. Technol., 119--186.
 
45
Yule, G. U. 1944. Statistical Study of Literary Vocabulary. Cambridge University Press.
 
46
Zipf, G. K. 1949. Human Behavior and the Principle of Least Effort. Addison-Wesley.

CITED BY  13

Collaborative Colleagues:
Stephen Dill: colleagues
Ravi Kumar: colleagues
Kevin S. Mccurley: colleagues
Sridhar Rajagopalan: colleagues
D. Sivakumar: colleagues
Andrew Tomkins: colleagues