|
ABSTRACT
Algorithmic tools for searching and mining the Web are becoming increasingly sophisticated and vital. In this context, algorithms that use and exploit structural information about the Web perform better than generic methods in both efficiency and reliability.We present an extensive characterization of the graph structure of the Web, with a view to enabling high-performance applications that make use of this structure. In particular, we show that the Web emerges as the outcome of a number of essentially independent stochastic processes that evolve at various scales. A striking consequence of this scale invariance is that the structure of the Web is "fractal"---cohesive subregions display the same characteristics as the Web at large. An understanding of this underlying fractal nature is therefore applicable to designing data services across multiple domains and scales.We describe potential applications of this line of research to optimized algorithm design for Web-scale data analysis.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Abiteboul, S., Quass, D., McHugh, J., Widom, J., and Wiener, J. 1997. The Lorel query language for semistructured data. Int. J. Digital Libr. 1, 1, 68--88.
|
| |
2
|
Adamic, L. and Huberman, B. 2000. The nature of markets on the world wide web. Q. J. Econ. Commerce 1, 1, 5--12.
|
| |
3
|
Adamic, L. and Huberman, B. 1999. Scaling behavior on the world wide web. Technical comment on Barabasi and Albert {1999}.
|
| |
4
|
|
 |
5
|
William Aiello , Fan Chung , Linyuan Lu, A random graph model for massive graphs, Proceedings of the thirty-second annual ACM symposium on Theory of computing, p.171-180, May 21-23, 2000, Portland, Oregon, United States
[doi> 10.1145/335305.335326]
|
| |
6
|
Arasu, A., Tomkins, A., and Tomlin, J. 2001. Pagerank computation and the structure of the web: Experiments and algorithms. Manuscript, 2001.
|
| |
7
|
|
| |
8
|
Barabasi, A. and Albert, R. 1999. Emergence of scaling in random networks. Science 286, 509.
|
| |
9
|
|
 |
10
|
|
| |
11
|
Bollobas, B. 1985. Random Graphs. Academic Press.
|
 |
12
|
|
| |
13
|
|
| |
14
|
Andrei Broder , Ravi Kumar , Farzin Maghoul , Prabhakar Raghavan , Sridhar Rajagopalan , Raymie Stata , Andrew Tomkins , Janet Wiener, Graph structure in the Web, Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking, p.309-320, June 2000, Amsterdam, The Netherlands
|
| |
15
|
|
| |
16
|
Soumen Chakrabarti , Byron Dom , Prabhakar Raghavan , Sridhar Rajagopalan , David Gibson , Jon Kleinberg, Automatic resource compilation by analyzing hyperlink structure and associated text, Proceedings of the seventh international conference on World Wide Web 7, p.65-74, April 1998, Brisbane, Australia
|
| |
17
|
Chakrabarti, S., Dom, B., Gibson, D., Ravi Kumar, S., Raghavan, P., Rajagopalan, S., and Tomkins, A. 1998b. Experiments in topic distillation. In SIGIR Workshop on Hypertext Information Retrieval on the Web.
|
| |
18
|
|
| |
19
|
|
| |
20
|
Deerwester, S., Dumais, S., Furnas, G., Landauer, T., and Harshman, R. 1990. Indexing by latent semantic analysis. J. ASIS 41, 6, 391--407.
|
 |
21
|
|
 |
22
|
Michalis Faloutsos , Petros Faloutsos , Christos Faloutsos, On power-law relationships of the Internet topology, Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication, p.251-262, August 30-September 03, 1999, Cambridge, Massachusetts, United States
|
| |
23
|
|
| |
24
|
Harary, F. 1975. Graph Theory. Addison Wesley.
|
| |
25
|
Hill, B. 1975. A simple approach to inference about the tail of a distribution. Ann. Stat. 3, 5, 1163--1174.
|
| |
26
|
Huberman, B., Pirolli, P., Pitkow, J., and Lukose, R. 1998. Strong regularities in world wide web surfing. Science 280, 95--97.
|
 |
27
|
|
| |
28
|
R. Kumar , P. Raghavan , S. Rajagopalan , D. Sivakumar , A. Tomkins , E. Upfal, Stochastic models for the Web graph, Proceedings of the 41st Annual Symposium on Foundations of Computer Science, p.57, November 12-14, 2000
|
| |
29
|
|
| |
30
|
|
| |
31
|
Kumar, R., Raghavan, P., Rajagopalan, S., and Tomkins, A. 2000. On semi-automated taxonomy construction. In Proceedings of the 4th Web Data Base Conference.
|
 |
32
|
|
| |
33
|
Martindale, C. and Konopka, A. K. 1996. Oligonucleotide frequencies in DNA follow a Yule distribution. Comput. Chem. 20, 1, 35--38.
|
| |
34
|
Mendelzon, A., Mihaila, G., and Milo, T. 1997. Querying the world wide web. J. Digital Libr. 1, 1, 68--88.
|
| |
35
|
|
| |
36
|
Palmer, E. M. 1985. Graphical Evolution. Wiley.
|
| |
37
|
|
| |
38
|
Pareto, V. 1897. Cours d'economie politique. Rouge, Lausanne et Paris.
|
 |
39
|
Peter Pirolli , James Pitkow , Ramana Rao, Silk from a sow's ear: extracting usable structures from the Web, Proceedings of the SIGCHI conference on Human factors in computing systems: common ground, p.118-125, April 13-18, 1996, Vancouver, British Columbia, Canada
[doi> 10.1145/238386.238450]
|
 |
40
|
James Pitkow , Peter Pirolli, Life, death, and lawfulness on the electronic frontier, Proceedings of the SIGCHI conference on Human factors in computing systems, p.383-390, March 22-27, 1997, Atlanta, Georgia, United States
[doi> 10.1145/258549.258805]
|
| |
41
|
Simon, H. A. 1955. On a class of skew distribution functions. Biometrika 42, 425--440.
|
| |
42
|
Spertus, E. and Stein, L. 1998. A hyperlink-based recommender system written in Squeal. In Proceedings of the CIKM Workshop on Web Information and Data Management.
|
| |
43
|
|
| |
44
|
White H. D. and McCain, K. W. 1989. Bibliometrics. Ann. Rev. Inf. Sci. Technol., 119--186.
|
| |
45
|
Yule, G. U. 1944. Statistical Study of Literary Vocabulary. Cambridge University Press.
|
| |
46
|
Zipf, G. K. 1949. Human Behavior and the Principle of Least Effort. Addison-Wesley.
|
CITED BY 13
|
|
Einat Amitay , David Carmel , Adam Darlow , Ronny Lempel , Aya Soffer, The connectivity sonar: detecting site functionality by structural patterns, Proceedings of the fourteenth ACM conference on Hypertext and hypermedia, August 26-30, 2003, Nottingham, UK
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Josiane Xavier Parreira , Debora Donato , Carlos Castillo , Gerhard Weikum, Computing trusted authority scores in peer-to-peer web search networks, Proceedings of the 3rd international workshop on Adversarial information retrieval on the web, May 08-08, 2007, Banff, Alberta, Canada
|
|
|
|
|
|
Mark R. Meiss , Filippo Menczer , Santo Fortunato , Alessandro Flammini , Alessandro Vespignani, Ranking web sites with real user traffic, Proceedings of the international conference on Web search and web data mining, February 11-12, 2008, Palo Alto, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.4
Systems and Software
Nouns:
World Wide Web (WWW)
Additional Classification:
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.3
Information Search and Retrieval
Subjects:
Information filtering
H.3.5
On-line Information Services
General Terms:
Algorithms,
Design,
Experimentation,
Measurement,
Theory,
Verification
Keywords:
Fractal,
Web-based services,
World-Wide-Web,
graph structure,
online information services,
self-similarity
|