ACM Home Page
Please provide us with feedback. Feedback
Sic transit gloria telae: towards an understanding of the web's decay
Full text PdfPdf (249 KB)
Source International World Wide Web Conference archive
Proceedings of the 13th international conference on World Wide Web table of contents
New York, NY, USA
SESSION: Link analysis table of contents
Pages: 328 - 337  
Year of Publication: 2004
ISBN:1-58113-844-X
Authors
Ziv Bar-Yossef  IBM Almaden Research Center, San Jose, CA
Andrei Z. Broder  IBM T. J. Watson Research Center, Hawthorne, NY
Ravi Kumar  IBM Almaden Research Center, San Jose, CA
Andrew Tomkins  IBM Almaden Research Center, San Jose, CA
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 88,   Citation Count: 18
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/988672.988716
What is a DOI?

ABSTRACT

The rapid growth of the web has been noted and tracked extensively. Recent studies have however documented the dual phenomenon: web pages have small half lives, and thus the web exhibits rapid death as well. Consequently, page creators are faced with an increasingly burdensome task of keeping links up-to-date, and many are falling behind. In addition to just individual pages, collections of pages or even entire neighborhoods of the web exhibit significant decay, rendering them less effective as information resources. Such neighborhoods are identified only by frustrated searchers, seeking a way out of these stale neighborhoods, back to more up-to-date sections of the web; measuring the decay of a page purely on the basis of dead links on the page is too naive to reflect this frustration. In this paper we formalize a strong notion of a decay measure and present algorithms for computing it efficiently. We explore this measure by presenting a number of validations, and use it to identify interesting artifacts on today's web. We then describe a number of applications of such a measure to search engines, web page maintainers, ontologists, and individual users.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
W. Aiello, F. Chung, and L. Lu. A random graph model for power law graphs. Experimental Mathematics, 10:53--66, 2001.
 
2
 
3
A.-L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286:509--512, 1999.
 
4
5
 
6
 
7
 
8
 
9
 
10
A. Z. Broder, R. Lempel, F. Maghoul, and J. Pedersen. Efficient Pagerank approximation via graph aggregation. Manuscript.
 
11
S. Chakrabarti, B. Dom, D. Gibson, R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Spectral filtering for resource discovery. In Proceedings of the ACM SIGIR Workshop on Hypertext Analysis, pages 13--21, 1998.
 
12
 
13
 
14
F. Douglis, A. Feldmann, B. Krishnamurthy, and J. C. Mogul. Rate of change and other metrics: a live study of the world wide web. In USENIX Symposium on Internet Technologies and Systems, 1997.
 
15
B. Edelman. Domains reregistered for distribution of unrelated content: A case study of "Tina's Free Live Webcam". http://cyber.law.harvard.edu/people/edelman/renewals/, 2002.
16
 
17
R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. Berners-Lee. RFC2616: Hypertext Transfer Protocol -- HTTP/1.1. http://www.w3.org/Protocols/rfc2616/rfc2616.html, June 1999.
18
 
19
 
20
A. Jesdanun. Internet littered with dead web sites. http://story.news.yahoo.com/news tmpl=story&u=/ap/20031102/ap_on_hi_te/% deadwood_online_1, November 2002.
21
 
22
 
23
W. Koehler. Digital libraries and world wide web sites and page persistence. Information Research, 4(4), 1999.
 
24
K. Kokoszkiewicz (a.k.a. Alectorides Conradus). Vocabula Computatralia Anglico-Latinum. University of Warsaw, Centre for Studies on the Classical Tradition in Poland and East-Central Europe (OBTA). http://www.obta.uw.edu.pl/ draco/docs/voccomp.html.
 
25
 
26
J. Markwell and D. W. Brooks. Broken links: The ephemeral nature of educational WWW hyperlinks. Journal of Science Education and Technology, 11(2):105--108, 2002.
 
27
J. Markwell and D. W. Brooks. "Link rot" limits the usefulness of web-based educational materials in biochemistry and molecular biology. Biochemistry and Molecular Biology Education, 31(1):69--72, 2003.
28
 
29
 
30
P. Rusmevichientong, D. M. Pennock, S. Lawrence, and C. L. Giles. Methods for sampling pages uniformly from the world wide web. In Proceedings of the AAAI Fall Symposium on Using Uncertainty Within Computation, pages 121--128, 2001.
31

CITED BY  18

Collaborative Colleagues:
Ziv Bar-Yossef: colleagues
Andrei Z. Broder: colleagues
Ravi Kumar: colleagues
Andrew Tomkins: colleagues