ACM Home Page
Please provide us with feedback. Feedback
What's new on the web?: the evolution of the web from a search engine perspective
Full text PdfPdf (502 KB)
Source International World Wide Web Conference archive
Proceedings of the 13th international conference on World Wide Web table of contents
New York, NY, USA
SESSION: Search engineering 1 table of contents
Pages: 1 - 12  
Year of Publication: 2004
ISBN:1-58113-844-X
Authors
Alexandros Ntoulas  University of California at Los Angeles, Los Angeles, CA
Junghoo Cho  University of California at Los Angeles, Los Angeles, CA
Christopher Olston  Carnegie Mellon University, Pittsburgh, PA
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 29,   Downloads (12 Months): 403,   Citation Count: 61
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/988672.988674
What is a DOI?

ABSTRACT

We seek to gain improved insight into how Web search engines shouldcope with the evolving Web, in an attempt to provide users with themost up-to-date results possible. For this purpose we collectedweekly snapshots of some 150 Web sites over the course of one year,and measured the evolution of content and link structure. Our measurements focus on aspects of potential interest to search engine designers: the evolution of link structure over time, the rate ofcreation of new pages and new distinct content on the Web, and the rate of change of the content of existing pages under search-centric measures of degree of change.Our findings indicate a rapid turnover rate of Web pages, i.e.,high rates of birth and death, coupled with an even higher rate ofturnover in the hyperlinks that connect them. For pages that persistover time we found that, perhaps surprisingly, the degree of contentshift as measured using TF.IDF cosine distance does not appear to beconsistently correlated with the frequency of contentupdating. Despite this apparent non-correlation, the rate of content shift of a given page is likely to remain consistent over time. That is, pages that change a great deal in one week will likely change by a similarly large degree in the following week. Conversely, pages that experience little change will continue to experience little change. We conclude the paper with a discussion of the potential implications ofour results for the design of effective Web search engines.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Google Directory http://dir.google.com.
 
2
Google Search. http://www.google.com.
 
3
The Internet Archive http://www.archive.org.
 
4
Nielsen NetRatings for Search Engines. avaiable from searchenginewatch.com at http://searchenginewatch.com/reports/article.php/2156451.
 
5
Online Computer Library Center http://wcp.oclc.org.
 
6
Open Directory Project http://www.dmoz.org.
 
7
The WebArchive Project, UCLA Computer Science, http://webarchive.cs.ucla.edu.
 
8
 
9
10
 
11
 
12
 
13
 
14
 
15
16
 
17
E. Coffman, Jr., Z. Liu, and R. R. Weber. Optimal robot scheduling for web search engines. Journal of Scheduling, 1(1):15--29, June 1998.
 
18
F. Douglis, A. Feldmann, and B. Krishnamurthy. Rate of change and other metrics: a live study of the world wide web. In Proceedings of the USENIX Symposium on Internet Technologies and Systems, Monterey, 1997.
19
 
20
21
 
22
 
23
B. H. Murray and A. Moore. Sizing the internet. White paper, Cyveillance, Inc., 2000.
24
 
25

CITED BY  61

Collaborative Colleagues:
Alexandros Ntoulas: colleagues
Junghoo Cho: colleagues
Christopher Olston: colleagues