ACM Home Page
Please provide us with feedback. Feedback
The web changes everything: understanding the dynamics of web content
Full text PdfPdf (667 KB)
Source Web Search and Web Data Mining archive
Proceedings of the Second ACM International Conference on Web Search and Data Mining table of contents
Barcelona, Spain
SESSION: Graph mining and web content table of contents
Pages 282-291  
Year of Publication: 2009
ISBN:978-1-60558-390-7
Authors
Eytan Adar  University of Washington, Seattle, WA
Jaime Teevan  Microsoft Research, Redmond, WA
Susan T. Dumais  Microsoft Research, Redmond, WA
Jonathan L. Elsas  CMU, Pittsburgh, PA
Sponsors
SIGMOD: ACM Special Interest Group on Management of Data
: Google
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
: Yahoo! Research
Microsoft : Microsoft
: Nokia
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 39,   Downloads (12 Months): 311,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1498759.1498837
What is a DOI?

ABSTRACT

The Web is a dynamic, ever changing collection of information. This paper explores changes in Web content by analyzing a crawl of 55,000 Web pages, selected to represent different user visitation patterns. Although change over long intervals has been explored on random (and potentially unvisited) samples of Web pages, little is known about the nature of finer grained changes to pages that are actively consumed by users, such as those in our sample. We describe algorithms, analyses, and models for characterizing changes in Web content, focusing on both time (by using hourly and sub-hourly crawls) and structure (by looking at page-, DOM-, and term-level changes). Change rates are higher in our behavior-based sample than found in previous work on randomly sampled pages, with a large portion of pages changing more than hourly. Detailed content and structure analyses identify stable and dynamic content within each page. The understanding of Web change we develop in this paper has implications for tools designed to help people interact with dynamic Web content, such as search engines, advertising, and Web browsers.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
5
 
6
Dontcheva, M., S. Drucker, D. Salesin, and M. F. Cohen, Changes in Webpage Structure over Time, TR2007-04-02, UW, CSE, 2007.
 
7
8
 
9
Friedman, J., T. Hastie, and R. Tibshirani, Additive logistic regression: A statistical view of boosting. Annals of Statistics, 28(20), 337--407, 2000.
10
 
11
Kim, J. K., and S. H. Lee. An empirical study of the change of Web pages. APWeb '05, 632--642, 2005.
 
12
 
13
Kwon, S. H., S. H. Lee, and S. J. Kim. Effective criteria for Web page changes. APWeb '06, 837--842, 2006.
14
15
16
17
18
 
19
Selberg, E. and Etzioni, O. On the instability of Web search engines. In Proceedings of RIAO '00, 2000.
20


Collaborative Colleagues:
Eytan Adar: colleagues
Jaime Teevan: colleagues
Susan T. Dumais: colleagues
Jonathan L. Elsas: colleagues