| The web changes everything: understanding the dynamics of web content |
| Full text |
Pdf
(667 KB)
|
| Source
|
Web Search and Web Data Mining
archive
Proceedings of the Second ACM International Conference on Web Search and Data Mining
table of contents
Barcelona, Spain
SESSION: Graph mining and web content
table of contents
Pages 282-291
Year of Publication: 2009
ISBN:978-1-60558-390-7
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 39, Downloads (12 Months): 311, Citation Count: 1
|
|
|
ABSTRACT
The Web is a dynamic, ever changing collection of information. This paper explores changes in Web content by analyzing a crawl of 55,000 Web pages, selected to represent different user visitation patterns. Although change over long intervals has been explored on random (and potentially unvisited) samples of Web pages, little is known about the nature of finer grained changes to pages that are actively consumed by users, such as those in our sample. We describe algorithms, analyses, and models for characterizing changes in Web content, focusing on both time (by using hourly and sub-hourly crawls) and structure (by looking at page-, DOM-, and term-level changes). Change rates are higher in our behavior-based sample than found in previous work on randomly sampled pages, with a large portion of pages changing more than hourly. Detailed content and structure analyses identify stable and dynamic content within each page. The understanding of Web change we develop in this paper has implications for tools designed to help people interact with dynamic Web content, such as search engines, advertising, and Web browsers.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Eytan Adar , Mira Dontcheva , James Fogarty , Daniel S. Weld, Zoetrope: interacting with the ephemeral web, Proceedings of the 21st annual ACM symposium on User interface software and technology, October 19-22, 2008, Monterey, CA, USA
[doi> 10.1145/1449715.1449756]
|
 |
2
|
|
 |
3
|
Michael Bolin , Matthew Webber , Philip Rha , Tom Wilson , Robert C. Miller, Automation and customization of rendered web pages, Proceedings of the 18th annual ACM symposium on User interface software and technology, October 23-26, 2005, Seattle, WA, USA
[doi> 10.1145/1095034.1095062]
|
| |
4
|
|
 |
5
|
|
| |
6
|
Dontcheva, M., S. Drucker, D. Salesin, and M. F. Cohen, Changes in Webpage Structure over Time, TR2007-04-02, UW, CSE, 2007.
|
| |
7
|
Fred Douglis , Anja Feldmann , Balachander Krishnamurthy , Jeffrey Mogul, Rate of change and other metrics: a live study of the world wide web, Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems, p.14-14, December 08-11, 1997, Monterey, California
|
 |
8
|
|
| |
9
|
Friedman, J., T. Hastie, and R. Tibshirani, Additive logistic regression: A statistical view of boosting. Annals of Statistics, 28(20), 337--407, 2000.
|
 |
10
|
|
| |
11
|
Kim, J. K., and S. H. Lee. An empirical study of the change of Web pages. APWeb '05, 632--642, 2005.
|
| |
12
|
|
| |
13
|
Kwon, S. H., S. H. Lee, and S. J. Kim. Effective criteria for Web page changes. APWeb '06, 837--842, 2006.
|
 |
14
|
|
 |
15
|
|
 |
16
|
James Pitkow , Peter Pirolli, Life, death, and lawfulness on the electronic frontier, Proceedings of the SIGCHI conference on Human factors in computing systems, p.383-390, March 22-27, 1997, Atlanta, Georgia, United States
[doi> 10.1145/258549.258805]
|
 |
17
|
Lakshmish Ramaswamy , Arun Iyengar , Ling Liu , Fred Douglis, Automatic detection of fragments in dynamically generated web pages, Proceedings of the 13th international conference on World Wide Web, May 17-20, 2004, New York, NY, USA
[doi> 10.1145/988672.988732]
|
 |
18
|
|
| |
19
|
Selberg, E. and Etzioni, O. On the instability of Web search engines. In Proceedings of RIAO '00, 2000.
|
 |
20
|
|
|