| Modelling information persistence on the web |
| Full text |
Pdf
(188 KB)
|
| Source
|
International Conference On Web Engineering; Vol. 263
archive
Proceedings of the 6th international conference on Web engineering
table of contents
Palo Alto, California, USA
SESSION: Best paper session: best paper candidates
table of contents
Pages: 193 - 200
Year of Publication: 2006
ISBN:1-59593-352-2
|
|
Authors
|
|
Daniel Gomes
|
Universidade de Lisboa, Faculdade de Ciências, Portugal
|
|
Mário J. Silva
|
Universidade de Lisboa, Faculdade de Ciências, Portugal
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 4, Downloads (12 Months): 30, Citation Count: 2
|
|
|
ABSTRACT
Models of web data persistency are essential tools for the designof efficient information extraction systems that repeatedlycollect and process the data. This study models the persistence ofweb data through the measurement of URL and content persistenceacross several snapshots of a national community web, collectedfor 3 years. We found that the lifetimes of URLs and contents aremodelled by logarithmic functions. We gathered statistics on thestructure of the web, identified reasons for URL death andcharacterized persistent URLs and contents. The lasting contentstend to be referenced by different URLs during their lifetime,while half of the contents referenced by persistent URLs do notchange.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
L. Bent , M. Rabinovich , G. M. Voelker , Z. Xiao, Characterization of a large web site population with implications for content delivery, Proceedings of the 13th international conference on World Wide Web, May 17-20, 2004, New York, NY, USA
[doi> 10.1145/988672.988743]
|
| |
2
|
|
| |
3
|
C. Castillo. E ective Web Crawling. PhD thesis, University of Chile, November 2004.
|
| |
4
|
|
 |
5
|
|
| |
6
|
|
| |
7
|
F. Douglis, A. Feldmann, B. Krishnamurthy, and J. C. Mogul. Rate of change and other metrics: a live study of the world wide web. In USENIX Symposium on Internet Technologies and Systems, 1997.
|
 |
8
|
|
| |
9
|
T. A. S. Foundation. Apache HTTP Server Version 1.3: Module mod include, November 2004.
|
 |
10
|
|
 |
11
|
|
| |
12
|
|
| |
13
|
|
 |
14
|
Steve Lawrence , Frans Coetzee , Eric Glover , Gary Flake , David Pennock , Bob Krovetz , Finn Nielsen , Andries Kruger , Lee Giles, Persistence of information on the web: analyzing citations contained in research articles, Proceedings of the ninth international conference on Information and knowledge management, p.235-242, November 06-11, 2000, McLean, Virginia, United States
[doi> 10.1145/354756.354824]
|
| |
15
|
J. Markwell and D. W. Brooks. 'link rot' limits the usefulness of web-based educational materials in biochemistry and molecular biology. Biochemistry and Molecular Biology Education, 31(1):69--72, 2003.
|
 |
16
|
|
 |
17
|
|
CITED BY 2
|
|
|
|
|
Yusuke Yanbe , Adam Jatowt , Satoshi Nakamura , Katsumi Tanaka, Can social bookmarking enhance search in the web?, Proceedings of the 2007 conference on Digital libraries, June 18-23, 2007, Vancouver, BC, Canada
|
|