ACM Home Page
Please provide us with feedback. Feedback
Modelling information persistence on the web
Full text PdfPdf (188 KB)
Source International Conference On Web Engineering; Vol. 263 archive
Proceedings of the 6th international conference on Web engineering table of contents
Palo Alto, California, USA
SESSION: Best paper session: best paper candidates table of contents
Pages: 193 - 200  
Year of Publication: 2006
ISBN:1-59593-352-2
Authors
Daniel Gomes  Universidade de Lisboa, Faculdade de Ciências, Portugal
Mário J. Silva  Universidade de Lisboa, Faculdade de Ciências, Portugal
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 30,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1145581.1145623
What is a DOI?

ABSTRACT

Models of web data persistency are essential tools for the designof efficient information extraction systems that repeatedlycollect and process the data. This study models the persistence ofweb data through the measurement of URL and content persistenceacross several snapshots of a national community web, collectedfor 3 years. We found that the lifetimes of URLs and contents aremodelled by logarithmic functions. We gathered statistics on thestructure of the web, identified reasons for URL death andcharacterized persistent URLs and contents. The lasting contentstend to be referenced by different URLs during their lifetime,while half of the contents referenced by persistent URLs do notchange.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
C. Castillo. E ective Web Crawling. PhD thesis, University of Chile, November 2004.
 
4
5
 
6
 
7
F. Douglis, A. Feldmann, B. Krishnamurthy, and J. C. Mogul. Rate of change and other metrics: a live study of the world wide web. In USENIX Symposium on Internet Technologies and Systems, 1997.
8
 
9
T. A. S. Foundation. Apache HTTP Server Version 1.3: Module mod include, November 2004.
10
11
 
12
 
13
14
 
15
J. Markwell and D. W. Brooks. 'link rot' limits the usefulness of web-based educational materials in biochemistry and molecular biology. Biochemistry and Molecular Biology Education, 31(1):69--72, 2003.
16
17


Collaborative Colleagues:
Daniel Gomes: colleagues
Mário J. Silva: colleagues