ACM Home Page
Please provide us with feedback. Feedback
Usage analysis of a public website reconstruction tool
Full text PdfPdf (272 KB)
Source
International Conference on Digital Libraries archive
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries table of contents
Pittsburgh PA, PA, USA
SESSION: Archiving and web tools for digital libraries table of contents
Pages 371-374  
Year of Publication: 2008
ISBN:978-1-59593-998-2
Authors
Frank McCown  Harding University, Searcy, AR, USA
Michael L. Nelson  Old Dominion University, Norfolk, VA, USA
Sponsors
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 80,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1378889.1378955
What is a DOI?

ABSTRACT

The Web is increasingly the medium by which information is published today, but due to its ephemeral nature, web pages and sometimes entire websites are often "lost" due to server crashes, viruses, hackers, run-ins with the law, bankruptcy and loss of interest. When a website is lost and backups are unavailable, an individual or third party can use Warrick to recover the website from several search engine caches and web archives (the Web Infrastructure). In this short paper, we present Warrick usage data obtained from Brass, a queueing system for Warrick hosted at Old Dominion University and made available to the public for free. Over the last six months, 520 individuals have reconstructed more than 700 websites with 800K resources from the Web Infrastructure. Sixty-two percent of the static web pages were recovered, and 41% of all website resources were recovered. The Internet Archive was the largest contributor of recovered resources (78%).


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
A. Cantrell. Data backup no big deal to many, until... CNNMoney.com, 2006. http://money.cnn.com/2006/06/07/technology/data_loss/index.htm.
 
3
M. Day. Preserving the fabric of our lives: A survey of web preservation initiatives. Research and Advanced Technology for Digital Libraries, pages 461--472, 2003.
4
 
5
W. Koehler. A longitudinal study of web pages continued: A consideration of document persistence. Information Research, 9(2), 2004.
 
6
S. Lawrence and C. L. Giles. Searching the world-wide web. Science, 280(4):98--100, 1998.
 
7
C. Marshall, F. McCown, and M. L. Nelson. Evaluating personal archiving strategies for Internet-based information. In Proceedings of IS&T Archiving 2007, pages 151--156, May 2007. arXiv:0704.3647v1.
 
8
F. McCown. Mark Foley websites - reconstructed, 2006. http://www.cs.odu.edu/~fmccown/foley/.
 
9
F. McCown, A. Benjelloun, and M. L. Nelson. Brass: A queueing manager for Warrick. In Proceedings of IWAW '07, June 2007.
10
 
11
F. McCown, C. C. Marshall, and M. L. Nelson. Why websites are lost (and how they're sometimes found). Communications of the ACM, 2008. To appear.
12
 
13
F. McCown and M. L. Nelson. Characterization of search engine caches. In Proceedings of IS&T Archiving 2007, pages 48--52, May 2007. arXiv:cs/0703083v2.
14
 
15
reCAPTCHA. http://recaptcha.net/.
 
16
A. Ross. Internet Archive forums: Web forum posting. Oct. 2004. http://www.archive.org/iathreads/post-view.php?id=23121.
 
17
J. Symons. How the Google cache can save your a$$, Dec. 2005. http://www.smartmoneydaily.com/Business/How-the-Google-Cache-can-Save-You.aspx.


Collaborative Colleagues:
Frank McCown: colleagues
Michael L. Nelson: colleagues