| Usage analysis of a public website reconstruction tool |
| Full text |
Pdf
(272 KB)
|
Source
|
International Conference on Digital Libraries
archive
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
table of contents
Pittsburgh PA, PA, USA
SESSION: Archiving and web tools for digital libraries
table of contents
Pages 371-374
Year of Publication: 2008
ISBN:978-1-59593-998-2
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 7, Downloads (12 Months): 80, Citation Count: 1
|
|
|
ABSTRACT
The Web is increasingly the medium by which information is published today, but due to its ephemeral nature, web pages and sometimes entire websites are often "lost" due to server crashes, viruses, hackers, run-ins with the law, bankruptcy and loss of interest. When a website is lost and backups are unavailable, an individual or third party can use Warrick to recover the website from several search engine caches and web archives (the Web Infrastructure). In this short paper, we present Warrick usage data obtained from Brass, a queueing system for Warrick hosted at Old Dominion University and made available to the public for free. Over the last six months, 520 individuals have reconstructed more than 700 websites with 800K resources from the Web Infrastructure. Sixty-two percent of the static web pages were recovered, and 41% of all website resources were recovered. The Internet Archive was the largest contributor of recovered resources (78%).
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
A. Cantrell. Data backup no big deal to many, until... CNNMoney.com, 2006. http://money.cnn.com/2006/06/07/technology/data_loss/index.htm.
|
| |
3
|
M. Day. Preserving the fabric of our lives: A survey of web preservation initiatives. Research and Advanced Technology for Digital Libraries, pages 461--472, 2003.
|
 |
4
|
|
| |
5
|
W. Koehler. A longitudinal study of web pages continued: A consideration of document persistence. Information Research, 9(2), 2004.
|
| |
6
|
S. Lawrence and C. L. Giles. Searching the world-wide web. Science, 280(4):98--100, 1998.
|
| |
7
|
C. Marshall, F. McCown, and M. L. Nelson. Evaluating personal archiving strategies for Internet-based information. In Proceedings of IS&T Archiving 2007, pages 151--156, May 2007. arXiv:0704.3647v1.
|
| |
8
|
F. McCown. Mark Foley websites - reconstructed, 2006. http://www.cs.odu.edu/~fmccown/foley/.
|
| |
9
|
F. McCown, A. Benjelloun, and M. L. Nelson. Brass: A queueing manager for Warrick. In Proceedings of IWAW '07, June 2007.
|
 |
10
|
|
| |
11
|
F. McCown, C. C. Marshall, and M. L. Nelson. Why websites are lost (and how they're sometimes found). Communications of the ACM, 2008. To appear.
|
 |
12
|
|
| |
13
|
F. McCown and M. L. Nelson. Characterization of search engine caches. In Proceedings of IS&T Archiving 2007, pages 48--52, May 2007. arXiv:cs/0703083v2.
|
 |
14
|
|
| |
15
|
reCAPTCHA. http://recaptcha.net/.
|
| |
16
|
A. Ross. Internet Archive forums: Web forum posting. Oct. 2004. http://www.archive.org/iathreads/post-view.php?id=23121.
|
| |
17
|
J. Symons. How the Google cache can save your a$$, Dec. 2005. http://www.smartmoneydaily.com/Business/How-the-Google-Cache-can-Save-You.aspx.
|
|