| A framework for describing web repositories |
| Full text |
Pdf
(395 KB)
|
Source
|
International Conference on Digital Libraries
archive
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
table of contents
Austin, TX, USA
Pages 341-344
Year of Publication: 2009
ISBN:978-1-60558-322-8
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 21, Downloads (12 Months): 52, Citation Count: 0
|
|
|
ABSTRACT
In prior work we have demonstrated that search engine caches and archiving projects like the Internet Archive's Wayback Machine can be used to "lazily preserve" website and reconstruct them when they are lost. We use the term "web repositories" for collections of automatically refreshed and migrated content, and collectively we refer to these repositories as the "web infrastructure". In this paper we present a framework for describing web repositories and the status of web resources in them. This includes an abstract API for web repository interaction, the concepts of deep vs. flat and light/dark/grey repositories and terminology of describing the recoverability of a web resource. Our API may serve as a foundation for future web repository interfaces.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
William Y. Arms , Selcuk Aya , Pavel Dmitriev , Blazej J. Kot , Ruth Mitchell , Lucia Walle, Building a research library for the history of the web, Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, June 11-15, 2006, Chapel Hill, NC, USA
[doi> 10.1145/1141753.1141771]
|
| |
2
|
R. Baeza-Yates and C. Castillo. Crawling the infinite web: five levels are enough. In Proceedings WAW 2004, pages 156--167, 2004.
|
 |
3
|
|
| |
4
|
Consultative Committee for Space Data Systems. Reference modelfor an open archival information system (OAIS). Technical report, 2002.
|
| |
5
|
M. Day. Preserving the fabric of our lives: A survey of webpreservation initiatives. Research Advanced Technol. Digital Libraries, pages 461--472, 2003.
|
 |
6
|
Dennis Fetterly , Mark Manasse , Marc Najork, Spam, damn spam, and statistics: using statistical analysis to locate spam web pages, Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004, June 17-18, 2004, Paris, France
[doi> 10.1145/1017074.1017077]
|
| |
7
|
C. Marshall, F. McCown, and M. L. Nelson. Evaluating personal archiving strategies for Internet--based information. In Proceedings of IS&T Archiving 2007, pages 151--156, May 2007.
|
| |
8
|
|
| |
9
|
F. McCown and M. L. Nelson. Characterization of search engine caches. In Proceedings of IS&T Archiving 2007, pages 48--52, May 2007.
|
 |
10
|
|
 |
11
|
|
| |
12
|
S. Olsen. Court backs thumbnail image linking. CNET News.com, July 2003. http://news.com.com/2100--1025_3--1023629.html.
|
| |
13
|
|
| |
14
|
S. E. Thomas and C. A. Kroch. Project Harvest: A report of the planning grant for the design of a subject-based electronic journal repository, Sep 2002. http://diglib.org/preserve/cornellfinal.html.
|
|