ABSTRACT
The management of electronic document collections is fundamentally different from the management of paper documents. The ephemeral nature of some electronic documents means that the document address (i.e., reference details of the document) can become incorrect some time after coming into use, resulting in references, such as index entries and hypertext links, failing to correctly address the document they describe. A classic case of invalidated references is on the World Wide Web—links that point to a named resource fail when the domain name, file name, or any other aspect of the addressed resource is changed, resulting in the well-known Error 404. Additionally, there are other errors which arise from changes to document collections.This paper surveys the strategies used both in World Wide Web software and other hypertext systems for managing the integrity of references and hence the integrity of links. Some strategies are preventative, not permitting errors to occur; others are corrective, discovering references errors and sometimes attempting to correct them; while the last strategy is adaptive, because references are calculated on a just-in-time basis, according the current state of the document collection.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
ACM. 2000. ACM Digital Library, http://www.acm. org/dl/.
|
| |
2
|
ARNOLD-MOORE, T. AND SACKS-DAVIS, R. 1994. Databases of Legislation: The Problems of Consolidation, Technical Report CITRI/TR-94- 9, Royal Melbourne Institute of Technology.
|
| |
3
|
ASHMAN, H. 1997. Theory and Practice of Large- Scale Hypermedia Management Systems, Ph.D. thesis, Royal Melbourne Institute of Technology.
|
| |
4
|
|
| |
5
|
ASHMAN, H., GARRIDO, A., AND OINAS-KUKKONEN, H. 1997. Hand-made and computed links, precomputed and dynamic links. In Proceedings of Hypermedia-Information Retrieval- Multimedia '97 (HIM '97) Conference, 191-208.
|
| |
6
|
BERNERS-LEE, T. 1996. Universal resource identifiers in WWW: a unifying syntax for the expression of names and addresses of objects on the network as used in the World Wide Web, World Wide Web Journal 1, 2 3-19.
|
| |
7
|
BERNERS-LEE, T., FIELDING, R., AND FRYSTYK, H. 1996. Hypertext transfer protocol HTTP/1.0, World Wide Web Journal 1, 2 59-94.
|
| |
8
|
BROWNE, S., DONGARRA, J., GREEN, S., MOORE, K., PEPIN, T., ROWAN, T., AND WADE, R. 1995. Location-Independent Naming for Virtual Distributed Software Repositories, http://www.netlib.org/utk/-papers/lifn/main. html.
|
| |
9
|
CAJUN. 2000. The CAJUN Project. Electronic Publishing Research Group. http://cajun.cs.nott. ac.uk.
|
| |
10
|
Leslie Carr , Gary Hill , David de Roure , Wendy Hall , Hugh Davis, Open information services, Proceedings of the fifth international World Wide Web conference on Computer networks and ISDN systems, p.1027-1036, May 1996, Paris, France
|
| |
11
|
CHANKHUNTHOD, A., DANZIG, P., NEERDAELS, C., SCHWARTZ, M., AND WORRELL, K. 1995. A Hierarchical Internet Object Cache, http://excalibur.usc.edu/cache-html/cache.html.
|
| |
12
|
CNRI. Corporation for National Research Initiatives. 1998. The Handle System, http://www. handle.net/.
|
| |
13
|
CONNOLLY, D. 1996. Names and addresses; URIs, URLs, URNs, URCs. http://www.w3.org/pub/ www/Addressing/.
|
| |
14
|
|
 |
15
|
|
 |
16
|
Hugh C. Davis, Referential integrity of links in open hypermedia systems, Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems, p.207-216, June 20-24, 1998, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/276627.276650]
|
 |
17
|
Hugh Davis , Wendy Hall , Ian Heath , Gary Hill , Rob Wilkins, Towards an integrated information environment with open hypermedia systems, Proceedings of the ACM conference on Hypertext, p.181-190, November 30-December 04, 1992, Milan, Italy
[doi> 10.1145/168466.168522]
|
| |
18
|
IANELLA, R., SUE, H., AND LEONG, D. 1996. BURNS: basic urn service resolution for the internet. In Proceedings of the Asia-Pacific World Wide Web Conference, Beijing and Hong Kong, http://www.dstc.edu.au/Research/Research/ Resource Discovery/publications/apweb96/ index.html.
|
| |
19
|
|
| |
20
|
IDF98. International DOI Foundation. 1998. About the DOI, http://www.doi.org/about the doi. html.
|
| |
21
|
Jane's. 2000. Jane's Information Group, All the World's Aircraft, CD-ROM.
|
| |
22
|
KANTOR, B. AND LAPSLEY, P. 1986. Network News Transfer Protocol-A Proposed Standard for the Stream-Based Transmission of News. Internet RFC 977, http://www.w3.org/ Protocols/rfc977/rfc977.txt.
|
| |
23
|
|
| |
24
|
KAPPE, F. 1995. A scalable architecture for maintaining referential integrity in distributed information systems, Journal of Universal Computer Science 1, 2 http://www. iicm.edu/jucs 1 2/a scalable architecture for.
|
| |
25
|
|
 |
26
|
Cesare Maioli , Stefano Sola , Fabio Vitali, Wide-area distribution issues in Hypertext systems, Proceedings of the 11th annual international conference on Systems documentation, p.185-197, October 05-08, 1993, Waterloo, Ontario, Canada
[doi> 10.1145/166025.166081]
|
 |
27
|
|
| |
28
|
OCLC, 1996. Online Computer Library Center, Inc. PURL, http://purl.oclc.org/.
|
| |
29
|
OJP. 1999. Open Journal Project. http://journals. ecs.soton.ac.uk.
|
| |
30
|
|
| |
31
|
|
| |
32
|
|
| |
33
|
THISTLEWAITE, P. 1995. Managing large hypermedia information bases: a case study involving the Australian parliament. Proceedings of the Ausweb '95 Conference, 223-228, http://ausweb.scu.edu.au/sponsored/ausweb/ ausweb95/papers/management/thistlewaite/.
|
| |
34
|
|
| |
35
|
|
| |
36
|
VANZYL, A., CESNIK, B., HEATH, I., AND DAVIS, H. 1994. Open hypertext systems: An examination of requirements, and analysis of implementation strategies, comparing microcosm, hyperTED, and the world wide web, http://www.inf-wiss.unikonstanz.de/Res/openhypermedia.html.
|
| |
37
|
VERBYLA, J. AND ASHMAN, H. 1994. A userconfigurable hypermedia-based interface via the functional model of the link, Hypermedia 6, 3, 193-208.
|
CITED BY 9
|
|
|
|
|
|
|
|
Zubin Dalal , Suvendu Dash , Pratik Dave , Luis Francisco-Revilla , Richard Furuta , Unmil Karadkar , Frank Shipman, Managing distributed collections: evaluating web page changes, movement, and replacement, Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, June 07-11, 2004, Tuscon, AZ, USA
|
|
|
|
|
|
Paul Logasa Bogen, II , Joshua Johnston , Unmil P. Karadkar , Richard Furuta , Frank Shipman, Application of kalman filters to identify unexpected change in blogs, Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries, June 16-20, 2008, Pittsburgh PA, PA, USA
|
|
|
|
|
|
|
|
|
Dong Zhou , Mark Truran , Tim Brailsford , Helen Ashman , Amir Pourabdollah, Llama-b: automatic hyperlink authoring in the blogosphere, Proceedings of the nineteenth ACM conference on Hypertext and hypermedia, June 19-21, 2008, Pittsburgh, PA, USA
|
|
|
Atsuyuki Morishima , Akiyoshi Nakamizo , Toshinari Iida , Shigeo Sugimoto , Hiroyuki Kitagawa, Bringing your dead links back to life: a comprehensive approach and lessons learned, Proceedings of the 20th ACM conference on Hypertext and hypermedia, June 29-July 01, 2009, Torino, Italy
|
REVIEW
"Claudiu Popescu : Reviewer"
This article analyzes the problem of the integrity of
electronic documents, in particular, of Web sites.
The main problem is that hyperlinks are frequently changed,
producing the well-known Error 404. Based on the stunning fact
that the avera
more...
|