ACM Home Page
Please provide us with feedback. Feedback
Bringing your dead links back to life: a comprehensive approach and lessons learned
Full text PdfPdf (486 KB)
Source
Conference on Hypertext and Hypermedia archive
Proceedings of the 20th ACM conference on Hypertext and hypermedia table of contents
Torino, Italy
SESSION: Hypertext structure and usage table of contents
Pages 15-24  
Year of Publication: 2009
ISBN:978-1-60558-486-7
Authors
Atsuyuki Morishima  University of Tsukuba, Tsukuba, Japan
Akiyoshi Nakamizo  Shibaura Institute of Technology, Tokyo, Japan
Toshinari Iida  University of Tsukuba, Tsukuba, Japan
Shigeo Sugimoto  University of Tsukuba, Tsukuba, Japan
Hiroyuki Kitagawa  University of Tsukuba, Tsukuba, Japan
Sponsors
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 39,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1557914.1557921
What is a DOI?

ABSTRACT

This paper presents an experimental study of the automatic correction of broken (dead) Web links focusing, in particular, on links broken by the relocation ofWeb pages. Our first contribution is that we developed an algorithm that incorporates a comprehensive set of heuristics, some of which are novel, in a single unified framework. The second contribution is that we conducted a relatively large-scale experiment, and analysis of our results revealed the characteristics of the problem of finding movedWeb pages. We demonstrated empirically that the problem of searching for moved pages is different from typical information retrieval problems. First, it is impossible to identify the final destination until the page is moved, so the index-server approach is not necessarily effective. Secondly, there is a large bias about where the new address is likely to be and crawler-based solutions can be effectively implemented, avoiding the need to search the entire Web. We analyzed the experimental results in detail to show how important each heuristic is in real Web settings, and conducted statistical analyses to show that our algorithm succeeds in correctly finding new links for more than 70% of broken links at 95% confidence level.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
3
 
4
 
5
M. Beynon, A. Flegg: Hypertext Request Integrity and User Experience. US Patent Application Publication, US 2004/0267726 A1, Dec, 2004.
 
6
M. Beynon, A. Flegg: Guaranteeing Hypertext Link Integrity. US Patent Application Publication, US 2005/0021997 A1, Jan. 2005.
7
8
9
10
11
 
12
R. P. Dellavalle, E. J. hester, L. F. Heilig, A. L. Drake, J. W. Kuntzman, M. Graber, L. M. Schilling: Going, Going, Gone: Lost Internet References, Science 302(31), 2003: 787--788
13
14
 
15
 
16
 
17
 
18
Toshinari Iida, Natsumi Sawa, Atsuyuki Morishima, Shigeo Sugimoto, Hiroyuki Kitagawa. Efficient Search for Moved Web Pages. Proc. DEWS2007, 7 pages, 2007 (in Japanese).
 
19
 
20
Google Technology. http://www.google.com/technology/.
 
21
GVU Center, College of Computing Georgia Institute of Technology. GVU's 10th WWW User Survey. http://www.gvu.gatech.edu/user_surveys/survey-1998-10/.
 
22
A. Mood, F. Graybill, D. Boes. Introduction to the theory of statistics. McGraw-Hill, 1974.
 
23
A. Morishima, et al. Automatic Correction of Broken Web Links (full version of this paper) Technical Report, University of Tsukuba.
 
24
Thomas A. Phelps, Robert Wilensky: Robust Hyperlinks: Cheap, Everywhere, Now. DDEP/PODDP 2000: 28--43
 
25
Persistent URL Home Page. http://purl.oclc.org/.
 
26
RFC2396. Uniform Resource Identifiers (URI): Generic Syntax. http://www.ietf.org/rfc/rfc2396.txt.
 
27
 
28
L. Huxley, E. Place, D. Boyd and P. Cross. Planet SOSIG - A spring-clean for SOSIG: a systematic approach to collection management. http://www.ariadne.ac.uk/issue33/planet-sosig/.
 
29
 
30
Xenu's Link Sleuth. http://www.cs.washington.edu/lab/sw/LinkSleuth.html.

Collaborative Colleagues:
Atsuyuki Morishima: colleagues
Akiyoshi Nakamizo: colleagues
Toshinari Iida: colleagues
Shigeo Sugimoto: colleagues
Hiroyuki Kitagawa: colleagues