ACM Home Page
Please provide us with feedback. Feedback
Looking at both the present and the past to efficiently update replicas of web content
Full text PdfPdf (168 KB)
Source Workshop On Web Information And Data Management archive
Proceedings of the 7th annual ACM international workshop on Web information and data management table of contents
Bremen, Germany
SESSION: Web Clustering, filtering and applications table of contents
Pages: 75 - 80  
Year of Publication: 2005
ISBN:1-59593-194-5
Authors
Luciano Barbosa  University of Utah
Ana Carolina Salgado  Universidade Federal de Pernambuco
Francisco de Carvalho  Universidade Federal de Pernambuco
Jacques Robin  Universidade Federal de Pernambuco
Juliana Freire  University of Utah
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 34,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1097047.1097062
What is a DOI?

ABSTRACT

Since Web sites are autonomous and independently updated, applications that keep replicas of Web data, such as Web warehouses and search engines, must periodically poll the sites and check for changes.Since this is a resource-intensive task, in order to keep the copies up-to-date, it is important to devise efficient update schedules that adapt to the change rate of the pages and avoid visiting pages not modified since the last visit.In this paper, we propose a new approach that learns to predict the change behavior of Web pages based both on the static features and change history of pages, and refreshes the copies accordingly.Experiments using real-world data show that our technique leads to substantial performance improvements compared to previously proposed approaches.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
 
4
5
6
 
7
J. Cho and A. Ntoulas. Effective Change Detection Using Sampling. In Proc. of VLDB, pages 514--525, 2002.
 
8
F. Douglis, A. Feldmann, and B. Krishnamurthy. Rate of Change and other Metrics: a Live Study of the World Wide Web. In Proc. of the USENIX Symposium on Internetworking Technologies and Systems, pages 147--158, 1999.
9
 
10
 
11
Internet archive. http://www.archive.org.
 
12
 
13
S. Lawrence and C. L. Giles. Searching the world wide web. Science, 280(5360):98--100, 1998.
 
14
S. Lawrence and C. L. Giles. Accessibility of information on the web. Nature, 400(6740):107--109, 1999.
 
15
The MD5 Message-Digest Algorithm. http://www.rfc-editor.org/rfc/rfc1321.txt.
16
 
17
Webarchive project. http://webarchive.cs.ucla.edu.
 
18
Weka 3: Data Mining Software in Java. http://www.cs.waikato.ac.nz/ ml/weka.


Collaborative Colleagues:
Luciano Barbosa: colleagues
Ana Carolina Salgado: colleagues
Francisco de Carvalho: colleagues
Jacques Robin: colleagues
Juliana Freire: colleagues