ACM Home Page
Please provide us with feedback. Feedback
Estimating frequency of change
Full text PdfPdf (354 KB)
Source ACM Transactions on Internet Technology (TOIT) archive
Volume 3 ,  Issue 3  (August 2003) table of contents
Pages: 256 - 290  
Year of Publication: 2003
ISSN:1533-5399
Authors
Junghoo Cho  University of California, Los Angeles, CA
Hector Garcia-Molina  Stanford University, Stanford, CA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 134,   Citation Count: 28
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/857166.857170
What is a DOI?

ABSTRACT

Many online data sources are updated autonomously and independently. In this article, we make the case for estimating the change frequency of data to improve Web crawlers, Web caches and to help data mining. We first identify various scenarios, where different applications have different requirements on the accuracy of the estimated frequency. Then we develop several "frequency estimators" for the identified scenarios, showing analytically and experimentally how precise they are. In many cases, our proposed estimators predict change frequencies much more accurately and improve the effectiveness of applications. For example, a Web crawler could achieve 35% improvement in "freshness" simply by adopting our proposed estimator.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Baentsch, M., Baum, L., Molter, G., Rothkugel, S., and Sturm, P. 1997. World Wide Web caching: The application-level view of the internet. IEEE Commun. 35, 6 (June), 170--178.
 
2
Bernardo, J. and Smith, A. 1994. Bayesian Theory. Wiley, New York.
 
3
 
4
 
5
Canavos, G. 1972. A Bayesian approach to parameter and reliability estimation in the Poisson distribution. IEEE Trans. Reliab. R21, 52--56.
 
6
7
 
8
Cho, J. and Garcia-Molina, H. 2002c. Estimating frequency of change. Tech. Rep., Univ. California, Los Angeles, Calif.
 
9
Coffman, Jr., E., Liu, Z., and Weber, R. R. 1998. Optimal robot scheduling for web search engines. J. Sched. 1, 1 (June), 15--29.
 
10
Courant, R. and David, H. 1989. Methods of mathematical physics, 1st ed. Wiley, New York.
 
11
Douglis, F., Feldmann, A., Krishnamurthy, B., and Mogul, J. 1999. Rate of change and other metrics: a live study of the world wide web. In USENIX Symposium on Internetworking Technologies and Systems.
12
 
13
Gwertzman, J. and Seltzer, M. 1996. World-Wide Web cache consistency. In Proceedings of USENIX 1996 Annual Technical Conference.
 
14
Hammer, J., Garcia-Molina, H., Widom, J., Labio, W. J., and Zhuge, Y. 1995. The Stanford data warehousing project. IEEE Data Eng. Bull. 18, 2 (June), 40--47.
15
 
16
Lee, P. M. 1997. Bayesian Statistics: An Introduction, 2nd ed. Arnold.
 
17
Matloff, N. 2002. Estimation of internet file-access/modification rates from incomplete data. Tech. Rep., University of California, Davis, Calif.
 
18
Misra, P. and Sorenson, H. 1975. Parameter estimation in Poisson processes. IEEE Trans. Inf. Theory IT-21, 87--90.
 
19
Snyder, D. L. 1975. Random Point Processes. Wiley, New York.
 
20
Taylor, H. M. and Karlin, S. 1998. An Introduction to Stochastic Modeling, 3rd ed. Academic Press, Orlando, Fla.
 
21
Thomas, Jr., G. B. 1969. Calculus and Analytic Geometry, 4th ed. Addison-Wesley, Reading, Mass.
 
22
Wackerly, D. D., Mendenhall, W., and Scheaffer, R. L. 1997. Mathematical Statistics with Applications, 5th ed. PWS Publishing.
 
23
 
24
Winkler, R. L. 1972. An Introduction to Bayesian Inference and Decision. Holt, Rinehart and Winston, Inc.
25
 
26
27
28

CITED BY  28

Collaborative Colleagues:
Junghoo Cho: colleagues
Hector Garcia-Molina: colleagues