|
ABSTRACT
In this paper we study how to refresh a local copy of an autonomous data source to maintain the copy up-to-date. As the size of the data grows, it becomes more difficult to maintain the copy \ fresh, “making it crucial to synchronize the copy effectively. We define two freshness metrics, change models of the underlying data, and synchronization policies. We analytically study how effective the various policies are. We also experimentally verify our analysis, based on data collected from 270 web sites for more than 4 months, and we show that our new policy improves the \ freshness” very significantly compared to current policies in use.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Google Inc. http://www.google.com.
|
| |
2
|
|
| |
3
|
J. Cho and H. Garcia-Molina. Synchronizing a database to improve freshness. Technical report, Stanford University, 1999. http://www-db. stanford.edu/~cho/papers/cho-synch.ps.
|
| |
4
|
J. Cho and H. Garcia-Molina. Estimating frequency of change. Technical report, Stanford University, 2000.
|
| |
5
|
|
| |
6
|
E. Coffman, Jr., Z. Liu, and R. R. Weber. Optimal robot scheduling for web search engines. Technical report, INRIA, 1997.
|
| |
7
|
J. Hammer, H. Garcia-Molina, J. Widom, W. J. Labio, and Y. Zhuge. The Stanford data warehousing project. IEEE Data Engineering Bulletin, June 1995.
|
 |
8
|
Venky Harinarayan , Anand Rajaraman , Jeffrey D. Ullman, Implementing data cubes efficiently, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.205-216, June 04-06, 1996, Montreal, Quebec, Canada
|
| |
9
|
S. Lawrence and C. L. Giles. Accessibility of information on the web. Nature, 400:107-109, 1999.
|
| |
10
|
|
| |
11
|
H. M. Taylor and S. Karlin. An Introduction To Stochastic Modeling. Academic Press, 3rd edition, 1998.
|
| |
12
|
G. B. Thomas, Jr. Calculus and analytic geometry. Addison-Wesley, 4th edition, 1969.
|
 |
13
|
Yue Zhuge , Héctor García-Molina , Joachim Hammer , Jennifer Widom, View maintenance in a warehousing environment, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.316-327, May 22-25, 1995, San Jose, California, United States
|
CITED BY 59
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
J. L. Wolf , M. S. Squillante , P. S. Yu , J. Sethuraman , L. Ozsen, Optimal crawling strategies for web search engines, Proceedings of the 11th international conference on World Wide Web, May 07-11, 2002, Honolulu, Hawaii, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Johnny W. Wong , David Evans , Michael Kwok, On staleness and the delivery of web pages, Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research, p.17, November 05-07, 2001, Toronto, Ontario, Canada
|
|
|
|
|
|
Junghoo Cho , Hector Garcia-Molina , Taher Haveliwala , Wang Lam , Andreas Paepcke , Sriram Raghavan , Gary Wesley, Stanford WebBase components and applications, ACM Transactions on Internet Technology (TOIT), v.6 n.2, p.153-186, May 2006
|
|
|
Junghoo Cho , Hector Garcia-Molina , Taher Haveliwala , Wang Lam , Andreas Paepcke , Sriram Raghavan , Gary Wesley, Stanford WebBase components and applications, ACM Transactions on Internet Technology (TOIT), v.6 n.2, p.153-186, May 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ravi Kokku , Praveen Yalagandula , Arun Venkataramani , Mike Dahlin, NPS: a non-interfering deployable web perfectching system, Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems, p.14-14, March 26-28, 2003, Seattle, WA
|
|
|
|
|
|
|
|
|
|
|
|
Uwe Röhm , Klemens Böhm , Hans-Jörg Schek , Heiko Schuldt, FAS: a freshness-sensitive coordination middleware for a cluster of OLAP components, Proceedings of the 28th international conference on Very Large Data Bases, p.754-765, August 20-23, 2002, Hong Kong, China
|
|
|
|
|
|
|
|
|
|
|
|
Anirban Dasgupta , Arpita Ghosh , Ravi Kumar , Christopher Olston , Sandeep Pandey , Andrew Tomkins, The discoverability of the web, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
|
Young Geun Han , Sang Ho Lee , Jae Hwi Kim , Yanggon Kim, A new aggregation policy for RSS services, Proceedings of the 2008 international workshop on Context enabled source and service selection, integration and adaptation: organized with the 17th International World Wide Web Conference (WWW 2008), p.1-7, April 22-22, 2008, Beijing, China
|
|
|
|
|
|
Hugo Santana , Geber Ramalho , Vincent Corruble , Bohdana Ratitch, Multi-Agent Patrolling with Reinforcement Learning, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, p.1122-1129, July 19-23, 2004, New York, New York
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mohammad Hossein Bateni , Lukasz Golab , Mohammad Taghi Hajiaghayi , Howard Karloff, Scheduling to minimize staleness and stretch in real-time data warehouses, Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures, August 11-13, 2009, Calgary, AB, Canada
|
|