|
ABSTRACT
Dividing web pages into fragments has been shown to provide significant benefits for both content generation and caching. In order for a web site to use fragment-based content generation, however, good methods are needed for dividing web pages into fragments. Manual fragmentation of web pages is expensive, error prone, and unscalable. This paper proposes a novel scheme to automatically detect and flag fragments that are cost-effective cache units in web sites serving dynamic content. We consider the fragments to be interesting if they are shared among multiple documents or they have different lifetime or personalization characteristics. Our approach has three unique features. First, we propose a hierarchical and fragment-aware model of the dynamic web pages and a data structure that is compact and effective for fragment detection. Second, we present an efficient algorithm to detect maximal fragments that are shared among multiple documents. Third, we develop a practical algorithm that effectively detects fragments based on their lifetime and personalization characteristics. We evaluate the proposed scheme through a series of experiments, showing the benefits and costs of the algorithms. We also study the impact of adopting the fragments detected by our system on disk space utilization and network bandwidth consumption.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Document Object Model - W3C Recommendation. http://www.w3.org/DOM.
|
| |
2
|
Edge Side Includes - Standard Specification. http://www.esi.org.
|
| |
3
|
HTML TIDY. http://www.w3.org/People/Raggett/tidy/.
|
| |
4
|
H. Bahn, H. Lee, S. H. Noh, S. L. Min, and K. Koh. Replica-Aware Caching for Web Proxies. Computer Communications, 25(3), 2002.
|
 |
5
|
|
| |
6
|
|
| |
7
|
Andrei Z. Broder , Steven C. Glassman , Mark S. Manasse , Geoffrey Zweig, Syntactic clustering of the Web, Selected papers from the sixth international conference on World Wide Web, p.1157-1166, September 1997, Santa Clara, California, United States
|
| |
8
|
|
| |
9
|
K. S. Candan, D. Agrawal, W.-S. Li, O. Po, and W.-P. Hsiung. View Invalidation for Dynamic Content Caching in Multi tiered Architectures. In Proceedings of VLDB-2002, September 2002.
|
| |
10
|
J. Challenger, A. Iyengar, and P. Dantzig. A Scalable System for Consistently Caching Dynamic Web Data. In Proceedings of IEEE INFOCOM 1999, March 1999.
|
| |
11
|
J. Challenger, A. Iyengar, K. Witting, C. Ferstat, and P. Reed. Publishing System for Efficiently Creating Dynamic Web Content. In Proceedings of IEEE INFOCOM 2000, May 2000.
|
| |
12
|
M. C. Chan and T. W. C. Woo. Cache-Based Compaction: A New Technique for Optimizing Web Transfer. In Proceedings of INFOCOM-1999.
|
 |
13
|
Anindya Datta , Kaushik Dutta , Helen Thomas , Debra VanderMeer , Suresha , Krithi Ramamritham, Proxy-based acceleration of dynamically generated content on the world wide web: an approach and implementation, Proceedings of the 2002 ACM SIGMOD international conference on Management of data, June 03-06, 2002, Madison, Wisconsin
[doi> 10.1145/564691.564703]
|
| |
14
|
F. Douglis and A. Iyengar. Application-Specific Delta Encoding Via Resemblance Detection. In Proceedings of the USENIX Annual Technical Conference, June 2003.
|
| |
15
|
|
 |
16
|
|
| |
17
|
P. Kulkarni, F. Douglis, J. LaVoie, and J. Tracey. Redundancy Elimination Within Large Collections of Files. In Proceedings of the USENIX Annual Technical Conference, June 2004. To appear.
|
| |
18
|
U. Manber. Finding Similar Files in a Large File System. In Proceedings of USENIX-1994, January 1994.
|
| |
19
|
J. Mogul. Network Behavior of a Busy Web Server and its Clients. Technical report, DEC Western Research Laboratories, 1995.
|
| |
20
|
J. Mogul, Y. Chan, and T. Kelly. Design, Implementation, and Evaluation of Duplicate Transfer Detection in HTTP. In Proceedings of NSDI '04, March 2004. To appear.
|
| |
21
|
P. Mohapatra and H. Chen. A Framework for Managing QoS and Improving Performance of Dynamic Web Content. In Proceedings of GLOBECOM-2001, November 2001.
|
| |
22
|
M. Naaman, H. Garcia-Molina, and A. Paepcke. Evaluation of ESI and Class-Based Delta Encoding. In Proceedings of WCW - 2003.
|
| |
23
|
M. O. Rabin. Fingerprinting by Random Polynomials. Technical report, Center for Research in Computing Technology, Harvard University, 1981.
|
 |
24
|
|
| |
25
|
|
CITED BY 17
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Iqbal Mohomed , Jim Chengming Cai , Sina Chavoshi , Eyal de Lara, Context-aware interactive content adaptation, Proceedings of the 4th international conference on Mobile systems, applications and services, June 19-22, 2006, Uppsala, Sweden
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jie Han , Dingyi Han , Chenxi Lin , Hua-Jun Zeng , Zheng Chen , Yong Yu, Homepage live: automatic block tracing for web personalization, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
I. V. Ramakrishnan , Jalal Mahmud , Yevgen Borodin , Muhammad Asiful Islam , Faisal Ahmed, Bridging the Web Accessibility Divide, Electronic Notes in Theoretical Computer Science (ENTCS), 235, p.107-124, April, 2009
|
|