ACM Home Page
Please provide us with feedback. Feedback
Automatic detection of fragments in dynamically generated web pages
Full text PdfPdf (268 KB)
Source International World Wide Web Conference archive
Proceedings of the 13th international conference on World Wide Web table of contents
New York, NY, USA
SESSION: Versioning and fragmentation table of contents
Pages: 443 - 454  
Year of Publication: 2004
ISBN:1-58113-844-X
Authors
Lakshmish Ramaswamy  Georgia Tech, Atlanta, GA
Arun Iyengar  IBM T.J. Watson Research Center, Yorktown Heights, NY
Ling Liu  Georgia Tech, Atlanta, GA
Fred Douglis  IBM T.J. Watson Research Center, Yorktown Heights, NY
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 83,   Citation Count: 17
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/988672.988732
What is a DOI?

ABSTRACT

Dividing web pages into fragments has been shown to provide significant benefits for both content generation and caching. In order for a web site to use fragment-based content generation, however, good methods are needed for dividing web pages into fragments. Manual fragmentation of web pages is expensive, error prone, and unscalable. This paper proposes a novel scheme to automatically detect and flag fragments that are cost-effective cache units in web sites serving dynamic content. We consider the fragments to be interesting if they are shared among multiple documents or they have different lifetime or personalization characteristics. Our approach has three unique features. First, we propose a hierarchical and fragment-aware model of the dynamic web pages and a data structure that is compact and effective for fragment detection. Second, we present an efficient algorithm to detect maximal fragments that are shared among multiple documents. Third, we develop a practical algorithm that effectively detects fragments based on their lifetime and personalization characteristics. We evaluate the proposed scheme through a series of experiments, showing the benefits and costs of the algorithms. We also study the impact of adopting the fragments detected by our system on disk space utilization and network bandwidth consumption.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Document Object Model - W3C Recommendation. http://www.w3.org/DOM.
 
2
Edge Side Includes - Standard Specification. http://www.esi.org.
 
3
HTML TIDY. http://www.w3.org/People/Raggett/tidy/.
 
4
H. Bahn, H. Lee, S. H. Noh, S. L. Min, and K. Koh. Replica-Aware Caching for Web Proxies. Computer Communications, 25(3), 2002.
5
 
6
 
7
 
8
 
9
K. S. Candan, D. Agrawal, W.-S. Li, O. Po, and W.-P. Hsiung. View Invalidation for Dynamic Content Caching in Multi tiered Architectures. In Proceedings of VLDB-2002, September 2002.
 
10
J. Challenger, A. Iyengar, and P. Dantzig. A Scalable System for Consistently Caching Dynamic Web Data. In Proceedings of IEEE INFOCOM 1999, March 1999.
 
11
J. Challenger, A. Iyengar, K. Witting, C. Ferstat, and P. Reed. Publishing System for Efficiently Creating Dynamic Web Content. In Proceedings of IEEE INFOCOM 2000, May 2000.
 
12
M. C. Chan and T. W. C. Woo. Cache-Based Compaction: A New Technique for Optimizing Web Transfer. In Proceedings of INFOCOM-1999.
13
 
14
F. Douglis and A. Iyengar. Application-Specific Delta Encoding Via Resemblance Detection. In Proceedings of the USENIX Annual Technical Conference, June 2003.
 
15
16
 
17
P. Kulkarni, F. Douglis, J. LaVoie, and J. Tracey. Redundancy Elimination Within Large Collections of Files. In Proceedings of the USENIX Annual Technical Conference, June 2004. To appear.
 
18
U. Manber. Finding Similar Files in a Large File System. In Proceedings of USENIX-1994, January 1994.
 
19
J. Mogul. Network Behavior of a Busy Web Server and its Clients. Technical report, DEC Western Research Laboratories, 1995.
 
20
J. Mogul, Y. Chan, and T. Kelly. Design, Implementation, and Evaluation of Duplicate Transfer Detection in HTTP. In Proceedings of NSDI '04, March 2004. To appear.
 
21
P. Mohapatra and H. Chen. A Framework for Managing QoS and Improving Performance of Dynamic Web Content. In Proceedings of GLOBECOM-2001, November 2001.
 
22
M. Naaman, H. Garcia-Molina, and A. Paepcke. Evaluation of ESI and Class-Based Delta Encoding. In Proceedings of WCW - 2003.
 
23
M. O. Rabin. Fingerprinting by Random Polynomials. Technical report, Center for Research in Computing Technology, Harvard University, 1981.
24
 
25

CITED BY  17

Collaborative Colleagues:
Lakshmish Ramaswamy: colleagues
Arun Iyengar: colleagues
Ling Liu: colleagues
Fred Douglis: colleagues