|
ABSTRACT
Aliasing occurs in Web transactions when requests containing different URLs elicit replies containing identical data payloads. Conventional caches associate stored data with URLs and can therefore suffer redundant payload transfers due to aliasing and other causes. Existing research literature, however, says little about the prevalence of aliasing in user-initiated transactions, or about redundant payload transfers in conventional Web cache hierarchies.This paper quantifies the extent of aliasing and the performance impact of URL-indexed cache management using a large client trace from WebTV Networks. Fewer than 5% of reply payloads are aliased (referenced via multiple URLs) but over 54% of successful transactions involve aliased payloads. Aliased payloads account for under 3.1% of the trace's "working set size" (sum of payload sizes) but over 36% of bytes transferred. For the WebTV workload, roughly 10% of payload transfers to browser caches and 23% of payload transfers to a shared proxy are redundant, assuming infinite-capacity conventional caches. Our analysis of a large proxy trace from Compaq Corporation yields similar results.URL-indexed caching does not entirely explain the large number of redundant proxy-to-browser payload transfers previously reported in the WebTV system. We consider other possible causes of redundant transfers (e.g., reply metadata and browser cache management policies) and discuss a simple hop-by-hop protocol extension that completely eliminates all redundant transfers, regardless of cause.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
V. Almeida, D. Menascé, R. Riedi, F. Peligrinelli, R. Fonseca, and W. Meira Jr. Analyzing Web robots and their impact on caching. In Proc. 6th Web Caching Workshop, June 2001.
|
| |
2
|
M. Arlitt, R. Friedrich, and T. Jin. Workload characterization of a Web proxy in a cable modem environment. Technical Report HPL-1999-48, HP Labs, 1999.
|
| |
3
|
H. Bahn, H. Lee, S. H. Noh, S. L. Min, and K. Koh. Replica-aware caching for Web proxies. Computer Communications, 25(3):183--188, Feb. 2002.
|
 |
4
|
|
| |
5
|
|
| |
6
|
K. Bharat, A. Broder, J. Dean, and M. R. Henzinger. A comparison of techniques to find mirrored hosts on the WWW. In Proc. Workshop on Organizing Web Space at 4th ACM Conference on Digital Libraries, Aug. 1999.
|
| |
7
|
L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web caching and Zipf-like distributions: Evidence and implications. In Proc. IEEE INFOCOM, Mar. 1999.
|
| |
8
|
|
| |
9
|
|
| |
10
|
Andrei Z. Broder , Steven C. Glassman , Mark S. Manasse , Geoffrey Zweig, Syntactic clustering of the Web, Selected papers from the sixth international conference on World Wide Web, p.1157-1166, September 1997, Santa Clara, California, United States
|
| |
11
|
J. Brutlag. Personal communication.
|
| |
12
|
R. Cáceres, B. Krishnamurthy, and J. Rexford. HTTP 1.0 logs considered harmful. Position Paper, W3C Web Characterization Group Workshop, Nov. 1998. http://www.research.att.com/~jrex/papers/w3c.passant.ps.
|
| |
13
|
CacheFlow Corporation. White paper: Creating a cache-friendly Web site, Apr. 2001. http://www.cacheflow.com/technology/whitepapers/index.cfm.
|
| |
14
|
|
| |
15
|
I. Cooper and J. Dilley. RFC 3143: Known HTTP proxy/caching problems, June 2001.
|
| |
16
|
|
| |
17
|
B. D. Davison. Web traffic logs: An imperfect resource for evaluation. In Proc. 9th Annual Conf. of the Internet Society, June 1999.
|
| |
18
|
J. Dilley. Personal communication.
|
| |
19
|
J. Dilley and M. Arlitt. Improving proxy cache performance---analyzing three cache replacement policies. Technical Report HPL-199-142, HP Labs, Oct. 1999.
|
| |
20
|
F. Douglis, A. Feldmann, B. Krishnamurthy, and J. Mogul. Rate of change and other metrics: A live study of the World Wide Web. In Proc. 1st USITS, pages 147--158, Dec. 1997.
|
| |
21
|
A. Feldmann. Continuous online extraction of HTTP traces from packet traces. In Proc. W3C Web Characterization Group Workshop, 1999. http://www.research.att.com/~anja/feldmann/papers.html.
|
| |
22
|
A. Feldmann, R. Cáceres, F. Douglis, G. Glass, and M. Rabinovich. Performance of Web proxy caching in heterogeneous bandwidth environments. In Proc. IEEE INFOCOM, Mar. 1999.
|
| |
23
|
R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. Berners-Lee. RFC 2616: Hypertext transfer protocol---HTTP/1.1, June 1999.
|
| |
24
|
D. Fisher. Personal communication.
|
| |
25
|
S. Gribble and E. Brewer. System design issues for Internet middleware services: Deductions from a large client trace. In Proc. 1st USITS, pages 207--218, Dec. 1997.
|
| |
26
|
A. Iyengar and J. Challenger. Improving Web server performance by caching dynamic data. In Proc. 1st USITS, pages 49--60, Dec. 1997.
|
| |
27
|
R. Jain. The Art of Computer Systems Performance Analysis. Wiley, 1991.
|
| |
28
|
T. Kelly. Thin-client Web access patterns: Measurements from a cache-busting proxy. Computer Communications, 25(4):357--366, Mar. 2002.http://ai.eecs.umich.edu/~tpkelly/papers/.
|
| |
29
|
M. Koletsou and G. M. Voelker. The Medusa proxy: A tool for exploring user-perceived Web performance. In Proc. 6th Web Caching Workshop, June 2001.
|
| |
30
|
B. Krishnamurthy and M. Arlitt. PRO-COW: Protocol compliance on the Web---a longitudinal study. In Proc. 3rd USITS, pages 109--122, Mar. 2001.
|
| |
31
|
Macromedia. Dreameaver, Nov. 2001. http://www.macromedia.com/support/dreamweaver/.
|
| |
32
|
P. Mattis, J. Plevyak, M. Haines, A. Beguelin, B. Totty, and D. Gourley. U.S. Patent #6,292,880: "Alias-free content-indexed object cache", Sept. 2001.
|
| |
33
|
M. Mikhailov and C. E. Wills. Change and relationship-driven content caching, distribution and assembly. Technical Report WPI-CS-TR-01-03, Worcester Polytechnic Institute, Mar. 2001.
|
| |
34
|
J. C. Mogul. Errors in timestamp-based HTTP header values. Technical Report 99/3, Compaq Western Research Laboratory, Dec. 1999.
|
| |
35
|
J. C. Mogul. A trace-based analysis of duplicate suppression in HTTP. Technical Report 99/2, Compaq Western Research Laboratory, Nov. 1999.
|
| |
36
|
J. C. Mogul. Squeezing more bits out of HTTP caches. IEEE Network, 14(3):6--14, May/June 2000.
|
| |
37
|
J. C. Mogul, F. Douglis, A. Feldmann, and B. Krishnamurthy. Potential benefits of delta encoding and data compression for HTTP (corrected version). Technical Report 97/4a, Digital Western Research Lab, Dec. 1997.
|
| |
38
|
J. C. Mogul, B. Krishnamurthy, F. Douglis, A. Feldmann, Y. Y. Goland, A. van Hoff, and D. M. Hellerstein. RFC 3229: Delta encoding in HTTP, Jan. 2002.
|
| |
39
|
FAQ of the Caching Mechanism in {Mozilla} 331 Release, Apr. 2000. http://www.mozilla.org/docs/netlib/cachefaq.html.
|
 |
40
|
|
| |
41
|
National Institute of Standards and Technology. Secure hash standard. FIPS Pub. 180-1, U.S. Dept. of Commerce, Apr. 1995. http://csrc.nist.gov/publications/fips/fips180-1/fip180-1.txt.
|
| |
42
|
H. Nordstrom. Squid cache revalidation and metadata updates. Posting to squid-dev mailing list, Oct. 2001. http://www.squid-cache.org/mail-archive/squid-dev/200110/0054.html.
|
 |
43
|
Venkata N. Padmanabhan , Lili Qiu, The content and access dynamics of a busy Web site: findings and implications, Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, p.111-123, August 28-September 01, 2000, Stockholm, Sweden
|
| |
44
|
W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes in C. Cambridge University Press, second edition, 1992.
|
| |
45
|
A. Rousskov and D. Wessels. The third cache-off: The official report. Technical report, The Measurement Factory, Inc., Oct. 2000. http://www.measurement-factory.com/results/public/cacheoff/N03/report.by-meas.html.
|
| |
46
|
J. Santos and D. Wetherall. Increasing effective link bandwidth by suppressing replicated data. In Proc. USENIX Annual Technical Conf., June 1998.
|
| |
47
|
|
| |
48
|
B. Smith, A. Acharya, T. Yang, and H. Zhu. Exploiting result equivalence in caching dynamic content. In Proc. 2nd USITS, pages 209--220, Oct. 1999.
|
| |
49
|
SPECweb. http://www.spec.org/osg/web99/.
|
 |
50
|
Neil T. Spring , David Wetherall, A protocol-independent technique for eliminating redundant network traffic, Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, p.87-95, August 28-September 01, 2000, Stockholm, Sweden
|
| |
51
|
D. Surovell. Personal communication.
|
| |
52
|
A. van Hoff, J. Giannandrea, M. Hapner, S. Carter, and M. Medin. The HTTP distribution and replication protocol. Technical Report NOTE-DRP, World Wide Web Consortium, Aug. 1997. http://www.w3.org/TR/NOTE-drp-19970825.html.
|
| |
53
|
Web Polygraph. http://www.web-polygraph.org/.
|
| |
54
|
C. E. Wills and M. Mikhailov. Examining the cacheability of user-requested Web resources. In Proc. 4th Web Caching Workshop, Apr. 1999.
|
| |
55
|
|
| |
56
|
C. E. Wills and M. Mikhailov. Studying the impact of more complete server information on Web caching. In Proc. 5th Web Caching Workshop, May 2000.
|
| |
57
|
A. Wolman, G. Voelker, N. Sharma, N. Cardwell, M. Brown, T. Landray, D. Pinnel, A. Karlin, and H. Levy. Organization-based analysis of Web-object sharing and caching. In Proc. 2nd USITS, Oct. 1999.
|
 |
58
|
Alec Wolman , M. Voelker , Nitin Sharma , Neal Cardwell , Anna Karlin , Henry M. Levy, On the scale and performance of cooperative Web proxy caching, ACM SIGOPS Operating Systems Review, v.33 n.5, p.16-31, Dec. 1999
|
| |
59
|
|
CITED BY 15
|
|
|
|
|
|
|
|
|
|
Huajing Li , Wang-Chien Lee , Anand Sivasubramaniam , Lee Giles, SearchGen: a synthetic workload generator for scientific literature digital libraries and search engines, Proceedings of the 2007 conference on Digital libraries, June 18-23, 2007, Vancouver, BC, Canada
|
|
Lakshmish Ramaswamy , Arun Iyengar , Ling Liu , Fred Douglis, Automatic detection of fragments in dynamically generated web pages, Proceedings of the 13th international conference on World Wide Web, May 17-20, 2004, New York, NY, USA
|
|
|
Jeffery C. Mogul , Yee Man Chan , Terence Kelly, Design, implementation, and evaluation of duplicate transfer detection in HTTP, Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation, p.4-4, March 29-31, 2004, San Francisco, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rajiv Chakravorty , Suman Banerjee , Pablo Rodriguez , Julian Chesterfield , Ian Pratt, Performance optimizations for wireless wide-area networks: comparative study and experimental evaluation, Proceedings of the 10th annual international conference on Mobile computing and networking, September 26-October 01, 2004, Philadelphia, PA, USA
|
INDEX TERMS
Primary Classification:
C.
Computer Systems Organization
C.2
COMPUTER-COMMUNICATION NETWORKS
C.2.2
Network Protocols
Subjects:
Applications (SMTP, FTP, etc.)
Additional Classification:
C.
Computer Systems Organization
C.2
COMPUTER-COMMUNICATION NETWORKS
C.2.4
Distributed Systems
Subjects:
Client/server
C.4
PERFORMANCE OF SYSTEMS
Subjects:
Measurement techniques
General Terms:
Design,
Management,
Measurement,
Performance
Keywords:
DTD,
HTTP,
WWW,
Zipf's law,
aliasing,
cache hierarchies,
caching,
duplicate suppression,
duplicate transfer detection,
hypertext transfer protocol,
performance analysis,
redundant transfers,
resource modification,
world wide web
Peer to Peer - Readers of this Article have also read:
-
M4: a metamodel for data preprocessing
Proceedings of the 4th ACM international workshop on Data warehousing and OLAP
Anca Vaduva
, Jörg-Uwe Kietz
, Regina Zücker
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
|