| SCAN-Lite: enterprise-wide analysis on the cheap |
| Full text |
Pdf
(265 KB)
|
Source
|
European Conference on Computer Systems
archive
Proceedings of the 4th ACM European conference on Computer systems
table of contents
Nuremberg, Germany
SESSION: Handling data
table of contents
Pages 117-130
Year of Publication: 2009
ISBN:978-1-60558-482-9
|
|
Authors
|
|
Craig A.N. Soules
|
HP Labs, Palo Alto, CA, USA
|
|
Kimberly Keeton
|
HP Labs, Palo Alto, CA, USA
|
|
Charles B. Morrey, III
|
HP Labs, Palo Alto, CA, USA
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 14, Downloads (12 Months): 82, Citation Count: 0
|
|
|
ABSTRACT
Background data analysis due to virus scanning, backup, and desktop search is increasingly prevalent on client systems. As the number of tools and their resource requirements grow, their impact on foreground workloads can be prohibitive. This creates a tension between users' foreground work and the background work that makes information management possible. We present a system called SCAN-Lite that addresses this tension. SCAN-Lite exploits the fact that data in an enterprise is often replicated to efficiently schedule background data analyses. It uses content hashing to identify duplicate content, and scans each unique piece of content only once. It delays scheduling these scans to increase the likelihood that the content will be replicated on multiple machines, thus providing more choices for where to perform the scan. Furthermore, it prioritizes machines to maximize use of idle time and minimize the impact on foreground activities. We evaluate SCAN-Lite using measurements of enterprise replication behavior. We find that SCAN-Lite significantly improves scanning performance over the naive approach, and that it effectively exploits replication to reduce total work done and the impact on client foreground activity.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Nitin Agrawal , William J. Bolosky , John R. Douceur , Jacob R. Lorch, A five-year study of file-system metadata, Proceedings of the 5th USENIX conference on File and Storage Technologies, p.3-3, February 13-16, 2007, San Jose, CA
|
| |
2
|
Autonomy. http:// www.autonomy.com/.
|
 |
3
|
William J. Bolosky , John R. Douceur , David Ely , Marvin Theimer, Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs, Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, p.34-43, June 18-21, 2000, Santa Clara, California, United States
|
| |
4
|
H. Bryhni, E. Klovning, and O. Kure. A comparison of load balancing techniques for scalable web servers. Network, IEEE, 14(4):58--64, July/August 2000.
|
| |
5
|
|
| |
6
|
John Dilley , Bruce Maggs , Jay Parikh , Harald Prokop , Ramesh Sitaraman , Bill Weihl, Globally Distributed Content Delivery, IEEE Internet Computing, v.6 n.5, p.50-58, September 2002
[doi> 10.1109/MIC.2002.1036038]
|
| |
7
|
Kave Eshghi , Mark Lillibridge , Lawrence Wilcock , Guillaume Belrose , Rycharde Hawkes, Jumbo store: providing efficient incremental upload and versioning for a utility rendering service, Proceedings of the 5th USENIX conference on File and Storage Technologies, p.22-22, February 13-16, 2007, San Jose, CA
|
| |
8
|
|
| |
9
|
GDS. Google Desktop Search, http:// desktop.google.com/features.html.
|
| |
10
|
Richard Golding , Peter Bosch , Carl Staelin , Tim Sullivan , John Wilkes, Idleness is not sloth, Proceedings of the USENIX 1995 Technical Conference Proceedings on USENIX 1995 Technical Conference Proceedings, p.17-17, January 16-20, 1995, New Orleans, Louisiana
|
| |
11
|
Hadoop. http:// hadoop.apache.org/.
|
| |
12
|
Kimberley Keeton , Cipriano Santos , Dirk Beyer , Jeffrey Chase , John Wilkes, Designing for Disasters, Proceedings of the 3rd USENIX Conference on File and Storage Technologies, March 31-31, 2004, San Francisco, CA
|
| |
13
|
Andrew W. Leung , Minglong Shao , Timothy Bisson , Shankar Pasupathy , Ethan L. Miller, Spyglass: fast, scalable metadata search for large-scale storage systems, Proccedings of the 7th conference on File and storage technologies, p.153-166, February 24-27, 2009, San Francisco, California
|
| |
14
|
LSF. Load Sharing Facility, http:// www.platform.com.
|
| |
15
|
McAfee. Artemis Technology, http://www.mcafee.com/us/enterprise/products/artemis technology/index.html.
|
 |
16
|
Michael P. Mesnier , Matthew Wachs , Raja R. Sambasivan , Alice X. Zheng , Gregory R. Ganger, Modeling the relative fitness of storage, Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, June 12-16, 2007, San Diego, California, USA
|
 |
17
|
|
| |
18
|
|
| |
19
|
T. Sterling, D.J. Becker, D. Savarese, J.E. Dorband, U.A. Ranawake, and C.V. Packer. Beowulf: A parallel workstation for scientific computation. In Proc. Intl. Conf. on Parallel Processing, pages 11--14, 1995.
|
| |
20
|
|
| |
21
|
A. Tridgell and P. Mackerras. The rsync algorithm. Technical report, Australian National University, 1998.
|
 |
22
|
|
| |
23
|
S. Zhou, X. Zheng, J. Wang, and P. Delisle. Utopia: a load sharing facility for large, heterogeneous distributed computer systems. Technical report, University of Toronto, 1993.
|
| |
24
|
|
|