ACM Home Page
Please provide us with feedback. Feedback
SCAN-Lite: enterprise-wide analysis on the cheap
Full text PdfPdf (265 KB)
Source
European Conference on Computer Systems archive
Proceedings of the 4th ACM European conference on Computer systems table of contents
Nuremberg, Germany
SESSION: Handling data table of contents
Pages 117-130  
Year of Publication: 2009
ISBN:978-1-60558-482-9
Authors
Craig A.N. Soules  HP Labs, Palo Alto, CA, USA
Kimberly Keeton  HP Labs, Palo Alto, CA, USA
Charles B. Morrey, III  HP Labs, Palo Alto, CA, USA
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 82,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1519065.1519079
What is a DOI?

ABSTRACT

Background data analysis due to virus scanning, backup, and desktop search is increasingly prevalent on client systems. As the number of tools and their resource requirements grow, their impact on foreground workloads can be prohibitive. This creates a tension between users' foreground work and the background work that makes information management possible. We present a system called SCAN-Lite that addresses this tension. SCAN-Lite exploits the fact that data in an enterprise is often replicated to efficiently schedule background data analyses. It uses content hashing to identify duplicate content, and scans each unique piece of content only once. It delays scheduling these scans to increase the likelihood that the content will be replicated on multiple machines, thus providing more choices for where to perform the scan. Furthermore, it prioritizes machines to maximize use of idle time and minimize the impact on foreground activities. We evaluate SCAN-Lite using measurements of enterprise replication behavior. We find that SCAN-Lite significantly improves scanning performance over the naive approach, and that it effectively exploits replication to reduce total work done and the impact on client foreground activity.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Autonomy. http:// www.autonomy.com/.
3
 
4
H. Bryhni, E. Klovning, and O. Kure. A comparison of load balancing techniques for scalable web servers. Network, IEEE, 14(4):58--64, July/August 2000.
 
5
 
6
 
7
 
8
 
9
GDS. Google Desktop Search, http:// desktop.google.com/features.html.
 
10
 
11
Hadoop. http:// hadoop.apache.org/.
 
12
 
13
 
14
LSF. Load Sharing Facility, http:// www.platform.com.
 
15
McAfee. Artemis Technology, http://www.mcafee.com/us/enterprise/products/artemis technology/index.html.
16
17
 
18
 
19
T. Sterling, D.J. Becker, D. Savarese, J.E. Dorband, U.A. Ranawake, and C.V. Packer. Beowulf: A parallel workstation for scientific computation. In Proc. Intl. Conf. on Parallel Processing, pages 11--14, 1995.
 
20
 
21
A. Tridgell and P. Mackerras. The rsync algorithm. Technical report, Australian National University, 1998.
22
 
23
S. Zhou, X. Zheng, J. Wang, and P. Delisle. Utopia: a load sharing facility for large, heterogeneous distributed computer systems. Technical report, University of Toronto, 1993.
 
24

Collaborative Colleagues:
Craig A.N. Soules: colleagues
Kimberly Keeton: colleagues
Charles B. Morrey, III: colleagues