ACM Home Page
Please provide us with feedback. Feedback
The design of a similarity based deduplication system
Full text PdfPdf (533 KB)
Source ACM International Conference Proceeding Series archive
Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference table of contents
Haifa, Israel
SESSION: Deduplication table of contents
Article No. 6  
Year of Publication: 2009
ISBN:978-1-60558-623-6
Authors
Lior Aronovich  IBM Corp.
Ron Asher  IBM Corp.
Eitan Bachmat  Ben-Gurion U.
Haim Bitner  Marvell Corp.
Michael Hirsch  IBM Corp.
Shmuel T. Klein  Bar-Ilan U.
Sponsors
: Melanox Technologies
: Hebrew University of Jerusalem
IBM : IBM
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 37,   Downloads (12 Months): 136,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1534530.1534539
What is a DOI?

ABSTRACT

We describe some of the design choices that were made during the development of a fast, scalable, inline, deduplication device. The system's design goals and how they were achieved are presented. This is the firs deduplication device that uses similarity matching. The paper provides the following original research contributions: we show how similarity signatures can serve in a deduplication scheme; a novel type of similarity signatures is presented and its advantages in the context of deduplication requirements are explained. It is also shown how to combine similarity matching schemes with byte by byte comparison or hash based identity schemes.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
3
4
5
 
6
 
7
A. Z. Broder. Some applications of Rabin's fingerprinting method. In R. Capocelli, A. De Santis, and U. Vaccaro, editors, Sequences II: Methods in Communications, Security, and Computer Science, 143--152. Springer-Verlag, 1993.
 
8
9
 
10
 
11
 
12
Fischer M. J., Paterson M. S., String matching and other products, in Complexity of Computation, R. M. Karp (editor), SIAM-AMS Proc. 7 (1974) 113--125.
 
13
B. Garret and C. Bouffard, The Enterprise Strategy Group (ESG) lab validation report for IBM TS7650G ProtecTier, 2008. Available at www.diligent.com.
 
14
N. Heintze. Scalable Document Fingerprinting. Proceedings of the Second USENIX Workshop on Electronic Commerce, pages 191--200, 1996.
 
15
 
16
 
17
 
18
Knuth D. E., Morris J. H., Pratt V. R., Fast pattern matching in strings, SIAM Journal on Computing 6 (1977) 323--350.
 
19
 
20
 
21
 
22
Moulton G. H., Whitehill S. B., Hash fil system and method for use in a commonality factoring system, U.S. Pat. No. 6,704,730.
23
 
24
25
 
26
 
27
Ukkonen E., On-line construction of suffi trees, Algorithmica 14(3) (1995) 249--260.
 
28
 
29
 
30
J. Ziv and A. Lempel. A universal algorithm for sequential data compression, IEEE Trans. Inform. Theory, vol. IT-23, pp. 337--343, May 1977. 282

Collaborative Colleagues:
Lior Aronovich: colleagues
Ron Asher: colleagues
Eitan Bachmat: colleagues
Haim Bitner: colleagues
Michael Hirsch: colleagues
Shmuel T. Klein: colleagues