ACM Home Page
Please provide us with feedback. Feedback
Online maintenance of very large random samples on flash storage
Full text PdfPdf (597 KB)
Source
Proceedings of the VLDB Endowment archive
Volume 1 ,  Issue 1  (August 2008) table of contents
SESSION: Systems B table of contents
Pages 970-983  
Year of Publication: 2008
ISSN:2150-8097
Authors
Suman Nath  Microsoft Research
Phillip B. Gibbons  Intel Research Pittsburgh
Publisher
Bibliometrics
Downloads (6 Weeks): 17,   Downloads (12 Months): 173,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1453856.1453961
What is a DOI?

ABSTRACT

Recent advances in flash media have made it an attractive alternative for data storage in a wide spectrum of computing devices, such as embedded sensors, mobile phones, PDA's, laptops, and even servers. However, flash media has many unique characteristics that make existing data management/analytics algorithms designed for magnetic disks perform poorly with flash storage. For example, while random (page) reads are as fast as sequential reads, random (page) writes and in-place data updates are orders of magnitude slower than sequential writes. In this paper, we consider an important fundamental problem that would seem to be particularly challenging for flash storage: efficiently maintaining a very large (100 MBs or more) random sample of a data stream (e.g., of sensor readings). First, we show that previous algorithms such as reservoir sampling and geometric file are not readily adapted to flash. Second, we propose B-FILE, an energy-efficient abstraction for flash media to store self-expiring items, and show how a B-FILE can be used to efficiently maintain a large sample in flash. Our solution is simple, has a small (RAM) memory footprint, and is designed to cope with flash constraints in order to reduce latency and energy consumption. Third, we provide techniques to maintain biased samples with a B-FILE and to query the large sample stored in a B-FILE for a subsample of an arbitrary size. Finally, we present an evaluation with flash media that shows our techniques are several orders of magnitude faster and more energy-efficient than (flash-friendly versions of) reservoir sampling and geometric file. A key finding of our study, of potential use to many flash algorithms beyond sampling, is that "semi-random" writes (as defined in the paper) on flash cards are over two orders of magnitude faster and more energy-efficient than random writes.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
3
 
4
 
5
Y. Diao, D. Ganesan, G. Mathur, and P. Shenoy. Rethinking data management for storage-centric sensor networks. In CIDR, 2007.
 
6
7
 
8
M. Hachman. New Samsung notebook replaces hard drive with flash. http://www.extremetech.com/article2/0,1558,1966644,00.asp, May 2006.
 
9
Intel-Corporation. Understanding the Flash Translation Layer (FTL) specification. www.embeddedfreebsd.org/Documents/Intel-FTL.pdf, 1998.
 
10
11
 
12
13
14
15
 
16
P. Miller. SimpleTech announces 512GB and 256GB 3.5-inch SSD drives. http://www.engadget.com/2007/04/18/, April 2007.
17
18
 
19
20
 
21
D. Reinsel and J. Janukowicz. Datacenter SSDs: Solid footing for growth. Samsung white paper. www.samsung.com/global/business/semiconductor/products/flash/ssd/pdf/datacenter_ssds.pdf, January 2008.
 
22
SyCard. CF extend 180 CompactFlash Flexible Extender Card. http://www.sycard.com/cfextl 80.html, 2008.
23
24
25
26
 
27
Yahoo!-Finance. Zeus-IOPS solid state drives surge to 512GB. http://biz.yahoo.com/pz/070418/117663.html, April 2007.
 
28


Collaborative Colleagues:
Suman Nath: colleagues
Phillip B. Gibbons: colleagues