|
ABSTRACT
Recent advances in flash media have made it an attractive alternative for data storage in a wide spectrum of computing devices, such as embedded sensors, mobile phones, PDA's, laptops, and even servers. However, flash media has many unique characteristics that make existing data management/analytics algorithms designed for magnetic disks perform poorly with flash storage. For example, while random (page) reads are as fast as sequential reads, random (page) writes and in-place data updates are orders of magnitude slower than sequential writes. In this paper, we consider an important fundamental problem that would seem to be particularly challenging for flash storage: efficiently maintaining a very large (100 MBs or more) random sample of a data stream (e.g., of sensor readings). First, we show that previous algorithms such as reservoir sampling and geometric file are not readily adapted to flash. Second, we propose B-FILE, an energy-efficient abstraction for flash media to store self-expiring items, and show how a B-FILE can be used to efficiently maintain a large sample in flash. Our solution is simple, has a small (RAM) memory footprint, and is designed to cope with flash constraints in order to reduce latency and energy consumption. Third, we provide techniques to maintain biased samples with a B-FILE and to query the large sample stored in a B-FILE for a subsample of an arbitrary size. Finally, we present an evaluation with flash media that shows our techniques are several orders of magnitude faster and more energy-efficient than (flash-friendly versions of) reservoir sampling and geometric file. A key finding of our study, of potential use to many flash algorithms beyond sampling, is that "semi-random" writes (as defined in the paper) on flash cards are over two orders of magnitude faster and more energy-efficient than random writes.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Nitin Agrawal , Vijayan Prabhakaran , Ted Wobber , John D. Davis , Mark Manasse , Rina Panigrahy, Design tradeoffs for SSD performance, USENIX 2008 Annual Technical Conference on Annual Technical Conference, p.57-70, June 22-27, 2008, Boston, Massachusetts
|
 |
2
|
|
 |
3
|
|
| |
4
|
|
| |
5
|
Y. Diao, D. Ganesan, G. Mathur, and P. Shenoy. Rethinking data management for storage-centric sensor networks. In CIDR, 2007.
|
| |
6
|
Fred Douglis , Ramón Cáceres , Frans Kaashoek , Kai Li , Brian Marsh , Joshua A. Tauber, Storage alternatives for mobile computers, Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation, p.3-es, November 14-17, 1994, Monterey, California
|
 |
7
|
|
| |
8
|
M. Hachman. New Samsung notebook replaces hard drive with flash. http://www.extremetech.com/article2/0,1558,1966644,00.asp, May 2006.
|
| |
9
|
Intel-Corporation. Understanding the Flash Translation Layer (FTL) specification. www.embeddedfreebsd.org/Documents/Intel-FTL.pdf, 1998.
|
| |
10
|
|
 |
11
|
|
| |
12
|
|
 |
13
|
Jongmin Lee , Sunghoon Kim , Hunki Kwon , Choulseung Hyun , Seongjun Ahn , Jongmoo Choi , Donghee Lee , Sam H. Noh, Block recycling schemes and their cost-based optimization in nand flash memory based storage system, Proceedings of the 7th ACM & IEEE international conference on Embedded software, September 30-October 03, 2007, Salzburg, Austria
[doi> 10.1145/1289927.1289956]
|
 |
14
|
|
 |
15
|
Gaurav Mathur , Peter Desnoyers , Deepak Ganesan , Prashant Shenoy, Capsule: an energy-optimized object storage system for memory-constrained sensor devices, Proceedings of the 4th international conference on Embedded networked sensor systems, October 31-November 03, 2006, Boulder, Colorado, USA
[doi> 10.1145/1182807.1182827]
|
| |
16
|
P. Miller. SimpleTech announces 512GB and 256GB 3.5-inch SSD drives. http://www.engadget.com/2007/04/18/, April 2007.
|
 |
17
|
|
 |
18
|
|
| |
19
|
|
 |
20
|
|
| |
21
|
D. Reinsel and J. Janukowicz. Datacenter SSDs: Solid footing for growth. Samsung white paper. www.samsung.com/global/business/semiconductor/products/flash/ssd/pdf/datacenter_ssds.pdf, January 2008.
|
| |
22
|
SyCard. CF extend 180 CompactFlash Flexible Extender Card. http://www.sycard.com/cfextl 80.html, 2008.
|
 |
23
|
|
 |
24
|
|
 |
25
|
|
 |
26
|
Chin-Hsien Wu , Li-Pin Chang , Tei-Wei Kuo, An efficient R-tree implementation over flash-memory storage systems, Proceedings of the 11th ACM international symposium on Advances in geographic information systems, p.17-24, November 07-08, 2003, New Orleans, Louisiana, USA
[doi> 10.1145/956676.956679]
|
| |
27
|
Yahoo!-Finance. Zeus-IOPS solid state drives surge to 512GB. http://biz.yahoo.com/pz/070418/117663.html, April 2007.
|
| |
28
|
Demetrios Zeinalipour-Yazti , Song Lin , Vana Kalogeraki , Dimitrios Gunopulos , Walid A. Najjar, Microhash: an efficient index structure for fash-based sensor devices, Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies, p.3-3, December 13-16, 2005, San Francisco, CA
|
CITED BY 4
|
|
|
|
|
|
|
|
Dimitris Tsirogiannis , Stavros Harizopoulos , Mehul A. Shah , Janet L. Wiener , Goetz Graefe, Query processing techniques for solid state drives, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|
|
Sorabh Gandhi , Suman Nath , Subhash Suri , Jie Liu, GAMPS: compressing multi sensor data by grouping and amplitude scaling, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|