|
ABSTRACT
Matching regular expressions (regexps) is a very common work-load. For example, tokenization, which consists of recognizing words or keywords in a character stream, appears in every search engine indexer. Tokenization also consumes 30% or more of most XML processors' execution time and represents the first stage of any programming language compiler. Despite the multi-core revolution, regexp scanner generators like flex haven't changed much in 20 years, and they do not exploit the power of recent multi-core architectures (e.g., multiple threads and wide SIMD units). This is unfortunate, especially given the pervasive importance of search engines and the fast growth of our digital universe. Indexing such data volumes demands precisely the processing power that multi-cores are designed to offer. We present an algorithm and a set of techniques for using multi-core features such as multiple threads and SIMD instructions to perform parallel regexp-based tokenization. As a proof of concept, we present a family of optimized kernels that implement our algorithm, providing the features of flex on the Cell/B.E. processor at top performance. Our kernels achieve almost-ideal resource utilization (99.2% of the clock cycles are non-NOP issues). They deliver a peak throughput of 14.30 Gbps per Cell chip, and 9.76 Gbps on Wikipedia input: a remarkable performance, comparable to dedicated hardware solutions. Also, our kernels show speedups of 57-81× over flex on the Cell. Our approach is valuable because it is easily portable to other SIMD-enabled processors, and there is a general trend toward more and wider SIMD instructions in architecture design.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
comScore Releases 2007 U.S. Internet Year in Review, Press Release, Jan. 2008.
|
 |
2
|
|
| |
3
|
S. Antonatos, K. Anagnostakis, M. Polychronakis, and E. Markatos. Performance analysis of content matching intrusion detection systems. In 4th IEEE/IPSJ Symp. on Applications and the Internet (SAINT 2004)., 2004.
|
 |
4
|
|
| |
5
|
L. Bu and J. A. Chandy. A CAM-based keyword match processor architecture. Microelectronics Jnl., 37(8):828--836, 2006.
|
 |
6
|
|
 |
7
|
|
| |
8
|
R. D. Cameron. Method and apparatus for processing character streams, U.S. Patent 7400271, July 2008.
|
| |
9
|
|
| |
10
|
J. Degener. ANSI C grammar, lex specification, http://www.lysator.liu.se/c/ANSI-C-grammar-l.html.
|
| |
11
|
|
| |
12
|
N. Firasta, M. Buxton, P. Jinbo, K. Nasri, and S. Kuo. Intel AVX: New Frontiers in Performance Improvements and Energy Efficiency, Mar. 2008.
|
| |
13
|
T. A. S. Foundation. Lucene, http://lucene.apache.org.
|
| |
14
|
N. Goyal, J. Ormont, R. Smith, K. Sankaralingam, and C. Estan. Signature matching in network processing using SIMD/GPU architectures. Technical Report 1628, University of Wisconsin at Madison, Jan. 2008.
|
| |
15
|
|
| |
16
|
Michael Gschwind , H. Peter Hofstee , Brian Flachs , Martin Hopkins , Yukio Watanabe , Takeshi Yamazaki, Synergistic Processing in Cell's Multicore Architecture, IEEE Micro, v.26 n.2, p.10-24, March 2006
[doi> 10.1109/MM.2006.41]
|
| |
17
|
|
| |
18
|
|
| |
19
|
IDC Corporation. The expanding digital universe. White Paper, March 2007.
|
| |
20
|
Intel Corp. Tera-scale research prototype - connecting 80 simple cores on a single test chip, October 2006.
|
| |
21
|
Intel Corp. Intel SSE4 Programming Reference, Reference Number: D91561-001, Apr. 2007.
|
| |
22
|
F. Iorio and J. V. Lunteren. Fast pattern matching on the Cell Broadband Engine. In 2008 Workshop on Cell Systems and Applications (WCSA), affiliated with the 2008 Intl. Symp. on Computer Architecture (ISCA'08), Beijing, China, June 2008.
|
| |
23
|
J. A. Kahle , M. N. Day , H. P. Hofstee , C. R. Johns , T. R. Maeurer , D. Shippy, Introduction to the cell multiprocessor, IBM Journal of Research and Development, v.49 n.4/5, p.589-604, July 2005
|
| |
24
|
|
 |
25
|
Janghaeng Lee , Sung Ho Hwang , Neungsoo Park , Seong-Won Lee , Sunglk Jun , Young Soo Kim, A high performance NIDS using FPGA-based regular expression matching, Proceedings of the 2007 ACM symposium on Applied computing, March 11-15, 2007, Seoul, Korea
[doi> 10.1145/1244002.1244259]
|
 |
26
|
John W. Lockwood , Naji Naufel , Jon S. Turner , David E. Taylor, Reprogrammable network packet processing on the field programmable port extender (FPX), Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays, p.87-93, February 2001, Monterey, California, United States
[doi> 10.1145/360276.360304]
|
| |
27
|
|
 |
28
|
|
 |
29
|
|
| |
30
|
V. Paxson. flex - a fast lexical analyzer generator, 1988.
|
| |
31
|
E. Perkins, M. Kostoulas, A. Heifets, M. Matsa, and N. Mendelsohn. Performance analysis of XML APIs. In XML 2005 Conf. and Exposition, Nov. 2005.
|
| |
32
|
D.P. Scarpazza, O. Villa, and F. Petrini. Peak-performance DFA-based string matching on the Cell processor. In Third Intl. Workshop on System Management Techniques, Processes, and Services (SMTPS), held in conjunction with IPDPS, Mar. 2007.
|
| |
33
|
D.P. Scarpazza, O. Villa, and F. Petrini. High-Speed String Searching against Large Dictionaries on the Cell/B.E. Processor. In 22nd IEEE Intl. Parallel & Distributed Processing Symp. (IPDPS'08), Miami, Florida, Apr. 2008.
|
 |
34
|
Larry Seiler , Doug Carmean , Eric Sprangle , Tom Forsyth , Michael Abrash , Pradeep Dubey , Stephen Junkins , Adam Lake , Jeremy Sugerman , Robert Cavin , Roger Espasa , Ed Grochowski , Toni Juan , Pat Hanrahan, Larrabee: a many-core x86 architecture for visual computing, ACM SIGGRAPH 2008 papers, August 11-15, 2008, Los Angeles, California
|
| |
35
|
I. Sourdis and D. Pnevmatikatos. Fast, large-scale string match for a 10 Gbps FPGA-based network intrusion. In 13th Conf. on Field Programmable Logic and Applications (FPL'03)., September 2003.
|
| |
36
|
I. Sourdis and D. Pnevmatikatos. Pre-decoded CAMs for efficient and high-speed NIDS pattern matching, 2004.
|
| |
37
|
D. C. Suresh, Z. Guo, B. Buyukkurt, and W. A. Najjar. Automatic compilation framework for bloom filter based intrusion detection. In Second Intl. Workshop on Reconfigurable Computing: Architectures and Applications (ARC'06), pages 413--418, 2006.
|
| |
38
|
A. D. Thurston. Parsing computer languages with an automaton compiled from a single regular expression. In O. H. Ibarra and H.-C. Yen, editors, CIAA, volume 4094 of Lecture Notes in Computer Science, pages 285--286. Springer, 2006.
|
| |
39
|
J. van Lunteren. High-performance pattern-matching for intrusion detection. In 25th IEEE Intl. Conf. on Computer Communications (INFOCOM 2006), pages 1--13, Apr. 2006.
|
| |
40
|
O. Villa, D. Chavarria, and K. Maschhoff. Input-independent, scalable and fast string matching on the Cray XMT. In 23nd IEEE Intl. Parallel & Distributed Processing Symp. (IPDPS'09), 2009.
|
|