ACM Home Page
Please provide us with feedback. Feedback
High-performance regular expression scanning on the Cell/B.E. processor
Full text PdfPdf (888 KB)
Source
International Conference on Supercomputing archive
Proceedings of the 23rd international conference on Supercomputing table of contents
Yorktown Heights, NY, USA
SESSION: Applications of the cell processor table of contents
Pages 14-25  
Year of Publication: 2009
ISBN:978-1-60558-498-0
Authors
Daniele Paolo Scarpazza  IBM T.J. Watson Research Center, Yorktown Heights, NY, USA
Gregory F. Russell  IBM T.J. Watson Research Center, Yorktown Heights, NY, USA
Sponsors
ACM: Association for Computing Machinery
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 30,   Downloads (12 Months): 114,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1542275.1542284
What is a DOI?

ABSTRACT

Matching regular expressions (regexps) is a very common work-load. For example, tokenization, which consists of recognizing words or keywords in a character stream, appears in every search engine indexer. Tokenization also consumes 30% or more of most XML processors' execution time and represents the first stage of any programming language compiler.

Despite the multi-core revolution, regexp scanner generators like flex haven't changed much in 20 years, and they do not exploit the power of recent multi-core architectures (e.g., multiple threads and wide SIMD units). This is unfortunate, especially given the pervasive importance of search engines and the fast growth of our digital universe. Indexing such data volumes demands precisely the processing power that multi-cores are designed to offer.

We present an algorithm and a set of techniques for using multi-core features such as multiple threads and SIMD instructions to perform parallel regexp-based tokenization.

As a proof of concept, we present a family of optimized kernels that implement our algorithm, providing the features of flex on the Cell/B.E. processor at top performance. Our kernels achieve almost-ideal resource utilization (99.2% of the clock cycles are non-NOP issues). They deliver a peak throughput of 14.30 Gbps per Cell chip, and 9.76 Gbps on Wikipedia input: a remarkable performance, comparable to dedicated hardware solutions. Also, our kernels show speedups of 57-81× over flex on the Cell.

Our approach is valuable because it is easily portable to other SIMD-enabled processors, and there is a general trend toward more and wider SIMD instructions in architecture design.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
comScore Releases 2007 U.S. Internet Year in Review, Press Release, Jan. 2008.
2
 
3
S. Antonatos, K. Anagnostakis, M. Polychronakis, and E. Markatos. Performance analysis of content matching intrusion detection systems. In 4th IEEE/IPSJ Symp. on Applications and the Internet (SAINT 2004)., 2004.
4
 
5
L. Bu and J. A. Chandy. A CAM-based keyword match processor architecture. Microelectronics Jnl., 37(8):828--836, 2006.
6
7
 
8
R. D. Cameron. Method and apparatus for processing character streams, U.S. Patent 7400271, July 2008.
 
9
 
10
J. Degener. ANSI C grammar, lex specification, http://www.lysator.liu.se/c/ANSI-C-grammar-l.html.
 
11
 
12
N. Firasta, M. Buxton, P. Jinbo, K. Nasri, and S. Kuo. Intel AVX: New Frontiers in Performance Improvements and Energy Efficiency, Mar. 2008.
 
13
T. A. S. Foundation. Lucene, http://lucene.apache.org.
 
14
N. Goyal, J. Ormont, R. Smith, K. Sankaralingam, and C. Estan. Signature matching in network processing using SIMD/GPU architectures. Technical Report 1628, University of Wisconsin at Madison, Jan. 2008.
 
15
 
16
 
17
 
18
 
19
IDC Corporation. The expanding digital universe. White Paper, March 2007.
 
20
Intel Corp. Tera-scale research prototype - connecting 80 simple cores on a single test chip, October 2006.
 
21
Intel Corp. Intel SSE4 Programming Reference, Reference Number: D91561-001, Apr. 2007.
 
22
F. Iorio and J. V. Lunteren. Fast pattern matching on the Cell Broadband Engine. In 2008 Workshop on Cell Systems and Applications (WCSA), affiliated with the 2008 Intl. Symp. on Computer Architecture (ISCA'08), Beijing, China, June 2008.
 
23
 
24
25
26
 
27
28
29
 
30
V. Paxson. flex - a fast lexical analyzer generator, 1988.
 
31
E. Perkins, M. Kostoulas, A. Heifets, M. Matsa, and N. Mendelsohn. Performance analysis of XML APIs. In XML 2005 Conf. and Exposition, Nov. 2005.
 
32
D.P. Scarpazza, O. Villa, and F. Petrini. Peak-performance DFA-based string matching on the Cell processor. In Third Intl. Workshop on System Management Techniques, Processes, and Services (SMTPS), held in conjunction with IPDPS, Mar. 2007.
 
33
D.P. Scarpazza, O. Villa, and F. Petrini. High-Speed String Searching against Large Dictionaries on the Cell/B.E. Processor. In 22nd IEEE Intl. Parallel & Distributed Processing Symp. (IPDPS'08), Miami, Florida, Apr. 2008.
34
 
35
I. Sourdis and D. Pnevmatikatos. Fast, large-scale string match for a 10 Gbps FPGA-based network intrusion. In 13th Conf. on Field Programmable Logic and Applications (FPL'03)., September 2003.
 
36
I. Sourdis and D. Pnevmatikatos. Pre-decoded CAMs for efficient and high-speed NIDS pattern matching, 2004.
 
37
D. C. Suresh, Z. Guo, B. Buyukkurt, and W. A. Najjar. Automatic compilation framework for bloom filter based intrusion detection. In Second Intl. Workshop on Reconfigurable Computing: Architectures and Applications (ARC'06), pages 413--418, 2006.
 
38
A. D. Thurston. Parsing computer languages with an automaton compiled from a single regular expression. In O. H. Ibarra and H.-C. Yen, editors, CIAA, volume 4094 of Lecture Notes in Computer Science, pages 285--286. Springer, 2006.
 
39
J. van Lunteren. High-performance pattern-matching for intrusion detection. In 25th IEEE Intl. Conf. on Computer Communications (INFOCOM 2006), pages 1--13, Apr. 2006.
 
40
O. Villa, D. Chavarria, and K. Maschhoff. Input-independent, scalable and fast string matching on the Cray XMT. In 23nd IEEE Intl. Parallel & Distributed Processing Symp. (IPDPS'09), 2009.

Collaborative Colleagues:
Daniele Paolo Scarpazza: colleagues
Gregory F. Russell: colleagues