ACM Home Page
Please provide us with feedback. Feedback
Scalable regular expression matching on data streams
Full text PdfPdf (304 KB)
Source
International Conference on Management of Data archive
Proceedings of the 2008 ACM SIGMOD international conference on Management of data table of contents
Vancouver, Canada
SESSION: Research Session 4: Streaming Filters table of contents
Pages 161-172  
Year of Publication: 2008
ISBN:978-1-60558-102-6
Authors
Anirban Majumder  Bell Labs Research India, India
Rajeev Rastogi  Yahoo!, Bangalore, India
Sriram Vanama  Indian Institute of Technology, Madras, Chennai, India
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 51,   Downloads (12 Months): 530,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1376616.1376635
What is a DOI?

ABSTRACT

Regular Expression (RE) matching has important applications in the areas of XML content distribution and network security. In this paper, we present the end-to-end design of a high performance RE matching system. Our system combines the processing efficiency of Deterministic Finite Automata (DFA) with the space efficiency of Non-deterministic Finite Automata (NFA) to scale to hundreds of REs. In experiments with real-life RE data on data streams, we found that a bulk of the DFA transitions are concentrated around a few DFA states. We exploit this fact to cache only the frequent core of each DFA in memory as opposed to the entire DFA (which may be exponential in size). Further, we cluster REs such that REs whose interactions cause an exponential increase in the number of states are assigned to separate groups -- this helps to improve cache hits by controlling the overall DFA size.

To the best of our knowledge, ours is the first end-to-end system capable of matching REs at high speeds and in their full generality. Through a clever combination of RE grouping, and static and dynamic caching, it is able to perform RE matching at high speeds, even in the presence of limited memory. Through experiments with real-life data sets, we show that our RE matching system convincingly outperforms a state-of-the-art Network Intrusion Detection tool with support for efficient RE matching.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
 
4
 
5
Y. Diao, P. Fischer, et al. Yfilter: Efficient and scalable filtering of XML documents. In phICDE. 2002.
 
6
7
 
8
 
9
 
10
 
11
 
12
T. Johnson, S. Muthukrishnan, et al. Monitoring regular expressions on out-of-order streams. phICDE, 2007.
13
 
14
J. Levandoski, E.Sommer, et al. Application layer packet classifier for linux. http://l7-filter.sourceforge.net/.
15
16
 
17
S. Wu and U. Manber. A fast algorithm for multi-pattern searching. Tech. Rep. TR-94-17, 1994.
 
18
www.bro ids.org. Bro network intrusion detection system.
 
19
www.cisco.com. Cisco ios ips deployment guide.
20

Collaborative Colleagues:
Anirban Majumder: colleagues
Rajeev Rastogi: colleagues
Sriram Vanama: colleagues