|
ABSTRACT
Efficient matching of incoming events to persistent queries is fundamental to event pattern matching, complex event processing, and publish/subscribe systems. Recent processing engines based on non-deterministic finite automata (NFAs) have demonstrated scalability in the number of queries that can be efficiently executed on a single machine. However, existing NFA based systems are limited to processing events on a single machine. Consequently, their event processing capacity cannot be increased by adding more machines. In this paper, we present an experimental evaluation of different methods for distributing an event processing system that is based on NFAs across multiple machines in a cluster. Our results show that careful input stream partitioning gives close to linear performance scaleup for CPU bound workloads.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
D. J. Abadi, Y. Ahmad, M. Balazinska, U. Çetintemel, M. Cherniack, J.-H. Hwang, W. Lindner, A. Maskey, A. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, and S. B. Zdonik. The design of the borealis stream processing engine. In CIDR, pages 277--289, 2005.
|
| |
2
|
J. Agrawal, Y. Diao, D. Gyllstrom, and N. Immerman. Efficient pattern matching over event streams. In Proceedings of the 2008 ACM SIGMOD, pages 147--160, New York, NY, USA, 2008. ACM.
|
| |
3
|
M. Akdere, U. Çetintemel, and N. Tatbul. Plan-based complex event detection across distributed sources. Proc. VLDB Endow., 1(1):66--77, 2008.
|
| |
4
|
L. Amini, H. Andrade, R. Bhagwan, F. Eskesen, R. King, P. Selo, Y. Park, and C. Venkatramani. Spc: a distributed, scalable platform for data mining. In Proc. of DMSSP '06, pages 27--37, New York, NY, USA, 2006. ACM.
|
| |
5
|
Y. Amir, C. Danilov, and J. R. Stanton. A low latency, loss tolerant architecture and protocol for wide area group communication. In Proceedings of DSN '00, pages 327--336, Washington, DC, USA, 2000. IEEE Computer Society.
|
| |
6
|
A. Arasu, M. Cherniack, E. Galvez, D. Maier, A. S. Maskey, E. Ryvkina, M. Stonebraker, and R. Tibbetts. Linear road: a stream data management benchmark. In Proc. of VLDB '04, pages 480--491. VLDB Endowment, 2004.
|
| |
7
|
M. Balazinska, H. Balakrishnan, and M. Stonebraker. Contract-based load management in federated distributed systems. In Proceedings of NSDI'04, pages 15--15, Berkeley, CA, USA, 2004. USENIX Association.
|
| |
8
|
Cayuga System (Accessed 11/2008). http://www.cs.cornell.edu/bigreddata/cayuga/.
|
| |
9
|
M. Cherniack, H. Balakrishnan, M. Balazinska, D. Carney, U. Cetintemel, Y. Xing, and S. Zdonik. Scalable distributed stream processing. In CIDR'03, Asilomar, California, 2003.
|
| |
10
|
C. Cranor, T. Johnson, O. Spataschek, and V. Shkapenyuk. Gigascope: a stream database for network applications. In Proc. of SIGMOD, pages 647--651, New York, NY, USA, 2003. ACM.
|
| |
11
|
A. J. Demers, J. Gehrke, M. Hong, M. Riedewald, and W. M. White. Towards expressive publish/subscribe systems. In Proc. of EDBT, pages 627--644, 2006.
|
| |
12
|
A. J. Demers, J. Gehrke, B. Panda, M. Riedewald, V. Sharma, and W. M. White. Cayuga: A general purpose event monitoring system. In CIDR, pages 412--422, 2007.
|
| |
13
|
D. DeWitt and J. Gray. Parallel database systems: the future of high performance database systems. Commun. ACM, 35(6):85--98, 1992.
|
| |
14
|
F. Fabret, H.-A. Jacobsen, F. Llirbat, J. Pereira, K. A. Ross, and D. Shasha. Filtering algorithms and implementation for very fast publish/subscribe. In Proc. SIGMOD, pages 115--126, 2001.
|
| |
15
|
R. Friedman and E. Hadad. Adaptive batching for replicated servers. In Proc. of SRDS '06, pages 311--320, Washington, DC, USA, 2006. IEEE Computer Society.
|
| |
16
|
B. Gedik, H. Andrade, K.-L. Wu, P. S. Yu, and M. Doo. Spade: the systems declarative stream processing engine. In Proceedings of the 2008 ACM SIGMOD, pages 1123--1134, New York, NY, USA, 2008. ACM.
|
| |
17
|
J. E. Hopcroft, R. Motwani, and J. D. Ullman. Introduction to Automata Theory, Languages, and Computation, Second Edition. Addison Wesley, 2000. 2nd edition.
|
| |
18
|
T. Johnson, M. S. Muthukrishnan, V. Shkapenyuk, and O. Spatscheck. Query-aware partitioning for monitoring massive network data streams. In Proc. of the 2008 ACM SIGMOD, pages 1135--1146, New York, NY, USA, 2008.
|
| |
19
|
T. Johnson, S. Muthukrishnan, V. Shkapenyuk, and O. Spatscheck. A heartbeat mechanism and its application in Gigascope. In Proc. of VLDB '05, pages 1079--1088, 2005.
|
| |
20
|
M. Li, M. Liu, L. Ding, E. A. Rundensteiner, and M. Mani. Event stream processing with out-of-order data arrival. In In Proc. of ICDCS '07 Workshops, page 67, Washington, DC, USA, 2007. IEEE Computer Society.
|
| |
21
|
NASDAQ Performance Statistics (Accessed 11/2008). http://www.nasdaqtrader.com/trader.aspx?id=marketshare.
|
| |
22
|
C. Olston, J. Jiang, and J. Widom. Adaptive filters for continuous queries over distributed data streams. In Proceedings of the 2003 ACM SIGMOD, pages 563--574, New York, NY, USA, 2003. ACM.
|
| |
23
|
P. R. Pietzuch, B. Shand, and J. Bacon. A framework for event composition in distributed systems. In Proc. of the 2003 Intl. Conf. on Middleware, pages 62--82, New York, NY, USA, 2003. Springer-Verlag New York, Inc.
|
| |
24
|
M. A. Shah, J. M. Hellerstein, and E. Brewer. Highly available, fault-tolerant, parallel dataflows. In Proc. of the ACM SIGMOD, pages 827--838, New York, NY, USA, 2004.
|
| |
25
|
Spread Concepts LLC (Accessed 11/2008). http://www.spread.org.
|
| |
26
|
U. Srivastava and J. Widom. Flexible time management in data stream systems. In Proc. of PODS '04, pages 263--274, New York, NY, USA, 2004. ACM.
|
| |
27
|
A. S. Tanenbaum and M. van Steen. Distributed Systems: Principles and Paradigms (2nd Edition), chapter 13, pages 603--607. Prentice-Hall, Inc. NJ, USA, 2006.
|
| |
28
|
W. White, M. Riedewald, J. Gehrke, and A. Demers. What is "next" in event processing? In Proc. of PODS '07, pages 263--272, New York, NY, USA, 2007. ACM.
|
|