|
ABSTRACT
In this paper, we present Spade - the System S declarative stream processing engine. System S is a large-scale, distributed data stream processing middleware under development at IBM T. J. Watson Research Center. As a front-end for rapid application development for System S, Spade provides (1) an intermediate language for flexible composition of parallel and distributed data-flow graphs, (2) a toolkit of type-generic, built-in stream processing operators, that support scalar as well as vectorized processing and can seamlessly inter-operate with user-defined operators, and (3) a rich set of stream adapters to ingest/publish data from/to outside sources. More importantly, Spade automatically brings performance optimization and scalability to System S applications. To that end, Spade employs a code generation framework to create highly-optimized applications that run natively on the Stream Processing Core (SPC), the execution and communication substrate of System S, and take full advantage of other System S services. Spade allows developers to construct their applications with fine granular stream operators without worrying about the performance implications that might exist, even in a distributed system. Spade's optimizing compiler automatically maps applications into appropriately sized execution units in order to minimize communication overhead, while at the same time exploiting available parallelism. By virtue of the scalability of the System S runtime and Spade's effective code generation and optimization, we can scale applications to a large number of nodes. Currently, we can run Spade jobs on ≈ 500 processors within more than 100 physical nodes in a tightly connected cluster environment. Spade has been in use at IBM Research to create real-world streaming applications, ranging from monitoring financial market feeds to radio telescopes to semiconductor fabrication lines.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
D. J. Abadi, Y. Ahmad, M. Balazinska, U. Cetintemel, M. Cherniack, J.-H. Hwang, W. Lindner, A. S. Maskey, A. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, and S. Zdonik. The design of the Borealis stream processing engine. In Proceedings of the Conference on Innovative Data Systems Research, CIDR, 2005.
|
 |
2
|
Lisa Amini , Henrique Andrade , Ranjita Bhagwan , Frank Eskesen , Richard King , Philippe Selo , Yoonho Park , Chitra Venkatramani, SPC: a distributed, scalable platform for data mining, Proceedings of the 4th international workshop on Data mining standards, services and platforms, p.27-37, August 20-20, 2006, Philadelphia, Pennsylvania
[doi> 10.1145/1289612.1289615]
|
| |
3
|
H. Andrade, B. Gedik, K.-L. Wu, and P. S. Yu. On optimizing aggregations and joins for high-performance data stream processing. In to be submitted to International Conference on Supercomputing, ACM ICS, 2008.
|
| |
4
|
A. Arasu, B. Babcock, S. Babu, M. Datar, K. Ito, R. Motwani, I. Nishizawa, U. Srivastava, D. Thomas, R. Varma, and J. Widom. STREAM: The Stanford stream data manager. IEEE Data Engineering Bulletin, 26, 2003.
|
| |
5
|
A. Arasu, S. Babu, and J. Widom. The CQL continuous query language: Semantic foundations and query execution. Technical report, InfoLab ? Stanford University, October 2003.
|
| |
6
|
Hari Balakrishnan , Magdalena Balazinska , Don Carney , Uğur Çetintemel , Mitch Cherniack , Christian Convey , Eddie Galvez , Jon Salz , Michael Stonebraker , Nesime Tatbul , Richard Tibbetts , Stan Zdonik, Retrospective on Aurora, The VLDB Journal — The International Journal on Very Large Data Bases, v.13 n.4, p.370-383, December 2004
[doi> 10.1007/s00778-004-0133-5]
|
| |
7
|
S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. R. Madden, V. Raman, F. Reiss, and M. A. Shah. TelegraphCQ: Continuous dataflow processing for an uncertain world. In Proceedings of the Conference on Innovative Data Systems Research, CIDR, 2003.
|
| |
8
|
Coral8, inc. http://www.coral8.com/, May 2007.
|
| |
9
|
IBM DB2. http://www.ibm.com/db2, October 2007.
|
| |
10
|
|
| |
11
|
|
| |
12
|
L. Girod, Y. Mei, R. Newton, S. Rost, A. Thiagarajan, H. Balakrishnan, and S. Madden. XStream: A signal-oriented data stream management system. In Proceedings of the International Conference on Data Engineering, IEEE ICDE, 2008.
|
| |
13
|
IBM general parallel file system. http://www.ibm.com/systems/clusters/software/gpfs, October 2007.
|
| |
14
|
G. Hulten and P. Domingos. VFML ? a toolkit for mining high-speed time-changing data streams. http://www.cs.washington.edu/dm/vfml/, 2003.
|
| |
15
|
IBM. Cell Broadband Engine architecture. Technical Report Version 1.0, IBM Systems and Technology Group, 2005.
|
| |
16
|
Intel. IXP2400 network processor hardware reference manual. Technical report, Intel Corporation, May 2003.
|
 |
17
|
Navendu Jain , Lisa Amini , Henrique Andrade , Richard King , Yoonho Park , Philippe Selo , Chitra Venkatramani, Design, implementation, and evaluation of the linear road bnchmark on the stream processing core, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA
[doi> 10.1145/1142473.1142522]
|
| |
18
|
|
| |
19
|
Z. Liu, A. Ranganathan, and A. V. Riabov. Use of OWL for describing stream processing components to enable automatic composition. In OWL: Experiences and Directions, OWLED, 2007.
|
| |
20
|
Mathworks MATLAB. http://www.mathworks.com/, October 2007.
|
| |
21
|
StreamBase Systems. http://www.streambase.com/, May 2007.
|
| |
22
|
IBM UIMA. http://www.research.ibm.com/UIMA/, Aug 2007.
|
| |
23
|
J. D. Ullman. Database and Knowledge-Base Systems. Computer Science Press, 1988.
|
| |
24
|
IBM WebSphere front office for financial markets. http://www.ibm.com/software/integration/wfo, October 2007.
|
| |
25
|
Kun-Lung Wu , Kirsten W. Hildrum , Wei Fan , Philip S. Yu , Charu C. Aggarwal , David A. George , Buǧra Gedik , Eric Bouillet , Xiaohui Gu , Gang Luo , Haixun Wang, Challenges and experience in prototyping a multi-modal stream analytic and monitoring application on System S, Proceedings of the 33rd international conference on Very large data bases, September 23-27, 2007, Vienna, Austria
|
CITED BY 4
|
|
Deepak S. Turaga , Brian Foo , Olivier Verscheure , Rong Yan, Configuring topologies of distributed semantic concept classifiers for continuous multimedia stream processing, Proceeding of the 16th ACM international conference on Multimedia, October 26-31, 2008, Vancouver, British Columbia, Canada
|
|
|
|
|
|
G. M. O'Hare , M. J. O'Grady , R. Tynan , C. Muldoon , H. R. Kolar , A. G. Ruzzelli , D. Diamond , E. Sweeney, Embedding intelligent decision making within complex dynamic environments, Artificial Intelligence Review, v.27 n.2-3, p.189-201, March 2007
|
|
|
Huayong Wang , Henrique Andrade , Bugra Gedik , Kun-Lung Wu, Auto-vectorization through code generation for stream processing applications, Proceedings of the 23rd international conference on Supercomputing, June 08-12, 2009, Yorktown Heights, NY, USA
|
|