| SPC: a distributed, scalable platform for data mining |
| Full text |
Pdf
(540 KB)
|
Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the 4th international workshop on Data mining standards, services and platforms
table of contents
Philadelphia, Pennsylvania
Pages: 27 - 37
Year of Publication: 2006
ISBN:1-59593-443-X
|
|
Authors
|
|
Lisa Amini
|
IBM T. J. Watson Research Center, Hawthorne, NY
|
|
Henrique Andrade
|
IBM T. J. Watson Research Center, Hawthorne, NY
|
|
Ranjita Bhagwan
|
IBM T. J. Watson Research Center, Hawthorne, NY
|
|
Frank Eskesen
|
IBM T. J. Watson Research Center, Hawthorne, NY
|
|
Richard King
|
IBM T. J. Watson Research Center, Hawthorne, NY
|
|
Philippe Selo
|
IBM T. J. Watson Research Center, Hawthorne, NY
|
|
Yoonho Park
|
IBM T. J. Watson Research Center, Hawthorne, NY
|
|
Chitra Venkatramani
|
IBM T. J. Watson Research Center, Hawthorne, NY
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 18, Downloads (12 Months): 127, Citation Count: 6
|
|
|
ABSTRACT
The Stream Processing Core (SPC) is distributed stream processing middleware designed to support applications that extract information from a large number of digital data streams. In this paper, we describe the SPC programming model which, to the best of our knowledge, is the first to support stream-mining applications using a subscription-like model for specifying stream connections as well as to provide support for non-relational operators. This enables stream-mining applications to tap into, analyze and track an ever-changing array of data streams which may contain information relevant to the streaming-queries placed on it. We describe the design, implementation, and experimental evaluation of the SPC distributed middleware, which deploys applications on to the running system in an incremental fashion, making stream connections as required. Using micro-benchmarks and a representative large-scale synthetic stream-mining application, we evaluate the performance of the control and data paths of the SPC middleware.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Daniel J. Abadi , Don Carney , Ugur Çetintemel , Mitch Cherniack , Christian Convey , Sangdon Lee , Michael Stonebraker , Nesime Tatbul , Stan Zdonik, Aurora: a new model and architecture for data stream management, The VLDB Journal — The International Journal on Very Large Data Bases, v.12 n.2, p.120-139, August 2003
[doi> 10.1007/s00778-003-0095-z]
|
| |
2
|
D. J. Abadi, Y. Ahmad, M. Balazinska, U. Cetintemel, M. Cherniack, J.-H. Hwang, W. Lindner, A. S. Maskey, A. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, and S. Zdonik. The design of the borealis stream processing engine. In Proceedings of the 2005 Conference on Innovative Data Systems Research (CIDR 2003), Asilomar, CA, 2005.
|
| |
3
|
|
| |
4
|
A. Arasu, B. Babcock, S. Babu, J. Cieslewicz, M. Datar, R. Motwani, U. Srivastava, and J. Widom. STREAM: The stanford data stream management system (demonstration description). To appear in a book on data stream management edited by Garofalakis, Gehrke and Rastogi, 2004.
|
 |
5
|
Arvind Arasu , Brian Babcock , Shivnath Babu , Mayur Datar , Keith Ito , Itaru Nishizawa , Justin Rosenstein , Jennifer Widom, STREAM: the stanford stream data manager (demonstration description), Proceedings of the 2003 ACM SIGMOD international conference on Management of data, June 09-12, 2003, San Diego, California
[doi> 10.1145/872757.872854]
|
| |
6
|
Arvind Arasu , Mitch Cherniack , Eduardo Galvez , David Maier , Anurag S. Maskey , Esther Ryvkina , Michael Stonebraker , Richard Tibbetts, Linear road: a stream data management benchmark, Proceedings of the Thirtieth international conference on Very large data bases, p.480-491, August 31-September 03, 2004, Toronto, Canada
|
| |
7
|
Michael D. Beynon , Tahsin Kurc , Umit Catalyurek , Chialin Chang , Alan Sussman , Joel Saltz, Distributed processing of very large datasets with DataCutter, Parallel Computing, v.27 n.11, p.1457-1478, October 2001
[doi> 10.1016/S0167-8191(01)00099-0]
|
| |
8
|
S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, V. Raman, F. Reiss, and M. Shah. TelegraphCQ: Continuous dataflow processing for an uncertain world. In Proceedings of the 2003 Conference on Innovative Data Systems Research (CIDR 2003), Asilomar, CA, 2003.
|
| |
9
|
M. Cherniack, H. Balakrishnan, M. Balazinska, D. Carney, U. Cetintemel, Y. Xing, and S. Zdonik. Scalable distributed stream processing. In Proceedings of the 2003 Conference on Innovative Data Systems Research (CIDR 2003), Asilomar, CA, January 2003.
|
| |
10
|
H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A framework and graphical development environment for robust NLP tools and applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, Philadelphia, PA, July 2002.
|
| |
11
|
|
| |
12
|
M. Hapner, R. Burridge, R. Sharma, and J. Fialli. Java message service -- version 1.0.2b, August 2001. Sun Microsystems.
|
 |
13
|
Navendu Jain , Lisa Amini , Henrique Andrade , Richard King , Yoonho Park , Philippe Selo , Chitra Venkatramani, Design, implementation, and evaluation of the linear road bnchmark on the stream processing core, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA
[doi> 10.1145/1142473.1142522]
|
 |
14
|
Rainer Koster , Andrew P. Black , Jie Huang , Jonathan Walpole , Calton Pu, Infopipes for composing distributed information flows, Proceedings of the 2001 international workshop on Multimedia middleware, October 05, 2001, Ottawa, Ontario, Canada
[doi> 10.1145/985135.985150]
|
| |
15
|
C.-Y. Lin, O. Verscheure, and L. Amini. Videodig project. http://www.research.ibm.com/VideoDIG.
|
| |
16
|
C.-Y. Lin, O. Verscheure, and L. Amini. Semantic routing and filtering for large-scale video streams monitoring. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME 2005), Amsterdam, Netherlands, July 2005.
|
 |
17
|
|
 |
18
|
|
| |
19
|
G. Swint, G. Jung, and C. Pu. Event-based QoS for a distributed continual query system. In Proceedings of the 2005 IEEE International Conference on Information Reuse and Integration (IRI 2005), Las Vegas, NV, August 2005.
|
CITED BY 6
|
|
|
|
|
Bugra Gedik , Henrique Andrade , Kun-Lung Wu , Philip S. Yu , Myungcheol Doo, SPADE: the system s declarative stream processing engine, Proceedings of the 2008 ACM SIGMOD international conference on Management of data, June 09-12, 2008, Vancouver, Canada
|
|
|
|
|
|
Joel Wolf , Nikhil Bansal , Kirsten Hildrum , Sujay Parekh , Deepak Rajan , Rohit Wagle , Kun-Lung Wu , Lisa Fleischer, SODA: an optimizing scheduler for large-scale stream-based distributed computer systems, Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware, December 01-05, 2008, Leuven, Belgium
|
|
|
Padmanabhan S. Pillai , Lily B. Mummert , Steven W. Schlosser , Rahul Sukthankar , Casey J. Helfrich, SLIPstream: scalable low-latency interactive perception on streaming data, Proceedings of the 18th international workshop on Network and operating systems support for digital audio and video, June 03-05, 2009, Williamsburg, VA, USA
|
|
|
Min Wang , Marion Blount , John Davis , Archan Misra , Daby Sow, A time-and-value centric provenance model and architecture for medical event streams, Proceedings of the 1st ACM SIGMOBILE international workshop on Systems and networking support for healthcare and assisted living environments, June 11-11, 2007, San Juan, Puerto Rico
|
|