ACM Home Page
Please provide us with feedback. Feedback
Designing an inductive data stream management system: the stream mill experience
Full text PdfPdf (322 KB)
Source SSPS; Vol. 301 archive
Proceedings of the 2nd international workshop on Scalable stream processing system table of contents
Nantes, France
SESSION: Scheduling, indexing and systems table of contents
Pages 79-88  
Year of Publication: 2008
ISBN:978-159593-963-0
Authors
Hetal Thakkar  University of California at Los Angeles
Barzan Mozafari  University of California at Los Angeles
Carlo Zaniolo  University of California at Los Angeles
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 108,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1379272.1379286
What is a DOI?

ABSTRACT

There has been much recent interest in on-line data mining. Existing mining algorithms designed for stored data are either not applicable or not effective on data streams, where real-time response is often needed and data characteristics change frequently. Therefore, researchers have been focusing on designing new and improved algorithms for on-line mining tasks, such as classification, clustering, frequent itemsets mining, pattern matching, etc. Relatively little attention has been paid to designing DSMSs, which facilitate and integrate the task of mining data streams---i.e., stream systems that provide Inductive functionalities analogous to those provided by Weka and MS OLE DB for stored data. In this paper, we propose the notion of an Inductive DSMS---a system that besides providing a rich library of inter-operable functions to support the whole mining process, also supports the essentials of DSMS, including optimization of continuous queries, load shedding, synoptic constructs, and non-stop computing. Ease-of-use and extensibility are additional desiderata for the proposed Inductive DSMS. We first review the many challenges involved in realizing such a system and then present our approach of extending the Stream Mill DSMS toward that goal. Our system features (i) a powerful query language where mining methods are expressed via aggregates for generic streams and arbitrary windows, (ii) a library of fast and light mining algorithms, and (iii) an architecture that makes it easy to customize and extend existing mining methods and introduce new ones.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Atlas user manual. http://wis.cs.ucla.edu/atlas.
 
2
DB2 Universal Database http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp.
 
3
Decision Tree Entropy Calculation http://decisiontrees.net/? q=node/27.
 
4
IBM. DB2 Intelligent Miner http://www-306.ibm.com/software/data/iminer.
 
5
ORACLE. Oracle Data Miner Release 10gr2 http://www.oracle.com/technology/products/bi/odm.
 
6
A. Arasu, S. Babu, and J. Widom. Cql: A language for continuous queries over streams and relations. In DBPL, pages 1--19, 2003.
 
7
8
9
 
10
Toon Calders, Bart Goethals, and Adriana Prado. Integrating pattern mining in relational databases. In PKDD, volume 4213 of Lecture Notes in Computer Science, pages 454--461. Springer, 2006.
 
11
W. Cheung and O. R. Zaiane. Incremental mining of frequent patterns without candidate generation or support. In DEAS, 2003.
 
12
 
13
F. Chu and C. Zaniolo. Fast and light boosting for adaptive mining of data streams. In PAKDD, volume 3056, 2004.
 
14
Weka 3: data mining with open source machine learning software in java. http://www.cs.waikato.ac.nz.
 
15
Guozhu Dong, Jiawei Han, Laks V. S. Lakshmanan, Jian Pei, Haixun Wang, and Philip S. Yu. Online mining of changes from data streams: Research problems and preliminary results. In SIGMOD, 2003.
 
16
 
17
Martin Ester, Hans-Peter Kriegel, Jorg Sander, and Xiaowei Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Second International Conference on Knowledge Discovery and Data Mining, pages 226--231, 1996.
18
 
19
20
 
21
Stream Mill Examples. Approximate Frequent Items http://wis.cs.ucla.edu/stream-mill/examples/freq.html.
 
22
E. Forgy. Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics, page 768, 1965.
23
 
24
J. Han, Y. Fu, W. Wang, K. Koperski, and O. R. Zaiane. DMQL: A data mining query language for relational databases. In Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD), pages 27--33, Montreal, Canada, June 1996.
25
 
26
27
28
 
29
Minsoo Kim, Jae-Hyun Seo, II-Ahn Cheong, and Bong-Nam Noh. Fuzzy Systems and Knowledge Discovery, chapter Auto-generation of Detection Rules with Tree Induction Algorithm, pages 160--169. Springer Berlin / Heidelberg, 2005.
 
30
 
31
32
 
33
 
34
Barzan Mozafari, Hetal Thakkar, and Carlo Zaniolo. Verifying and mining frequent patterns from large windows over data streams. In International Conference on Data Engineering (ICDE), 2008.
 
35
 
36
R. Ramakrishnan, D. Donjerkovic, A. Ranganathan, K. Beyer, and M. Krishnaprasad. Srql: Sorted relational query language, 1998.
37
38
39
 
40
A. Siebes. Where is the mining in kdid? (invited talk). In Fourth Int. Workshop on Knowledge Discovery in Inductive Databases,, 2005.
 
41
42
43
 
44
Carlo Zaniolo. Mining databases and data streamswith query languages and rules (invited talk). In Fourth Int. Workshop on Knowledge Discovery in Inductive Databases,, 2005.
 
45
Fred Zemke, Andrew Witkowski, Mitch Cherniak, and Latha Colby. Pattern matching in sequences of rows. Technical report, Oracle and IBM, 2007.

Collaborative Colleagues:
Hetal Thakkar: colleagues
Barzan Mozafari: colleagues
Carlo Zaniolo: colleagues