| New ensemble methods for evolving data streams |
| Full text |
Mov
(18:24),
Pdf
(596 KB)
|
Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Paris, France
SESSION: Research track papers
table of contents
Pages 139-148
Year of Publication: 2009
ISBN:978-1-60558-495-9
|
|
Authors
|
|
Albert Bifet
|
Universitat Politècnica de Catalunya, Barcelona, Spain
|
|
Geoff Holmes
|
University of Waikato, Hamilton, New Zealand
|
|
Bernhard Pfahringer
|
University of Waikato, Hamilton, New Zealand
|
|
Richard Kirkby
|
University of Waikato, Hamilton, New Zealand
|
|
Ricard Gavaldà
|
Universitat Politècnica de Catalunya, Barcelona, Spain
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 59, Downloads (12 Months): 201, Citation Count: 0
|
|
|
ABSTRACT
Advanced analysis of data streams is quickly becoming a key area of data mining research as the number of applications demanding such processing increases. Online mining when such data streams evolve over time, that is when concepts drift or change completely, is becoming one of the core issues. When tackling non-stationary concepts, ensembles of classifiers have several advantages over single classifier methods: they are easy to scale and parallelize, they can adapt to change quickly by pruning under-performing parts of the ensemble, and they therefore usually also generate more accurate concept descriptions. This paper proposes a new experimental data stream framework for studying concept drift, and two new variants of Bagging: ADWIN Bagging and Adaptive-Size Hoeffding Tree (ASHT) Bagging. Using the new experimental framework, an evaluation study on synthetic and real-world datasets comprising up to ten million examples shows that the new ensemble methods perform very well compared to several known methods.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
A. Asuncion and D. Newman. UCI machine learning repository, 2007.
|
| |
4
|
M. Baena-Garcıa, J. D. Campo-Ávila, R. Fidalgo, A. Bifet, R. Gavaldà, and R. Morales-Bueno. Early drift detection method. In Fourth International Workshop on Knowledge Discovery from Data Streams, 2006.
|
| |
5
|
A. Bifet and R. Gavaldà. Learning from time-changing data with adaptive windowing. In SIAM International Conference on Data Mining, pages 443--448, 2007.
|
| |
6
|
L. Breiman et al. Classification and Regression Trees. Chapman&Hall, New York, 1984.
|
| |
7
|
F. Chu and C. Zaniolo. Fast and light boosting for adaptive mining of data streams. In PAKDD, pages 282--292. Springer Verlag, 2004.
|
 |
8
|
|
| |
9
|
J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detection. In SBIA Brazilian Symposium on Artificial Intelligence, pages 286--295, 2004.
|
 |
10
|
|
| |
11
|
|
| |
12
|
M. Harries. Splice-2 comparative evaluation: Electricity pricing. Technical report, The University of South Wales, 1999.
|
| |
13
|
G. Holmes, R. Kirkby, and B. Pfahringer. MOA: Massive Online Analysis. http://sourceforge.net/projects/ moa-datastream. 2007.
|
 |
14
|
|
| |
15
|
R. Kirkby. Improving Hoeffding Trees. PhD thesis, University of Waikato, November 2007.
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
N. Oza and S. Russell. Online bagging and boosting. In Artificial Intelligence and Statistics 2001, pages 105--112. Morgan Kaufmann, 2001.
|
 |
20
|
|
| |
21
|
R. Pelossof, M. Jones, I. Vovsha, and C. Rudin. Online coordinate boosting. http://arxiv.org/abs/0810.4553, 2008.
|
| |
22
|
B. Pfahringer, G. Holmes, and R. Kirkby. New options for hoeffding trees. In AI, pages 90--99, 2007.
|
| |
23
|
|
| |
24
|
|
 |
25
|
|
| |
26
|
|
 |
27
|
|
|