|
ABSTRACT
Classification of streaming data faces three basic challenges: it has to deal with huge amounts of data, the varying time between two stream data items must be used best possible (anytime classification) and additional training data must be incrementally learned (anytime learning) for applying the classifier consistently to fast data streams. In this work, we propose a novel index-based technique that can handle all three of the above challenges using the established Bayes classifier on effective kernel density estimators. Our novel Bayes tree automatically generates (adapted efficiently to the individual object to be classified) a hierarchy of mixture densities that represent kernel density estimators at successively coarser levels. Our probability density queries together with novel classification improvement strategies provide the necessary information for very effective classification at any point of interruption. Moreover, we propose a novel evaluation method for anytime classification using Poisson streams and demonstrate the anytime learning performance of the Bayes tree.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
D. Andre and P. Stone. Physiological data modeling contest (ICML-2004): http://www.cs.utexas.edu/users/pstone/workshops/2004icml/, 2004.
|
| |
2
|
T. Bayes. An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society, 53:370--418, 1763.
|
 |
3
|
Norbert Beckmann , Hans-Peter Kriegel , Ralf Schneider , Bernhard Seeger, The R*-tree: an efficient and robust access method for points and rectangles, Proceedings of the 1990 ACM SIGMOD international conference on Management of data, p.322-331, May 23-26, 1990, Atlantic City, New Jersey, United States
|
| |
4
|
C. Böhm, A. Pryakhin, and M. Schubert. The Gauss-Tree: Efficient Object Identification in Databases of Probabilistic Feature Vectors. ICDE, 2006.
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
K. Crammer, J. S. Kandola, and Y. Singer. Online classification on a budget. In NIPS, 2003.
|
| |
9
|
|
 |
10
|
|
| |
11
|
P. Domingos and G. Hulten. Learning from infinite data in finite time. In NIPS, pages 673--680, 2001.
|
| |
12
|
|
 |
13
|
|
| |
14
|
A. G. Gray and A. W. Moore. Nonparametric density estimation: Toward computational tractability. In SDM, 2003.
|
 |
15
|
|
| |
16
|
T. Hastie, R. Tibshirani, and J. H. Friedman. Datasets for "The Elements of Statistical Learning": http://www-stat.stanford.edu/~tibs/elemstatlearn/.
|
| |
17
|
T. Hastie, R. Tibshirani, and J. H. Friedman. The Elements of Statistical Learning. Springer, 2002.
|
| |
18
|
S. Hettich and S. Bay. The UCI KDD archive http://kdd.ics.uci.edu, 1999.
|
 |
19
|
|
| |
20
|
M. Jordan and R. Jacobs. Hierarchical Mixtures of Experts and the EM Algorithm. Graphical Models: Foundations of Neural Computation, 2001.
|
| |
21
|
Philipp Kranen , David Kensche , Saim Kim , Nadine Zimmermann , Emmanuel Müller , Christoph Quix , Xiang Li , Thomas Gries , Thomas Seidl , Matthias Jarke , Steffen Leonhardt, Mobile Mining and Information Management in HealthNet Scenarios, Proceedings of the The Ninth International Conference on Mobile Data Management, p.215-216, April 27-30, 2008
[doi> 10.1109/MDM.2008.12]
|
 |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
T. Seidl. Nearest Neighbor Classification / Liu L., Özsu M. T. (eds.): Encyclopedia of Database Systems. (to appear). Springer, 2009.
|
 |
26
|
|
| |
27
|
L. Silva, J. M. de Sa, and L. Alexandre. Neural network classification using shannonŠs entropy. In ESANN, 2005.
|
| |
28
|
B. Silverman. Density Estimation for Statistics and Data Analysis. 1986.
|
 |
29
|
|
| |
30
|
|
| |
31
|
W. Wahlster. Verbmobil: Foundations of Speech-To-Speech Translation. Springer, 2000.
|
 |
32
|
|
| |
33
|
|
| |
34
|
|
 |
35
|
Tian Zhang , Raghu Ramakrishnan , Miron Livny, Fast density estimation using CF-kernel for very large databases, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.312-316, August 15-18, 1999, San Diego, California, United States
[doi> 10.1145/312129.312266]
|
| |
36
|
S. Zilberstein. Using anytime algorithms in intelligent systems. The AI magazine, 17(3):73--83, 1996.
|
|