|
ABSTRACT
A statistical approach to decision tree modeling is described. In this approach, each decision in the tree is modeled parametrically as is the process by which an output is generated from an input and a sequence of decisions. The resulting model yields a likelihood measure of goodness of fit, allowing ML and MAP estimation techniques to be utilized. An efficient algorithm is presented to estimate the parameters in the tree. The model selection problem is presented and several alternative proposals are considered. A hidden Markov version of the tree is described for data sequences that have temporal dependencies.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Baum, L.E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics, 41, 164-171.
|
| |
2
|
Bengio, Y., & Frasconi, P. (in press). Credit assignment through time: Alternatives to backpropagation. Neural Information Processing Systems 6. San Marco, CA: Morgan Kaufmann.
|
| |
3
|
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and Regression Trees. Belmont, CA: Wadsworth International Group.
|
| |
4
|
Cacciatore, T & Nowlan, S. (in press). Mixtures of controllers for jump linear and non-linear plants. Neural Information Processing Systems 6. San Mateo, CA: Morgan Kaufmann.
|
| |
5
|
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39, 1-38.
|
| |
6
|
Draper, N. R., & Smith, H. (t981). Applied Regression Analysis. New York: John Wiley.
|
| |
7
|
Duda, R. O., & Hart, P. E. (1973). Pattern Classification and Scene Analysis. New York: John Wiley.
|
| |
8
|
Ghahramani, Z., & Jordan, M. I. (in press). Supervised learning from incomplete data via the EM approach. Neural Information Processing Systems 6. San Mateo, CA: Morgan Kaufmann.
|
| |
9
|
|
| |
10
|
|
| |
11
|
McCullagh, P. & Nelder, J.A. (1983). Generalized Linear Models. London: Chapman and Hall.
|
| |
12
|
Meila, M. P., & Jordan, M. I. (1994). Learning the parameters of HMMs with auxiliary input. MIT Computational Cognitive Science Tech. Rep. 9401, Cambridge, MA.
|
| |
13
|
Murthy, S. K., Kasif, S., & Salzberg, S. (1993). OCI.' A randomized algorithm for building oblique decision trees. Technical Report, Department of Computer Science, Johns Hopkins University.
|
| |
14
|
Neal, R., & Hinton, G. E. (1993). A new view of the EM algorithm that justifies incremental and other variants. Submitted to Biometrika.
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
Xu, L., Jordan, M. I., & Hinton, G. E. (1994). An alternative mixture of experts model. MIT Computational Cognitive Science Tech. Rep. 9402, Cambridge, MA.
|
|