|
ABSTRACT
Bayesian networks are a powerful probabilistic representation, and their use for classification has received considerable attention. However, they tend to perform poorly when learned in the standard way. This is attributable to a mismatch between the objective function used (likelihood or a function thereof) and the goal of classification (maximizing accuracy or conditional likelihood). Unfortunately, the computational cost of optimizing structure and parameters for conditional likelihood is prohibitive. In this paper we show that a simple approximation---choosing structures by maximizing conditional likelihood while setting parameters by maximum likelihood---yields good results. On a large suite of benchmark datasets, this approach produces better class probability estimates than naive Bayes, TAN, and generatively-trained Bayesian networks.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Agresti, A. (1990). Categorical data analysis. New York, NY: Wiley.
|
| |
2
|
|
| |
3
|
Bilmes, J., Zweig, G., Richardson, T., Filali, K., Livescu, K., Xu, P., Jackson, K., Brandman, Y., Sandness, E., Holtz, E., Torres, J., & Byrne, B. (2001). Discriminatively structured graphical models for speech recognition (Tech. Rept.). Center for Language and Speech Processing, Johns Hopkins Univ., Baltimore, MD.
|
| |
4
|
Blake, C., & Merz, C. J. (2000). UCI repository of machine learning databases. Dept. Information and Computer Science, Univ. California, Irvine, CA. http://www.ics.uci.edu/~mlearn/MLRepository.html.
|
| |
5
|
|
| |
6
|
Chow, C. K., & Liu, C. N. (1968). Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14, 462--467.
|
| |
7
|
|
| |
8
|
|
| |
9
|
Duda, R. O., & Hart, P. E. (1973). Pattern classification and scene analysis. New York, NY: Wiley.
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
Jaakkola, T., Meila, M., & Jebara, T. (1999). Maximum entropy discrimination. In Advances in neural information processing systems 12. Cambridge, MA: MIT Press.
|
| |
16
|
|
| |
17
|
Keogh, E., & Pazzani, M. (1999). Learning augmented Bayesian classifiers: A comparison of distribution-based and classification-based approaches. Proc. 7th Intl. Wk-shp. on AI and Statistics (pp. 225--230).
|
| |
18
|
|
| |
19
|
Lam, W., & Bacchus, F. (1994). Learning Bayesian belief networks: An approach based on the MDL principle. Computational Intelligence, 10, 269--293.
|
| |
20
|
Ng, A. Y., & Jordan, M. I. (2002). On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes. In Advances in neural information processing systems 14. Cambridge, MA: MIT Press.
|
| |
21
|
|
| |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
Rubinstein, Y. D., & Hastie, T. (1997). Discriminative vs. informative learning. Proc. 3rd Intl. Conf. on Knowledge Discovery and Data Mining (pp. 49--53).
|
| |
26
|
Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, prediction, and search. New York, NY: Springer.
|
| |
27
|
|
CITED BY 17
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Marion Verduijn , Niels Peek , Peter M. J. Rosseel , Evert de Jonge , Bas A. J. M. de Mol, Prognostic Bayesian networks, Journal of Biomedical Informatics, v.40 n.6, p.609-618, December, 2007
|
|
|
|
|
|
Jiang Su , Harry Zhang , Charles X. Ling , Stan Matwin, Discriminative parameter learning for Bayesian networks, Proceedings of the 25th international conference on Machine learning, p.1016-1023, July 05-09, 2008, Helsinki, Finland
|
|
|
|
|
|
|
|
|
|
|
|
|
|