ACM Home Page
Please provide us with feedback. Feedback
Learning Bayesian network classifiers by maximizing conditional likelihood
Full text PdfPdf (187 KB)
Source ACM International Conference Proceeding Series; Vol. 69 archive
Proceedings of the twenty-first international conference on Machine learning table of contents
Banff, Alberta, Canada
Page: 46  
Year of Publication: 2004
ISBN:1-58113-828-5
Authors
Daniel Grossman  University of Washington, Seattle, WA
Pedro Domingos  University of Washington, Seattle, WA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 123,   Citation Count: 18
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1015330.1015339
What is a DOI?

ABSTRACT

Bayesian networks are a powerful probabilistic representation, and their use for classification has received considerable attention. However, they tend to perform poorly when learned in the standard way. This is attributable to a mismatch between the objective function used (likelihood or a function thereof) and the goal of classification (maximizing accuracy or conditional likelihood). Unfortunately, the computational cost of optimizing structure and parameters for conditional likelihood is prohibitive. In this paper we show that a simple approximation---choosing structures by maximizing conditional likelihood while setting parameters by maximum likelihood---yields good results. On a large suite of benchmark datasets, this approach produces better class probability estimates than naive Bayes, TAN, and generatively-trained Bayesian networks.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Agresti, A. (1990). Categorical data analysis. New York, NY: Wiley.
 
2
 
3
Bilmes, J., Zweig, G., Richardson, T., Filali, K., Livescu, K., Xu, P., Jackson, K., Brandman, Y., Sandness, E., Holtz, E., Torres, J., & Byrne, B. (2001). Discriminatively structured graphical models for speech recognition (Tech. Rept.). Center for Language and Speech Processing, Johns Hopkins Univ., Baltimore, MD.
 
4
Blake, C., & Merz, C. J. (2000). UCI repository of machine learning databases. Dept. Information and Computer Science, Univ. California, Irvine, CA. http://www.ics.uci.edu/~mlearn/MLRepository.html.
 
5
 
6
Chow, C. K., & Liu, C. N. (1968). Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14, 462--467.
 
7
 
8
 
9
Duda, R. O., & Hart, P. E. (1973). Pattern classification and scene analysis. New York, NY: Wiley.
 
10
 
11
 
12
 
13
 
14
 
15
Jaakkola, T., Meila, M., & Jebara, T. (1999). Maximum entropy discrimination. In Advances in neural information processing systems 12. Cambridge, MA: MIT Press.
 
16
 
17
Keogh, E., & Pazzani, M. (1999). Learning augmented Bayesian classifiers: A comparison of distribution-based and classification-based approaches. Proc. 7th Intl. Wk-shp. on AI and Statistics (pp. 225--230).
 
18
 
19
Lam, W., & Bacchus, F. (1994). Learning Bayesian belief networks: An approach based on the MDL principle. Computational Intelligence, 10, 269--293.
 
20
Ng, A. Y., & Jordan, M. I. (2002). On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes. In Advances in neural information processing systems 14. Cambridge, MA: MIT Press.
 
21
 
22
 
23
 
24
 
25
Rubinstein, Y. D., & Hastie, T. (1997). Discriminative vs. informative learning. Proc. 3rd Intl. Conf. on Knowledge Discovery and Data Mining (pp. 49--53).
 
26
Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, prediction, and search. New York, NY: Springer.
 
27

CITED BY  18
Collaborative Colleagues:
Daniel Grossman: colleagues
Pedro Domingos: colleagues