| Incorporating domain knowledge into topic modeling via Dirichlet Forest priors |
| Full text |
Pdf
(733 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 382
archive
Proceedings of the 26th Annual International Conference on Machine Learning
table of contents
Montreal, Quebec, Canada
Pages 25-32
Year of Publication: 2009
ISBN:978-1-60558-516-1
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 15, Downloads (12 Months): 46, Citation Count: 0
|
|
|
ABSTRACT
Users of topic modeling methods often have knowledge about the composition of words that should have high or low probability in various topics. We incorporate such domain knowledge using a novel Dirichlet Forest prior in a Latent Dirichlet Allocation framework. The prior is a mixture of Dirichlet tree distributions with special structures. We present its construction, and inference via collapsed Gibbs sampling. Experiments on synthetic and real datasets demonstrate our model's ability to follow and generalize beyond user-specified domain knowledge.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Blei, D., & Lafferty, J. (2006). Correlated topic models. In Advances in neural information processing systems 18, 147--154. Cambridge, MA: MIT Press.
|
| |
3
|
|
| |
4
|
|
| |
5
|
Dennis III, S. Y. (1991). On the hyper-Dirichlet type 1 and hyper-Liouville distributions. Communications in Statistics -- Theory and Methods, 20, 4069--4081.
|
| |
6
|
Goldberg, A., Fillmore, N., Andrzejewski, D., Xu, Z., Gibson, B., & Zhu, X. (2009). May all your wishes come true: A study of wishes and how to recognize them. Human Language Technologies: Proc. of the Annual Conf. of the North American Chapter of the Assoc. for Computational Linguistics. ACL Press.
|
| |
7
|
Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proc. of the Natl. Academy of Sciences of the United States of America, 101, 5228--5235.
|
| |
8
|
|
 |
9
|
|
| |
10
|
Minka, T. P. (1999). The Dirichlet-tree distribution (Technical Report). http://research.microsoft.com/~minka/papers/dirichlet/minka-dirtree.pdf.
|
| |
11
|
Tam, Y.-C., & Schultz, T. (2007). Correlated latent semantic model for unsupervised LM adaptation. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (pp. 41--44).
|
| |
12
|
The Gene Ontology Consortium (2000). Gene Ontology: Tool for the unification of biology. Nature Genetics, 25, 25--29.
|
|