ACM Home Page
Please provide us with feedback. Feedback
Incorporating domain knowledge into topic modeling via Dirichlet Forest priors
Full text PdfPdf (733 KB)
Source ACM International Conference Proceeding Series; Vol. 382 archive
Proceedings of the 26th Annual International Conference on Machine Learning table of contents
Montreal, Quebec, Canada
Pages 25-32  
Year of Publication: 2009
ISBN:978-1-60558-516-1
Authors
David Andrzejewski  University of Wisconsin-Madison, Madison, WI
Xiaojin Zhu  University of Wisconsin-Madison, Madison, WI
Mark Craven  University of Wisconsin-Madison, Madison, WI
Sponsors
: MITACS
: NSF
Microsoft Research : Microsoft Research
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 46,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1553374.1553378
What is a DOI?

ABSTRACT

Users of topic modeling methods often have knowledge about the composition of words that should have high or low probability in various topics. We incorporate such domain knowledge using a novel Dirichlet Forest prior in a Latent Dirichlet Allocation framework. The prior is a mixture of Dirichlet tree distributions with special structures. We present its construction, and inference via collapsed Gibbs sampling. Experiments on synthetic and real datasets demonstrate our model's ability to follow and generalize beyond user-specified domain knowledge.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Blei, D., & Lafferty, J. (2006). Correlated topic models. In Advances in neural information processing systems 18, 147--154. Cambridge, MA: MIT Press.
 
3
 
4
 
5
Dennis III, S. Y. (1991). On the hyper-Dirichlet type 1 and hyper-Liouville distributions. Communications in Statistics -- Theory and Methods, 20, 4069--4081.
 
6
Goldberg, A., Fillmore, N., Andrzejewski, D., Xu, Z., Gibson, B., & Zhu, X. (2009). May all your wishes come true: A study of wishes and how to recognize them. Human Language Technologies: Proc. of the Annual Conf. of the North American Chapter of the Assoc. for Computational Linguistics. ACL Press.
 
7
Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proc. of the Natl. Academy of Sciences of the United States of America, 101, 5228--5235.
 
8
9
 
10
Minka, T. P. (1999). The Dirichlet-tree distribution (Technical Report). http://research.microsoft.com/~minka/papers/dirichlet/minka-dirtree.pdf.
 
11
Tam, Y.-C., & Schultz, T. (2007). Correlated latent semantic model for unsupervised LM adaptation. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (pp. 41--44).
 
12
The Gene Ontology Consortium (2000). Gene Ontology: Tool for the unification of biology. Nature Genetics, 25, 25--29.

Collaborative Colleagues:
David Andrzejewski: colleagues
Xiaojin Zhu: colleagues
Mark Craven: colleagues