ACM Home Page
Please provide us with feedback. Feedback
Automatic labeling of multinomial topic models
Full text PdfPdf (1.05 MB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
San Jose, California, USA
SESSION: Research track papers table of contents
Pages: 490 - 499  
Year of Publication: 2007
ISBN:978-1-59593-609-7
Authors
Qiaozhu Mei  University of Illinois at Urbana-Champaign
Xuehua Shen  University of Illinois at Urbana-Champaign
ChengXiang Zhai  University of Illinois at Urbana-Champaign
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 200,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1281192.1281246
What is a DOI?

ABSTRACT

Multinomial distributions over words are frequently used to model topics in text collections. A common, major challenge in applying all such topic models to any text mining problem is to label a multinomial topic model accurately so that a user can interpret the discovered topic. So far, such labels have been generated manually in a subjective way. In this paper, we propose probabilistic approaches to automatically labeling multinomial topic models in an objective way. We cast this labeling problem as an optimization problem involving minimizing Kullback-Leibler divergence between word distributions and maximizing mutual information between a label and a topic model. Experiments with user study have been done on two text data sets with different genres.The results show that the proposed labeling methods are quite effective to generate labels that are meaningful and useful for interpreting the discovered topic models. Our methods are general and can be applied to labeling topics learned through all kinds of topic models such as PLSA, LDA, and their variations.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
S. Banerjee and T. Pedersen. The design, implementation, and use of the ngram statistics package. pages 370--381, 2003.
 
2
D. Blei and J. Lafferty. Correlated topic models. In NIPS '05: Advances in Neural Information Processing Systems 18, 2005.
3
 
4
5
 
6
 
7
 
8
 
9
T. L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences 101(suppl.1): 5228--5235, 2004.
 
10
11
 
12
 
13
P. J. Kaufman, Leonard; Rousseeuw. Finding groups in data. an introduction to cluster analysis. Wiley Series in Probability and Mathematical Statistics. Applied Probability and Statistics. Wiley. New York., 1990.
14
 
15
16
17
18
19
20
 
21
22
23
24
 
25
26
27


Collaborative Colleagues:
Qiaozhu Mei: colleagues
Xuehua Shen: colleagues
ChengXiang Zhai: colleagues