ACM Home Page
Please provide us with feedback. Feedback
Topic models and a revisit of text-related applications
Full text PdfPdf (302 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 2nd PhD workshop on Information and knowledge management table of contents
Napa Valley, California, USA
SESSION: Session 2 table of contents
Pages 25-32  
Year of Publication: 2008
ISBN:978-1-60558-257-3
Authors
Viet Ha-Thuc  The University of Iowa, Iowa City, IA, USA
Padmini Srinivasan  The University of Iowa, Iowa City, IA, USA
Sponsors
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 57,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458550.1458556
What is a DOI?

ABSTRACT

Topic models such as aspect model or LDA have been shown as a promising approach for text modeling. Unlike many previous models that restrict each document to a single topic, topic models support the important idea that each document could be relevant to multiple topics. This makes topic models significantly more expressive in modeling text documents. However, we observe two limitations in topic models. One is that of scalability as it is extremely expensive to run the models on large corpora. The other limitation is the inability to model the key concept of relevance. This prevents the models from being directly applied to goals such as text classification and relevance feedback for query modification; in these goals, items relevant to topics (classes and queries) are provided upfront. The first aim of this paper is to sketch solutions for these limitations. To alleviate the scalability problem, we introduce a one-scan topic model requiring only a single pass over a corpus for inference. To overcome the latter, we propose relevance-based topic models that have the advantages of previous models while taking the concept of relevance into account. The second aim, based on the proposed models, is to revisit a wide range of well-known but still open text-related tasks, and outline our vision on how the approaches for the tasks could be improved by topic models.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Adrieu, C., Freitas, N., Doucet, A., Jordan, M., An Introduction to Markov Chain Monte Carlo for Machine Learning, Machine Learning, 50, 2003.
 
2
Bhamidipati, N., Pal, S., Stemming via Distribution-based Word Segregation for Classification and Retrieval, In IEEE Transactions on Systems, Man, and Cybernetics, 37(2), 2007.
 
3
 
4
Bradley P. S., Fayyad, U., Reina, C., Scaling Clustering Algorithms to Large Databases, In Proceedings of the 4th ACM SIG International Conference on Knowledge Discovery and Data Mining Conference (KDD), 1998.
 
5
Erosheva, E., Fienberg, S., Lafferty, J., Mixed-membership Models of Scientific Publication, In Proceedings of National Academy of Science (PNAS), 2004.
 
6
Farnstrom, F. Lewis, J., Elkan, C., Scalability for Clustering Algorithms Revisited, In Proceedings of the 6th ACM SIG International Conference on Knowledge Discovery and Data Mining Conference (KDD), 2000.
 
7
Griffiths, T., Steyvers, M., Finding Scientific Topics, In Proceedings of National Academy of Science (PNAS), 2004.
 
8
Ha-Thuc, V., Nguyen, D. C., Srinivasan, P., A Quality-Threshold Data Summarization Algorithm, In Proceedings of the 6th IEEE International Conference on Research, Innovation and Vision for the Future (RIVF), 2008.
9
 
10
Hofmann, T., Probabilistic Latent Semantic Indexing, In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI), 1999.
 
11
Lauser, B., Hotho, A., Automatic Multi-label Subject Indexing in a Multi-lingual Environment, In Proceedings of the 7th European Conference in Research and Advanced Technology for Digital Libraries (ECDL), 2003.
12
13
 
14
McCallum, A., Multi-Label Text Classification with a Mixture Model Trained by EM, In Proceedings of AAAI Workshop on Text Learning, 1999.
 
15
16
 
17
Robertson, S., Sparck-Jones, K., Relevance Weighting of Search Terms, Journal of American Society for Information Science, 27, 1988.
 
18
Sparck-Jones, A., Robertson, S., Hiemstra, D., Zaragoza, H., Language Modelling and Relevance, In Croft, B., and Lafferty, J. (eds.) Language Modeling for Information Retrieval, Kluwer Academics, 2003.
 
19
Steyvers, M., Griffiths, T., Probabilistic Topic Models, In Landauer et al. (eds.) Latent Semantic Analysis: A Road to Meaning, Laurence Erlbaum, 2006.
20
21
22
23
24
 
25

Collaborative Colleagues:
Viet Ha-Thuc: colleagues
Padmini Srinivasan: colleagues