ACM Home Page
Please provide us with feedback. Feedback
Mining concepts from code with probabilistic topic models
Full text PdfPdf (233 KB)
Source
Automated Software Engineering archive
Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering table of contents
Atlanta, Georgia, USA
POSTER SESSION: Posters table of contents
Pages 461-464  
Year of Publication: 2007
ISBN:978-1-59593-882-4
Authors
Erik Linstead  University of California, Irvine, Irvine, CA
Paul Rigor  University of California, Irvine, Irvine, CA
Sushil Bajracharya  University of California, Irvine, Irvine, CA
Cristina Lopes  University of California, Irvine, Irvine, CA
Pierre Baldi  University of California, Irvine, Irvine, CA
Sponsors
ACM: Association for Computing Machinery
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
SIGSOFT: ACM Special Interest Group on Software Engineering
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 18,   Downloads (12 Months): 113,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1321631.1321709
What is a DOI?

ABSTRACT

We develop and apply statistical topic models to software as a means of extracting concepts from source code. The effectiveness of the technique is demonstrated on 1,555 projects from SourceForge and Apache consisting of 113,000 files and 19 million lines of code. In addition to providing an automated, unsupervised, solution to the problem of summarizing program functionality, the approach provides a probabilistic framework with which to analyze and visualize source file similarity. Finally, we introduce an information-theoretic approach for computing tangling and scattering of extracted concepts, and present preliminary results


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
S. Deerwester, S. Dumais, T. Landauer, G. Furnas, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391--407, 1990.
 
4
G. Kiczales, J. Lamping, A. Menhdhekar, C. Maeda, C. Lopes, J. Loingtier, and J. Irwin. Aspect-oriented programming. In M. Akşit and S. Matsuoka, editors, Proceedings European Conference on Object-Oriented Programming, volume 1241, pages 220--242. Springer-Verlag, Berlin, Heidelberg, and New York, 1997.
 
5
 
6
 
7
 
8
9

Collaborative Colleagues:
Erik Linstead: colleagues
Paul Rigor: colleagues
Sushil Bajracharya: colleagues
Cristina Lopes: colleagues
Pierre Baldi: colleagues