ACM Home Page
Please provide us with feedback. Feedback
Classification of source code archives
Full text PdfPdf (67 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval table of contents
Toronto, Canada
POSTER SESSION: Posters table of contents
Pages: 425 - 426  
Year of Publication: 2003
ISBN:1-58113-646-3
Authors
Robert Krovetz  NEC Laboratories America, Princeton, NJ
Secil Ugurel  NEC Laboratories America, Princeton, NJ
C. Lee Giles  Pennsylvania State University, University Park, PA
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 45,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/860435.860533
What is a DOI?

ABSTRACT

The World Wide Web contains a number of source code archives. Programs are usually classified into various categories within the archive by hand. We report on experiments for automatic classification of source code into these categories. We examined a number of factors that affect classification accuracy. Weighting features by expected entropy loss makes a significant improvement in classification accuracy. We show a Support Vector Machine can be trained to classify source code with a high degree of accuracy. We feel these results show promise for software reuse.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Abramson N. Information Theory and Coding, McGraw-Hill, New York, 1963.
2
 
3
Chang C and Lin C. LIBSVM: A library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
 
4
Chen A, Lee Y K, Yao A Y, and Michail A. Code search based on CVS comments: A preliminary evaluation (Technical Report 0106). School of Computer Science and Eng., University of New South Wales, Australia, 2001.
 
5
Dumais S T. Using SVMs for text categorization. IEEE Intelligent Systems Magazine, Trends and Controversies, Vol. 13(4), 21--23, 1998.
 
6
7
 
8
Merkl D. Content-based software classification by self-organization. In Proceedings of the IEEE International Conference on Neural Networks, 1086--1091, 1995.
9
 
10
11


Collaborative Colleagues:
Robert Krovetz: colleagues
Secil Ugurel: colleagues
C. Lee Giles: colleagues