ACM Home Page
Please provide us with feedback. Feedback
Evolving similarity functions for code plagiarism detection
Full text PdfPdf (366 KB)
Source
Genetic And Evolutionary Computation Conference archive
Proceedings of the 10th annual conference on Genetic and evolutionary computation table of contents
Atlanta, GA, USA
SESSION: Real-world application papers table of contents
Pages 1453-1460  
Year of Publication: 2008
ISBN:978-1-60558-130-9
Authors
Vic Ciesielski  RMIT University, Melbourne, Australia
Nelson Wu  RMIT University, Melbourne, Australia
Seyed Tahaghoghi  RMIT University, Melbourne, Australia
Sponsors
ACM: Association for Computing Machinery
SIGEVO: ACM Special Interest Group on Genetic and Evolutionary Computation
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 121,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1389095.1389380
What is a DOI?

ABSTRACT

Detecting whether computer program code is a student's original work or has been copied from another student or some other source is a major problem for many universities. Detection methods based on the information retrieval concepts of indexing and similarity matching scale well to large collections of files, but require appropriate similarity functions for good performance. We have used particle swarm optimization and genetic programming to evolve similarity functions that are suited to computer program code. Using a training set of plagiarised and non-plagiarised programs we have evolved better parameter values for the previously published Okapi BM25 similarity function. We have then used genetic programming to evolve completely new similarity functions that do not conform to any predetermined structure. We found that the evolved similarity functions outperformed the human developed Okapi BM25 function. We also found that a detection system using the evolved functions was more accurate than the the best code plagiarism detection system in use today, and scales much better to large collections of files. The evolutionary computing techniques have been extremely useful in finding similarity functions that advance the state of the art in code plagiarism detection.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
K. Bowyer and L. Hall. Experience Using 'MOSS' to Detect Cheating On Programming Assignments. In Proceedings of the Frontiers in Education Conference, volume 3, pages 18--22, 1999.
 
2
 
3
M. Clerc and J. Kennedy. The Particle Swarm: Explosion, Stability, and Convergence in a Multi-Dimensional Complex Space. IEEE Transactions on Evolutionary Computation, 6(1):58--73, 2002.
 
4
5
 
6
 
7
 
8
M. Joy and M. Luck. Plagiarism in Programming Assignments. IEEE Transactions on Education, 42(2):129--133, 1999.
 
9
J. Kennedy and R. Eberhart. Particle Swarm Optimization. In Proceedings of the IEEE International Conference on Neural Networks, volume 4, pages 1942--1948, 1995.
10
11
 
12
L. Prechelt, G. Malpohl, and M. Philippsen. Finding Plagiarisms among a Set of Programs with JPlag. Journal of Universal Computer Science, 8(11):1016--1038, 2002.
 
13
S. Robertson and S. Walker. Okapi/Keenbow at TREC-8. Overview of the Eighth Text REtrieval Conference (TREC-8), pages 151--162, 1999.
 
14
15
 
16
 
17
N. Wu. Evolving Similarity Functions for Code Plagiarism Detection. Honours Thesis, RMIT, School of Computer Science and Information Technology, 2007.
 
18
19


Collaborative Colleagues:
Vic Ciesielski: colleagues
Nelson Wu: colleagues
Seyed Tahaghoghi: colleagues