ACM Home Page
Please provide us with feedback. Feedback
A natural language processing approach to automatic plagiarism detection
Full text PdfPdf (348 KB)
Source
Conference On Information Technology Education (formerly CITC) archive
Proceedings of the 8th ACM SIGITE conference on Information technology education table of contents
Destin, Florida, USA
SESSION: Ethics and service learning table of contents
Pages 213-218  
Year of Publication: 2007
ISBN:978-1-59593-920-3
Authors
Chi-Hong Leung  The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Yuen-Yan Chan  The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 30,   Downloads (12 Months): 154,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1324302.1324348
What is a DOI?

ABSTRACT

The problem of plagiarism has existed for a long time but with the advance of information technology the problem becomes worse. It is because there are many electronic versions of published materials available to everyone. The Web is an important and common source for plagiarism. Some plagiarism detection programs (such as Turnitin) were developed to attempt to deal with this problem. To determine whether an article is copied from the Web or other electronic sources, the plagiarism detection program should calculate the similarity between two articles. However, it is often difficult to detect plagiarism accurately after modification of the copied contents. For example, it is possible to simply replace a word with its synonym (e.g. "program" -- "software ") and change the entire sentence structure. Most plagiarism detection programs can only compare whether two words are the same lexically and count how many matched words are there in a paper. Thus, if the copied materials are modified deliberately, it becomes difficult to detect plagiarism.

Application of natural language processing can help to resolve this kind of problem. The underlying syntactic structure and semantic meaning of two sentences can be compared to reveal their similarity. There are several steps in the matching procedure. First, the thesaurus (or the lexical hierarchical structure) is referenced to find out the synonyms, broader terms and narrower terms used in the paper being checked. Then, the paper will be compared with the documents in the database. Wordnet is a typical example of the thesaurus that can be used for this purpose. If it is suspected that the paper contains some contents from the database, the sentences of the paper may be parsed to construct their parsing trees and semantic representations for further detailed comparison. The context free grammar and the case grammar are used to represent the syntactic structure and semantic meaning of sentences in the system. It is found that plagiarism that cannot be detected by the traditional methods can be identified by this new approach.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Austin, P. K. Lexical Functional Grammar. In N. J. Smelser and P. Baltes (eds.), International Encyclopedia of the Social and Behavioural Sciences, Elsevier, pp. 8748--8754, 2001.
 
2
BalaSundaraRaman; I. S., and Sanjeeth K. R. Context Free Grammar for Natural Language Constructs - An implementation for Venpa Class of Tamil Poetry. In Proceedings of Tamil Internet (Chennai, August 22, 2003). International Forum for Information Technology in Tamil, 2003, 128--136.
 
3
Chomsky, N. Three models for the description of language. Information Theory, IEEE Transactions, 2, 3 (Sep. 1956), 113--124.
 
4
Culy, C. The Complexity of the Vocabulary of Bambara. Linguistics and Philosophy, 8, (1985), 345--351.
 
5
Dalrymple, M. Lexical Functional Grammar, Syntax and Semantic, New York: Academic Press, 2001.
 
6
Fanning, K. Is honesty still the best policy? Junior Scholastic, 107, 17 (Apr. 2005), 8--9.
 
7
Fillmore, C. J. The Case for Case Reopened. Studies in Syntax and Semantics, 8, (1977), 59--81.
 
8
Kaplan, R. M. and Joan B. Lexical-functional grammar: A formal system for grammatical representations. In M. Dalrymple, R. M. Kaplan, III M., John t. and A. Zaenen (eds.), Formal issues in Lexical-Functional Grammar, Stanford, CA: CSLI Publications, pp. 29--130, 1995.
 
9
 
10
MacDonell, C. The problem of plagiarism, School Library Journal, 51, 1 (Jan. 2005), 35.
 
11
McCullen, C. Preventing digital plagiarism. Technology & Learning, 22, 9 (Apr. 2002), 8.
 
12
Murphy, T. The emergence of texture: An analysis of the functions of the nominal demonstratives in an English interlanguage corpus, Language Learning & Technology, 5, 3 (Sept., 2001), 152--173.
 
13
Murray, W. The plagiarism phenomenon. E.learning Age, (Oct 2006), 22--25.
 
14
Pullum, G. K., and Gerald G. Natural languages and context-free languages. Linguistics and Philosophy, 4, (1982), 471--504.
 
15
Roach, R. Rutgers tests Internet plagiarism software. Black Issues in Higher Education, 18, 16 (Sep. 2001), 45.
 
16
Royce, J. Has turnitin.com got it all wrapped up? Teacher Librarian, 30, 4 (Apr. 2003), 26--30.
 
17
Shieber, S. Evidence against the context-freeness of natural language. Linguistics and Philosophy, 8, 91985), 333--343.
 
18
 
19


Collaborative Colleagues:
Chi-Hong Leung: colleagues
Yuen-Yan Chan: colleagues