ACM Home Page
Please provide us with feedback. Feedback
Building a scalable and accurate copy detection mechanism
Full text PdfPdf (921 KB)
Source International Conference on Digital Libraries archive
Proceedings of the first ACM international conference on Digital libraries table of contents
Bethesda, Maryland, United States
Pages: 160 - 168  
Year of Publication: 1996
ISBN:0-89791-830-4
Authors
Narayanan Shivakumar  Department of Computer Science, Stanford, CA
Hector Garcia-Molina  Department of Computer Science, Stanford, CA
Sponsors
SIGBIO: ACM Special Interest Group on Biomedical Computing
SIGCAPH: ACM SIGCAPH Computers and the Physically Handicapped
SIGGROUP: ACM Special Interest Group on Supporting Group Work
SIGMIS: ACM Special Interest Group on Management Information Systems
SIGADA: ACM Special Interest Group on Ada Programming Language
SIGART: ACM Special Interest Group on Artificial Intelligence
SIGCUE: ACM Special Interest Group on Computer Uses In Education
SIGCOMM: ACM Special Interest Group on Data Communication
SIGIR: ACM Special Interest Group on Information Retrieval
SIGLINK: Hypertext, Hypermedia, and Web
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 51,   Citation Count: 21
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/226931.226961
What is a DOI?

ABSTRACT

Often, publishers are reluctant to offer valuable digital documents on the Internet for fear that they will be re-transmitted or copied widely. A Copy Detection Mechanism can help identify such copying. For example, publishers may register their documents with a copy detection server, and the server can then automatically check public sources such as UseNet articles and Web sites for potential illegal copies. The server can search for exact copies, and also for cases where significant portions of documents have been copied. In this paper we study, for the first time, the performance of various copy detection mechanisms, including the disk storage requirements, main memory requirements, response times for registration, and response time for querying. We also contrast performance to the accuracy of the mechanisms (how well they detect partial copies). The results are obtained using SCAM, an experimental server we have implemented, and a collection of 50,000 netnews articles.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
J. Brassil, S. Low, N. Maxemchuk, and L.O'Gorman. Document marking and identification using both line and word shifting. Technical report, AT&T Bell Labratories, 1994. May be obtained from ftp://ftp.research.att.com/dist/brassil/docmarkZ.ps.
 
2
J. Brassil, S. Low, N. Maxemchuk, and L.O'Gorman. Electronic marking and identification techniques to discourage document copying. Technical report, AT~T Bell Labratories, 1994.
3
 
4
A. Choudhury, N. Maxemchuk, S. Paul, and H. Schulzrinne. Copyright protection for electronic publishing over computer networks. Technical report, AT&T Bell Labratories, 1994. Submitted to IEEE Network Magazine June 1994.
5
 
6
 
7
G. N. Griswold. A method for protecting copyright on networks. In Joint Harvard MIT Workshop on Technology Strategies .for Protecting Intellectual Property in the Networked Multimedia Environment, April 1993.
 
8
 
9
U. Manber and S. Wu. Glimpse: A tool to search through entire file systems. In Proceedings of the winter USENIX Conference, January 1994.
 
10
11
 
12
 
13
 
14
N. Shivakumar and H. Garcia-Molina. SCAM: A copy detection mechanism for digital documents. In Proceedings of 2rid International Conference in Theory and Practice o} Digital Libraries (DL'95), Austin, Texas, June 1995.
 
15
D. Wheeler. Computer networks are said to offer new opportunities for plagiarists. The Chronicle of Higher Education, pages t7, 19, June 1993.
 
16
T. Yah and H. Garcia-Molina. Duplicate detection in information dissemination. In Proceedings of Very Large Databases (VLDB'95) Conference, Zurich, Switzerland, September 1995.

CITED BY  22

Collaborative Colleagues:
Narayanan Shivakumar: colleagues
Hector Garcia-Molina: colleagues