ACM Home Page
Please provide us with feedback. Feedback
Efficient techniques for document sanitization
Full text PdfPdf (502 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 17th ACM conference on Information and knowledge management table of contents
Napa Valley, California, USA
SESSION: DB: security and privacy table of contents
Pages 843-852  
Year of Publication: 2008
ISBN:978-1-59593-991-3
Authors
Venkatesan T. Chakaravarthy  IBM India Research Lab, New Delhi, India
Himanshu Gupta  IBM India Research Lab, New Delhi, India
Prasan Roy  Aster Data Systems, Redwood City, CA, USA
Mukesh K. Mohania  IBM India Research Lab, New Delhi, India
Sponsors
ACM: Association for Computing Machinery
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 18,   Downloads (12 Months): 143,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458082.1458194
What is a DOI?

ABSTRACT

Sanitization of a document involves removing sensitive information from the document, so that it may be distributed to a broader audience. Such sanitization is needed while declassifying documents involving sensitive or confidential information such as corporate emails, intelligence reports, medical records, etc. In this paper, we present the ERASE framework for performing document sanitization in an automated manner. ERASE can be used to sanitize a document dynamically, so that different users get different views of the same document based on what they are authorized to know. We formalize the problem and present algorithms used in ERASE for finding the appropriate terms to remove from the document. Our preliminary experimental study demonstrates the efficiency and efficacy of the proposed algorithms.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
3
 
4
 
5
 
6
M. Douglass, G. Clifford, A. Reisner, W. Long, G. Moody, and R.G.Mark. De-identification algorithm for free-text nursing notes. In Computers in Cardiology, 2005.
 
7
8
9
 
10
PARC. Xerox unveils technology that blocks access to sensitive data in documents to prevent security leaks, 2007. http://www.parc.com/about/pressroom/news/2007-10-15-redaction.html.
 
11
Y. Saygin, D. Hakkani-Tur, and G. Tur. Sanitization and anonymization of document repositories. In Web and Information Security, 2005.
 
12
L. Sweeney. Replacing personally-identifying information in medical records, the srub system. In Journal of the Americal Medical Informatics Association, 1996.
 
13
 
14
A. Tveit. Anonymization of general practitioner medical records. In HelsIT'04, Trondheim, Norway, 2004.
 
15
U.S. Department of Energy. Department of energy researches use of advanced computing for document declassification.
 
16
Wikipedia. Sanitization (classified information) - wikipedia, the free encyclopedia, 2008.
17


Collaborative Colleagues:
Venkatesan T. Chakaravarthy: colleagues
Himanshu Gupta: colleagues
Prasan Roy: colleagues
Mukesh K. Mohania: colleagues