| Efficient techniques for document sanitization |
| Full text |
Pdf
(502 KB)
|
Source
|
Conference on Information and Knowledge Management
archive
Proceeding of the 17th ACM conference on Information and knowledge management
table of contents
Napa Valley, California, USA
SESSION: DB: security and privacy
table of contents
Pages 843-852
Year of Publication: 2008
ISBN:978-1-59593-991-3
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 18, Downloads (12 Months): 143, Citation Count: 1
|
|
|
ABSTRACT
Sanitization of a document involves removing sensitive information from the document, so that it may be distributed to a broader audience. Such sanitization is needed while declassifying documents involving sensitive or confidential information such as corporate emails, intelligence reports, medical records, etc. In this paper, we present the ERASE framework for performing document sanitization in an automated manner. ERASE can be used to sanitize a document dynamically, so that different users get different views of the same document based on what they are authorized to know. We formalize the problem and present algorithms used in ERASE for finding the appropriate terms to remove from the document. Our preliminary experimental study demonstrates the efficiency and efficacy of the proposed algorithms.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Eugene Agichtein , Luis Gravano , Jeff Pavel , Viktoriya Sokolova , Aleksandr Voskoboynik, Snowball: a prototype system for extracting relations from large text collections, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.612, May 21-24, 2001, Santa Barbara, California, United States
|
| |
2
|
|
 |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
M. Douglass, G. Clifford, A. Reisner, W. Long, G. Moody, and R.G.Mark. De-identification algorithm for free-text nursing notes. In Computers in Cardiology, 2005.
|
| |
7
|
|
 |
8
|
|
 |
9
|
|
| |
10
|
PARC. Xerox unveils technology that blocks access to sensitive data in documents to prevent security leaks, 2007. http://www.parc.com/about/pressroom/news/2007-10-15-redaction.html.
|
| |
11
|
Y. Saygin, D. Hakkani-Tur, and G. Tur. Sanitization and anonymization of document repositories. In Web and Information Security, 2005.
|
| |
12
|
L. Sweeney. Replacing personally-identifying information in medical records, the srub system. In Journal of the Americal Medical Informatics Association, 1996.
|
| |
13
|
|
| |
14
|
A. Tveit. Anonymization of general practitioner medical records. In HelsIT'04, Trondheim, Norway, 2004.
|
| |
15
|
U.S. Department of Energy. Department of energy researches use of advanced computing for document declassification.
|
| |
16
|
Wikipedia. Sanitization (classified information) - wikipedia, the free encyclopedia, 2008.
|
 |
17
|
|
|