|
ABSTRACT
For privacy reasons, sensitive content may be revised before it is released. The revision often consists of redaction, that is, the "blacking out" of sensitive words and phrases. Redaction has the side effect of reducing the utility of the content, often so much that the content is no longer useful. Consequently, government agencies and others are increasingly exploring the revision of sensitive content as an alternative to redaction that preserves more content utility. We call this practice sanitization. In a sanitized document, names might be replaced with pseudonyms and sensitive attributes might be replaced with hypernyms. Sanitization adds to redaction the challenge of determining what words and phrases reduce the sensitivity of content. We have designed and developed a tool to assist users in sanitizing sensitive content. Our tool leverages the Web to automatically identify sensitive words and phrases and quickly evaluates revisions for sensitivity. The tool, however, does not identify all sensitive terms and mistakenly marks some innocuous terms as sensitive. This is unavoidable because of the difficulty of the underlying inference problem and is the main reason we have designed a sanitization assistant as opposed to a fully-automated tool. We have conducted a small study of our tool in which users sanitize biographies of celebrities to hide the celebrity's identity both both with and without our tool. The user study suggests that while the tool is very valuable in encouraging users to preserve content utility and can preserve privacy, this usefulness and apparent authoritativeness may lead to a "slippery slope" in which users neglect their own judgment in favor of the tool's.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Venkatesan T. Chakaravarthy , Himanshu Gupta , Prasan Roy , Mukesh K. Mohania, Efficient techniques for document sanitization, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
[doi> 10.1145/1458082.1458194]
|
 |
3
|
|
| |
4
|
K. Crawford. Have a blog, lose your job? CNN/Money. February 15, 2005.
|
| |
5
|
IntelliDact. CSI Computing Systems Innovations. http://www.csisoft.com
|
 |
6
|
Clare-Marie Karat , John Karat , Carolyn Brodie , Jinjuan Feng, Evaluating interfaces for privacy policy rule authoring, Proceedings of the SIGCHI conference on Human Factors in computing systems, April 22-27, 2006, Montréal, Québec, Canada
[doi> 10.1145/1124772.1124787]
|
| |
7
|
D. Lopresti and A. Spitz. Information leakage through document redaction: attacks and countermeasures. Proceedings of Document Recognition and Retrieval XII. January 2005.
|
| |
8
|
Google Directory. http://www.google.com/dirhp
|
| |
9
|
C. Johnson, III. Memorandum M-07-16, "Safeguarding against and responding to the breach of personally identifiable information". FAQ. May 22, 2007.
|
| |
10
|
Judicial Watch. FBI protects Osama bin Laden's "Right to Privacy" in document release. April 20, 2005. http://www.judicialwatch.org/printer_5286.shtml
|
| |
11
|
J. Markoff. Researchers develop computer techniques to bring blacked-out words to light. The New York Times. May 10, 2004.
|
| |
12
|
Amazon Mechanical Turk. https://www.mturk.com/mturk/welcome
|
| |
13
|
OpenNLP. http://opennlp.sourceforge.net/
|
| |
14
|
RapidRedact. http://www.rapidredact.com/
|
| |
15
|
S. Shane. Spies do a huge volume of work in invisible ink. The New York Times. October 28, 2007.
|
| |
16
|
B. Sullivan. California data leak raises questions. Experts wonder: Why do agencies share SSNs? MSNBC. October 27, 2004.
|
| |
17
|
|
| |
18
|
V. Plame Wilson. Fair Game: My life as a spy, my betrayal by the White House. Simon and Schuster, 2007.
|
| |
19
|
A. Witt. Blog Interrupted. The Washington Post. August 15, 2004.
|
| |
20
|
TrackMeNot. http://mrl.nyu.edu/dhowe/trackmenot/
|
| |
21
|
WordNet. http://wordnet.princeton.edu
|
|