| Quantifying information leakage in document redaction |
| Full text |
Pdf
(563 KB)
|
| Source
|
Conference on Information and Knowledge Management
archive
Proceedings of the 1st ACM workshop on Hardcopy document processing
table of contents
Washington, DC, USA
Pages: 63 - 69
Year of Publication: 2004
ISBN:1-58113-976-4
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 2, Downloads (12 Months): 26, Citation Count: 1
|
|
|
ABSTRACT
In this paper, we examine ways in which sensitive information might leak through the process of redaction. Such attacks apply known methods from document image analysis and natural language processing to recover text thought to have been obliterated for the purposes of public release. Systematically identifying and testing these weaknesses is a first step towards designing effective countermeasures. We describe our development of a prototype semi-automated system intended to accept as input a redacted document and provide feedback to the user as to whether the document might suffer from such leaks.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Adobe Systems Incorporated, San Jose, CA. Adobe Font Metrics File Format Specification, October 1998.
|
| |
2
|
D. Butler. US intelligence exposed as student decodes Iraq memo. Nature, 429:116, May 2004.
|
| |
3
|
C. Fang and J. J. Hull. A word-level deciphering algorithm for degraded document recognition. In Symposium on Document Analysis and Information Retrieval, pages 191--202, 1995.
|
| |
4
|
J. J. Hull, S. Khoubyari, and T. K. Ho. Visual global context: Word image matching in a methodology for degraded text recognition. In International Conference on Pattern Recognition, pages 665--668, The Hague, Netherlands, 1992.
|
| |
5
|
|
| |
6
|
G. E. Kopec. Least-squares font metric estimation from images. IEEE Transactions on Image Processing, 2(4):510--519, October 1993.
|
| |
7
|
J. W. Leonard. Classified National Security Information Directive No. 1, September 2003. http://www.archives.gov/isoo/policy_documents/eo_12958_implementing_directive.html.
|
| |
8
|
D. Lopresti and A. L. Spitz. Information leakage through document redaction: Attacks and countermeasures, July 2004. Submitted for publication.
|
| |
9
|
D. Naccache, May 2004. Private communication.
|
| |
10
|
U.S. Census Bureau: Name Files, September 2004. http://www.census.gov/genealogy/names/.
|
| |
11
|
|
| |
12
|
U.S. Census Bureau: U.S. and World Population Clocks, September 2004. http://www.census.gov/main/www/popclock.html.
|
| |
13
|
A. L. Spitz. Using character shape codes for word spotting in document images. In D. Dori and A. Bruckstein, editors, Shape, Structure and Pattern Recognition, pages 382--389. World Scientific, Singapore, 1995.
|
| |
14
|
|
| |
15
|
|
| |
16
|
Tcl Developer Xchange, September 2004. http://www.tcl.tk/.
|
| |
17
|
YAWL (yet another word list), September 2004. http://metalab.unc.edu/pub/Linux/libs/yawl-0.3.tar.gz.
|
| |
18
|
A. Zramdini and R. Ingold. Optical font recognition from projection profiles. Electronic Publishing, 6(3):249--260, 1993.
|
|