ACM Home Page
Please provide us with feedback. Feedback
Quantifying information leakage in document redaction
Full text PdfPdf (563 KB)
Source Conference on Information and Knowledge Management archive
Proceedings of the 1st ACM workshop on Hardcopy document processing table of contents
Washington, DC, USA
Pages: 63 - 69  
Year of Publication: 2004
ISBN:1-58113-976-4
Authors
Daniel Lopresti  Lehigh University, Bethlehem, PA
A. Lawrence Spitz  DocRec Ltd., Atawhai, Nelson, New Zealand
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 26,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1031442.1031452
What is a DOI?

ABSTRACT

In this paper, we examine ways in which sensitive information might leak through the process of redaction. Such attacks apply known methods from document image analysis and natural language processing to recover text thought to have been obliterated for the purposes of public release. Systematically identifying and testing these weaknesses is a first step towards designing effective countermeasures. We describe our development of a prototype semi-automated system intended to accept as input a redacted document and provide feedback to the user as to whether the document might suffer from such leaks.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Adobe Systems Incorporated, San Jose, CA. Adobe Font Metrics File Format Specification, October 1998.
 
2
D. Butler. US intelligence exposed as student decodes Iraq memo. Nature, 429:116, May 2004.
 
3
C. Fang and J. J. Hull. A word-level deciphering algorithm for degraded document recognition. In Symposium on Document Analysis and Information Retrieval, pages 191--202, 1995.
 
4
J. J. Hull, S. Khoubyari, and T. K. Ho. Visual global context: Word image matching in a methodology for degraded text recognition. In International Conference on Pattern Recognition, pages 665--668, The Hague, Netherlands, 1992.
 
5
 
6
G. E. Kopec. Least-squares font metric estimation from images. IEEE Transactions on Image Processing, 2(4):510--519, October 1993.
 
7
J. W. Leonard. Classified National Security Information Directive No. 1, September 2003. http://www.archives.gov/isoo/policy_documents/eo_12958_implementing_directive.html.
 
8
D. Lopresti and A. L. Spitz. Information leakage through document redaction: Attacks and countermeasures, July 2004. Submitted for publication.
 
9
D. Naccache, May 2004. Private communication.
 
10
U.S. Census Bureau: Name Files, September 2004. http://www.census.gov/genealogy/names/.
 
11
 
12
U.S. Census Bureau: U.S. and World Population Clocks, September 2004. http://www.census.gov/main/www/popclock.html.
 
13
A. L. Spitz. Using character shape codes for word spotting in document images. In D. Dori and A. Bruckstein, editors, Shape, Structure and Pattern Recognition, pages 382--389. World Scientific, Singapore, 1995.
 
14
 
15
 
16
Tcl Developer Xchange, September 2004. http://www.tcl.tk/.
 
17
YAWL (yet another word list), September 2004. http://metalab.unc.edu/pub/Linux/libs/yawl-0.3.tar.gz.
 
18
A. Zramdini and R. Ingold. Optical font recognition from projection profiles. Electronic Publishing, 6(3):249--260, 1993.


Collaborative Colleagues:
Daniel Lopresti: colleagues
A. Lawrence Spitz: colleagues