ACM Home Page
Please provide us with feedback. Feedback
HIDE: heterogeneous information DE-identification
Full text PdfPdf (543 KB)
Source Extending Database Technology; Vol. 360 archive
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology table of contents
Saint Petersburg, Russia
DEMONSTRATION SESSION: Demonstrations: Demo group 1 table of contents
Pages 1116-1119  
Year of Publication: 2009
ISBN:978-1-60558-422-5
Authors
James Gardner  Emory University, Atlanta, GA
Li Xiong  Emory University, Atlanta, GA
Kanwei Li  Emory University, Atlanta, GA
James J. Lu  Emory University, Atlanta, GA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): n/a,   Downloads (12 Months): n/a,   Citation Count: 0
Additional Information:

abstract   references   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1516360.1516491
What is a DOI?

ABSTRACT

While there is an increasing need to share data that may contain personal information, such data sharing must preserve individual privacy without disclosing any identifiable information. A considerable amount of research in the data privacy community has been devoted to formalizing the notion of identifiability with many techniques for anonymization, but is focused exclusively on structured data. On the other hand, efforts on de-identifying medical text documents in the medical informatics community are highly specialized for specific document types or a subset of identifiers. In addition, they rely on simple identifier removal or grouping techniques and do not take advantage of the research developments in the data privacy community. We developed an integrated system, HIDE, for Heterogeneous Information DE-identification including structured and unstructured data utilizing existing anonymization techniques. We demonstrate a prototype of our system and show the effectiveness of our approach through a set of real data augmented with synthesized data.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
R. M. B. A. Beckwith, U. J. Balis, and F. Kuo. Development and evaluation of an open source software tool for deidentification of pathology reports. BMC Medical Informatics and Decision Making, 6(12), 2006.
 
2
 
3
 
4
 
5
 
6
D. Gupta, M. Saul, and J. Gilbertson. Evaluation of a deidentification (de-id) software engine to share pathology reports and clinical documents for research. American Journal of Clinical Pathology, 2004.
 
7
P. Jurczyk, J. J. Lu, L. Xiong, J. D. Cragan, and A. Correa. FRIL: A tool for comparative record linkage. In AMIA Annual Symposium, 2008.
 
8
 
9
R. Leaman and G. G. Banner: An executable survey of advances in biomedical named entity recognition. In Pacific Symposium on Biocomputing, 2008.
 
10
 
11
N. Li and T. Li. t-closeness: Privacy beyond k-anonymity and l-diversity. In ICDE, 2007.
 
12
 
13
A. K. McCallum. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu, 2002.
 
14
D. Nadeau and S. Sekine. A survey of named entity recognition and classification. Linguisticae Investigationes, 30(7), 2007.
15
 
16
 
17
L. Sweeney. Replacing personally-identifying information in medical records, the scrub system. Journal of the American Informatics Association, pages 333--337, 1996.
 
18
 
19
20
 
21
Q. Zhang, N. Koudas, D. Srivastava, and T. Yu. Aggregate query answering on anonymized tables. In ICDE, 2007.
Collaborative Colleagues:
James Gardner: colleagues
Li Xiong: colleagues
Kanwei Li: colleagues
James J. Lu: colleagues