| Detecting privacy leaks using corpus-based association rules |
| Full text |
Pdf
(300 KB)
|
Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Las Vegas, Nevada, USA
SESSION: Industrial papers
table of contents
Pages 893-901
Year of Publication: 2008
ISBN:978-1-60558-193-4
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 20, Downloads (12 Months): 236, Citation Count: 2
|
|
|
ABSTRACT
Detecting inferences in documents is critical for ensuring privacy when sharing information. In this paper, we propose a refined and practical model of inference detection using a reference corpus. Our model is inspired by association rule mining: inferences are based on word co-occurrences. Using the model and taking the Web as the reference corpus, we can find inferences and measure their strength through web-mining algorithms that leverage search engines such as Google or Yahoo!. Our model also includes the important case of private corpora, to model inference detection in enterprise settings in which there is a large private document repository. We find inferences in private corpora by using analogues of our Web-mining algorithms, relying on an index for the corpus rather than a Web search engine. We present results from two experiments. The first experiment demonstrates the performance of our techniques in identifying all the keywords that allow for inference of a particular topic (e.g. "HIV") with confidence above a certain threshold. The second experiment uses the public Enron e-mail dataset. We postulate a sensitive topic and use the Enron corpus and the Web together to find inferences for the topic. These experiments demonstrate that our techniques are practical, and that our model of inference based on word co-occurrence is well-suited to efficient inference detection.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Rakesh Agrawal , Heikki Mannila , Ramakrishnan Srikant , Hannu Toivonen , A. Inkeri Verkamo, Fast discovery of association rules, Advances in knowledge discovery and data mining, American Association for Artificial Intelligence, Menlo Park, CA, 1996
|
| |
2
|
|
| |
3
|
M. Ahlers. Blueprints for terrorists? On the Web at http://www.cnn.com/2004/US/10/19/terror.nrc/index.html.
|
| |
4
|
Apache Lucene project. On the Web at http://lucene.apache.org/.
|
| |
5
|
Margherita Berardi , Michele Lapi , Pietro Leo , Corrado Loglisci, Mining generalized association rules on biomedical literature, Proceedings of the 18th international conference on Innovations in Applied Artificial Intelligence, p.500-509, June 22-24, 2005, Bari, Italy
[doi> 10.1007/11504894_68]
|
| |
6
|
W. Broad. U. S. web archive is said to reveal a nuclear primer. On the Web at http://www.nytimes.com/2006/11/03/world/middleeast/03documents.html.
|
 |
7
|
|
 |
8
|
Mike Dowman , Valentin Tablan , Hamish Cunningham , Borislav Popov, Web-assisted annotation, semantic indexing and search of television and radio news, Proceedings of the 14th international conference on World Wide Web, May 10-14, 2005, Chiba, Japan
[doi> 10.1145/1060745.1060781]
|
| |
9
|
Enron corpus. On the Web at http://www.cs.cmu.edu/~enron/.
|
 |
10
|
|
 |
11
|
|
| |
12
|
Health Privacy Project. On the Web at http://www.healthprivacy.org/.
|
| |
13
|
Inboxer. On the Web at http://www.inboxer.com/.
|
| |
14
|
Inboxer's Enron demonstration site. On the Web at http://www.enronemail.com/.
|
| |
15
|
|
| |
16
|
D. P. Lopresti and A. L. Spitz. Information leakage through document redaction: attacks and countermeasures. In DRR, pages 183--190, 2005.
|
| |
17
|
|
| |
18
|
|
 |
19
|
Lisa Singh , Peter Scheuermann , Bin Chen, Generating association rules from semi-structured documents using an extended concept hierarchy, Proceedings of the sixth international conference on Information and knowledge management, p.193-200, November 10-14, 1997, Las Vegas, Nevada, United States
[doi> 10.1145/266714.266895]
|
| |
20
|
|
| |
21
|
L. Sweeney. AI technologies to defeat identity theft vulnerabilities. In AAAI Spring Symposium on AI TEchnologies for Homeland Security, 2005.
|
| |
22
|
N. Terry and L. Francis. Ensuring the privacy and confidentiality of electronic health records. Illinois Law Review, 2007(2).
|
| |
23
|
|
| |
24
|
|
| |
25
|
Yahoo! Web Search API. On the Web at http://developer.yahoo.com/search/web/.
|
|