ACM Home Page
Please provide us with feedback. Feedback
Releasing search queries and clicks privately
Full text PdfPdf (929 KB)
Source
International World Wide Web Conference archive
Proceedings of the 18th international conference on World wide web table of contents
Madrid, Spain
SESSION: Data mining/session: web mining table of contents
Pages 171-180  
Year of Publication: 2009
ISBN:978-1-60558-487-4
Authors
Aleksandra Korolova  Stanford University, Stanford, CA, USA
Krishnaram Kenthapadi  Microsoft Research, Mountain View, CA, USA
Nina Mishra  Microsoft Research, Mountain View, CA, USA
Alexandros Ntoulas  Microsoft Research, Mountain View, CA, USA
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 37,   Downloads (12 Months): 152,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1526709.1526733
What is a DOI?

ABSTRACT

The question of how to publish an anonymized search log was brought to the forefront by a well-intentioned, but privacy-unaware AOL search log release. Since then a series of ad-hoc techniques have been proposed in the literature, though none are known to be provably private. In this paper, we take a major step towards a solution: we show how queries, clicks and their associated perturbed counts can be published in a manner that rigorously preserves privacy. Our algorithm is decidedly simple to state, but non-trivial to analyze. On the opposite side of privacy is the question of whether the data we can safely publish is of any use. Our findings offer a glimmer of hope: we demonstrate that a non-negligible fraction of queries and clicks can indeed be safely published via a collection of experiments on a real search log. In addition, we select an application, keyword generation, and show that the keyword suggestions generated from the perturbed data resemble those generated from the original data.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
E. Adar. User 4xxxxx9: Anonymizing query logs. In Query Log Analysis: Social And Technological Challenges Workshop at WWW, 2007.
 
2
M. Arrington. AOL proudly releases massive amounts of private data. August 2006.
3
 
4
J. Bar-Ilan. Access to query logs -- an academic researcher's point of view. In Query Log Analysis: Social And Technological Challenges Workshop at WWW, 2007.
 
5
M. Barbaro and T. Zeller. A face is exposed for AOL searcher No. 4417749. New York Times, Aug 2006.
6
 
7
K. Chaudhuri and N. Mishra. When random sampling preserves privacy. In CRYPTO, volume 4117, pages 198--213, 2006.
8
 
9
C. Dwork. An ad omnia approach to defining and achieving private data analysis. In Lecture Notes in Computer Science, volume 4890, pages 1--13. Springer, 2008.
 
10
C. Dwork, K. . Kenthapadi, F. McSherry, I. Mironov, and M. Naor. Our data, ourselves: Privacy via distributed noise generation. In EUROCRYPT, volume 4004, pages 486--503, 2006.
 
11
C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, pages 265--284, 2006.
 
12
D. Fallows. Search engine users. Pew Internet and American Life Project, 2005.
13
 
14
A. Horowitz, D. Jacobson, T. McNichol, and O. Thomas. 101 dumbest moments in business, the year's biggest boors, buffoons, and blunderers. In CNN Money, 2007.
15
16
17
 
18
R. Kessler, M. Stein, and P. Berglund. Social Phobia Subtypes in the National Comorbidity Survey. Am J Psychiatry, 155(5):613--619, 1998.
19
 
20
 
21
F. McSherry and K. Talwar. Private communication. 2008.
 
22
 
23
K. Nissim. Private data analysis via output perturbation. In Privacy-Preserving Data Mining: Models and Algorithms, pages 383--414. Springer, 2008.
 
24
B. Tancer. Click: What Millions of People Are Doing Online and Why it Matters. Hyperion, 2008.
 
25
L. Xiong and E. Agichtein. Towards privacy-preserving query log publishing. In Query Log Analysis: Social And Technological Challenges Workshop in WWW, 2007.

Collaborative Colleagues:
Aleksandra Korolova: colleagues
Krishnaram Kenthapadi: colleagues
Nina Mishra: colleagues
Alexandros Ntoulas: colleagues