| Releasing search queries and clicks privately |
| Full text |
Pdf
(929 KB)
|
Source
|
International World Wide Web Conference
archive
Proceedings of the 18th international conference on World wide web
table of contents
Madrid, Spain
SESSION: Data mining/session: web mining
table of contents
Pages 171-180
Year of Publication: 2009
ISBN:978-1-60558-487-4
|
|
Authors
|
|
Aleksandra Korolova
|
Stanford University, Stanford, CA, USA
|
|
Krishnaram Kenthapadi
|
Microsoft Research, Mountain View, CA, USA
|
|
Nina Mishra
|
Microsoft Research, Mountain View, CA, USA
|
|
Alexandros Ntoulas
|
Microsoft Research, Mountain View, CA, USA
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 37, Downloads (12 Months): 152, Citation Count: 0
|
|
|
ABSTRACT
The question of how to publish an anonymized search log was brought to the forefront by a well-intentioned, but privacy-unaware AOL search log release. Since then a series of ad-hoc techniques have been proposed in the literature, though none are known to be provably private. In this paper, we take a major step towards a solution: we show how queries, clicks and their associated perturbed counts can be published in a manner that rigorously preserves privacy. Our algorithm is decidedly simple to state, but non-trivial to analyze. On the opposite side of privacy is the question of whether the data we can safely publish is of any use. Our findings offer a glimmer of hope: we demonstrate that a non-negligible fraction of queries and clicks can indeed be safely published via a collection of experiments on a real search log. In addition, we select an application, keyword generation, and show that the keyword suggestions generated from the perturbed data resemble those generated from the original data.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
E. Adar. User 4xxxxx9: Anonymizing query logs. In Query Log Analysis: Social And Technological Challenges Workshop at WWW, 2007.
|
| |
2
|
M. Arrington. AOL proudly releases massive amounts of private data. August 2006.
|
 |
3
|
|
| |
4
|
J. Bar-Ilan. Access to query logs -- an academic researcher's point of view. In Query Log Analysis: Social And Technological Challenges Workshop at WWW, 2007.
|
| |
5
|
M. Barbaro and T. Zeller. A face is exposed for AOL searcher No. 4417749. New York Times, Aug 2006.
|
 |
6
|
|
| |
7
|
K. Chaudhuri and N. Mishra. When random sampling preserves privacy. In CRYPTO, volume 4117, pages 198--213, 2006.
|
 |
8
|
|
| |
9
|
C. Dwork. An ad omnia approach to defining and achieving private data analysis. In Lecture Notes in Computer Science, volume 4890, pages 1--13. Springer, 2008.
|
| |
10
|
C. Dwork, K. . Kenthapadi, F. McSherry, I. Mironov, and M. Naor. Our data, ourselves: Privacy via distributed noise generation. In EUROCRYPT, volume 4004, pages 486--503, 2006.
|
| |
11
|
C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, pages 265--284, 2006.
|
| |
12
|
D. Fallows. Search engine users. Pew Internet and American Life Project, 2005.
|
 |
13
|
|
| |
14
|
A. Horowitz, D. Jacobson, T. McNichol, and O. Thomas. 101 dumbest moments in business, the year's biggest boors, buffoons, and blunderers. In CNN Money, 2007.
|
 |
15
|
Thorsten Joachims , Laura Granka , Bing Pan , Helene Hembrooke , Geri Gay, Accurately interpreting clickthrough data as implicit feedback, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
[doi> 10.1145/1076034.1076063]
|
 |
16
|
|
 |
17
|
Rosie Jones , Ravi Kumar , Bo Pang , Andrew Tomkins, Vanity fair: privacy in querylog bundles, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
[doi> 10.1145/1458082.1458195]
|
| |
18
|
R. Kessler, M. Stein, and P. Berglund. Social Phobia Subtypes in the National Comorbidity Survey. Am J Psychiatry, 155(5):613--619, 1998.
|
 |
19
|
Ravi Kumar , Jasmine Novak , Bo Pang , Andrew Tomkins, On anonymizing query logs via token-based hashing, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
[doi> 10.1145/1242572.1242657]
|
| |
20
|
|
| |
21
|
F. McSherry and K. Talwar. Private communication. 2008.
|
| |
22
|
|
| |
23
|
K. Nissim. Private data analysis via output perturbation. In Privacy-Preserving Data Mining: Models and Algorithms, pages 383--414. Springer, 2008.
|
| |
24
|
B. Tancer. Click: What Millions of People Are Doing Online and Why it Matters. Hyperion, 2008.
|
| |
25
|
L. Xiong and E. Agichtein. Towards privacy-preserving query log publishing. In Query Log Analysis: Social And Technological Challenges Workshop in WWW, 2007.
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.2
DATABASE MANAGEMENT
H.2.0
General
Subjects:
Security, integrity, and protection**
Additional Classification:
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
General Terms:
Algorithms,
Experimentation,
Human Factors,
Legal Aspects,
Measurement,
Performance,
Security,
Theory
Keywords:
data release,
differential privacy,
query click graph,
search logs
|