| Vanity fair: privacy in querylog bundles |
| Full text |
Pdf
(225 KB)
|
Source
|
Conference on Information and Knowledge Management
archive
Proceeding of the 17th ACM conference on Information and knowledge management
table of contents
Napa Valley, California, USA
SESSION: DB: security and privacy
table of contents
Pages 853-862
Year of Publication: 2008
ISBN:978-1-59593-991-3
|
|
Authors
|
|
Rosie Jones
|
Yahoo! Research, Sunnyvale, CA, USA
|
|
Ravi Kumar
|
Yahoo! Research, Sunnyvale, CA, USA
|
|
Bo Pang
|
Yahoo! Research, Sunnyvale, CA, USA
|
|
Andrew Tomkins
|
Yahoo! Research, Sunnyvale, CA, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 12, Downloads (12 Months): 96, Citation Count: 1
|
|
|
ABSTRACT
A recently proposed approach to address privacy concerns in storing web search querylogs is bundling logs of multiple users together. In this work we investigate privacy leaks that are possible even when querylogs from multiple users are bundled together, without any user or session identifiers. We begin by quantifying users' propensity to issue own-name vanity queries and geographically revealing queries. We show that these propensities interact badly with two forms of vulnerabilities in the bundling scheme. First, structural vulnerabilities arise due to properties of the heavy tail of the user search frequency distribution, or the distribution of locations that appear within a user's queries. These heavy tails may cause a user to appear visibly different from other users in the same bundle. Second, we demonstrate analytical vulnerabilities based on the ability to separate the queries in a bundle into threads corresponding to individual users. These vulnerabilities raise privacy issues suggesting that bundling must be handled with great care.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
E. Adar. User 4XXXXX9: Anonymizing query logs. In Query Logs Workshop at 16th WWW, 2007.
|
 |
2
|
Lars Backstrom , Cynthia Dwork , Jon Kleinberg, Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
[doi> 10.1145/1242572.1242598]
|
| |
3
|
D. Fallows. Internet search users. http://www.pewinternet.org/pdfs/PIP\_Searchengine\_users.pdf.
|
 |
4
|
Dan Frankowski , Dan Cosley , Shilad Sen , Loren Terveen , John Riedl, You are what you say: privacy risks of public mentions, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
[doi> 10.1145/1148170.1148267]
|
| |
5
|
C. Gates and T. Whalen. Private lives: User attitudes towards personal information on the web. Technical Report CS-2005-06, Dalhousie University, 2005.
|
| |
6
|
|
 |
7
|
|
 |
8
|
|
 |
9
|
Ravi Kumar , Jasmine Novak , Bo Pang , Andrew Tomkins, On anonymizing query logs via token-based hashing, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
[doi> 10.1145/1242572.1242657]
|
| |
10
|
|
| |
11
|
M. Meila. Comparing clusterings by variation of information. In 16th COLT, pages 173--187, 2003.
|
 |
12
|
|
| |
13
|
|
| |
14
|
B. Rey and P. Jhala. Mining associations from web query logs. In Proc. ECML PKDD Workshop on Web Mining, 2006.
|
 |
15
|
|
| |
16
|
C. Soghoian. The problem of anonymous vanity searches. SSRN eLibrary, 2007.
|
 |
17
|
|
| |
18
|
|
|