ACM Home Page
Please provide us with feedback. Feedback
Vanity fair: privacy in querylog bundles
Full text PdfPdf (225 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 17th ACM conference on Information and knowledge management table of contents
Napa Valley, California, USA
SESSION: DB: security and privacy table of contents
Pages 853-862  
Year of Publication: 2008
ISBN:978-1-59593-991-3
Authors
Rosie Jones  Yahoo! Research, Sunnyvale, CA, USA
Ravi Kumar  Yahoo! Research, Sunnyvale, CA, USA
Bo Pang  Yahoo! Research, Sunnyvale, CA, USA
Andrew Tomkins  Yahoo! Research, Sunnyvale, CA, USA
Sponsors
ACM: Association for Computing Machinery
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 96,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458082.1458195
What is a DOI?

ABSTRACT

A recently proposed approach to address privacy concerns in storing web search querylogs is bundling logs of multiple users together. In this work we investigate privacy leaks that are possible even when querylogs from multiple users are bundled together, without any user or session identifiers. We begin by quantifying users' propensity to issue own-name vanity queries and geographically revealing queries. We show that these propensities interact badly with two forms of vulnerabilities in the bundling scheme. First, structural vulnerabilities arise due to properties of the heavy tail of the user search frequency distribution, or the distribution of locations that appear within a user's queries. These heavy tails may cause a user to appear visibly different from other users in the same bundle. Second, we demonstrate analytical vulnerabilities based on the ability to separate the queries in a bundle into threads corresponding to individual users. These vulnerabilities raise privacy issues suggesting that bundling must be handled with great care.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
E. Adar. User 4XXXXX9: Anonymizing query logs. In Query Logs Workshop at 16th WWW, 2007.
2
 
3
D. Fallows. Internet search users. http://www.pewinternet.org/pdfs/PIP\_Searchengine\_users.pdf.
4
 
5
C. Gates and T. Whalen. Private lives: User attitudes towards personal information on the web. Technical Report CS-2005-06, Dalhousie University, 2005.
 
6
7
8
9
 
10
 
11
M. Meila. Comparing clusterings by variation of information. In 16th COLT, pages 173--187, 2003.
12
 
13
 
14
B. Rey and P. Jhala. Mining associations from web query logs. In Proc. ECML PKDD Workshop on Web Mining, 2006.
15
 
16
C. Soghoian. The problem of anonymous vanity searches. SSRN eLibrary, 2007.
17
 
18


Collaborative Colleagues:
Rosie Jones: colleagues
Ravi Kumar: colleagues
Bo Pang: colleagues
Andrew Tomkins: colleagues