| "I know what you did last summer": query logs and user privacy |
| Full text |
Pdf
(204 KB)
|
Source
|
Conference on Information and Knowledge Management
archive
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
table of contents
Lisbon, Portugal
POSTER SESSION: Poster session
table of contents
Pages: 909-914
Year of Publication: 2007
ISBN:978-1-59593-803-9
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 8, Downloads (12 Months): 118, Citation Count: 5
|
|
|
ABSTRACT
We investigate the subtle cues to user identity that may be exploited in attacks on the privacy of users in web search query logs. We study the application of simple classifiers to map a sequence of queries into the gender, age, and location of the user issuing the queries. We then show how these classifiers may be carefully combined at multiple granularities to map a sequence of queries into a set of candidate users that is 300-600 times smaller than random chance would allow. We show that this approach remains accurate even after removing personally identifiable information such as names/numbers or limiting the size of the query log. We also present a new attack in which a real-world acquaintance of a user attempts to identify that user in a large query log, using personal information. We show that combinations of small pieces of information about terms a user would probably search for can be highly effective in identifying the sessions of that user. We conclude that known schemes to release even heavily scrubbed query logs that contain session information have significant privacy risks.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
E. Adar. User 4XXXXX9: Anonymizing query logs. In Query Logs Workshop at the 16th WWW, 2007.
|
| |
2
|
S. Argamon, M. Koppel, and G. Avneri. Routing documents according to style. In Proc. 1st Workshop on Innovative Information Systems, 1998.
|
| |
3
|
S. Argamon, M. Koppel, J. Fine, and A. R. Shimoni. Gender, genre, and writing style in formal written texts. Text, 23(3):321--346, 2003.
|
 |
4
|
Lars Backstrom , Cynthia Dwork , Jon Kleinberg, Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
[doi> 10.1145/1242572.1242598]
|
 |
5
|
Dan Frankowski , Dan Cosley , Shilad Sen , Loren Terveen , John Riedl, You are what you say: privacy risks of public mentions, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
[doi> 10.1145/1148170.1148267]
|
 |
6
|
|
 |
7
|
Jian Hu , Hua-Jun Zeng , Hua Li , Cheng Niu , Zheng Chen, Demographic prediction based on user's browsing behavior, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
[doi> 10.1145/1242572.1242594]
|
| |
8
|
|
| |
9
|
R. Jones, W. V. Zhang, P. Jhala, and B. Rey. Geographic intention and modification in web search. International Journal of Geographical Information Science, 2007.
|
 |
10
|
Ravi Kumar , Jasmine Novak , Bo Pang , Andrew Tomkins, On anonymizing query logs via token-based hashing, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
[doi> 10.1145/1242572.1242657]
|
| |
11
|
F. Mosteller and D. Wallace. Inference and Disputed Authorship: The Federalist Papers. Addison-Wesley, 1964.
|
 |
12
|
|
 |
13
|
|
| |
14
|
C. Silverstein, M. Henzinger, H. Marais, and M. Moricz. Analysis of a very large altavista query log. Technical Report 1998--014, Digital SRC, 1998.
|
| |
15
|
|
CITED BY 6
|
|
|
|
|
Rosie Jones , Ravi Kumar , Bo Pang , Andrew Tomkins, Vanity fair: privacy in querylog bundles, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|