ACM Home Page
Please provide us with feedback. Feedback
You are what you say: privacy risks of public mentions
Full text PdfPdf (320 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Seattle, Washington, USA
SESSION: Web IR: current topics table of contents
Pages: 565 - 572  
Year of Publication: 2006
ISBN:1-59593-369-7
Authors
Dan Frankowski  University of Minnesota, Minneapolis, MN
Dan Cosley  University of Minnesota, Minneapolis, MN
Shilad Sen  University of Minnesota, Minneapolis, MN
Loren Terveen  University of Minnesota, Minneapolis, MN
John Riedl  University of Minnesota, Minneapolis, MN
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 199,   Citation Count: 5
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1148170.1148267
What is a DOI?

ABSTRACT

In today's data-rich networked world, people express many aspects of their lives online. It is common to segregate different aspects in different places: you might write opinionated rants about movies in your blog under a pseudonym while participating in a forum or web site for scholarly discussion of medical ethics under your real name. However, it may be possible to link these separate identities, because the movies, journal articles, or authors you mention are from a sparse relation space whose properties (e.g., many items related to by only a few users) allow re-identification. This re-identification violates people's intentions to separate aspects of their life and can have negative consequences; it also may allow other privacy violations, such as obtaining a stronger identifier like name and address.This paper examines this general problem in a specific setting: re-identification of users from a public web movie forum in a private movie ratings dataset. We present three major results. First, we develop algorithms that can re-identify a large proportion of public users in a sparse relation space. Second, we evaluate whether private dataset owners can protect user privacy by hiding data; we show that this requires extensive and undesirable changes to the dataset, making it impractical. Third, we evaluate two methods for users in a public forum to protect their own privacy, suppression and misdirection. Suppression doesn't work here either. However, we show that a simple misdirection strategy works well: mention a few popular items that you haven't rated.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
Berkovsky, S., Eytani, Y., Kuflik, T., and Ricci, R. 2005. Privacy-Enhanced Collaborative Filtering. In Proc. User Modeling Workshop on Privacy-Enhanced Personalization.
4
5
6
7
8
9
10
 
11
 
12
 
13
 
14
Rizvi, S., and Haritsa, J. 2002. Maintaining Privacy in Association Rule Mining. In Proc. VLDB02, pp. 682--693.
15
 
16
 
17
 
18
Taylor, H. 2003. Most People Are "Privacy Pragmatists." The Harris Poll #17. Harris Interactive (March 19, 2003).
19
20



REVIEW

"Anthony L. Clapes : Reviewer"

Identification data for most Internet users exists in numerous data sets on numerous servers in their home countries and often in foreign countries as well. Sometimes, the identifying information is explicit and provided by the user: name, address  more...

Collaborative Colleagues:
Dan Frankowski: colleagues
Dan Cosley: colleagues
Shilad Sen: colleagues
Loren Terveen: colleagues
John Riedl: colleagues