ACM Home Page
Please provide us with feedback. Feedback
Challenges in mining social network data: processes, privacy, and paradoxes
Full text MovMov (67:46),  PdfPdf (65 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
San Jose, California, USA
Pages: 4 - 5  
Year of Publication: 2007
ISBN:978-1-59593-609-7
Author
Jon M. Kleinberg  Cornell University
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 203,   Downloads (12 Months): 1276,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1281192.1281195
What is a DOI?

ABSTRACT

The profileration of rich social media, on-line communities, and collectively produced knowledge resources has accelerated the convergence of technological and social networks, producing environments that reflect both the architecture of the underlying information systems and the social structure on their members. In studying the consequences of these developments, we are faced with the opportunity to analyze social network data at unprecedented levels of scale and temporal resolution; this has led to a growing body of research at the intersection of the computing and social sciences.

We discuss some of the current challenges in the analysis of large-scale social network data, focusing on two themes in particular: the inference of social processes from data, and the problem of maintaining individual privacy in studies of social networks. While early research on this type of data focused on structural questions, recent work has extended this to consider the social processes that unfold within the networks. Particular lines of investigation have focused on processes in on-line social systems related to communication [1, 22], community formation [2, 8, 16, 23], information-seeking and collective problem-solving [20, 21, 18], marketing [12, 19, 24, 28], the spread of news [3, 17], and the dynamics of popularity [29]. There are a number of fundamental issues, however, for which we have relatively little understanding, including the extent to which the outcomes of these types of social processes are predictable from their early stages (see e.g. [29]), the differences between properties of individuals and properties of aggregate populations in these types of data, and the extent to which similar social phenomena in different domains have uniform underlying explanations.

The second theme we pursue is concerned with the problem of privacy. While much of the research on large-scale social systems has been carried out on data that is public, some of the richest emerging sources of social interaction data come from settings such as e-mail, instant messaging, or phone communication in which users have strong expectations of privacy. How can such data be made available to researchers while protecting the privacy of the individuals represented in the data? Many of the standard approaches here are variations on the principle of anonymization - the names of individuals are replaced with meaningless unique identifiers, so that the network structure is maintained while private information has been suppressed.

In recent joint work with Lars Backstrom and Cynthia Dwork, we have identified some fundamental limitations on the power of network anonymization to ensure privacy [7]. In particular, we describe a family of attacks such that even from a single anonymized copy of a social network, it is possible for an adversary to learn whether edges exist or not between specific targeted pairs of nodes. The attacks are based on the uniqueness of small random subgraphs embedded in an arbitrary network, using ideas related to those found in arguments from Ramsey theory [6, 14]. Combined with other recent examples of privacy breaches in data containing rich textual or time-series information [9, 26, 27, 30], these results suggest that anonymization contains pitfalls even in very simple settings. In this way, our approach can be seen as a step toward understanding how techniques of privacy-preserving data mining (see e.g. [4, 5, 10, 11, 13, 15, 25] and the references therein) can inform how we think about the protection of eventhe most skeletal social network data.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Lada A. Adamic and Eytan Adar. How to search a social network. Social Networks, 27(3):187--203, 2005.
 
2
Lada A. Adamic, Orkut Buyukkokten, and Eytan Adar. A social network caught in the web. First Monday, 8(6), 2003.
 
3
Eytan Adar, Li Zhang, Lada A. Adamic, and Rajan M. Lukose. Implicit structure and the dynamics of blogspace. In Workshop on the Weblogging Ecosystem, 2004.
4
5
 
6
Noga Alon and Joel Spencer. The Probabilistic Method. John Wiley & Sons, second edition, 2000.
7
8
 
9
Michael Barbaro and Tom Zeller Jr. A face is exposed for aol searcher no. 4417749. New York Times, 9 August 2006.
10
11
12
 
13
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Proc. 3rd International Conference on Very Large Data Bases, pages 265--284, 2006.
 
14
Paul Erdös. Some remarks on the theory of graphs. Bulletin of the AMS, 53:292--294, 1947.
15
 
16
Scott A. Golder, Dennis Wilkinson, and Bernardo A. Huberman. Rhythms of social interaction: Messaging within a massive online network. In Proc. 3rd International Conference on Communities and Technologies, 2007.
17
 
18
Michael Kearns, Siddharth Suri, and Nick Monfort. An experimental study of the coloring problem on human subject networks. Science, 313(5788):824--827, 2006.
19
 
20
Jon Kleinberg. Complex networks and decentralized search algorithms. In Proc. International Congress of Mathematicians, 2006.
 
21
 
22
Gueorgi Kossinets and Duncan Watts. Empirical analysis of an evolving social network. Science, 311:88--90, 2006.
23
24
25
 
26
Arvind Narayanan and Vitaly Shmatikov. How to break anonymity of the netflix prize dataset, October 2006. arxiv cs/0610105.
27
28
 
29
Matthew Salganik, Peter Dodds, and Duncan Watts. Experimental study of inequality and unpredictability in an artificial cultural market. Science, 311:854--856, 2006.
 
30
Latanya Sweeney. Weaving technology and policy together to maintain confidentiality. J. Law Med. Ethics, 25, 1997.