|
ABSTRACT
It is increasingly common for users to interact with the web using a number of different aliases. This trend is a double-edged sword. On one hand, it is a fundamental building block in approaches to online privacy. On the other hand, there are economic and social consequences to allowing each user an arbitrary number of free aliases. Thus, there is great interest in understanding the fundamental issues in obscuring the identities behind aliases.However, most work in the area has focused on linking aliases through analysis of lower-level properties of interactions such as network routes. We show that aliases that actively post text on the web can be linked together through analysis of that text. We study a large number of users posting on bulletin boards, and develop algorithms to anti-alias those users: we can with a high degree of success identify when two aliases belong to the same individual.Our results show that such techniques are surprisingly effective, leading us to conclude that guaranteeing privacy among aliases that post actively requires mechanisms that do not yet exist.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
S. Argamon, M. Koppel, and G. Avneri. Routing documents according to style. In Proceedings of First International Workshop on Innovative Information Systems, 1998.
|
| |
2
|
B. Brainerd. The computer in statistical studies of William Shakespeare. Computer Studies in the Humanities and Verbal Behavior, 4(1), 1973.
|
 |
3
|
|
 |
4
|
|
 |
5
|
|
| |
6
|
Joachim Diederich, Jörg Kindermann, Edda Leopold, and Gerhard Paass. Authorship attribution with support vector machines.
|
| |
7
|
|
| |
8
|
E. Friedman and P. Resnick. The social cost of cheap pseudonyms. Journal of Economics and Management Strategy, 1(2):173--199, 2001.
|
 |
9
|
|
| |
10
|
I. Krsul and E. H. Spafford. Authorship analysis: Identifying the author of a program. In Proc. 18th NIST - NCSC National Information Systems Security Conference, pages 514--524, 1995.
|
| |
11
|
|
| |
12
|
M. Meila. Comparing clusterings. Technical Report 418, UW Statistics Department, 2002.
|
| |
13
|
F. Mosteller and D. Wallace. Inference and Disputed Authorship: The Federalist. Addison-Wesley, 1964.
|
| |
14
|
Josyula R. Rao and Pankaj Rohatgi. Can pseudonymity really guarantee privacy? In Proceedings of the Ninth USENIX Security Symposium, pages 85--96. USENIX, August 2000.
|
| |
15
|
Edie Rasmussen. Clustering Algorithms, chapter 16. Prentice Hall, 1992.
|
| |
16
|
M. Reed and P. Syverson. Onion routing. In Proceedings of AIPA, 1999.
|
 |
17
|
|
| |
18
|
Zero Knowledge Systems, 2000.
|
| |
19
|
Yuta Tsuboi and Yuji Matsumoto. Authorship identification for heterogeneous documents. Master's thesis, Nara Institute of Science and Technology, 2002.
|
| |
20
|
|
| |
21
|
|
| |
22
|
C. Williams. Mendenhall's studies of word-length distribution in the works of Shakespeare and Bacon. Biometrika, 62:207--212, 1975.
|
| |
23
|
G. U. Yule. Statistical Study of Literary Vocabulary. Cambridge University Press, 1944.
|
| |
24
|
G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, 1949.
|
CITED BY 11
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rosie Jones , Ravi Kumar , Bo Pang , Andrew Tomkins, Vanity fair: privacy in querylog bundles, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
Dan Frankowski , Dan Cosley , Shilad Sen , Loren Terveen , John Riedl, You are what you say: privacy risks of public mentions, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
|
|
|
Lars Backstrom , Cynthia Dwork , Jon Kleinberg, Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
|
|
Ravi Kumar , Jasmine Novak , Bo Pang , Andrew Tomkins, On anonymizing query logs via token-based hashing, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
|
|
Jon M. Kleinberg, Challenges in mining social network data: processes, privacy, and paradoxes, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, p.4-5, August 12-15, 2007, San Jose, California, USA
|
|
|
|
|
|
|
|