ACM Home Page
Please provide us with feedback. Feedback
Style mining of electronic messages for multiple authorship discrimination: first results
Full text PdfPdf (156 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Washington, D.C.
POSTER SESSION: Research track table of contents
Pages: 475 - 480  
Year of Publication: 2003
ISBN:1-58113-737-0
Authors
Shlomo Argamon  Illinois Institute of Technology, Chicago, IL
Marin Šarić  Illinois Institute of Technology, Chicago, IL
Sterling S. Stein  Illinois Institute of Technology, Chicago, IL
Sponsors
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 78,   Citation Count: 9
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/956750.956805
What is a DOI?

Warning: The download time has expired please click on the item to try again.


ABSTRACT

This paper considers the use of computational stylistics for performing authorship attribution of electronic messages, addressing categorization problems with as many as 20 different classes (authors). Effective stylistic characterization of text is potentially useful for a variety of tasks, as language style contains cues regarding the authorship, purpose, and mood of the text, all of which would be useful adjuncts to information retrieval or knowledge-management tasks. We focus here on the problem of determining the author of an anonymous message, based only on the message text. Several multiclass variants of the Winnow algorithm were applied to a vector representation of the message texts to learn models for discriminating different authors. We present results comparing the classification accuracy of the different approaches. The results show that stylistic models can be accurately learned to determine an author's identity.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
S. Argamon, M. Koppel, J. Fine, and A. R. Shimony. Gender, genre, and writing style in formal written texts. Text, 23(3), 2003.
 
2
J. F. Burrows. Computers and the study of literature. In Computers and Written Texts, pages 167--204. Oxford: Blackwell, 1992.
 
3
 
4
 
5
I. Dagan, Y. Karov, and D. Roth. Mistake-driven learning in text categorization. In Proc. EMNLP-97, Providence, RI.
 
6
O. de Vel. Mining e-mail authorship In KDD-2000 Workshop on Text Mining, Boston, MA, 2000.
 
7
R. S. Forsyth and D. I. Holmes. Feature finding for text classification. Lit. and Ling. Comp., 11(4):163--174, 1996.
 
8
S. Har-Peled, D. Roth, and D. Zimak. Constraint classification for multiclass classification and ranking. In NIPS-15, 2002.
 
9
D. I. Holmes. The evolution of stylometry in humanities scholarship. Lit. and Ling. Comp., 13(3):111--117, 1998.
 
10
J. Karlgren. Stylistic Experiments for Information Retrieval. PhD thesis, SICS, 2000.
 
11
 
12
M. Koppel, S. Argamon, and A. R. Shimoni. Automatically categorizing written texts by author gender. Lit. and Ling. Comp., 17(4), 2003.
 
13
R. A. J. Matthews and T. V. N. Merriam. Neural computation in stylometry I: An application to the works of Shakespeare and Fletcher. Lit. and Ling. Comp., 8:103--209, 1993.
 
14
A. McEnery and M. Oakes. Authorship studies/textual statistics, pages 234--248. Marcel Dekker, 2000.
 
15
 
16
F. Mosteller and D. Wallace. Inference and Disputed Authorship: The Federalist. Addison-Wesley, Reading, Massachusetts, 1964.
17
 
18
 
19
F. Tweedie, S. Singh, and D. Holmes. Neural network applications in stylometry: The federalist papers. Computers and the Humanities, 30(1):1--10, 1996.
 
20
 
21
G. U. Yule. Statistical Study of Literary Vocabulary. Cambridge U. Press, 1944.

CITED BY  9

Collaborative Colleagues:
Shlomo Argamon: colleagues
Marin Šarić: colleagues
Sterling S. Stein: colleagues