|
Warning: The download time has expired please click on the item to try again.
ABSTRACT
This paper considers the use of computational stylistics for performing authorship attribution of electronic messages, addressing categorization problems with as many as 20 different classes (authors). Effective stylistic characterization of text is potentially useful for a variety of tasks, as language style contains cues regarding the authorship, purpose, and mood of the text, all of which would be useful adjuncts to information retrieval or knowledge-management tasks. We focus here on the problem of determining the author of an anonymous message, based only on the message text. Several multiclass variants of the Winnow algorithm were applied to a vector representation of the message texts to learn models for discriminating different authors. We present results comparing the classification accuracy of the different approaches. The results show that stylistic models can be accurately learned to determine an author's identity.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
S. Argamon, M. Koppel, J. Fine, and A. R. Shimony. Gender, genre, and writing style in formal written texts. Text, 23(3), 2003.
|
| |
2
|
J. F. Burrows. Computers and the study of literature. In Computers and Written Texts, pages 167--204. Oxford: Blackwell, 1992.
|
| |
3
|
|
| |
4
|
|
| |
5
|
I. Dagan, Y. Karov, and D. Roth. Mistake-driven learning in text categorization. In Proc. EMNLP-97, Providence, RI.
|
| |
6
|
O. de Vel. Mining e-mail authorship In KDD-2000 Workshop on Text Mining, Boston, MA, 2000.
|
| |
7
|
R. S. Forsyth and D. I. Holmes. Feature finding for text classification. Lit. and Ling. Comp., 11(4):163--174, 1996.
|
| |
8
|
S. Har-Peled, D. Roth, and D. Zimak. Constraint classification for multiclass classification and ranking. In NIPS-15, 2002.
|
| |
9
|
D. I. Holmes. The evolution of stylometry in humanities scholarship. Lit. and Ling. Comp., 13(3):111--117, 1998.
|
| |
10
|
J. Karlgren. Stylistic Experiments for Information Retrieval. PhD thesis, SICS, 2000.
|
| |
11
|
|
| |
12
|
M. Koppel, S. Argamon, and A. R. Shimoni. Automatically categorizing written texts by author gender. Lit. and Ling. Comp., 17(4), 2003.
|
| |
13
|
R. A. J. Matthews and T. V. N. Merriam. Neural computation in stylometry I: An application to the works of Shakespeare and Fletcher. Lit. and Ling. Comp., 8:103--209, 1993.
|
| |
14
|
A. McEnery and M. Oakes. Authorship studies/textual statistics, pages 234--248. Marcel Dekker, 2000.
|
| |
15
|
|
| |
16
|
F. Mosteller and D. Wallace. Inference and Disputed Authorship: The Federalist. Addison-Wesley, Reading, Massachusetts, 1964.
|
 |
17
|
|
| |
18
|
|
| |
19
|
F. Tweedie, S. Singh, and D. Holmes. Neural network applications in stylometry: The federalist papers. Computers and the Humanities, 30(1):1--10, 1996.
|
| |
20
|
|
| |
21
|
G. U. Yule. Statistical Study of Literary Vocabulary. Cambridge U. Press, 1944.
|
CITED BY 9
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Daniel Pavelec , Edson Justino , Leonardo V. Batista , Luiz S. Oliveira, Author identification using writer-dependent and writer-independent strategies, Proceedings of the 2008 ACM symposium on Applied computing, March 16-20, 2008, Fortaleza, Ceara, Brazil
|
|
|
|
|
|
|
|
|
|
|
|
|
|