|
ABSTRACT
In this work we discuss author identification for documents written in Portuguese. Two different approaches were compared. The first is the writer-independent model which reduces the pattern recognition problem to a single model and two classes, hence, makes it possible to build robust system even when few genuine samples per writer are available. The second is the personal model, which very often performs better but needs a bigger number of samples per writer. We also introduce a stylometric feature set based on the conjunctions and adverbs of the Portuguese language. Experiments on a database composed of short articles from 30 different authors and Support Vector Machine (SVM) as classifier demonstrate that the proposed strategy can produced results comparable to the literature.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
S. Argamon, M. Koppel, J. Fine, and A. R. Shimony. Gender, genre, and writing style in formal written texts. Text, 23(3), 2003.
|
 |
2
|
|
| |
3
|
H. Baayen, H. van Halteren, and F. Tweedie. Outside the cave of shadows: Using syntactic annotation to enhance authorship attribution. Literary and Linguistic Computing, 11(3):121--131, 1666.
|
| |
4
|
C. Chaski. A daubert-inspired assessment of current techniques for language-based author identification. Technical Report 1098, ILE Technical Report, 1998.
|
| |
5
|
C. E. Chaski. Who is at the keyboard. authorship attribution in digital evidence investigations. International Journal of Digital Evidence, 4(1), 2005.
|
| |
6
|
B. C. Coutinho, L. M. Macedo, A. Rique-JR, and L. V. Batista. Atribuio de autoria usando PPM. In XXV Congress of the SBC, pages 2208--2217, 2004.
|
| |
7
|
R. S. Forsyth and D. I. Holmes. Feature finding for text classfication. Literary and Linguistic Computing, 11(4):163--174, 1996.
|
| |
8
|
M. Koppel and J. Schler. Exploiting stylistic idiosyncrasies for authorship attribution. In Workshop on Computational Approaches to Style Analysis and Synthesis, 2003.
|
| |
9
|
D. Madigan, A. Genkin, D. D. Lewis, S. Argamon, D. Fradkin, and L. Ye. Author identification on the large scale. In Joint Annual Meeting of the Interface and the Classification Society of North America (CSNA), 2005.
|
| |
10
|
C. Mascol. Curves of pauline and pseudo-pauline style i. Unitarian Review, 30:453--460, 1888.
|
| |
11
|
T. Mendenhall. The characteristic curves of composition. Science, 214:237--249, 1887.
|
| |
12
|
A. Morton. Literary Detection. Charles Scribners Sons, 1978.
|
| |
13
|
F. Mosteller and D. L. Wallace. Inference and disputed authorship: The federalist. In Series in behavioral science: Quantitative methods edition. Addison-Wesley, 1964.
|
| |
14
|
|
| |
15
|
J. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In A. Smola et al, editor, Advances in Large Margin Classifiers, pages 61--74. MIT Press, 1999.
|
| |
16
|
F. Smadja. Lexical co-occurrence: The missing link. Journal of the Association for Literary and Linguistic Computing, 4(3), 1989.
|
| |
17
|
G. Tambouratzis, S. Markantonatou, N. Hairetakis, M. Vassiliou, G. Carayannis, and D. Tambouratzis. Discriminating the registers and styles in the modern greek language -- part 2: Extending the feature vector to optimize author discrimination. Literary and Linguistic Computing, 19(2):221--242, 2004.
|
| |
18
|
|
|