ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Author identification using writer-dependent and writer-independent strategies
Full text PdfPdf (145 KB)
Source Symposium on Applied Computing archive
Proceedings of the 2008 ACM symposium on Applied computing table of contents
Fortaleza, Ceara, Brazil
SESSION: Document engineering table of contents
Pages: 414-418  
Year of Publication: 2008
ISBN:978-1-59593-753-7
Authors
Daniel Pavelec  Pontifícia Universidade Católica do Paraná (PUCPR)
Edson Justino  Pontifícia Universidade Católica do Paraná (PUCPR)
Leonardo V. Batista  Federal University of Paraíba (UFPB)
Luiz S. Oliveira  Pontifícia Universidade Católica do Paraná (PUCPR)
Sponsor
SIGAPP: ACM Special Interest Group on Applied Computing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 60,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1363686.1363788
What is a DOI?

ABSTRACT

In this work we discuss author identification for documents written in Portuguese. Two different approaches were compared. The first is the writer-independent model which reduces the pattern recognition problem to a single model and two classes, hence, makes it possible to build robust system even when few genuine samples per writer are available. The second is the personal model, which very often performs better but needs a bigger number of samples per writer. We also introduce a stylometric feature set based on the conjunctions and adverbs of the Portuguese language. Experiments on a database composed of short articles from 30 different authors and Support Vector Machine (SVM) as classifier demonstrate that the proposed strategy can produced results comparable to the literature.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
S. Argamon, M. Koppel, J. Fine, and A. R. Shimony. Gender, genre, and writing style in formal written texts. Text, 23(3), 2003.
2
 
3
H. Baayen, H. van Halteren, and F. Tweedie. Outside the cave of shadows: Using syntactic annotation to enhance authorship attribution. Literary and Linguistic Computing, 11(3):121--131, 1666.
 
4
C. Chaski. A daubert-inspired assessment of current techniques for language-based author identification. Technical Report 1098, ILE Technical Report, 1998.
 
5
C. E. Chaski. Who is at the keyboard. authorship attribution in digital evidence investigations. International Journal of Digital Evidence, 4(1), 2005.
 
6
B. C. Coutinho, L. M. Macedo, A. Rique-JR, and L. V. Batista. Atribuio de autoria usando PPM. In XXV Congress of the SBC, pages 2208--2217, 2004.
 
7
R. S. Forsyth and D. I. Holmes. Feature finding for text classfication. Literary and Linguistic Computing, 11(4):163--174, 1996.
 
8
M. Koppel and J. Schler. Exploiting stylistic idiosyncrasies for authorship attribution. In Workshop on Computational Approaches to Style Analysis and Synthesis, 2003.
 
9
D. Madigan, A. Genkin, D. D. Lewis, S. Argamon, D. Fradkin, and L. Ye. Author identification on the large scale. In Joint Annual Meeting of the Interface and the Classification Society of North America (CSNA), 2005.
 
10
C. Mascol. Curves of pauline and pseudo-pauline style i. Unitarian Review, 30:453--460, 1888.
 
11
T. Mendenhall. The characteristic curves of composition. Science, 214:237--249, 1887.
 
12
A. Morton. Literary Detection. Charles Scribners Sons, 1978.
 
13
F. Mosteller and D. L. Wallace. Inference and disputed authorship: The federalist. In Series in behavioral science: Quantitative methods edition. Addison-Wesley, 1964.
 
14
 
15
J. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In A. Smola et al, editor, Advances in Large Margin Classifiers, pages 61--74. MIT Press, 1999.
 
16
F. Smadja. Lexical co-occurrence: The missing link. Journal of the Association for Literary and Linguistic Computing, 4(3), 1989.
 
17
G. Tambouratzis, S. Markantonatou, N. Hairetakis, M. Vassiliou, G. Carayannis, and D. Tambouratzis. Discriminating the registers and styles in the modern greek language -- part 2: Extending the feature vector to optimize author discrimination. Literary and Linguistic Computing, 19(2):221--242, 2004.
 
18

Collaborative Colleagues:
Daniel Pavelec: colleagues
Edson Justino: colleagues
Leonardo V. Batista: colleagues
Luiz S. Oliveira: colleagues