|
ABSTRACT
This paper describes and evaluates various general stemming approaches for the French, Portuguese (Brazilian), German and Hungarian languages. Based on the CLEF test-collections, we demonstrate that light stemmers for the French, Portuguese and Hungarian languages perform well, and reasonably well for the German language. Variations in mean average precision among the different stemming approaches are also evaluated and sometimes they are found statistically significant.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Lovins, J. B. Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11(1), 1968, 22--31.
|
| |
3
|
Porter, M. F. An algorithm for suffix stripping. Program, 14(3), 1980, 130--137.
|
 |
4
|
|
 |
5
|
|
| |
6
|
Savoy, J. Stemming of French words based on grammatical category. JASIS, 44(1), 1993, 1--9.
|
 |
7
|
Tuomo Korenius , Jorma Laurikkala , Kalervo Järvelin , Martti Juhola, Stemming and lemmatization in the clustering of finnish text documents, Proceedings of the thirteenth ACM international conference on Information and knowledge management, November 08-13, 2004, Washington, D.C., USA
[doi> 10.1145/1031171.1031285]
|
| |
8
|
Harman, D. How effective is suffixing? JASIS, 42(1), 1991, 7--15.
|
| |
9
|
Di Nunzio, G. M., Ferro, N., Melucci, M., and Orio, N. Experiments to evaluate probabilistic models for automatic stemmer generation and query word translation. In Comparative Evaluation of Multilingual Information Access Systems. LNCS #3237, Springer, Berlin, 2004, 220--235.
|
| |
10
|
|
| |
11
|
Tomlinson, S. Lexical and algorithmic stemming compared for 9 European languages with Humminbird SearchServer#8482; at CLFF 2003. In Comparative Evaluation of Multilingual Information Access Systems. LNCS #3237, Springer-Verlag, Berlin, 2004, 286--300.
|
| |
12
|
Kluck, M. The GIRT data in the evaluation of CLIR systems - from 1997 until 2003. In Comparative Evaluation of Multilingual Information Access Systems. LNCS #3237, Springer, Berlin, 2004, 376--390.
|
| |
13
|
Savoy, J. Report on CLEF-2003 monolingual tracks. In Comparative Evaluation of Multilingual Information Access Systems. LNCS #3237, Springer, Berlin, 2004, 322--336.
|
| |
14
|
Buckley, C., Singhal, A., Mitra, M., and Salton, G. New retrieval approaches using SMART. In Proceedings of TREC-4. Gaithersburg, MA, 1996, 25--48.
|
| |
15
|
Singhal, A., Choi, J., Hindle, D., Lewis, D. D. & Pereira, F. (1999). AT&T at TREC-7. In Proceedings TREC-7, Gaithersburg, MA, 1999, 239--251.
|
| |
16
|
|
 |
17
|
|
| |
18
|
|
|