ACM Home Page
Please provide us with feedback. Feedback
Light stemming approaches for the French, Portuguese, German and Hungarian languages
Full text PdfPdf (145 KB)
Source Symposium on Applied Computing archive
Proceedings of the 2006 ACM symposium on Applied computing table of contents
Dijon, France
SESSION: Information access and retrieval (IAR) table of contents
Pages: 1031 - 1035  
Year of Publication: 2006
ISBN:1-59593-108-2
Author
Jacques Savoy  University of Neuchatel, Neuchâtel, Switzerland
Sponsor
SIGAPP: ACM Special Interest Group on Applied Computing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 17,   Downloads (12 Months): 66,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1141277.1141523
What is a DOI?

ABSTRACT

This paper describes and evaluates various general stemming approaches for the French, Portuguese (Brazilian), German and Hungarian languages. Based on the CLEF test-collections, we demonstrate that light stemmers for the French, Portuguese and Hungarian languages perform well, and reasonably well for the German language. Variations in mean average precision among the different stemming approaches are also evaluated and sometimes they are found statistically significant.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Lovins, J. B. Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11(1), 1968, 22--31.
 
3
Porter, M. F. An algorithm for suffix stripping. Program, 14(3), 1980, 130--137.
4
5
 
6
Savoy, J. Stemming of French words based on grammatical category. JASIS, 44(1), 1993, 1--9.
7
 
8
Harman, D. How effective is suffixing? JASIS, 42(1), 1991, 7--15.
 
9
Di Nunzio, G. M., Ferro, N., Melucci, M., and Orio, N. Experiments to evaluate probabilistic models for automatic stemmer generation and query word translation. In Comparative Evaluation of Multilingual Information Access Systems. LNCS #3237, Springer, Berlin, 2004, 220--235.
 
10
 
11
Tomlinson, S. Lexical and algorithmic stemming compared for 9 European languages with Humminbird SearchServer#8482; at CLFF 2003. In Comparative Evaluation of Multilingual Information Access Systems. LNCS #3237, Springer-Verlag, Berlin, 2004, 286--300.
 
12
Kluck, M. The GIRT data in the evaluation of CLIR systems - from 1997 until 2003. In Comparative Evaluation of Multilingual Information Access Systems. LNCS #3237, Springer, Berlin, 2004, 376--390.
 
13
Savoy, J. Report on CLEF-2003 monolingual tracks. In Comparative Evaluation of Multilingual Information Access Systems. LNCS #3237, Springer, Berlin, 2004, 322--336.
 
14
Buckley, C., Singhal, A., Mitra, M., and Salton, G. New retrieval approaches using SMART. In Proceedings of TREC-4. Gaithersburg, MA, 1996, 25--48.
 
15
Singhal, A., Choi, J., Hindle, D., Lewis, D. D. & Pereira, F. (1999). AT&T at TREC-7. In Proceedings TREC-7, Gaithersburg, MA, 1999, 239--251.
 
16
17
 
18