ACM Home Page
Please provide us with feedback. Feedback
A novel method for stemmer generation based on hidden markov models
Full text PdfPdf (208 KB)
Source Conference on Information and Knowledge Management archive
Proceedings of the twelfth international conference on Information and knowledge management table of contents
New Orleans, LA, USA
SESSION: Information retrieval session 3: cross language retrieval table of contents
Pages: 131 - 138  
Year of Publication: 2003
ISBN:1-58113-723-0
Authors
Massimo Melucci  University of Padova, Padova, Italy
Nicola Orio  University of Padova, Padova, Italy
Sponsors
ACM: Association for Computing Machinery
SIGMIS: ACM Special Interest Group on Management Information Systems
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): n/a,   Downloads (12 Months): n/a,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/956863.956889
What is a DOI?

ABSTRACT

In this paper, we present a method based on Hidden Markov Models (HMMs) to generate statistical stemmers. Using a list of words as training set, the method estimates the HMM parameters which are used to calculate the most probable stem for an arbitrary word. Stemming is performed by computing the most probable path, through the HMM states, corresponding to the input word. Linguistic knowledge or a training set of manually stemmed words are not required. We describe the method and the results of the experiments carried out using standard test collections for five different languages.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
J. Allan, editor. Challenges in Information Retrieval and Language Modeling, Report of a Workshop held at the Center for Intelligent Information Retrieval on September, 2002, University of Massachusetts, Amherst, MA, Published on April, 24th 2003 at http://ciir.cs.umass.edu/irchallenges/.
 
2
 
3
M. Braschler and B. Ripplinger. Stemming and decompounding for German text retrieval. In Proceedings of the European Conference on Information Retrieval Research (ECIR), pages 177--192, Pisa, Italy, 2003.
 
4
G. Di Nunzio, N. Ferro, M. Melucci, and N. Orio The University of Padova at CLEF 2003: Experiments to evaluate probabilistic models for automatic stemmer generation and query word translation Working Notes of the Cross-Language Evaluation Forum Workshop, in press.
 
5
 
6
 
7
D. Harman. How effective is suffixing. Journal of the American Society for Information Science, 42(1):7--15, 1991.
 
8
9
10
 
11
J. Lovins. Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11:22--31, 1968.
 
12
Lucene. The Lucene Search Engine. http://jakarta.apache.org/lucene/docs/index.html, 2003.
 
13
A. M. Mood, F. A. Graybill, and D. C. Boes. Introduction to the Theory of Statistics. McGraw-Hill, Inc., 1974.
 
14
 
15
M. Popovic and P. Willett. The effectiveness of stemming for natural language access to S lovene textual data. Journal of the American Society for Information Science, 43(5):384--390, 1992.
 
16
 
17
M. Porter. Snowball. http://www.snowball.tartarus.org/, May 2003.
 
18
M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.
 
19
 
20
 
21
22
23


Collaborative Colleagues:
Massimo Melucci: colleagues
Nicola Orio: colleagues