| A novel method for stemmer generation based on hidden markov models |
| Full text |
Pdf
(208 KB)
|
| Source
|
Conference on Information and Knowledge Management
archive
Proceedings of the twelfth international conference on Information and knowledge management
table of contents
New Orleans, LA, USA
SESSION: Information retrieval session 3: cross language retrieval
table of contents
Pages: 131 - 138
Year of Publication: 2003
ISBN:1-58113-723-0
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): n/a, Downloads (12 Months): n/a, Citation Count: 3
|
|
|
ABSTRACT
In this paper, we present a method based on Hidden Markov Models (HMMs) to generate statistical stemmers. Using a list of words as training set, the method estimates the HMM parameters which are used to calculate the most probable stem for an arbitrary word. Stemming is performed by computing the most probable path, through the HMM states, corresponding to the input word. Linguistic knowledge or a training set of manually stemmed words are not required. We describe the method and the results of the experiments carried out using standard test collections for five different languages.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. Allan, editor. Challenges in Information Retrieval and Language Modeling, Report of a Workshop held at the Center for Intelligent Information Retrieval on September, 2002, University of Massachusetts, Amherst, MA, Published on April, 24th 2003 at http://ciir.cs.umass.edu/irchallenges/.
|
| |
2
|
|
| |
3
|
M. Braschler and B. Ripplinger. Stemming and decompounding for German text retrieval. In Proceedings of the European Conference on Information Retrieval Research (ECIR), pages 177--192, Pisa, Italy, 2003.
|
| |
4
|
G. Di Nunzio, N. Ferro, M. Melucci, and N. Orio The University of Padova at CLEF 2003: Experiments to evaluate probabilistic models for automatic stemmer generation and query word translation Working Notes of the Cross-Language Evaluation Forum Workshop, in press.
|
| |
5
|
|
| |
6
|
|
| |
7
|
D. Harman. How effective is suffixing. Journal of the American Society for Information Science, 42(1):7--15, 1991.
|
| |
8
|
|
 |
9
|
|
 |
10
|
|
| |
11
|
J. Lovins. Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11:22--31, 1968.
|
| |
12
|
Lucene. The Lucene Search Engine. http://jakarta.apache.org/lucene/docs/index.html, 2003.
|
| |
13
|
A. M. Mood, F. A. Graybill, and D. C. Boes. Introduction to the Theory of Statistics. McGraw-Hill, Inc., 1974.
|
| |
14
|
|
| |
15
|
M. Popovic and P. Willett. The effectiveness of stemming for natural language access to S lovene textual data. Journal of the American Society for Information Science, 43(5):384--390, 1992.
|
| |
16
|
|
| |
17
|
M. Porter. Snowball. http://www.snowball.tartarus.org/, May 2003.
|
| |
18
|
M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
 |
22
|
|
 |
23
|
|
|