|
||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||
ABSTRACT
Speech Retrieval systems utilize automatic speech recognition (ASR) to generate textual data for indexing. However, automatic transcriptions include errors, either because of out-of-vocabulary (OOV) words or due to ASR inaccuracy. In this work, we address spoken information retrieval in Turkish, a morphologically rich language where OOV rates are high. We apply several techniques, such as using subword units and indexing alternative hypotheses, to cope with the OOV problem and ASR inaccuracy. Experiments are performed on our Turkish Broadcast News (BN) Corpus which also incorporates a spoken IR collection. Results indicate that word segmentation is quite useful but the efficiency of indexing alternative hypotheses depends on retrieval type. REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
INDEX TERMS
Primary Classification:
Additional Classification:
General Terms:
Keywords:
|
||||||||||||||||||||||||||||||||||||||||