ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Spoken information retrieval for turkish broadcast news
Full text PdfPdf (395 KB)
Source
Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval table of contents
Boston, MA, USA
POSTER SESSION: Posters table of contents
Pages: 782-783  
Year of Publication: 2009
ISBN:978-1-60558-483-6
Authors
Siddika Parlak  Rutgers University, Piscataway, NJ, USA
Murat Saraclar  Bogazici University, Istanbul, Turkey
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 100,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1571941.1572126
What is a DOI?

ABSTRACT

Speech Retrieval systems utilize automatic speech recognition (ASR) to generate textual data for indexing. However, automatic transcriptions include errors, either because of out-of-vocabulary (OOV) words or due to ASR inaccuracy. In this work, we address spoken information retrieval in Turkish, a morphologically rich language where OOV rates are high. We apply several techniques, such as using subword units and indexing alternative hypotheses, to cope with the OOV problem and ASR inaccuracy.

Experiments are performed on our Turkish Broadcast News (BN) Corpus which also incorporates a spoken IR collection. Results indicate that word segmentation is quite useful but the efficiency of indexing alternative hypotheses depends on retrieval type.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
E. Arisoy, D. Can, S. Parlak, H. Sak, and M. Saraclar. Turkish broadcast news transcription and retrieval. IEEE Transactions on Speech and Audio Processing, June 2009.
 
2
 
3
M. Creutz and K. Lagus. Unsupervised morpheme segmentation and morphology induction from text corpora using Morfessor 1.0. Publications in Computer and Information Science Report A81, Helsinki University of Technology, March 2005.
 
4
J. Garofolo, G. Auzanne, and E. Voorhees. The TREC spoken document retrieval track: A success story. In Proc. TREC 8, pages 16--19, 2000.
5
 
6
NIST. (STD) 2006 evaluation plan http://www.nist.gov/speech/tests/std/. 2006.
 
7
S. Parlak and M. Saraclar. Spoken term detection for Turkish broadcast news. In Proc. ICASSP, pages 5244--5247, April 2008.

Collaborative Colleagues:
Siddika Parlak: colleagues
Murat Saraclar: colleagues