| Effects of out of vocabulary words in spoken document retrieval (poster session) |
| Full text |
Pdf
(341 KB)
|
| Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
table of contents
Athens, Greece
Pages: 372 - 374
Year of Publication: 2000
ISBN:1-58113-226-3
|
|
Authors
|
|
P. C. Woodland
|
Cambridge University, Engineering Department, Trumpington Street, Cambridge CB2 1PZ, UK
|
|
S. E. Johnson
|
Cambridge University, Engineering Department, Trumpington Street, Cambridge CB2 1PZ, UK
|
|
P. Jourlin
|
Cambridge University, Computer Laboratory, Pembroke Street, Cambridge, CB2 3QG, UK
|
|
K. Spärck Jones
|
Cambridge University, Computer Laboratory, Pembroke Street, Cambridge, CB2 3QG, UK
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 8, Downloads (12 Months): 37, Citation Count: 9
|
|
|
ABSTRACT
The effects of out-of-vocabulary (OOV) items in spoken document retrieval (SDR) are investigated. Several sets of transcriptions were created for the TREC-8 SDR task using a speech recognition system varying the vocabulary sizes and OOV rates, and the relative retrieval performance measured. The effects of OOV terms on a simple baseline IR system and on more sophisticated retrieval systems are described. The use of a parallel corpus for query and document expansion is found to be especially beneficial, and with this data set, good retrieval performance can be achieved even for fairly high OOV rates.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J S Garofolo, C G P Auzane & E M Voorhees 1999 TREC-8 Spoken Document Retrieval Track: Overview, Results and Analyses To appear Proc. TREC-8, 2000
|
| |
2
|
S E Johnson, P Jourlin, K Spi.rrck Jones & P C Woodland. Spoken Document Retrieval for TREC-8 at Cambridge University. To appear Proc. TREC-8, 2000
|
 |
3
|
|
| |
4
|
S E Robertson & K Spirck Jones Simple, Proven Approaches to Test Retrieval Technical Report TR356, Cambridge University Computer Laboratory, May. 1997
|
CITED BY 9
|
|
James W. Cooper , Mahesh Viswanathan , Donna Byron , Margaret Chan, Building searchable collections of enterprise speech data, Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries, p.226-234, January 2001, Roanoke, Virginia, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Edward C. Kaiser , Paulo Barthelmess , Candice Erdmann , Phil Cohen, Multimodal redundancy across handwriting and speech during computer mediated human-human interactions, Proceedings of the SIGCHI conference on Human factors in computing systems, April 28-May 03, 2007, San Jose, California, USA
|
|
|
|
|
|
|
|
|
Dogan Can , Erica Cooper , Arnab Ghoshal , Martin Jansche , Sanjeev Khudanpur , Bhuvana Ramabhadran , Michael Riley , Murat Saraclar , Abhinav Sethy , Morgan Ulinski , Christopher White, Web derived pronunciations for spoken term detection, Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, July 19-23, 2009, Boston, MA, USA
|
|