| Web derived pronunciations for spoken term detection |
| Full text |
Pdf
(614 KB)
|
Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
table of contents
Boston, MA, USA
SESSION: Speech and linguistic processing
table of contents
Pages 83-90
Year of Publication: 2009
ISBN:978-1-60558-483-6
|
|
Authors
|
|
Dogan Can
|
Bogazici University, Istanbul, Turkey
|
|
Erica Cooper
|
MIT, Cambridge, MA, USA
|
|
Arnab Ghoshal
|
Johns Hopkins University, Baltimore, MD, USA
|
|
Martin Jansche
|
Google, Inc., NY, NY, USA
|
|
Sanjeev Khudanpur
|
Johns Hopkins University, Baltimore, MD, USA
|
|
Bhuvana Ramabhadran
|
IBM T. J. Watson Research Center, Yorktown Heights, NY, USA
|
|
Michael Riley
|
Google, Inc., New York, NY, USA
|
|
Murat Saraclar
|
Bogazici University, Istanbul, Turkey
|
|
Abhinav Sethy
|
IBM T. J. Watson Research Center, Yorktown Heights, NY, USA
|
|
Morgan Ulinski
|
Cornell University, Ithaca, NY, USA
|
|
Christopher White
|
Johns Hopkins University, Baltimore, MD, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 45, Downloads (12 Months): 124, Citation Count: 0
|
|
|
ABSTRACT
Indexing and retrieval of speech content in various forms such as broadcast news, customer care data and on-line media has gained a lot of interest for a wide range of applications, from customer analytics to on-line media search. For most retrieval applications, the speech content is typically first converted to a lexical or phonetic representation using automatic speech recognition (ASR). The first step in searching through indexes built on these representations is the generation of pronunciations for named entities and foreign language query terms. This paper summarizes the results of the work conducted during the 2008 JHU Summer Workshop by the Multilingual Spoken Term Detection team, on mining the web for pronunciations and analyzing their impact on spoken term detection. We will first present methods to use the vast amount of pronunciation information available on the Web, in the form of IPA and ad-hoc transcriptions. We describe techniques for extracting candidate pronunciations from Web pages and associating them with orthographic words, filtering out poorly extracted pronunciations, normalizing IPA pronunciations to better conform to a common transcription standard, and generating phonemic representations from ad-hoc transcriptions. We then present an analysis of the effectiveness of using these pronunciations to represent Out-Of-Vocabulary (OOV) query terms on the performance of a spoken term detection (STD) system. We will provide comparisons of Web pronunciations against automated techniques for pronunciation generation as well as pronunciations generated by human experts. Our results cover a range of speech indexes based on lattices, confusion networks and one-best transcriptions at both word and word fragments levels.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
C. Allauzen et al. OpenFST: A general and efficient weighted finite-state library. In CIAA, 2007.
|
| |
2
|
C. Allauzen, M. Mohri, and M. Saraclar. General-indexation of weighted automata-application to spoken utterance retrieval. In Proc. HLT-NAACL, 2004.
|
| |
3
|
M. Bisani and H. Ney. Investigations on joint-multigram models for grapheme-to-phoneme conversion. In ICSLP, 2002.
|
| |
4
|
A. Black, K. Lenzo, and V. Pagel. Issues in building general letter to sound rules. In ESCA WSS-3, 1998.
|
| |
5
|
U. V. Chaudhari and M. Picheny. Improvements in phone based audio search via constrained match with high order confusion estimates. In Proc. of ASRU, 2007.
|
| |
6
|
M. Clements, S. Robertson, and M. S. Miller. Phonetic searching applied to on-line distance learning modules. In Proc. of IEEE Digital Signal Processing Workshop, 2002.
|
| |
7
|
|
| |
8
|
|
| |
9
|
H. Elovitz et al. Letter-to-sound rules for automatic translation of English text to phones. IEEE Trans. ASSP, 1976.
|
| |
10
|
J. S. Garofolo, C. G. P. Auzanne, and E. M. Voorhees. The trec spoken document retrieval track: A success story. In Proc. of TREC-9, 2000.
|
| |
11
|
R. Kneser and H. Ney. Improved backing-off for m-gram language modeling. In ICASSP, 1995.
|
 |
12
|
|
| |
13
|
|
| |
14
|
M. Mohri, F. C. N. Pereira, and M. Riley. Weighted automata in text and speech processing. In Proc. ECAI, Workshop on Extended Finite State Models of Language, 1996.
|
| |
15
|
NIST. The spoken term detection (std) 2006 evaluation plan, 2006. http://www.nist.gov/speech/tests/std/docs/std06-evalplan-v10.
|
| |
16
|
S. Parlak and M. Saraclar. Spoken term detection for Turkish broadcast news. In Proc. ICASSP, 2008.
|
| |
17
|
M. Saraclar and R. Sproat. Lattice-based search for spoken utterance retrieval. In Proc. HLT-NAACL, 2004.
|
| |
18
|
F. Seide, P. Yu, C. Ma, and E. Chang. Vocabulary-independent search in spontaneous speech. In Proc. of ICASSP, 2004.
|
| |
19
|
O. Siohan and M. Bacchiani. Fast vocabulary independent audio search using path based graph indexing. In Proc. of Interspeech, 2005.
|
| |
20
|
H. Soltau, B. Kingsbury, L. Mangu, D. Povey, G. Saon, and G. Zweig. The IBM 2004 conversational telephony system for rich transcription. In Proc. ICASSP, 2005.
|
| |
21
|
P. Taylor. Hidden Markov models for grapheme to phoneme conversion. In Interspeech, 2005.
|
 |
22
|
P. C. Woodland , S. E. Johnson , P. Jourlin , K. Spärck Jones, Effects of out of vocabulary words in spoken document retrieval (poster session), Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, p.372-374, July 24-28, 2000, Athens, Greece
[doi> 10.1145/345508.345661]
|
|