|
ABSTRACT
Combined word-based index and phonetic indexes have been used to improve the performance of spoken document retrieval systems primarily by addressing the out-of-vocabulary retrieval problem. However, a known problem with phonetic recognition is its limited accuracy in comparison with word level recognition. We propose a novel method for phonetic retrieval in the CueVideo system based on the probabilistic formulation of term weighting using phone confusion data in a Bayesian framework. We evaluate this method of spoken document retrieval against word-based retrieval for the search levels identified in a realistic video-based distributed learning setting. Using our test data, we achieved an average recall of 0.88 with an average precision of 0.69 for retrieval of out-of-vocabulary words on phonetic transcripts with 35% word error rate. For in-vocabulary words, we achieved a 17% improvement in recall over word-based retrieval with a 17% loss in precision for word error rites ranging from 35 to 65%.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Amir, A., Ponceleon, D., Blanchard, B., Petkovic, D., Srinivasan, S. and Cohen, G. Using Audio Time Scale Modification for Video Browsing, in Proceedings of HICSS-33, Hawaii, Jan. 2000.
|
 |
2
|
Arnon Amir , Savitha Srinivasan , Dulce Ponceleon , Dragutin Petkovic, CueVideo (demonstration abstract): automated video/audio indexing and browsing, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.326, August 15-19, 1999, Berkeley, California, United States
[doi> 10.1145/312624.312759]
|
| |
3
|
Dharanipragada, S., Franz, M. and Roukos, S. Audio-Indexing For Broadcast News. In Proceedings of Seventh Text Retrieval Conference, TREC-6, (NIST Special Publication) 1997.
|
| |
4
|
Dharanipragada, S., and Roukos, S. A Fast vocabulary independent algorithm for spotting words in speech. In Proceedings of lCASSP 98, 1998.
|
 |
5
|
|
| |
6
|
Garofolo, J.,Voorhees, E., Auzanne, C., Stanford, V. and Lund, B. (1997). The TREC-7 Spoken Document Retrieval Track Overview and Results. In Proceedings of the seventh Text Retrieval Conference (TREC-7), pp. 79. NIST Special Publication 500-242.
|
| |
7
|
James, D. System for Unrestricted Topic Retrieval from Radio News Broadcasts, In Proceedings of ICASSP-96, Atlanta, GA, May196, pp. 279-282.
|
 |
8
|
|
| |
9
|
Johnson, S.E., Jourlin, P., Moore, G.L., Jones, K.S. and Woodland, P.C. Spoken Document Retrieval for TREC-7 at Cambridge University. In Proceedings of the Seventh Text Retrieval Conference (TPREC-7), (NIST Special Publication) 1998
|
| |
10
|
Jones, G. J. F., Foote, J. T., Jones, K. S., and Young, S. J.. Video Mail Retrieval: the effect of word spotting accuracy on precision. In Proceedings of ICASSP 95, volume 1, pp. 309-312, Detroit, MI.
|
 |
11
|
G. J. F. Jones , J. T. Foote , K. Spärck Jones , S. J. Young, Retrieving spoken documents by combining multiple index sources, Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, p.30-38, August 18-22, 1996, Zurich, Switzerland
[doi> 10.1145/243199.243208]
|
| |
12
|
Jones, K. S., Walker, S. and Robertson, S.E. A probabilistic model of information retrieval: Develepment and STatus, TR 446, Cambridge University Computer Laboratory, Sept 1998.
|
| |
13
|
See URL at http://www.lotus.com/home.nsf/tabs/learnspace
|
| |
14
|
Lunassen, L.M. and Mercer, R.L. An Information Theoretic Approach to Automatic Determination of Phonemic Baseforms. In Proceedings of ICASSP 84, pp. 42.5.1-42.5.4, 1984.
|
 |
15
|
|
| |
16
|
Ng, K. and Zue, V. Phonetic Recognition for Spoken Document Retrieval. In Proceedings of ICASSP 98, pp. 325-328.
|
| |
17
|
Robertson, S.E. and SparckoJones, K. Relevance weighting of search terms. In Journal of American Society of Information Sciences. 27 (May-June 1976). pp. 126-146.
|
| |
18
|
Robertson, S.E., Walker, A., Sparck-Jones, K., Hancock-Beaulieu M.M & Gatford, M. Okapi at TREC-3. In Prec. Third Text Retrieval Conference. (NIST special publication), 1995.
|
| |
19
|
Sch/tuble, P. and Wechsler, M. First Experiences with a System for Content Based Retrieval of Information from Speech Recordings. In IJCAI-95, Workshop on Intelligent Multimedia Information Retrieval, Maybury, M.T.
|
| |
20
|
Siegler, M.A., Witbrock, M.J., Slattery, S.T., Seymore, K., Jones, R.E. and Hauptmann, A.G. Experiments in Spoken Document Retrieval at CMU. In Ptvceedings of the Seventh Text Retrieval Conference (TREC-7), (NIST Special Publication) 1998.
|
| |
21
|
Singhal, A., Col, J., Hindle, D., Lewis, D. and Pereira, F. AT&T at TREC-7. In Proceedings of the Seventh Text Retrieval Conference TREC-7, (NIST Special Publication) 1998.
|
| |
22
|
|
| |
23
|
See URL at http://cwp.stanford.edu.
|
| |
24
|
See URL at http://www-4.ibm.com/software/speecld
|
| |
25
|
Voorhees, E., Garofolo, J. and Jones, K. (1997). The TREC-6 Spoken Document Retrieval Track Overview and Results. In Proceedings of the sixth Text Retrieval Conference (TREC-6), pp. 83. NIST Special Publication 500-240.
|
 |
26
|
|
 |
27
|
|
CITED BY 15
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
E. W. Brown , S. Srinivasan , A. Coden , D. Ponceleon , J. W. Cooper , A. Amir, Toward speech as a knowledge resource, IBM Systems Journal, v.40 n.4, p.985-1001, October 2001
|
|
|
Arnon Amir , Sankar Basu , Giridharan Iyengar , Ching-Yung Lin , Milind Naphade , John R. Smith , Savitha Srinivasan , Belle Tseng, A multi-modal system for the retrieval of semantic video events, Computer Vision and Image Understanding, v.96 n.2, p.216-236, November 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|