| Spoken content metadata and MPEG-7 |
| Full text |
Pdf
(474 KB)
|
| Source
|
International Multimedia Conference
archive
Proceedings of the 2000 ACM workshops on Multimedia
table of contents
Los Angeles, California, United States
Pages: 81 - 84
Year of Publication: 2000
ISBN:1-58113-311-1
|
|
Authors
|
|
J. P. A. Charlesworth
|
Canon Research Centre Europe, 1 Occam Ct, Surrey Research Park, Guilford GU2 5YJ, England
|
|
P. N. Garner
|
Canon Research Centre Europe, 1 Occam Ct, Surrey Research Park, Guilford GU2 5YJ, England
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 3, Downloads (12 Months): 17, Citation Count: 0
|
|
|
ABSTRACT
The words spoken in an audio stream form an obvious descriptor essential to most audio-visual metadata standards. When derived using automatic speech recognition systems, the spoken content fits into neither low-level (representative) nor high-level (semantic) metadata categories. This results in difficulties in creating a representation that can support both interoperability between different extraction and application utilities while retaining robustness to the limitations of the extraction process. In this paper, we discuss the issues encountered in the design of the MPEG-7 spoken content descriptor and their applicability to other metadata standards.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
See, e.g., www.mpeg-7.com
|
| |
2
|
See, e.g., www.digitalimaging.org
|
| |
3
|
For a comprehensive treatment of ASR techniques see Rabiner, L and B. Juang, Fundamentals of Speech Recognition, Wiley (1997).
|
| |
4
|
Johnson, S.E., et al., "Spoken document retrieval for TREC- 7 at Cambridge University", Proc. 7th text retrieval conf., NIST special publication 500-242, p 191 (1998).
|
| |
5
|
Siegler, M. et al. "Experiments in Spoken Document Retrieval at CMU", Proc. 7th text retrieval conf., NIST special publication 500-242, p319 (1998).
|
| |
6
|
Ng, K., "Information fusion for spoken document retrieval", Proc. ICASSP 4, p2405 (2000)
|
| |
7
|
Wechsler M, "Spoken document retrieval based on phoneme recognition" PhD thesis, Swiss federal institute of technology, Zurich (1998)
|
| |
8
|
Charlesworth, J.P.A., Garner P.N., Srinivasan S "Output of an of automatic speech recognition" ISO/1EC/JCC1/SC29/WG11 MPEG99/4458 (1999)
|
| |
9
|
The seventh Text REtrieval Conference, NIST special publication 500-242 (1998)
|
| |
10
|
Charlesworth, J.P.A., Gamer P.N., Srinivasan S "Results of CE of automatic speech recognition" ISO/IEC/JCCl/SC29/WGI I MPEG99/5106 (1999)
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.5
INFORMATION INTERFACES AND PRESENTATION (I.7)
Additional Classification:
D.
Software
D.2
SOFTWARE ENGINEERING
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.3
Information Search and Retrieval
Subjects:
Retrieval models
H.5
INFORMATION INTERFACES AND PRESENTATION (I.7)
H.5.2
User Interfaces (D.2.2, H.1.2, I.3.6)
Subjects:
Voice I/O
I.
Computing Methodologies
I.2
ARTIFICIAL INTELLIGENCE
I.2.7
Natural Language Processing
Subjects:
Speech recognition and synthesis
K.
Computing Milieux
K.1
THE COMPUTER INDUSTRY
Subjects:
Standards
General Terms:
Design,
Experimentation,
Human Factors,
Languages,
Management,
Measurement,
Performance,
Standardization,
Theory
Keywords:
MPEG-7,
automatic speech recognition,
interoperability,
robust retrieval,
spoken content,
spoken document retrieval
|