ACM Home Page
Please provide us with feedback. Feedback
Spoken content metadata and MPEG-7
Full text PdfPdf (474 KB)
Source International Multimedia Conference archive
Proceedings of the 2000 ACM workshops on Multimedia table of contents
Los Angeles, California, United States
Pages: 81 - 84  
Year of Publication: 2000
ISBN:1-58113-311-1
Authors
J. P. A. Charlesworth  Canon Research Centre Europe, 1 Occam Ct, Surrey Research Park, Guilford GU2 5YJ, England
P. N. Garner  Canon Research Centre Europe, 1 Occam Ct, Surrey Research Park, Guilford GU2 5YJ, England
Sponsors
SIGOPS: ACM Special Interest Group on Operating Systems
SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques
SIGMIS: ACM Special Interest Group on Management Information Systems
SIGCHI: ACM Special Interest Group on Computer-Human Interaction
SIGCOMM: ACM Special Interest Group on Data Communication
SIGIR: ACM Special Interest Group on Information Retrieval
SIGMULTIMEDIA: ACM Special Interest Group on Multimedia
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 17,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/357744.357880
What is a DOI?

ABSTRACT

The words spoken in an audio stream form an obvious descriptor essential to most audio-visual metadata standards. When derived using automatic speech recognition systems, the spoken content fits into neither low-level (representative) nor high-level (semantic) metadata categories. This results in difficulties in creating a representation that can support both interoperability between different extraction and application utilities while retaining robustness to the limitations of the extraction process. In this paper, we discuss the issues encountered in the design of the MPEG-7 spoken content descriptor and their applicability to other metadata standards.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
See, e.g., www.mpeg-7.com
 
2
See, e.g., www.digitalimaging.org
 
3
For a comprehensive treatment of ASR techniques see Rabiner, L and B. Juang, Fundamentals of Speech Recognition, Wiley (1997).
 
4
Johnson, S.E., et al., "Spoken document retrieval for TREC- 7 at Cambridge University", Proc. 7th text retrieval conf., NIST special publication 500-242, p 191 (1998).
 
5
Siegler, M. et al. "Experiments in Spoken Document Retrieval at CMU", Proc. 7th text retrieval conf., NIST special publication 500-242, p319 (1998).
 
6
Ng, K., "Information fusion for spoken document retrieval", Proc. ICASSP 4, p2405 (2000)
 
7
Wechsler M, "Spoken document retrieval based on phoneme recognition" PhD thesis, Swiss federal institute of technology, Zurich (1998)
 
8
Charlesworth, J.P.A., Garner P.N., Srinivasan S "Output of an of automatic speech recognition" ISO/1EC/JCC1/SC29/WG11 MPEG99/4458 (1999)
 
9
The seventh Text REtrieval Conference, NIST special publication 500-242 (1998)
 
10
Charlesworth, J.P.A., Gamer P.N., Srinivasan S "Results of CE of automatic speech recognition" ISO/IEC/JCCl/SC29/WGI I MPEG99/5106 (1999)

Collaborative Colleagues:
J. P. A. Charlesworth: colleagues
P. N. Garner: colleagues

Peer to Peer - Readers of this Article have also read: