|
ABSTRACT
For a computer to recognize objects, persons, situations or actions in multimedia, it needs to have learned models of each thing beforehand. For the moment, no large, general collection of training examples exists for the wide variety of things that we would want to automatically recognize in multimedia, video and still images. We believe that the WWW and current technology can allow us to automatically build such a resource. This paper describes a methodology for the construction of a grounded, general purpose, multimedia ontology that is instantiated through web processing. In this hierarchically organized ontology, concepts corresponding to concrete objects, persons, situations and actions are linked with still images, videos and sounds that represent exemplars of each concept. These examples are necessary resources for computing discriminating signatures for the recognition of the concepts in still images or videos. Since images retrieved using existing image search engines contain much noise hand are not always representative, we also present here our methodology for finding good representative for each concept.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Ben-Yacoub, S., Abdeljaoued, Y., Mayoraz, E., 1999. Fusion of face and speech data for person identity verification. IEEE transaction on neural networks, Vol 10, issue 5, sept 1999, pp 1065--1074.
|
| |
2
|
Fellbaum C., Wordnet- an electronic lexical database, MIT Press, 1998.
|
| |
3
|
Gauvain J.-L. and Lamel L. Large vocabulary speech recognition based on statistical methods. In W. Chou and F. Juang, editors, Pattern Recognition in Speech and Language Processing, chapter 5, pages 149--189. CRC Press, 2003.
|
| |
4
|
|
| |
5
|
Lamel L;, Gauvain J.-L., Adda G., Adda-Decker M., Canseco L., Chen L., Galibert O., Messaoudi A., and Schwenk H. Speech Transcription in Multiple Languages. In Proceedings of ICASSP, Montreal, May 2004.
|
| |
6
|
Millet C., Grefenstette G., Bloch I Moëllic P.-A., Patrick Hède, Automatically populating an image ontology and semantic color filtering, International Workshop Ontoimage'2006 Language Resources for Content-Based Image Retrieval, Genoa, Italy
|
| |
7
|
Mukherjea S., Cho J., Method and apparatus for assigning keywords to media objects US Patent 6,317,740, 2001
|
| |
8
|
Oltramari, A., Gangemi A., Guarino N., Masolo C. 2002. Restructuring WordNet's Top-Level: The OntoClean approach. Proceedings of LREC2002 (OntoLex workshop). Las Palmas, Spain.
|
| |
9
|
Pianta E., Bentivogli L., Girardi C.: MultiWordNet: Developing an Aligned Multilingual Database. Proceedings of the 1st International Global WordNet Conference, Mysore, India (2002)
|
| |
10
|
Popescu A., "Image Retrieval Using a Multilingual Ontology", RIAO2007, Pittsburgh, May, 2007-02-16
|
| |
11
|
|
| |
12
|
Zinger S., Millet C., Mathieu B., Grefenstette G., Hede P., and. Moellic P.-A. Extracting an ontology of portrayable objects from WordNet. In Proceedings of the MUSCLE/ImageCLEF Workshop on Image and Video Retrieval Evaluation, pages 17--23, Vienna, Austria, September 2005
|
|