|
ABSTRACT
Semantic multimedia management is necessary for the effective and widespread utilization of multimedia repositories and realizing the potential that lies untapped in the rich multimodal information content. This challenge has driven researchers to devise new algorithms and systems that enable automatic or semi-automatic tagging of large scale multimedia content with rich semantics. An emerging research area is the detection of a predetermined set of semantic concepts that can act as semantic filters and aid in search, and manipulation. The NIST TRECVID benchmark has responded by creating a task that has evaluated the performance of concept detection. Within the scope of this benchmark task, this paper studies trends in the emerging concept detection systems, architectures and algorithms. It also analyzes strategies that have yielded reasonable success, and challenges and gaps that lie ahead.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Amir, M. Berg, S. F. Chang, G. Iyengar, C. Lin, M. Naade, A. Natsev, C. Neti, H. Nock, W. Hsu, I. Sachdev, J. Smith, B. Tseng, Y. Wu, and D. Zhang, "IBM research trecvid-2003 video retrieval system," Nov 2003, NIST TRECVID 2003.
|
| |
2
|
A. Hauptmann, R. Baron, M. Chen, M. Christel, P. Duygulu, C. Huang, R. jin, W. Lin, T. Ng, N. Moraveji, N. Papernick, C. Snoek, G~Tzanetakis, J. Yang, R. Yan, and H. Wactlar, "Informedia at TRECVID 2003: Analyzing and searching broadcast news video," Nov 2003, NIST TRECVID 2003.
|
| |
3
|
L. Wu, Y. Guo, X. Qiu, Z. Feng, J. Rong, W. Jin, D. Zhou, R. Wang, and M. Jin, "Fudan university at TRECVID 2003," Nov 2003, NIST TRECVID 2003.
|
| |
4
|
M. Rautiainen, J. Pebttila, P. Peterila, K. Noponen, M. Hosio, T. Koskela, S. Makela, J. Peltola, J. Liu, T. Ojala, and T. Seppanen, "TRECVID 2003 experiments at mediaTeam Oulu and VTT," Nov 2003, NIST TRECVID 2003.
|
| |
5
|
Y. Zhai, Z. Rasheed, and M. Shah, "University of central florida at TRECVID 2003," Nov 2003, NIST TRECVID 2003.
|
| |
6
|
X. Huang, G. Wei, and V. Petrushin, "Shot boundary detection and high-level features extraction for the TREC video evaluation 2003," Nov 2003, NIST TRECVID 2003.
|
| |
7
|
F. Souvannavong, B. Merialdo, and B. Huet, "Latent semantic indexing for video content modeling and analysis," Nov 2003, NIST TRECVID 2003.
|
| |
8
|
W. H. Adams, A. Amir, C. Dorai, S. Ghoshal, G. Iyengar, A. Jaimes, C. Lang, C. Y. Lin, M. R. Naade, A. Natsev, C. Neti, H. J. Nock, H. Permutter, R. Singh, S. Srinivasan, J. R. Smith, B. L. Tseng, A. T. Varadaraju, and D. Zhang, "IBM research TREC-2002 video retrieval system," in Proc. Text Retrieval Conference (TREC), Gaithersburg, MD, Nov 2002, pp. 289--298.
|
| |
9
|
J. Smith, S. Srinivasan, A. Amir, S. Basu, G. Iyengar, C. Lin, M. Naade, D. Ponceleon, and B. Tseng, "Integrating features, models, and semantics for content-based retrieval," NIST video-TEC notebook, 2001.
|
| |
10
|
A. Hauptmann, R. Yan, Y. Qi, R. Jin, M. Christel, M. Derthick, M. Chen, R. Baron, W. Lin, and T. Ng, "Video classification and retrieval with the informedia digital video library system," in The Eleventh Text Retrieval Conference, TREC 2002, Gaithersburg, MD, Nov 2002, pp. 119--127.
|
| |
11
|
L. Wu, X. Huang, J. Niu, Y. Xia, Z. Feng, and Y. Zhou, "FDU at TREC 2002: Filtering, q&a and video tasks," in The Eleventh Text Retrieval Conference, TREC 2002, Gaithersburg, MD, Nov 2002, pp. 232--247.
|
| |
12
|
M. Rautiainen, J. Pebttila, P. Peterila, D. Vorobiev, K. Noponen, M. Hosio, E. Matinmikko, S. Makela, J. Peltola, T. Ojala, and T. Seppanen, "TRECVID 2002 experiments at MediaTeam Oulu and VTT," in The Eleventh Text Retrieval Conference, TREC 2002, Gaithersburg, MD, Nov 2002, pp. 417--428.
|
| |
13
|
G. Quenot, D. Moraru, L. Besacier, and P. Muthem, "Clips at trec 11: Experiments in video retrieval," in The Eleventh Text Retrieval Conference, TREC 2002, Gaithersburg, MD, Nov 2002, pp. 181--187.
|
| |
14
|
F. Souvannavong, B. Merialdo, and B. Huet, "Semantic feature extraction using mpeg macro-block classification," in The Eleventh Text Retrieval Conference, TREC 2002, Gaithersburg, MD, Nov 2002, pp. 227--231.
|
| |
15
|
A. Smeaton, "TRECVID 2003- an introduction," Nov 2003, NIST TRECVID 2003.
|
| |
16
|
A. Smeaton and P. Over, "The TREC-2002 video track report," in The Eleventh Text Retrieval Conference, TREC 2002, Gaithersburg, MD, Nov 2002, pp. 69--85.
|
| |
17
|
M. Naade, S. Basu, J. Smith, C. Lin, and B. Tseng, "Modeling semnatic concepts to support query by keywords in video," in IEEE International Confernce on Image Processing, Rochester, NY, Sep 2002.
|
| |
18
|
C. Lin, B. Tseng, and J. Smith, "Video collaborative annotation forum: Establishing ground-truth labels on large multimedia datasets," in Proc. Text Retrieval Conference (TREC), Gaithersburg, MD, Nov 2003.
|
| |
19
|
M. Naade, T. Kristjansson, B. Frey, and T. S. Huang, "Probabilistic multimedia objects (multijects): A novel approach to indexing and retrieval in multimedia systems," in Proceedings of IEEE International Conference on Image Processing, Chicago, IL, Oct. 1998, vol. 3, pp. 536--540.
|
| |
20
|
A. Vailaya, A. Jain, and H. Zhang, "On image classification: City images vs. landscapes," Pattern Recognition, vol. 31, pp. 1921--1936, Dec. 1998.
|
| |
21
|
NIST TREC10, "Common evaluation measures," http://trec.nist.gov/pubs/trec10/appendices/measures.pdf.
|
| |
22
|
E. Voorhees, "The ilosoy of information retrieval evaluation," http://www.itl.nist.gov/iaui/894.02/~works/papers/eval_ilosoy.ps.
|
| |
23
|
J. Smith, S. Srinivasan, A. Amir, S. Basu, G. Iyengar, C. Lin, M. Naade, D. Ponceleon, and B. Tseng, "Integrating features, models, and semantics for content-based retrieval," NIST video-TEC notebook, 2001.
|
| |
24
|
Milind R. Naade and John R. Smith, "A hybrid framework for detecting the semantics of concepts and context," in Lecture Notes in Computer Science: Image and Video Retrieval, M. Lew, N. Sebe, and J. Eakins, Eds. Springer, 2003.
|
| |
25
|
J. Vendrig, J. Hartog, D. Leeuwen, I. Patras, S. Raaijmakers, J. Best, C. Snoek, and M. Worring, "TREC feature extraction by active learning," in The Eleventh Text Retrieval Conference, TREC 2002, Gaithersburg, MD, Nov 2002, pp. 429--438.
|
| |
26
|
P. Browne, C. Czirjek, C. Gurrin, R. Jarina, H. Lee, S. Markow, K. McDonald, N. Mury, N. O'Connor, A. Smeaton, and J. Ye, "Dublin city university video track experiments for TREC 2002," in The Eleventh Text Retrieval Conference, TREC 2002, Gaithersburg, MD, Nov 2002, pp. 217--226.
|
| |
27
|
|
CITED BY 34
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Xun Yuan , Xian-Sheng Hua , Meng Wang , Xiu-Qing Wu, Manifold-ranking based video concept detection on large database and feature pool, Proceedings of the 14th annual ACM international conference on Multimedia, October 23-27, 2006, Santa Barbara, CA, USA
|
|
|
|
|
|
Alexander G. Hauptmann , Wei-Hao Lin , Rong Yan , Jun Yang , Ming-Yu Chen, Extreme video retrieval: joint maximization of human and computer performance, Proceedings of the 14th annual ACM international conference on Multimedia, October 23-27, 2006, Santa Barbara, CA, USA
|
|
|
Yuli Gao , Jianping Fan , Xiangyang Xue , Ramesh Jain, Automatic image annotation by incorporating feature hierarchy and boosting to scale up SVM classifiers, Proceedings of the 14th annual ACM international conference on Multimedia, October 23-27, 2006, Santa Barbara, CA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Deepak S. Turaga , Brian Foo , Olivier Verscheure , Rong Yan, Configuring topologies of distributed semantic concept classifiers for continuous multimedia stream processing, Proceeding of the 16th ACM international conference on Multimedia, October 26-31, 2008, Vancouver, British Columbia, Canada
|
|
|
Meng Wang , Xian-Sheng Hua , Xun Yuan , Yan Song , Li-Rong Dai, Optimizing multi-graph learning: towards a unified video annotation scheme, Proceedings of the 15th international conference on Multimedia, September 25-29, 2007, Augsburg, Germany
|
|
|
|
|
|
|
|
|
Meng Wang , Xian-Sheng Hua , Tao Mei , Richang Hong , Guojun Qi , Yan Song , Li-Rong Dai, Semi-supervised kernel density estimation for video annotation, Computer Vision and Image Understanding, v.113 n.3, p.384-396, March, 2009
|
|
|
Apostol (Paul) Natsev , Alexander Haubold , Jelena Tešić , Lexing Xie , Rong Yan, Semantic concept-based query expansion and re-ranking for multimedia retrieval, Proceedings of the 15th international conference on Multimedia, September 25-29, 2007, Augsburg, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Markus Mühling , Ralph Ewerth , Thilo Stadelmann , Bernd Freisleben , Rene Weber , Klaus Mathiak, Semantic video analysis for psychological research on violence in computer games, Proceedings of the 6th ACM international conference on Image and video retrieval, p.611-618, July 09-11, 2007, Amsterdam, The Netherlands
|
|
|
Jiebo Luo , Jie Yu , Dhiraj Joshi , Wei Hao, Event recognition: viewing the world with a third eye, Proceeding of the 16th ACM international conference on Multimedia, October 26-31, 2008, Vancouver, British Columbia, Canada
|
|
|
|
|
|
|
|
|
|
|
|
Meng Wang , Xian-Sheng Hua , Richang Hong , Jinhui Tang , Guo-Jun Qi , Yan Song, Unified video annotation via multigraph learning, IEEE Transactions on Circuits and Systems for Video Technology, v.19 n.5, p.733-746, May 2009
|
|
|
|
|