|
ABSTRACT
Among the various types of semantic concepts modeled, events pose the greatest challenge in terms of computational power needed to represent the event and accuracy that can be achieved in modeling it. We introduce a novel low-level visual feature that summarizes motion in a shot. This feature leverages motion vectors from MPEG-encoded video, and aggregates local motion vectors over time in a matrix, which we refer to as a motion image. The resulting motion image is representative of the overall motion in a video shot, having compressed the temporal dimension while preserving spatial ordering. Building motion models using this feature permits us to combine the power of discriminant modeling with the dynamics of the motion in video shots that cannot be accomplished by building generative models over a time series of motion features from multiple frames in the video shot. Evaluation of models built using several motion image features in the TRECVID 2005 dataset shows that use of this novel motion feature results an average improvement in concept detection performance by 140% over existing motion features. Furthermore, experiments also reveal that when this motion feature is combined with static feature representations of a single keyframe from the shot such as color and texture features, the fused detection results in an improvement between 4 to 12% over the fusion across the static features alone.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Adams, W. H., Amir, A., Dorai, C., Ghoshal, S., Iyengar, G., Jaimes, A., Lang, C. Lin, C. Y., Naphade, M. R., Natsev, A., Neti, C., Nock, H. J., Permutter, H., Singh, R., Srinivasan, S., Smith, J. R., Tseng, B. L., Varadaraju, A. T., and Zhang, D. IBM Research TREC-2002 Video Retrieval System. In Proceedings of the Text Retrieval Conference (TREC) (Gaithersburg, MD, November 2002), NIST Special Publications, SP 500-251, 2002, 289--298.
|
| |
2
|
Amir, A., Berg, M., Chang, S. F., Iyengar, G., Lin, C., Naphade, M. R., Natsev, A., Neti, C., Nock, H., Hsu, W., Sachdev, I., Smith, J. R., Tseng, B., Wu, Y., and Zhang, D. IBM Research TRECVID-2003 Video Retrieval System. In Proceedings of the TRECVID 2003 Workshop (Gaithersburg, MD, November 2003), NIST Special Publications, 2003.
|
| |
3
|
Amir, A., Argillander J., Berg, M., Chang, S. F., Iyengar, G., Lin, C., Naphade, M. R., Natsev, A., Hsu, W., Smith, J. R., Tešić, J., Yan, R., Zhang, D. IBM Research TRECVID-2004 Video Retrieval System. In Proceedings of the TRECVID 2004 Workshop (Gaithersburg, MD, November 2004), NIST Special Publications, 2004.
|
| |
4
|
Amir A., Argillander J., Campbell M., Haubold A., Iyengar G., Ebadollahi S., Kang F., Naphade M. R., Natsev A., Smith J. R., Tešić J., and Volkmer T. IBM Research TRECVID-2005 Video Retrieval System. In Proceedings of the TRECVID 2005 Workshop (Gaithersburg, MD, November 2005), NIST Special Publications, 2005.
|
| |
5
|
Bresenham, J. E. Algorithm for computer control of a digital plotter. In IBM Systems Journal, Vol. 4 (1), 1965, 25--30.
|
| |
6
|
Campbell M., Haubold A., Ebadollahi S., Naphade M. R., Natsev P., Smith J. R., Tešić J., and Xie L. IBM Research TRECVID-2006 Video Retrieval System. In Proceedings of the TRECVID 2006 Workshop (Gaithersburg, MD, November 2006), NIST Special Publications, 2006.
|
| |
7
|
Ewerth, R., Beringer, C., Kopp, T., Nievergall, M., Stadelmann, T., and Freisleben, B. University of Marburg at TRECVID 2005: Shot Boundary Detection and Camera Motion Estimation Results. In Proceedings of the TRECVID 2005 Workshop, NIST Special Publications, Gaithersburg, MD, Nov. 2005.
|
| |
8
|
|
 |
9
|
|
| |
10
|
|
| |
11
|
Naphade, M. R., Huang, M. Discovering Recurrent Events in Video Using Unsupervised Methods. In Proceedings of the International Conference on Image Processing (ICIP '02) (Rochester, NY, September 22--25, 2002), IEEE Press, New York, NY, 2002, II-13--II-16.
|
| |
12
|
Naphade, M. R., Kennedy, L., Kender, J. R., Chang, S. F., Smith, J. R., Over P., and Hauptmann, A. LSCOM-lite: A Light Scale Concept Ontology for Multimedia Understanding for TRECVID 2005. IBM Research Technical Report, RC23612 (W0505-104), May, 2005.
|
 |
13
|
|
 |
14
|
|
| |
15
|
Snoek, C. G. M., and Worring, M. Multimedia Event-Based Video Indexing using Time Intervals. In IEEE Transactions on Multimedia, 7(4) (Aug. 2005), 638--647.
|
| |
16
|
J. W. Davis. Hierarchical Motion History Images for Recognizing Human Motion. In Proceedings of the IEEE Workshop on Detection and Recognition of Events in Video (Vancouver, Canada, July 8, 2001). IEEE Press, New York, NY, 2001, 39--46.
|
| |
17
|
J. C. Niebles, H. Wang, L. Fei-Fei. Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words. In Proceedings of the British Machine Vision Conference (BMVC '06) (Edinburgh, United Kingdom, September 4--7, 2006). British Machine Vision Association, 2001
|
| |
18
|
|
|