|
ABSTRACT
This paper describes a novel methodology for implementing video search functions such as retrieval of near-duplicate videos and recognition of actions in surveillance video. Videos are divided into half-second clips whose stacked frames produce 3D space-time volumes of pixels. Pixel regions with consistent color and motion properties are extracted from these 3D volumes by a threshold-free hierarchical space-time segmentation technique. Each region is then described by a high-dimensional point whose components represent the position, motion and, when possible, color of the region. In the indexing phase for a video database, these points are assigned labels that specify their video clip of origin. All the labeled points for all the clips are stored into a single binary tree for efficient $k$-nearest neighbor retrieval. The retrieval phase uses video segments as queries. Half-second clips of these queries are again segmented to produce sets of points, and for each point the labels of its nearest neighbors are retrieved. The labels that receive the largest numbers of votes correspond to the database clips that are the most similar to the query video segment. We illustrate this approach for video indexing and retrieval and for action recognition. First, we describe retrieval experiments for dynamic logos, and for video queries that differ from the indexed broadcasts by the addition of large overlays. Then we describe experiments in which office actions (such as pulling and closing drawers, taking and storing items, picking up and putting down a phone) are recognized. Color information is ignored to insure independence to people's appearance. One of the distinct advantages of using this approach for action recognition is that there is no need for detection or recognition of body parts.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
R.C. Bolles, H.H. Baker and D.H. Marimont, "Epipolar-Plane Image Analysis: An Approach to Determining Structure from Motion", Int. J. of Computer Vision, 1(1), pp. 7--55, 1987.
|
| |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
A. Del Bimbo, P. Pala and L. Tanganelli, "Video Retrieval based on Dynamics of Color Flows", ICPR 2000, vol. 1, pp. 851--854.
|
| |
8
|
D. DeMenthon, "Spatio-Temporal Segmentation of Video by Hierarchical Mean Shift Analysis", SMVP 2002 (Statistical Methods in Video Processing Workshop), Copenhagen, Denmark, 2002.
|
 |
9
|
|
| |
10
|
N. Dimitrova and M. Abdel-Mottaleb, "Content-based Video Retrieval by Example Video Clip", Proc. SPIE vol. 3022, Storage and Retrieval for Image and Video Databases, pp. 59--70, 1997.
|
| |
11
|
R. Fablet, P. Bouthemy and P. Perez, "Non-parametric Motion Characterization using Causal Probabilistic Models for Video Indexing and Retrieval", IEEE Trans. on Image Processing, vol. 11(4), pp. 393--407, 2002.
|
| |
12
|
Myron Flickner , Harpreet Sawhney , Wayne Niblack , Jonathan Ashley , Qian Huang , Byron Dom , Monika Gorkani , Jim Hafner , Denis Lee , Dragutin Petkovic , David Steele , Peter Yanker, Query by Image and Video Content: The QBIC System, Computer, v.28 n.9, p.23-32, September 1995
[doi> 10.1109/2.410146]
|
| |
13
|
|
| |
14
|
A. Hampapur, A. Gupta, B. Horowitz, C-F. Shu, C. Fuller, J. Bach, M. Gorkani and R. Jain, "Virage Video Engine", Proc. SPIE vol. 3022, Storage and Retrieval for Image and Video Databases, pp. 188--198, 1997.
|
| |
15
|
|
| |
16
|
V. Kobla, and D. Doermann, "Indexing and Retrieval of MPEG-compressed Video", Journal of Electronic Imaging, pp. 294--307, 1998.
|
| |
17
|
|
| |
18
|
R. Lienhart, W. Effelsberg and R. Jain, "Visual GREP: A Systematic Method to Compare and Retrieve Video Sequences", Proc. SPIE vol. 3312, Storage and Retrieval for Image and Video Databases, pp. 271--282, 1998.
|
| |
19
|
C. Merkwirth, U. Parlitz and W. Lautherborn, "Fast Nearest-Neighbor Searching for Nonlinear Signal Processing", Phys. Review E., vol. 62, pp. 2089--2097, 2000. TSTool package available at http://www.physik3.gwdg.de/tstool/
|
| |
20
|
|
| |
21
|
|
| |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
H. Sun, T. Feng and T. Tan, "Spatio-Temporal Segmentation for Video Surveillance", ICPR 2000, vol. 1, pp. 843--846, 2000.
|
| |
26
|
E. Sahouria, A. Zakhor, "Motion Indexing of Video", ICIP, vol. 2, pp. 526--529, 1997.
|
| |
27
|
T. F. Syeda-Mahmood, A. Vasilescu and S. Sethi, "Recognizing Action Events in Video", IEEE Workshop on Event Detection and Recognition in Video, pp. 64--72, 2001.
|
CITED BY 9
|
|
|
Xiaomeng Wu , Masao Takimoto , Shin'ichi Satoh , Jun Adachi, Scene duplicate detection based on the pattern of discontinuities in feature point trajectories, Proceeding of the 16th ACM international conference on Multimedia, October 26-31, 2008, Vancouver, British Columbia, Canada
|
|
Ling-Yu Duan , Jun-Song Yuan , Qi Tian , Chang-Sheng Xu, Fast and robust video clip search using index structure, Proceedings of the 12th annual ACM international conference on Multimedia, October 10-16, 2004, New York, NY, USA
|
|
Junsong Yuan , Ling-Yu Duan , Qi Tian , Changsheng Xu, Fast and robust short video clip search using an index structure, Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval, October 15-16, 2004, New York, NY, USA
|
|
|
|
|
|
|
|
Jianping Fan , Hangzai Luo , Jing Xiao , Lide Wu, Semantic video classification and feature subset selection under context and concept uncertainty, Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, June 07-11, 2004, Tuscon, AZ, USA
|
|
|
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE Design Automation Conference on
Gwo-Dong Chen
, Daniel D. Gajski
|