ACM Home Page
Please provide us with feedback. Feedback
Video retrieval using spatio-temporal descriptors
Full text PdfPdf (994 KB)
Source International Multimedia Conference archive
Proceedings of the eleventh ACM international conference on Multimedia table of contents
Berkeley, CA, USA
SESSION: Surveillance table of contents
Pages: 508 - 517  
Year of Publication: 2003
ISBN:1-58113-722-2
Authors
Daniel DeMenthon  University of Maryland, College Park, MD
David Doermann  University of Maryland, College Park, MD
Sponsors
SIGMULTIMEDIA: ACM Special Interest Group on Multimedia
SIGCOMM: ACM Special Interest Group on Data Communication
ACM: Association for Computing Machinery
SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 121,   Citation Count: 9
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/957013.957124
What is a DOI?

ABSTRACT

This paper describes a novel methodology for implementing video search functions such as retrieval of near-duplicate videos and recognition of actions in surveillance video. Videos are divided into half-second clips whose stacked frames produce 3D space-time volumes of pixels. Pixel regions with consistent color and motion properties are extracted from these 3D volumes by a threshold-free hierarchical space-time segmentation technique. Each region is then described by a high-dimensional point whose components represent the position, motion and, when possible, color of the region. In the indexing phase for a video database, these points are assigned labels that specify their video clip of origin. All the labeled points for all the clips are stored into a single binary tree for efficient $k$-nearest neighbor retrieval. The retrieval phase uses video segments as queries. Half-second clips of these queries are again segmented to produce sets of points, and for each point the labels of its nearest neighbors are retrieved. The labels that receive the largest numbers of votes correspond to the database clips that are the most similar to the query video segment. We illustrate this approach for video indexing and retrieval and for action recognition. First, we describe retrieval experiments for dynamic logos, and for video queries that differ from the indexed broadcasts by the addition of large overlays. Then we describe experiments in which office actions (such as pulling and closing drawers, taking and storing items, picking up and putting down a phone) are recognized. Color information is ignored to insure independence to people's appearance. One of the distinct advantages of using this approach for action recognition is that there is no need for detection or recognition of body parts.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
R.C. Bolles, H.H. Baker and D.H. Marimont, "Epipolar-Plane Image Analysis: An Approach to Determining Structure from Motion", Int. J. of Computer Vision, 1(1), pp. 7--55, 1987.
 
3
 
4
 
5
 
6
 
7
A. Del Bimbo, P. Pala and L. Tanganelli, "Video Retrieval based on Dynamics of Color Flows", ICPR 2000, vol. 1, pp. 851--854.
 
8
D. DeMenthon, "Spatio-Temporal Segmentation of Video by Hierarchical Mean Shift Analysis", SMVP 2002 (Statistical Methods in Video Processing Workshop), Copenhagen, Denmark, 2002.
9
 
10
N. Dimitrova and M. Abdel-Mottaleb, "Content-based Video Retrieval by Example Video Clip", Proc. SPIE vol. 3022, Storage and Retrieval for Image and Video Databases, pp. 59--70, 1997.
 
11
R. Fablet, P. Bouthemy and P. Perez, "Non-parametric Motion Characterization using Causal Probabilistic Models for Video Indexing and Retrieval", IEEE Trans. on Image Processing, vol. 11(4), pp. 393--407, 2002.
 
12
 
13
 
14
A. Hampapur, A. Gupta, B. Horowitz, C-F. Shu, C. Fuller, J. Bach, M. Gorkani and R. Jain, "Virage Video Engine", Proc. SPIE vol. 3022, Storage and Retrieval for Image and Video Databases, pp. 188--198, 1997.
 
15
 
16
V. Kobla, and D. Doermann, "Indexing and Retrieval of MPEG-compressed Video", Journal of Electronic Imaging, pp. 294--307, 1998.
 
17
 
18
R. Lienhart, W. Effelsberg and R. Jain, "Visual GREP: A Systematic Method to Compare and Retrieve Video Sequences", Proc. SPIE vol. 3312, Storage and Retrieval for Image and Video Databases, pp. 271--282, 1998.
 
19
C. Merkwirth, U. Parlitz and W. Lautherborn, "Fast Nearest-Neighbor Searching for Nonlinear Signal Processing", Phys. Review E., vol. 62, pp. 2089--2097, 2000. TSTool package available at http://www.physik3.gwdg.de/tstool/
 
20
 
21
 
22
 
23
 
24
 
25
H. Sun, T. Feng and T. Tan, "Spatio-Temporal Segmentation for Video Surveillance", ICPR 2000, vol. 1, pp. 843--846, 2000.
 
26
E. Sahouria, A. Zakhor, "Motion Indexing of Video", ICIP, vol. 2, pp. 526--529, 1997.
 
27
T. F. Syeda-Mahmood, A. Vasilescu and S. Sethi, "Recognizing Action Events in Video", IEEE Workshop on Event Detection and Recognition in Video, pp. 64--72, 2001.

CITED BY  9

Collaborative Colleagues:
Daniel DeMenthon: colleagues
David Doermann: colleagues

Peer to Peer - Readers of this Article have also read: