ACM Home Page
Please provide us with feedback. Feedback
Localizing volumetric motion for action recognition in realistic videos
Full text PdfPdf (1.79 MB)
Source
International Multimedia Conference archive
Proceedings of the seventeen ACM international conference on Multimedia table of contents
Beijing, China
SESSION: Short papers session 1: content analysis table of contents
Pages 505-508  
Year of Publication: 2009
ISBN:978-1-60558-608-3
Authors
Xiao Wu  Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Chong-Wah Ngo  Department of Computer Science, City University of Hong Kong, Hong Kong, China
Jintao Li  Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Yongdong Zhang  Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Sponsor
SIGMULTIMEDIA: ACM Special Interest Group on Multimedia
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 7,   Citation Count: 0
Additional Information:

abstract   references   index terms  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1631272.1631342
What is a DOI?

ABSTRACT

This paper presents a novel motion localization approach for recognizing actions and events in real videos. Examples include StandUp and Kiss in Hollywood movies. The challenge can be attributed to the large visual and motion variations imposed by realistic action poses. Previous works mainly focus on learning from descriptors of cuboids around space time interest points (STIP) to characterize actions. The size, shape and space-time position of cuboids are fixed without considering the underlying motion dynamics. This often results in large set of fragmentized cuboids which fail to capture long-term dynamic properties of realistic actions. This paper proposes the detection of spatio-temporal motion volumes (namely Volume of Interest, VOI) of scale and position adaptive to localize actions. First, motions are described as bags of point trajectories by tracking keypoints along the time dimension. VOIs are then adaptively extracted by clustering trajectory on the motion mainfold. The resulting VOIs, of varying scales and centering at arbitrary positions depending on motion dynamics, are eventually described by SIFT and 3D gradient features for action recognition. Comparing with fixed-size cuboids, VOI allows comprehensive modeling of long-term motion and shows better capability in capturing contextual information associated with motion dynamics. Experiments on a realistic Hollywood movie dataset show that the proposed approach can achieve 20\% relative improvement compared to the state-of-the-art STIP based algorithm.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
J. Sun, X. Wu, SC. Yan, LF. Cheong, TS. Chua and J. Li. Hierarchical spatio-temporal context modeling for action recognition. CVPR, 2009.
 
2
F. Wang, Y. Jiang, and C. Ngo. Video event detection using motion relativity and visual relatedness. ACM Multimedia, 2008.
 
3
J. Liu, J. Luo, et al. Recognizing realistic actions from videos in the Wild. CVPR, 2009.
 
4
B. Morris, et al. A survey of vision-based trajectory learning and analysis for surveillance. TCSVT, 2008.
 
5
I. Laptev, M. Marsza lek, C. Schmid, et al. Learning realistic human actions from movies. CVPR, 2008.
 
6
D. Batra, T. Chen and R. Sukthankar. Space-Time shapelets for action recognition. IEEE WMVC, 2008.
 
7
L. Gorelick, M. Blank, E. Shechtman, et al. Actions as space-time shapes. TPAMI, 2007.
 
8
R. Tron, et al. A benchmark for the comparison of 3D motion segmentation algorithms. CVPR, 2008.
 
9
Y. Cheng, et al. Mean shift, mode seeking, and clustering. TPAMI, 1995.
 
10
P. Dollar, V. Rabaud, et al. Behavior recognition via sparse spatio-temporal features. In VS-PETS, 2005.
 
11
X. Wu, Y. Zhang, Y. Wu, J. Guo and J. Li. Invariant visual patterns for video copy detection. ICPR, 2008.
 
12
OpenCV: sourceforge.net/projects/opencvlibrary.