ACM Home Page
Please provide us with feedback. Feedback
Object tracking: A survey
Full text PdfPdf (2.60 MB)
Source ACM Computing Surveys (CSUR) archive
Volume 38 ,  Issue 4  (2006) table of contents
Article No. 13  
Year of Publication: 2006
ISSN:0360-0300
Authors
Alper Yilmaz  Ohio State University
Omar Javed  ObjectVideo, Inc., Reston, VA
Mubarak Shah  University of Central Florida
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 565,   Downloads (12 Months): 5731,   Citation Count: 15
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1177352.1177355
What is a DOI?

ABSTRACT

The goal of this article is to review the state-of-the-art tracking methods, classify them into different categories, and identify new trends. Object tracking, in general, is a challenging problem. Difficulties in tracking objects can arise due to abrupt object motion, changing appearance patterns of both the object and the scene, nonrigid object structures, object-to-object and object-to-scene occlusions, and camera motion. Tracking is usually performed in the context of higher-level applications that require the location and/or shape of the object in every frame. Typically, assumptions are made to constrain the tracking problem in the context of a particular application. In this survey, we categorize the tracking methods on the basis of the object and motion representations used, provide detailed descriptions of representative methods in each category, and examine their pros and cons. Moreover, we discuss the important issues related to tracking including the use of appropriate image features, selection of motion models, and detection of objects.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Ali, A. and Aggarwal, J. 2001. Segmentation and recognition of continuous human activity. In IEEE Workshop on Detection and Recognition of Events in Video. 28--35.
 
3
Avidan, S. 2001. Support vector tracking. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 184--191.
 
4
Baddeley, A. 1992. Errors in binary images and an l version of the haus- dorff metric. Nieuw Archief voor Wiskunde 10, 157--183.
 
5
 
6
 
7
 
8
 
9
 
10
Beymer, D. and Konolige, K. 1999. Real-time tracking of multiple people using continuous detection. In IEEE International Conference on Computer Vision (ICCV) Frame-Rate Workshop..
 
11
 
12
 
13
 
14
15
 
16
17
 
18
 
19
Bregler, C., Hertzmann, A., and Biermann, H. 2000. Recovering nonrigid 3d shape from image streams. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 690--696.
 
20
 
21
 
22
 
23
 
24
Cham, T. and Rehg, J. M. 1999. A multiple hypothesis approach to figure tracking. In IEEE International Conference on Computer Vision and Pattern Recognition. 239--245.
 
25
Chang, Y. L. and Aggarwal, J. K. 1991. 3d structure reconstruction from an ego motion sequence using statistical estimation and detection theory. In Workshop on Visual Motion. 268--273.
 
26
Chen, Y., Rui, Y., and Huang, T. 2001. Jpdaf based hmm for real-time contour tracking. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 543--550.
 
27
Collins, R., Lipton, A., Fujiyoshi, H., and Kanade, T. 2001. Algorithms for cooperative multisensor surveillance. Proceedings of IEEE 89, 10, 1456--1477.
 
28
 
29
 
30
 
31
 
32
 
33
 
34
 
35
 
36
 
37
Cremers, D. and Schnorr, C. 2003. Statistical shape knowledge in variational motion segmentation. I. Srael Nent. Cap. J. 21, 77--86.
 
38
Dockstader, S. and Tekalp, A. M. 2001a. Multiple camera tracking of interacting and occluded human motion. Proceedings of the IEEE 89, 1441--1455.
 
39
 
40
 
41
Elgammal, A., Duraiswami, R., Harwood, D., and Davis, L. 2002. Background and foreground modeling using nonparametric kernel density estimation for visual surveillance. Proceedings of IEEE 90, 7, 1151--1163.
 
42
 
43
 
44
 
45
Friedman, J., Hastie, T., and Tibshirani, R. 2000. Additive logistic regression: A statistical view of boosting. annals of statistics. Ann. Stat. 38, 2, 337--374.
 
46
Gao, X., Boult, T., Coetzee, F., and Ramesh, V. 2000. Error analysis of background adaption. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 503--510.
 
47
 
48
 
49
Greenspan, H., Belongie, S., Goodman, R., Perona, P., Rakshit, S., and Anderson, C. 1994. Overcomplete steerable pyramid filters and rotation invariance. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 222--228.
 
50
 
51
Haralick, R., Shanmugam, B., and Dinstein, I. 1973. Textural features for image classification. IEEE Trans. Syst. Man Cybern. 33, 3, 610--622.
 
52
 
53
Harris, C. and Stephens, M. 1988. A combined corner and edge detector. In 4th Alvey Vision Conference. 147--151.
 
54
HarrisSrc. Harris Source Code. http://www.cs.uwa.edu.au/~pk/Research/MatlabFns/Spatial/harris.m.
 
55
Hausdorff, F. 1962. Set Theory. Chelsea, New York, NY.
 
56
Horn, B. and Schunk, B. 1981. Determining optical flow. Artific. Intell. 17, 185--203.
 
57
Huang, T. and Russell, S. 1997. Object identification in a bayesian context. In Proceedings of International Joint Conference on Artificial Intelligence. 1276--1283.
 
58
Hue, C., Cadre, J. L., and Prez, P. 2002. Sequential monte carlo methods for multiple targettracking and data fusion. IEEE Trans. Sign. Process. 50, 2, 309--325.
 
59
Huttenlocher, D., Noh, J., and Rucklidge, W. 1993. Tracking nonrigid objects in complex scenes. In IEEE International Conference on Computer Vision (ICCV). 93--101.
 
60
 
61
 
62
 
63
Isard, M. and MacCormick, J. 2001. Bramble: A bayesian multiple-blob tracker. In IEEE International Conference on Computer Vision (ICCV). 34--41.
 
64
Jain, R. and Nagel, H. 1979. On the analysis of accumulative difference pictures from image sequences of real world scenes. IEEE Trans. Patt. Analy. Mach. Intell. 1, 2, 206--214.
 
65
 
66
 
67
Jepson, A., Fleet, D., and ElMaraghi, T. 2003. Robust online appearance models for visual tracking. IEEE Trans. Patt. Analy. Mach. Intell. 25, 10, 1296--1311.
 
68
 
69
KalmanSrc. Kalman Filtering Source Code. http://www.ai.mit.edu/~murphyk/Software/index.html.
 
70
Kanade, T., Collins, R., Lipton, A., Burt, P., and Wixson, L. 1998. Advances in cooperative multi-sensor video surveillance. Darpa IU Workshop. 3--24.
 
71
Kang, J., Cohen, I., and Medioni, G. 2003. Continuous tracking within and across camera streams. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 267--272.
 
72
 
73
Kass, M., Witkin, A., and Terzopoulos, D. 1988. Snakes: active contour models. Int. J. Comput. Vision 1, 321--332.
 
74
Kettnaker, V. and Zabih, R. 1999. Bayesian multi-camera surveillance. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 117--123.
 
75
Khan, S. and Shah, M. 2003. Consistent labeling of tracked objects in multiple cameras with overlapping fields of view. IEEE Trans. Patt. Analy. Mach. Intell. 25, 10, 1355--1360.
 
76
KLTSrc. KLT Source Code. http://www.ces.clemson.edu/~stb/klt/.
 
77
Kockelkorn, M., Luneburg, A., and Scheffer, T. 2003. Using transduction and multiview learning to answer emails. In European Conference on Principle and Practice of Knowledge Discovery in Databases. 266--277.
 
78
Kuhn, H. 1955. The hungarian method for solving the assignment problem. Naval Research Logistics Quart. 2, 83--97.
 
79
 
80
Laws, K. 1980. Textured image segmentation. PhD thesis, Electrical Engineering, University of Southern California.
 
81
 
82
LevelSetSrc. Level Set Source Code. http://www.cs.utah.edu/~whitaker/vispack/.
 
83
 
84
Li, B., Chellappa, R., Zheng, Q., and Der, S. 2001. Model-based temporal object verification using video. IEEE Trans. Image Process. 10, 6, 897--908.
 
85
Liyuan, L. and Maylor, L. 2002. Integrating intensity and texture differences for robust change detection. IEEE Trans. Image Process. 11, 2, 105--112.
 
86
 
87
Lucas, B. D. and Kanade., T. 1981. An iterative image registration technique with an application to stereo vision. In International Joint Conference on Artificial Intelligence.
 
88
 
89
 
90
 
91
 
92
Matthies, L., Szeliski, R., and Kanade, T. 1989. Kalman filter-based algorithms for estimating depth from image sequences. Int. J. Comput. Vision 3, 3, 209--238.
 
93
MeanShiftSegmentSrc. Mean-Shift Segmentation Source Code. http://www.caip.rutgers.edu/riul/research/code.html.
 
94
MeanShiftTrackSrc. Mean-Shift Tracking Source Code. http://www.intel.com/technology/computing/opencv/index.htm.
 
95
 
96
Mikolajczyk, K. and Schmid, C. 2003. A performance evaluation of local descriptors. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1615--1630.
 
97
 
98
 
99
 
100
Moravec, H. 1979. Visual mapping by a robot rover. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 598--600.
 
101
 
102
Mumford, D. and Shah, J. 1989. Optimal approximations by piecewise smooth functions and variational problems. Comm. Pure Appl. Mathemat. 42, 5, 677--685.
 
103
Murty, K. 1968. An algorithm for ranking all the assignments in order of increasing cost. Operations Resear. 16, 682--686.
 
104
 
105
 
106
 
107
 
108
 
109
Park, S. and Aggarwal, J. K. 2004. A hierarchical bayesian network for event recognition of human actions and interactions. Multimed. Syst. 10, 2, 164--179.
 
110
ParticleFltSrc. Particle Filtering Source Code. http://www-sigproc.eng.cam.ac.uk/smc/software.html.
 
111
Paschos, G. 2001. Perceptually uniform color spaces for color texture analysis: an empirical evaluation. IEEE Trans. Image Process. 10, 932--937.
 
112
Rabiner, L. R. 1989. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE 77, 2, 257--286.
 
113
 
114
 
115
 
116
Reid, D. B. 1979. An algorithm for tracking multiple targets. IEEE Trans. Autom. Control 24, 6, 843--854.
 
117
 
118
 
119
Rosales, R. and Sclaroff, S. 1999. 3d trajectory recovery for tracking multiple objects and trajectory guided recognition of actions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 117--123.
 
120
Rowe, S. and Blake, A. 1996. Statistical mosaics for tracking. Israel Verj. Cap. J. 14, 549--564.
 
121
 
122
 
123
 
124
 
125
 
126
 
127
 
128
Sethian, J. 1999. Level Set Methods: Evolving Interfaces in Geometry, Fluid Mechanics Computer Vision and Material Sciences. Cambridge University Press.
 
129
 
130
 
131
Shi, J. and Tomasi, C. 1994. Good features to track. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 593--600.
 
132
SIFTSrc. SIFT Source Code. http://www.cs.ucla.edu/~vedaldi/code/siftpp/assets/siftpp/versions/.
 
133
Song, K. Y., Kittler, J., and Petrou, M. 1996. Defect detection in random color textures. Israel Verj. Cap. J. 14, 9, 667--683.
 
134
 
135
Stenger, B., Ramesh, V., Paragios, N., Coetzee, F., and Buhmann, J. 2001. Topology free hidden markov models: Application to background modeling. In IEEE International Conference on Computer Vision (ICCV). 294--301.
 
136
 
137
Streit, R. L. and Luginbuhl, T. E. 1994. Maximum likelihood method for probabilistic multi-hypothesis tracking. In Proceedings of the International Society for Optical Engineering (SPIE.) vol. 2235. 394--405.
 
138
 
139
Tanizaki, H. 1987. Non-gaussian state-space modeling of nonstationary time series. J. Amer. Statist. Assoc. 82, 1032--1063.
 
140
 
141
 
142
 
143
 
144
 
145
 
146
Toyama, K., J. Krumm, B. B., and Meyers, B. 1999. Wallflower: Principles and practices of background maintenance. In IEEE International Conference on Computer Vision (ICCV). 255--261.
 
147
Vapnik, V. 1998. Statistical Learning Theory. John Wiley NY.
 
148
Vaswani, N., RoyChowdhury, A., and Chellappa, R. 2003. Activity recognition using the dynamics of the configuration ofinteracting objects. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 633--640.
 
149
 
150
Vidal, R. and Ma, Y. 2004. A unified algebraic approach to 2-d and 3-d motion segmentation. In European Conference on Computer Vision (ECCV). 1--15.
 
151
 
152
Viterbi, A. J. 1967. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inform. Theory 13, 260--269.
 
153
Wang, J. and Adelson, E. 1994. Representing moving images with layers. IEEE Image Process. 3, 5, 625--638.
 
154
 
155
 
156
Xu, N. and Ahuja, N. 2002. Object contour tracking using graph cuts based active contours. In IEEE International Conference on Image Processing (ICIP). 277--280.
 
157
 
158
Yilmaz, A., Shafique, K., and Shah, M. 2003. Target tracking in airborne forward looking imagery. J. Image Vision Comput. 21, 7, 623--635.
 
159
 
160
 
161
 
162

CITED BY  15


REVIEW

"Sebastien Lefevre : Reviewer"

Object tracking is one of the major steps toward understanding video content. Indeed, its goal is to give object positions in the successive frames of a video sequence. This spatio-temporal information can then be used to analyze the actions or be  more...

Collaborative Colleagues:
Alper Yilmaz: colleagues
Omar Javed: colleagues
Mubarak Shah: colleagues