|
ABSTRACT
The goal of this article is to review the state-of-the-art tracking methods, classify them into different categories, and identify new trends. Object tracking, in general, is a challenging problem. Difficulties in tracking objects can arise due to abrupt object motion, changing appearance patterns of both the object and the scene, nonrigid object structures, object-to-object and object-to-scene occlusions, and camera motion. Tracking is usually performed in the context of higher-level applications that require the location and/or shape of the object in every frame. Typically, assumptions are made to constrain the tracking problem in the context of a particular application. In this survey, we categorize the tracking methods on the basis of the object and motion representations used, provide detailed descriptions of representative methods in each category, and examine their pros and cons. Moreover, we discuss the important issues related to tracking including the use of appropriate image features, selection of motion models, and detection of objects.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Ali, A. and Aggarwal, J. 2001. Segmentation and recognition of continuous human activity. In IEEE Workshop on Detection and Recognition of Events in Video. 28--35.
|
| |
3
|
Avidan, S. 2001. Support vector tracking. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 184--191.
|
| |
4
|
Baddeley, A. 1992. Errors in binary images and an l version of the haus- dorff metric. Nieuw Archief voor Wiskunde 10, 157--183.
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
Beymer, D. and Konolige, K. 1999. Real-time tracking of multiple people using continuous detection. In IEEE International Conference on Computer Vision (ICCV) Frame-Rate Workshop..
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
 |
15
|
|
| |
16
|
|
 |
17
|
Bernhard E. Boser , Isabelle M. Guyon , Vladimir N. Vapnik, A training algorithm for optimal margin classifiers, Proceedings of the fifth annual workshop on Computational learning theory, p.144-152, July 27-29, 1992, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/130385.130401]
|
| |
18
|
|
| |
19
|
Bregler, C., Hertzmann, A., and Biermann, H. 2000. Recovering nonrigid 3d shape from image streams. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 690--696.
|
| |
20
|
|
| |
21
|
|
| |
22
|
|
| |
23
|
|
| |
24
|
Cham, T. and Rehg, J. M. 1999. A multiple hypothesis approach to figure tracking. In IEEE International Conference on Computer Vision and Pattern Recognition. 239--245.
|
| |
25
|
Chang, Y. L. and Aggarwal, J. K. 1991. 3d structure reconstruction from an ego motion sequence using statistical estimation and detection theory. In Workshop on Visual Motion. 268--273.
|
| |
26
|
Chen, Y., Rui, Y., and Huang, T. 2001. Jpdaf based hmm for real-time contour tracking. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 543--550.
|
| |
27
|
Collins, R., Lipton, A., Fujiyoshi, H., and Kanade, T. 2001. Algorithms for cooperative multisensor surveillance. Proceedings of IEEE 89, 10, 1456--1477.
|
| |
28
|
|
| |
29
|
|
| |
30
|
|
| |
31
|
|
| |
32
|
|
| |
33
|
|
| |
34
|
|
| |
35
|
|
| |
36
|
|
| |
37
|
Cremers, D. and Schnorr, C. 2003. Statistical shape knowledge in variational motion segmentation. I. Srael Nent. Cap. J. 21, 77--86.
|
| |
38
|
Dockstader, S. and Tekalp, A. M. 2001a. Multiple camera tracking of interacting and occluded human motion. Proceedings of the IEEE 89, 1441--1455.
|
| |
39
|
|
| |
40
|
|
| |
41
|
Elgammal, A., Duraiswami, R., Harwood, D., and Davis, L. 2002. Background and foreground modeling using nonparametric kernel density estimation for visual surveillance. Proceedings of IEEE 90, 7, 1151--1163.
|
| |
42
|
|
| |
43
|
|
| |
44
|
|
| |
45
|
Friedman, J., Hastie, T., and Tibshirani, R. 2000. Additive logistic regression: A statistical view of boosting. annals of statistics. Ann. Stat. 38, 2, 337--374.
|
| |
46
|
Gao, X., Boult, T., Coetzee, F., and Ramesh, V. 2000. Error analysis of background adaption. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 503--510.
|
| |
47
|
|
| |
48
|
|
| |
49
|
Greenspan, H., Belongie, S., Goodman, R., Perona, P., Rakshit, S., and Anderson, C. 1994. Overcomplete steerable pyramid filters and rotation invariance. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 222--228.
|
| |
50
|
|
| |
51
|
Haralick, R., Shanmugam, B., and Dinstein, I. 1973. Textural features for image classification. IEEE Trans. Syst. Man Cybern. 33, 3, 610--622.
|
| |
52
|
|
| |
53
|
Harris, C. and Stephens, M. 1988. A combined corner and edge detector. In 4th Alvey Vision Conference. 147--151.
|
| |
54
|
HarrisSrc. Harris Source Code. http://www.cs.uwa.edu.au/~pk/Research/MatlabFns/Spatial/harris.m.
|
| |
55
|
Hausdorff, F. 1962. Set Theory. Chelsea, New York, NY.
|
| |
56
|
Horn, B. and Schunk, B. 1981. Determining optical flow. Artific. Intell. 17, 185--203.
|
| |
57
|
Huang, T. and Russell, S. 1997. Object identification in a bayesian context. In Proceedings of International Joint Conference on Artificial Intelligence. 1276--1283.
|
| |
58
|
Hue, C., Cadre, J. L., and Prez, P. 2002. Sequential monte carlo methods for multiple targettracking and data fusion. IEEE Trans. Sign. Process. 50, 2, 309--325.
|
| |
59
|
Huttenlocher, D., Noh, J., and Rucklidge, W. 1993. Tracking nonrigid objects in complex scenes. In IEEE International Conference on Computer Vision (ICCV). 93--101.
|
| |
60
|
|
| |
61
|
|
| |
62
|
|
| |
63
|
Isard, M. and MacCormick, J. 2001. Bramble: A bayesian multiple-blob tracker. In IEEE International Conference on Computer Vision (ICCV). 34--41.
|
| |
64
|
Jain, R. and Nagel, H. 1979. On the analysis of accumulative difference pictures from image sequences of real world scenes. IEEE Trans. Patt. Analy. Mach. Intell. 1, 2, 206--214.
|
| |
65
|
|
| |
66
|
|
| |
67
|
Jepson, A., Fleet, D., and ElMaraghi, T. 2003. Robust online appearance models for visual tracking. IEEE Trans. Patt. Analy. Mach. Intell. 25, 10, 1296--1311.
|
| |
68
|
|
| |
69
|
KalmanSrc. Kalman Filtering Source Code. http://www.ai.mit.edu/~murphyk/Software/index.html.
|
| |
70
|
Kanade, T., Collins, R., Lipton, A., Burt, P., and Wixson, L. 1998. Advances in cooperative multi-sensor video surveillance. Darpa IU Workshop. 3--24.
|
| |
71
|
Kang, J., Cohen, I., and Medioni, G. 2003. Continuous tracking within and across camera streams. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 267--272.
|
| |
72
|
|
| |
73
|
Kass, M., Witkin, A., and Terzopoulos, D. 1988. Snakes: active contour models. Int. J. Comput. Vision 1, 321--332.
|
| |
74
|
Kettnaker, V. and Zabih, R. 1999. Bayesian multi-camera surveillance. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 117--123.
|
| |
75
|
Khan, S. and Shah, M. 2003. Consistent labeling of tracked objects in multiple cameras with overlapping fields of view. IEEE Trans. Patt. Analy. Mach. Intell. 25, 10, 1355--1360.
|
| |
76
|
KLTSrc. KLT Source Code. http://www.ces.clemson.edu/~stb/klt/.
|
| |
77
|
Kockelkorn, M., Luneburg, A., and Scheffer, T. 2003. Using transduction and multiview learning to answer emails. In European Conference on Principle and Practice of Knowledge Discovery in Databases. 266--277.
|
| |
78
|
Kuhn, H. 1955. The hungarian method for solving the assignment problem. Naval Research Logistics Quart. 2, 83--97.
|
| |
79
|
|
| |
80
|
Laws, K. 1980. Textured image segmentation. PhD thesis, Electrical Engineering, University of Southern California.
|
| |
81
|
|
| |
82
|
LevelSetSrc. Level Set Source Code. http://www.cs.utah.edu/~whitaker/vispack/.
|
| |
83
|
|
| |
84
|
Li, B., Chellappa, R., Zheng, Q., and Der, S. 2001. Model-based temporal object verification using video. IEEE Trans. Image Process. 10, 6, 897--908.
|
| |
85
|
Liyuan, L. and Maylor, L. 2002. Integrating intensity and texture differences for robust change detection. IEEE Trans. Image Process. 11, 2, 105--112.
|
| |
86
|
|
| |
87
|
Lucas, B. D. and Kanade., T. 1981. An iterative image registration technique with an application to stereo vision. In International Joint Conference on Artificial Intelligence.
|
| |
88
|
|
| |
89
|
|
| |
90
|
|
| |
91
|
|
| |
92
|
Matthies, L., Szeliski, R., and Kanade, T. 1989. Kalman filter-based algorithms for estimating depth from image sequences. Int. J. Comput. Vision 3, 3, 209--238.
|
| |
93
|
MeanShiftSegmentSrc. Mean-Shift Segmentation Source Code. http://www.caip.rutgers.edu/riul/research/code.html.
|
| |
94
|
MeanShiftTrackSrc. Mean-Shift Tracking Source Code. http://www.intel.com/technology/computing/opencv/index.htm.
|
| |
95
|
|
| |
96
|
Mikolajczyk, K. and Schmid, C. 2003. A performance evaluation of local descriptors. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1615--1630.
|
| |
97
|
|
| |
98
|
|
| |
99
|
|
| |
100
|
Moravec, H. 1979. Visual mapping by a robot rover. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 598--600.
|
| |
101
|
|
| |
102
|
Mumford, D. and Shah, J. 1989. Optimal approximations by piecewise smooth functions and variational problems. Comm. Pure Appl. Mathemat. 42, 5, 677--685.
|
| |
103
|
Murty, K. 1968. An algorithm for ranking all the assignments in order of increasing cost. Operations Resear. 16, 682--686.
|
| |
104
|
|
| |
105
|
|
| |
106
|
|
| |
107
|
|
| |
108
|
|
| |
109
|
Park, S. and Aggarwal, J. K. 2004. A hierarchical bayesian network for event recognition of human actions and interactions. Multimed. Syst. 10, 2, 164--179.
|
| |
110
|
ParticleFltSrc. Particle Filtering Source Code. http://www-sigproc.eng.cam.ac.uk/smc/software.html.
|
| |
111
|
Paschos, G. 2001. Perceptually uniform color spaces for color texture analysis: an empirical evaluation. IEEE Trans. Image Process. 10, 932--937.
|
| |
112
|
Rabiner, L. R. 1989. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE 77, 2, 257--286.
|
| |
113
|
|
| |
114
|
|
| |
115
|
|
| |
116
|
Reid, D. B. 1979. An algorithm for tracking multiple targets. IEEE Trans. Autom. Control 24, 6, 843--854.
|
| |
117
|
|
| |
118
|
|
| |
119
|
Rosales, R. and Sclaroff, S. 1999. 3d trajectory recovery for tracking multiple objects and trajectory guided recognition of actions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 117--123.
|
| |
120
|
Rowe, S. and Blake, A. 1996. Statistical mosaics for tracking. Israel Verj. Cap. J. 14, 549--564.
|
| |
121
|
|
| |
122
|
|
| |
123
|
|
| |
124
|
|
| |
125
|
|
| |
126
|
|
| |
127
|
|
| |
128
|
Sethian, J. 1999. Level Set Methods: Evolving Interfaces in Geometry, Fluid Mechanics Computer Vision and Material Sciences. Cambridge University Press.
|
| |
129
|
|
| |
130
|
|
| |
131
|
Shi, J. and Tomasi, C. 1994. Good features to track. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 593--600.
|
| |
132
|
SIFTSrc. SIFT Source Code. http://www.cs.ucla.edu/~vedaldi/code/siftpp/assets/siftpp/versions/.
|
| |
133
|
Song, K. Y., Kittler, J., and Petrou, M. 1996. Defect detection in random color textures. Israel Verj. Cap. J. 14, 9, 667--683.
|
| |
134
|
|
| |
135
|
Stenger, B., Ramesh, V., Paragios, N., Coetzee, F., and Buhmann, J. 2001. Topology free hidden markov models: Application to background modeling. In IEEE International Conference on Computer Vision (ICCV). 294--301.
|
| |
136
|
|
| |
137
|
Streit, R. L. and Luginbuhl, T. E. 1994. Maximum likelihood method for probabilistic multi-hypothesis tracking. In Proceedings of the International Society for Optical Engineering (SPIE.) vol. 2235. 394--405.
|
| |
138
|
|
| |
139
|
Tanizaki, H. 1987. Non-gaussian state-space modeling of nonstationary time series. J. Amer. Statist. Assoc. 82, 1032--1063.
|
| |
140
|
|
| |
141
|
|
| |
142
|
|
| |
143
|
|
| |
144
|
|
| |
145
|
|
| |
146
|
Toyama, K., J. Krumm, B. B., and Meyers, B. 1999. Wallflower: Principles and practices of background maintenance. In IEEE International Conference on Computer Vision (ICCV). 255--261.
|
| |
147
|
Vapnik, V. 1998. Statistical Learning Theory. John Wiley NY.
|
| |
148
|
Vaswani, N., RoyChowdhury, A., and Chellappa, R. 2003. Activity recognition using the dynamics of the configuration ofinteracting objects. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 633--640.
|
| |
149
|
|
| |
150
|
Vidal, R. and Ma, Y. 2004. A unified algebraic approach to 2-d and 3-d motion segmentation. In European Conference on Computer Vision (ECCV). 1--15.
|
| |
151
|
|
| |
152
|
Viterbi, A. J. 1967. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inform. Theory 13, 260--269.
|
| |
153
|
Wang, J. and Adelson, E. 1994. Representing moving images with layers. IEEE Image Process. 3, 5, 625--638.
|
| |
154
|
|
| |
155
|
|
| |
156
|
Xu, N. and Ahuja, N. 2002. Object contour tracking using graph cuts based active contours. In IEEE International Conference on Image Processing (ICIP). 277--280.
|
| |
157
|
|
| |
158
|
Yilmaz, A., Shafique, K., and Shah, M. 2003. Target tracking in airborne forward looking imagery. J. Image Vision Comput. 21, 7, 623--635.
|
| |
159
|
|
| |
160
|
|
| |
161
|
|
| |
162
|
|
CITED BY 15
|
|
|
|
|
|
|
|
|
|
|
Dan B. Goldman , Chris Gonterman , Brian Curless , David Salesin , Steven M. Seitz, Video object annotation, navigation, and composition, Proceedings of the 21st annual ACM symposium on User interface software and technology, October 19-22, 2008, Monterey, CA, USA
|
|
|
Pierre Dragicevic , Gonzalo Ramos , Jacobo Bibliowitcz , Derek Nowrouzezahrai , Ravin Balakrishnan , Karan Singh, Video browsing by direct manipulation, Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems, April 05-10, 2008, Florence, Italy
|
|
|
|
|
|
|
|
|
|
|
|
Jia Liu , Xiaofeng Tong , Wenlong Li , Tao Wang , Yimin Zhang , Hongqi Wang, Automatic player detection, labeling and tracking in broadcast soccer video, Pattern Recognition Letters, v.30 n.2, p.103-113, January, 2009
|
|
|
|
|
|
D. I. Kosmopoulos , A. Doulamis , A. Makris , N. Doulamis , S. Chatzis , S. E. Middleton, Vision-based production of personalized video, Image Communication, v.24 n.3, p.158-176, March, 2009
|
|
|
|
|
|
|
|
|
|
|
|
Gerald Bieber , Andre Hoffmeyer , Enrico Gutzeit , Christian Peter , Bodo Urban, Activity monitoring by fusion of optical and mechanical tracking technologies for user behavior analysis, Proceedings of the 2nd International Conference on PErvsive Technologies Related to Assistive Environments, p.1-6, June 09-13, 2009, Corfu, Greece
|
REVIEW
"Sebastien Lefevre : Reviewer"
Object tracking is one of the major steps toward understanding video content. Indeed, its goal is to give object positions in the successive frames of a video sequence. This spatio-temporal information can then be used to analyze the actions or be
more...
|