|
ABSTRACT
Stereo matching is one of the most active research areas in computer vision. While a large number of algorithms for stereo correspondence have been developed, relatively little work has been done on characterizing their performance. In this paper, we present a taxonomy of dense, two-frame stereo methods. Our taxonomy is designed to assess the different components and design decisions made in individual stereo algorithms. Using this taxonomy, we compare existing stereo methods and present experiments evaluating the performance of many different variants. In order to establish a common software platform and a collection of data sets for easy evaluation, we have designed a stand-alone, flexible C++ implementation that enables the evaluation of individual components and that can easily be extended to include new algorithms. We have also produced several new multi-frame stereo data sets with ground truth and are making both the code and data sets available on the Web. Finally, we include a comparative evaluation of a large set of today's best-performing stereo algorithms.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Anandan, P. 1989. A computational framework and an algorithm for the measurement of visual motion. IJCV, 2(3):283-310.
|
| |
2
|
Arnold, R.D. 1983. Automated stereo perception. Technical Report AIM-351, Artificial Intelligence Laboratory, Stanford University.
|
| |
3
|
Baker, H.H. 1980. Edge based stereo correlation, in Image Understanding Workshop, L.S. Baumann (Ed.). Science Applications International Corporation, pp. 168-175.
|
| |
4
|
Baker, H. and Binford, T. 1981. Depth from edge and intensity based stereo. In IJCAI, pp. 631-636.
|
| |
5
|
|
| |
6
|
Barnard, S.T. 1989. Stochastic stereo matching over scale. IJCV, 3(1):17-32.
|
 |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
Belhumeur, P.N. and Mumford, D. 1992. A Bayesian treatment of the stereo correspondence problem using half-occuluded regions. In CVPR, pp. 506-512.
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
Birchfield, S. and Tomasi, C. 1999. Multiway cut for stereo and motion with slanted surfaces. In ICCV, pp. 489-495.
|
| |
15
|
Black, M.J. and Anandan, p. 1993. A framework for the robust estimation of optical flow. In ICCV, pp. 231-236.
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
Bolles, R.C., Baker, H.H., and Hannah, M.J. 1993. The JISCT stereo evaluation. In DARPA Image Understanding Workshop, pp.263- 274.
|
| |
20
|
Bolles, R.C., Baker, H.H., and Marimont, D.H. 1987. Epipolar-plane image analysis: An approach to determining structure from motion. IJCV, 1:7-55.
|
| |
21
|
|
| |
22
|
|
| |
23
|
|
| |
24
|
Broadhurst, A., Drummond, T., and Cipolla, R. 2001. A probabilistic framework for space carving. In ICCV, Vol. I, pp. 388-393.
|
 |
25
|
|
| |
26
|
Burt, P.J. and Adelson, E.H. 1983. The Laplacian pyramid as a compact image code. IEEE Transactions on Communications, COM- 31(4):532-540.
|
| |
27
|
|
| |
28
|
|
| |
29
|
|
| |
30
|
|
| |
31
|
|
| |
32
|
|
| |
33
|
|
| |
34
|
De Bonet, J.S. and Viola, P. 1999. Poxels: Probabilistic voxelized volume reconstruction. In ICCV, pp. 418-425.
|
| |
35
|
|
| |
36
|
Dev, P. 1974. Segmentation processes in visual perception: A co-operative neural model. University of Massachusetts at Amherst, COINS Technical Report 74C-5.
|
| |
37
|
Dhond, U.R. and Aggarwal, J.K. 1989. Structure from stereo--a review. IEEE Trans. on Systems, Man, and Cybern., 19(6):1489- 1510.
|
| |
38
|
Faugeras, O. and Keriven, R. 1998. Variational principles, surface evolution, PDE's, level set methods, and the stereo problem. IEEE Trans. Image Proc., 7(3):336-344.
|
| |
39
|
|
| |
40
|
|
| |
41
|
|
| |
42
|
Fua, P. 1993. A parallel stereo algorithm that produces dense depth maps and preserves image features. Machine Vision and Applications , 6:35-49.
|
| |
43
|
|
| |
44
|
|
| |
45
|
|
| |
46
|
|
| |
47
|
Geman, S. and Geman, D. 1984. Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images. IEEE TPAMI, 6(6):721-741.
|
| |
48
|
Gennert, M.A. 1988. Brightness-based stereo matching. In ICCV, pp. 139-143.
|
| |
49
|
|
| |
50
|
Grimson, W.E.L. 1985. Computational experiments with a feature based stereo algorithm. IEEE TPAMI, 7(1): 17-34.
|
| |
51
|
|
| |
52
|
|
| |
53
|
|
| |
54
|
|
| |
55
|
|
| |
56
|
|
| |
57
|
|
| |
58
|
Kanade, T. 1994. Development of a video-rate stereo machine. In Image Understanding Workshop, Monterey, CA, 1994. Morgan Kaufmann Publishers: San Mateo, CA, pp. 549-557.
|
| |
59
|
|
| |
60
|
|
| |
61
|
Kang, S.B., Szeliski, R., and Chai, J. 2001. Handling occlusions in dense multi-view stereo. In CVPR, pp. 103-110.
|
| |
62
|
|
| |
63
|
Kass, M. 1988. Linear image features in stereopsis. IJCV, 1 (4):357- 368.
|
| |
64
|
Kimura, R. et al. 1999. A convolver-based real-time stereo machine (SAZAN). In CVPR, Vol. 1, pp. 457-463.
|
| |
65
|
Kolmogorov, V. and Zabih, R. 2001. Computing visual correspondence with occlusions using graph cuts. In ICCV, Vol. II, pp. 508- 515.
|
| |
66
|
|
| |
67
|
|
| |
68
|
|
| |
69
|
Lin, M. and Tomasi, C. Surfaces with occlusions from layered stereo. Technical report, Stanford University. In preparation.
|
| |
70
|
Loop, C. and Zhang, Z. 1999. Computing rectifying homographies for stereo vision. In CVPR, Vol. I, pp. 125-131.
|
| |
71
|
Lucas, B.D. and Kanade, T. 1981. An iterative image registration technique with an application in stereo vision. In IJCAI, pp. 674- 679.
|
| |
72
|
Marr, D. 1982. Vision. Freeman: New York.
|
| |
73
|
Marr, D. and Poggio, T. 1976. Cooperative computation of stereo disparity. Science, 194:283-287.
|
| |
74
|
Marr, D.C. and Poggio, T. 1979. A computational theory of human stereo vision. Proceedings of the Royal Society of London, B 204:301-328.
|
| |
75
|
Marroquin, J.L. 1983. Design of cooperative networks. AI Lab, MIT, Working Paper 253.
|
| |
76
|
Marroquin, J., Mitter, S., and Poggio, T. 1987. Probabilistic solution of ill-posed problems in computational vision. Journal of the American Statistical Association, 82(397):76-89.
|
| |
77
|
Matthies, L., Szeliski, R., and Kanade, T. 1989. Kalman filter-based algorithms for estimating depth from image sequences. IJCV, 3:209-236.
|
| |
78
|
|
| |
79
|
|
| |
80
|
Mulligan, J., Isler, V., and Danulidis, K. 2001. Performance evaluation of stereo for tele-presence. In ICCV, Vol. II, pp. 558-565.
|
| |
81
|
|
| |
82
|
Nishihara, H.K. 1984. Practical real-time imaging stereo matcher. Optical Engineering, 23(5):536-545.
|
| |
83
|
Ohta, Y. aud Kanade, T. 1985. Stereo by intra- and interscanline search using dynamic programming. IEEE TPAMI, 7(2):139- 154.
|
| |
84
|
|
| |
85
|
|
| |
86
|
|
| |
87
|
|
| |
88
|
Pollard, S.B., Mayhew, J.E.W., and Frisby, J.P. 1985. PMF: A stereo correspondence algorithm using a disparity gradient limit. Perception , 14:449-470.
|
| |
89
|
Prazdny, K. 1985. Detection of binocular disparities. Biological Cybernetics . 52(2):93-99.
|
| |
90
|
Quam, L.H. 1984. Hierarchical warp stereo. In Image Understanding Workshop, New Orleans, Louisiana, 1984. Science Applications International Corporation, pp. 149-155.
|
| |
91
|
|
| |
92
|
|
| |
93
|
Ryan, T.W., Gray, R.T., and Hunt, B.R. 1980. Prediction of correlation errors in stereo-pair images. Optical Engineering, 19(3):312- 322.
|
| |
94
|
Saito, H. and Kanade, T. 1999. Shape reconstruction in projective grid space from large number of images. In CVPR, Vol. 2, pp. 49-54.
|
| |
95
|
Scharstein, D. 1994. Matching images by comparing their gradient fields. In ICPR, Vol. 1, pp. 572-575.
|
| |
96
|
|
| |
97
|
|
| |
98
|
Scharstein, D. and Szeliski, R. 2001. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Microsoft Research, Technical Report MSR-TR-2001-81.
|
| |
99
|
|
| |
100
|
Seitz, P. 1989. Using local orientation information as image primitive for robust object recognition. In SPIE Visual Communications and Image Processing IV, Vol. 1199, pp. 1630-1639.
|
| |
101
|
|
 |
102
|
|
| |
103
|
Shah, J. 1993. A nonlinear diffusion model for discontinuous disparity and half-occlusion in stereo. In CVPR, pp. 34-40.
|
| |
104
|
|
| |
105
|
Shimizu, M. and Okutomi, M. 2001. Precise sub-pixel estimation on area-based matching. In ICCV, Vol. I, pp. 90-97.
|
| |
106
|
Shum, H.-Y. and Szeliski, R. 1999. Stereo reconstruction from multiperspective panoramas. In ICCV, pp. 14-21.
|
| |
107
|
Simoncelli, E.P., Adelson, E.H., and Heeger, D.J. 1991. Probability distributions of optic flow. In CVPR, pp. 310-315.
|
| |
108
|
|
| |
109
|
|
| |
110
|
|
| |
111
|
|
| |
112
|
|
| |
113
|
|
| |
114
|
Szeliski, R. and Hinton, G. 1985. Solving random-dot stereograms using the heat equation. In CVPR, pp. 284-288.
|
| |
115
|
|
| |
116
|
|
| |
117
|
Tao, H., Sawhney, H., and Kumar, R. 2001. A global matching framework for stereo computation. In ICCV, Vol. 1, pp. 532-539.
|
| |
118
|
|
| |
119
|
|
| |
120
|
Trezopoulos, D. and Fleischer, K. 1988. Deformable models. The Visual Computer, 4(6):306-331.
|
| |
121
|
|
| |
122
|
|
| |
123
|
|
| |
124
|
Veksler, O. 2001. Stereo matching by compact windows via minimum ratio cycle. In ICCV, Vol. I, pp. 540-547.
|
| |
125
|
Wang, J.Y.A. and Adelson, E.H. 1993. Layered representation for motion analysis. In CVPR, pp. 361-366.
|
| |
126
|
Witkin, A., Terzopoulos, D., and Kass, M. 1987. Signal matching through scale space. IJCV, 1:133-144.
|
| |
127
|
Yang, Y., Yuille, A., and Lu, J. 1993. Local, global, and multilevel stereo matching. In CVPR, pp. 274-279.
|
| |
128
|
Yuille, A.L. and Poggio, T. 1984. A generalized ordering constraint for stereo correspondence. AI Lab, MIT, A.I. Memo 777.
|
| |
129
|
|
| |
130
|
|
| |
131
|
|
| |
132
|
|
CITED BY 171
|
|
|
|
|
Kiran S. Bhat , Christopher D. Twigg , Jessica K. Hodgins , Pradeep K. Khosla , Zoran Popović , Steven M. Seitz, Estimating cloth simulation parameters from video, Proceedings of the 2003 ACM SIGGRAPH/Eurographics symposium on Computer animation, July 26-27, 2003, San Diego, California
|
|
|
|
|
|
|
|
|
David Koller , Michael Turitzin , Marc Levoy , Marco Tarini , Giuseppe Croccia , Paolo Cignoni , Roberto Scopigno, Protected interactive 3D graphics via remote rendering, ACM Transactions on Graphics (TOG), v.23 n.3, August 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Marc Pollefeys , Luc Van Gool , Maarten Vergauwen , Frank Verbiest , Kurt Cornelis , Jan Tops , Reinhard Koch, Visual Modeling with a Hand-Held Camera, International Journal of Computer Vision, v.59 n.3, p.207-232, September-October 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
M. Pollefeys , D. Nistér , J. -M. Frahm , A. Akbarzadeh , P. Mordohai , B. Clipp , C. Engels , D. Gallup , S. -J. Kim , P. Merrell , C. Salmi , S. Sinha , B. Talton , L. Wang , Q. Yang , H. Stewénius , R. Yang , G. Welch , H. Towles, Detailed Real-Time Urban 3D Reconstruction from Video, International Journal of Computer Vision, v.78 n.2-3, p.143-167, July 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Evren İmre , Sebastian Knorr , Burak Özkalaycı , Uğur Topay , A. Aydın Alatan , Thomas Sikora, Towards 3-D scene reconstruction from broadcast video, Image Communication, v.22 n.2, p.108-126, February, 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ramesh Raskar , Kar-Han Tan , Rogerio Feris , Jingyi Yu , Matthew Turk, Non-photorealistic camera: depth edge detection and stylized rendering using multi-flash imaging, ACM SIGGRAPH 2005 Courses, July 31-August 04, 2005, Los Angeles, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Wenbo Zhang , Xinting Gao , Eric Sung , Farook Sattar , Ronda Venkateswarlu, A feature-based matching scheme: MPCD and robust matching strategy, Pattern Recognition Letters, v.28 n.10, p.1222-1231, July, 2007
|
|
|
Neil A. Thacker , Adrian F. Clark , John L. Barron , J. Ross Beveridge , Patrick Courtney , William R. Crum , Visvanathan Ramesh , Christine Clark, Performance characterization in computer vision: A guide to best practices, Computer Vision and Image Understanding, v.109 n.3, p.305-334, March, 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Christos Georgoulas , Leonidas Kotoulas , Georgios Ch. Sirakoulis , Ioannis Andreadis , Antonios Gasteratos, Real-time disparity map computation module, Microprocessors & Microsystems, v.32 n.3, p.159-170, May, 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ta-Te Lin , Yuan-Kai Hsiung , Guo-Long Hong , Hung-Kuo Chang , Fu-Ming Lu, Development of a virtual reality GIS using stereo vision, Computers and Electronics in Agriculture, v.63 n.1, p.38-48, August, 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Zheng Gu , Xianyu Su , Yuankun Liu , Qican Zhang, Local stereo matching with adaptive support-weight, rank transform and disparity calibration, Pattern Recognition Letters, v.29 n.9, p.1230-1235, July, 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ke Colin Zheng , Alex Colburn , Aseem Agarwala , Maneesh Agrawala , David Salesin , Brian Curless , Michael F. Cohen, Parallax photography: creating 3D cinematic effects from stills, Proceedings of Graphics Interface 2009, May 25-27, 2009, Kelowna, British Columbia, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Pasquale Foggia , Jean-Michel Jolion , Alessandro Limongiello , Mario Vento, A new approach for stereo matching in autonomous mobile robot applications, Proceedings of the 20th international joint conference on Artifical intelligence, p.2103-2108, January 06-12, 2007, Hyderabad, India
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ke Zhang , Jiangbo Lu , Gauthier Lafruit , Rudy Lauwereins , Luc Van Gool, Accurate and efficient stereo matching with robust piecewise voting, Proceedings of the 2009 IEEE international conference on Multimedia and Expo, p.93-96, June 28-July 03, 2009, New York, NY, USA
|
|
|
|
|
|
|
|
|
|
|
|
Tristrom Cooke , Robert Whatmough , Nicholas J. Redding , Gary Ewing , Edwin El-Mahassni, On the extraction of 3D models from airborne video sensors for geolocation, Digital Signal Processing, v.19 n.6, p.934-941, December, 2009
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|