|
ABSTRACT
Large-scale distributed video surveillance systems pose new scalability challenges. Due to the large number of video sources in such systems, the amount of bandwidth required to transmit video streams for monitoring often strains the capability of the network. On the other hand, large-scale surveillance systems often rely on computer vision algorithms to automate surveillance tasks. We observe that these surveillance tasks present an opportunity for trade-off between the accuracy of the tasks and the bit rate of the video being sent. This paper shows that there exists a sweet spot, which we term critical video quality that can be used to reduce video bit rate without significantly affecting the accuracy of the surveillance tasks. We demonstrate this point by running extensive experiments on standard face detection and face tracking algorithms. Our experiments show that face detection works equally well even if the quality of compression is significantly reduced, and face tracking still works even if the frame rate is reduced to 6 frames per second. We further develop a prototype video surveillance system to demonstrate this idea. Our evaluation shows that we can achieve up to 29 times reduction in video bit rate when detecting faces and 16 times reduction when tracking faces. This paper also proposes a formal rate-accuracy optimization framework which can be used to determine appropriate encoding parameters in distributed video surveillance systems that are subjected to either bandwidth constraints or accuracy constraints.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
M. Boyle. The effects of capture conditions on the CAMSHIFT face tracker. Technical Report 2001-691-14, Department of Computer Science, University of Calgary, Alberta, Canada, 2001.
|
| |
2
|
|
| |
3
|
R. Collins, A. Lipton, T. Kanade, H. Fujiyoshi, D. Duggins, Y. Tsin, D. Tolliver, N. Enomoto, and O. Hasegawa. A system for video surveillance and monitoring. Technical Report CMU-RI-TR-00-12, Robotics Institute, Carnegie Mellon University, May 2000.
|
| |
4
|
|
| |
5
|
W. Feng, J. Walpole, W. Feng, and C. Pu. Moving towards massively scalable video-based sensor networks. In Proceedings of the Workshop on New Visions for Large-Scale Networks: Research and Applications, pages 12--14, Washington, DC, USA, March 2001.
|
| |
6
|
O. Javed, Z. Rasheed, O. Alatas, and M. Shah. KNIGHT M: A real-time surveillance system for multiple overlapping and non-overlapping cameras. In Proceedings of the IEEE International Conference on Multimedia and Expo, ICME'03, pages 649--652, Baltimore, Maryland, July 2003.
|
| |
7
|
|
| |
8
|
V. Kettnaker and R. Zabih. Bayesian multi-camera surveillance. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR'99, volume 2, pages 253--259, Fort Collins, Colorado, USA, June 1999.
|
| |
9
|
J. Kim, Y. Wang, and S. Chang. Content-adaptive utility-based video adaptation. In Proceedings of the IEEE International Conference on Multimedia and Expo, ICME'03, volume 3, pages 281--284, Baltimore, Maryland, July 2003.
|
| |
10
|
|
 |
11
|
|
| |
12
|
V. Nair and J. J. Clark. Automated visual surveillance using hidden markov models. In Proceedings of the 15th International Conference on Vision Interface, VI'02, pages 88--92, Calgary, May 2002.
|
| |
13
|
W. Niu, J. Long, D. Han, and Y. Wang. Human activity detection and recognition for video surveillance. In Proceedings of the IEEE International Conference on Multimedia and Expo, ICME'04, volume 1, pages 719--722, Taipei, Taiwan, June 2004.
|
| |
14
|
W. T. Ooi, P. Pletcher, and L. Rowe. I ndiva: A middleware for managing distributed media environment. In Proceedings of the SPIE Conference on Multimedia Computing and Networking, MMCN'04, pages 211--224, Santa Clara, CA, January 2004.
|
| |
15
|
I. Pavlidis, V. Morellas, P. Tsiamyrtzis, and S. Harp. Urban surveillance systems: From the laboratory to the commercial world. IEEE Proceedings, 89(10):1478--1497, October 2001.
|
| |
16
|
R. Rangaswami, Z. Dimitrijevi, K. Kakligian, E. Chang, and Y. Wang. The SfinX video surveillance system. In Proceedings of the IEEE International Conference on Multimedia and Expo, ICME'04, Taipei, Taiwan, June 2004.
|
| |
17
|
|
| |
18
|
R. Schumeyer, E. A. Heredia, and K. E. Barner. Region of interest priority coding for sign language videoconferencing. In Proceedings of the First IEEE Workshop on Multimedia Signal Processing, MMSP'05, pages 531--536, Princeton, NJ, June 1997.
|
| |
19
|
P. Viola and M. Jones. Robust real-time face detection. In Proceedings of the ICCV 2001 Workshop on Statistical and Computation Theories of Vision, ICCV'01, volume 2, page 747, Vancouver, Canada, July 2001.
|
| |
20
|
|
| |
21
|
|
CITED BY 4
|
|
Andreas Girgensohn , Frank Shipman , Anthony Dunnigan , Thea Turner , Lynn Wilcox, Support for effective use of multiple video streams in security, Proceedings of the 4th ACM international workshop on Video surveillance and sensor networks, October 27-27, 2006, Santa Barbara, California, USA
|
|
|
|
|
|
Andreas Girgensohn , Frank Shipman , Thea Turner , Lynn Wilcox, Effects of presenting geographic context on tracking activity between cameras, Proceedings of the SIGCHI conference on Human factors in computing systems, April 28-May 03, 2007, San Jose, California, USA
|
|
|
Andreas Girgensohn , Don Kimber , Jim Vaughan , Tao Yang , Frank Shipman , Thea Turner , Eleanor Rieffel , Lynn Wilcox , Francine Chen , Tony Dunnigan, DOTS: support for effective video surveillance, Proceedings of the 15th international conference on Multimedia, September 25-29, 2007, Augsburg, Germany
|
|