|
ABSTRACT
In today's fast-paced world, while the number of channels of television programming available is increasing rapidly, the time available to watch them remains the same or is decreasing. Users desire the capability to watch the programs time-shifted (on-demand) and/or to watch just the highlights to save time. In this paper we explore how to provide for the latter capability, that is the ability to extract highlights automatically, so that viewing time can be reduced.
We focus on the sport of baseball as our initial target—it is a very popular sport, the whole game is quite long, and the exciting portions are few. We focus on detecting highlights using audio-track features alone without relying on expensive-to-compute video-track features. We use a combination of generic sports features and baseball-specific features to obtain our results, but believe that may other sports offer the same opportunity and that the techniques presented here will apply to those sports. We present details on relative performance of various learning algorithms, and a probabilistic framework for combining multiple sources of information. We present results comparing output of our algorithms against human-selected highlights for a diverse collection of baseball games with very encouraging results.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
WebTV, http:/www.webtv.eom.
|
| |
2
|
TVReplay, http://www.tvreplay.eorn
|
| |
3
|
Tivo Inc., http://www.tivo.com
|
 |
4
|
Francis C. Li , Anoop Gupta , Elizabeth Sanocki , Li-wei He , Yong Rui, Browsing digital video, Proceedings of the SIGCHI conference on Human factors in computing systems, p.169-176, April 01-06, 2000, The Hague, The Netherlands
[doi> 10.1145/332040.332425]
|
| |
5
|
Zhang, H., et al. Automatic Parsing of News Video. in IEEE Conference on Multimedia Computing and Systems. 1994.
|
| |
6
|
MPEG-7: Context and Objectives (V.7). in ISO/IEC JTCI/SC29/WGI I N2207, MPEG98. March 1998.
|
| |
7
|
ATVEF, http://www.atvef.eom
|
| |
8
|
|
| |
9
|
|
| |
10
|
McGee, T. and N. Dimitrova. Parsing TV Program Structures for Identification and Removal of Non-Story Segments. in SPIE Conf. on Storage and Retrieval for Image and Video Databases. 1999. San Jo80, CA.
|
| |
11
|
|
| |
12
|
Arons, B. Pitch-based Emphasis Detection for Segmenting Speech Recordings. in International Conference on Spoken Language Processing. 1994.
|
 |
13
|
Liwei He , Elizabeth Sanocki , Anoop Gupta , Jonathan Grudin, Auto-summarization of audio-video presentations, Proceedings of the seventh ACM international conference on Multimedia (Part 1), p.489-498, October 30-November 05, 1999, Orlando, Florida, United States
[doi> 10.1145/319463.319691]
|
| |
14
|
|
 |
15
|
Savitha Srinivasan , Dragutin Petkovic , Dulce Ponceleon, Towards robust features for classifying audio in the CueVideo system, Proceedings of the seventh ACM international conference on Multimedia (Part 1), p.393-400, October 30-November 05, 1999, Orlando, Florida, United States
[doi> 10.1145/319463.319658]
|
 |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
Rui, Y. and P. Anandan. Segmenting Visual Action Units Based on Spatial-Temporal Motion Patterns. in IEEE Conf on Computer Vision and Pattern Recgonition. 2000.
|
| |
20
|
|
| |
21
|
Dellaert, F., T. Polzin, and A. Walbel. Recognizing Emotion in Speech. in IEEE ICASSP. 1995.
|
| |
22
|
Droppo, J. and A. Acero. Maximum A Posterior Pitch Tracking. in IEEE ICASSP. 1999.
|
| |
23
|
Huang, L.-S. and C.-H. Yang. A Novel Approach to Robust Speech Endpoint Detection in Car Environments (submitted). in IEEE ICASSP. 2000.
|
| |
24
|
|
| |
25
|
|
| |
26
|
Burges, C., A Tutorial on Support Vector Machines for Pattern Recognition, U. Fayyad, Editor. 1999, Klnwer Academic Publishers: Boston.
|
| |
27
|
Platt, J.C., Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods, in Advances in Large Margin Classifiers, P.B. Alexander J. Smola, Bernhard Seholkopf and Dale Schuurmans, Editor. 1999.
|
| |
28
|
|
CITED BY 51
|
|
|
|
|
Nevenka Dimitrova , Hong-Jiang Zhang , Behzad Shahraray , Ibrahim Sezan , Thomas Huang , Avideh Zakhor, Applications of Video-Content Analysis and Retrieval, IEEE MultiMedia, v.9 n.3, p.42-55, July 2002
|
|
|
Nevenka Dimitrova , Radu Jasinschi , Lalitha Agnihotri , John Zimmerman , Thomas McGee , Dongge Li, Personalizing video recorders using multimedia processing and integration, Proceedings of the ninth ACM international conference on Multimedia, September 30-October 05, 2001, Ottawa, Canada
|
|
|
|
|
|
Min Xu , Ling-Yu Duan , Liang-Tien Chia , Chang-sheng Xu, Audio keyword generation for sports video analysis, Proceedings of the 12th annual ACM international conference on Multimedia, October 10-16, 2004, New York, NY, USA
|
|
|
J. Assfalg , M. Bertini , C. Colombo , A. Del Bimbo , W. Nunziati, Automatic interpretation of soccer video for highlights extraction and annotation, Proceedings of the 2003 ACM symposium on Applied computing, March 09-12, 2003, Melbourne, Florida
|
|
|
Kongwah WAN , Xin YAN , Xinguo YU , Changsheng XU, Robust goal-mouth detection for virtual content insertion, Proceedings of the eleventh ACM international conference on Multimedia, November 02-08, 2003, Berkeley, CA, USA
|
|
|
Ling-Yu Duan , Min Xu , Tat-Seng Chua , Qi Tian , Chang-Sheng Xu, A mid-level representation framework for semantic sports video analysis, Proceedings of the eleventh ACM international conference on Multimedia, November 02-08, 2003, Berkeley, CA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Kongwah Wan , Xin Yan , Xinguo Yu , Changsheng Xu, Real-time goal-mouth detection in MPEG soccer video, Proceedings of the eleventh ACM international conference on Multimedia, November 02-08, 2003, Berkeley, CA, USA
|
|
|
Bin Yu , Wei-Ying Ma , Klara Nahrstedt , Hong-Jiang Zhang, Video summarization based on user log enhanced link analysis, Proceedings of the eleventh ACM international conference on Multimedia, November 02-08, 2003, Berkeley, CA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Cees G. M. Snoek , Marcel Worring , Jan C. van Gemert , Jan-Mark Geusebroek , Arnold W. M. Smeulders, The challenge problem for automated detection of 101 semantic concepts in multimedia, Proceedings of the 14th annual ACM international conference on Multimedia, October 23-27, 2006, Santa Barbara, CA, USA
|
|
|
Changsheng Xu , Jinjun Wang , Kongwah Wan , Yiqun Li , Lingyu Duan, Live sports event detection based on broadcast video and web-casting text, Proceedings of the 14th annual ACM international conference on Multimedia, October 23-27, 2006, Santa Barbara, CA, USA
|
|
|
|
|
|
|
|
|
|
|
|
Guangyu Zhu , Changsheng Xu , Yi Zhang , Qingming Huang , Hanqing Lu, Event tactic analysis based on player and ball trajectory in broadcast video, Proceedings of the 2008 international conference on Content-based image and video retrieval, July 07-09, 2008, Niagara Falls, Canada
|
|
|
|
|
|
Uma Srinivasan , Silvia Pfeiffer , Surya Nepal , Michael Lee , Lifang Gu , Stephen Barrass, A Survey of MPEG-1 Audio, Video and Semantic Analysis Techniques, Multimedia Tools and Applications, v.27 n.1, p.105-141, September 2005
|
|
|
|
|
|
|
|
|
|
|
|
Chunxi Liu , Qingming Huang , Shuqiang Jiang , Liyuan Xing , Qixiang Ye , Wen Gao, A framework for flexible summarization of racquet sports video using multiple modalities, Computer Vision and Image Understanding, v.113 n.3, p.415-424, March, 2009
|
|
|
Min Xu , Changsheng Xu , Lingyu Duan , Jesse S. Jin , Suhuai Luo, Audio keywords generation for sports video analysis, ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP), v.4 n.2, p.1-23, May 2008
|
|
|
Guangyu Zhu , Qingming Huang , Changsheng Xu , Yong Rui , Shuqiang Jiang , Wen Gao , Hongxun Yao, Trajectory based event tactics analysis in broadcast sports video, Proceedings of the 15th international conference on Multimedia, September 25-29, 2007, Augsburg, Germany
|
|
|
Yifan Zhang , Xiaoyu Zhang , Changsheng Xu , Hanqing Lu, Personalized retrieval of sports video, Proceedings of the international workshop on Workshop on multimedia information retrieval, September 24-29, 2007, Augsburg, Bavaria, Germany
|
|
|
|
|
|
Surong Wang , Manoranjan Dash , Liang-Tien Chia , Min Xu, Efficient sampling of training set in large and noisy multimedia data, ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP), v.3 n.3, p.14-es, August 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Guangyu Zhu , Changsheng Xu , Qingming Huang , Yong Rui , Shuqiang Jiang , Wen Gao , Hongxun Yao, Event tactic analysis based on broadcast sports video, IEEE Transactions on Multimedia, v.11 n.1, p.49-67, January 2009
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.5
INFORMATION INTERFACES AND PRESENTATION (I.7)
H.5.1
Multimedia Information Systems
Subjects:
Video (e.g., tape, disk, DVI)
Additional Classification:
I.
Computing Methodologies
I.2
ARTIFICIAL INTELLIGENCE
I.2.7
Natural Language Processing
Subjects:
Speech recognition and synthesis
I.4
IMAGE PROCESSING AND COMPUTER VISION
J.
Computer Applications
General Terms:
Algorithms,
Human Factors,
Languages,
Measurement
Keywords:
audio,
baseball,
highlights,
summarization,
television,
video
|