|
ABSTRACT
A necessary capability for content-based retrieval is to support the paradigm of query by example. Most systems for video retrieval support queries using image sequences only. We present an algorithm for matching multimodal (audio-visual) patterns for the purpose of content-based video retrieval. The novel ability of our approach to use the information content in multiple media coupled with a strong emphasis on temporal similarity differentiates it from the state-of-the-art in content-based retrieval. At the core of the pattern matching scheme is a dynamic programming algorithm, which leads to a significant improvement in performance. Coupling the use of audio with video this algorithm can be applied to grouping of shots based on audio-visual similarity. We also support relevance feedback. The user can provide feedback to the system, by choosing clips, which are closer to the user's desired target. The system then automatically adjusts the relative weights or relevance of the media and fetches different sets of target clips accordingly. It is our observation that a few iterations of such feedback are generally sufficient, for retrieving the desired video clips.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. Bach, C. Fuller, A. Gupta, A. Hampapur, B. Horowitz, R. Humphrey, R. Jain, and C. Shu. The virage image search engine: An open framework for image management. In Pnxeedings of SPIE Storage and Retrieval for Image and Video Databases, Feb. 1996.
|
| |
2
|
Ft. E. Bellman. Dynamic Progmmming. Princeton University Press, Princeton, NJ, 1957.
|
| |
3
|
S. F. Chang, W. Chen, and H. Sundaam. Semantic visual templates linking features to semantics. In Proceedings of IEEE International Confemce on Image Processing, volume 3, pages 531-535, Chicago, IL, Oct. 1998.
|
| |
4
|
Myron Flickner , Harpreet Sawhney , Wayne Niblack , Jonathan Ashley , Qian Huang , Byron Dom , Monika Gorkani , Jim Hafner , Denis Lee , Dragutin Petkovic , David Steele , Peter Yanker, Query by Image and Video Content: The QBIC System, Computer, v.28 n.9, p.23-32, September 1995
[doi> 10.1109/2.410146]
|
| |
5
|
A. K. Jain and A. Vailaya. Shape-based retrieval: A case study with trademark image databases. Pattern Recognition, 31(9):1369-1390, 1998.
|
| |
6
|
|
| |
7
|
V. Kobla, D. DeMenthon, and D. Doermann. Identifying sports video using replay, text and camera motion features. In Proceedings of SPIE Storage and Retrieval for Media Databases, volume 3972, pages 332-343, Jan. 2000.
|
| |
8
|
|
| |
9
|
R.. Mohan. Video sequence matching. In Proceedings of Intonational Conference on Speech, Accowtics and Sigd Processing, volume 6, pages 3697-3700, 1998.
|
| |
10
|
|
| |
11
|
M. Naphade, T. Kristjansson, B. Frey, and T. S. Huang. Probabilistic multimedia objects (multijects): A novel approach to indexing and retrieval in multimedia systems. In Pmceedings of IEEE Intemtiond Conference on Image Pnxessing, volume 3, pages 53G-540, Chicago, IL, Oct. 1998.
|
| |
12
|
M. Naphade, R.. Mehrotra, A. M. Ferman, J. Warnick, T. S. Hung, and A. M. Tekalp. A hiih performance shot boundary detection algorithm using multiple cues. In Proceedings of IEEE Intmatimd Confemm on Image Processing, volume 2, pages 884-887, Chicago, IL, Oct. 1998.
|
| |
13
|
|
| |
14
|
M. R. Naphade and T. S. Huang. A probabilistic framework for semantic video indexing, filtering and retrieval. IEEE Transactions on Multimedia, special issue on Multimedia over IP, 3(1):141-151, Mar. 2001.
|
| |
15
|
M. R. Naphade, I. Kaintsev, and T. S. Hung. On probabilistic semantic video indexing. In Proceedings of Neural Information Processing Systems, Nov. 2000.
|
| |
16
|
M. R. Naphade, M. M. Yang, and B. L. Yea. A navel scheme for fast and efficient video sequence matching using compact signatures. In Pmceedings of SPIE Storage and Retrieval for Mzdtimedia Databases, volume 3972, pages 564-572, Jan. 2000.
|
| |
17
|
|
| |
18
|
Y. R.ui, T. S. Huang, M. Ortega, and S. Mehrotra. Relevance feedback: A power tool in interactive content-based image retrieval. IEEE %nsactions on Cimits and Systems for Video Technology, Special issue on Segmentation, Description, and Retrieval of Video Content, 8(5):644-655, Sep. 1998.
|
| |
19
|
H. Sakoe and S. Chiba. Dynamic programming optimization for spoken word recognition. IEEE Zhmsactimw on Accoustics, Speech, Signal Processing ASSP, 26(1):43-49, Feb. 1978.
|
| |
20
|
D. D. Saw, Y. P. Tan, S. R. Kulkami, and P. J. Ramadge. Automated analysis and annotation of basketball video. In Proceedings of SPIE Symposium, volume 3022, pages 176187, 1997.
|
 |
21
|
|
| |
22
|
S. Srinivasan, D. Ponceleon, A. Amir, and D. Petkovic. What is that video anyway? In search of better browsing. In Proceedings of IEEE Intenzational Conference on Multimedia and Ezpo, pages 388-392, July 2000.
|
| |
23
|
N. Vaswncelos and A. Lippman. Baysian modeling of video editing and structure: Semantic features for video summarization and browsing. In P-dings of IEEE International Confemce on Image Processing, volume 2, pages 550-555, Chicago, IL, Oct. 1998.
|
| |
24
|
|
| |
25
|
|
| |
26
|
T. Zhang and C. Kuo. An integrated approach to multimodal media content analysis. In Proceedings of SPIE, ISIT Storage and Retrieval for Media Databases, volume 3972, pages 506-517, Jan. 2000.
|
| |
27
|
|
|