|
ABSTRACT
Existing video search engines have not taken the advantages of video content analysis and semantic understanding. Video search in academia uses semantic annotation to approach content-based indexing. We argue this is a promising direction to enable real content-based video search. However, due to the complexity of both video data and semantic concepts, existing techniques on automatic video annotation are still not able to handle large-scale video set and large-scale concept set, in terms of both annotation accuracy and computation cost. To address this problem, in this paper, we propose a scalable framework for annotation-based video search, as well as a novel approach to enable large-scale semantic concept annotation, that is, online multi-label active learning. This framework is scalable to both the video sample dimension and concept label dimension. Large-scale unlabeled video samples are assumed to arrive consecutively in batches with an initial pre-labeled training set, based on which a preliminary multi-label classifier is built. For each arrived batch, a multi-label active learning engine is applied, which automatically selects and manually annotates a set of unlabeled sample-label pairs. And then an online learner updates the original classifier by taking the newly labeled sample-label pairs into consideration. This process repeats until all data are arrived. During the process, new labels, even without any pre-labeled training samples, can be incorporated into the process anytime. Experiments on TRECVID dataset demonstrate the effectiveness and efficiency of the proposed framework.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
A. Kapoor, K. Grauman, R. Urtasun, and T. Darrel, "Active Learning with Gaussian Processes for Object Recognition," in Proc. of IEEE International Conference on Computer Vision, 2007.
|
 |
3
|
Guo-Jun Qi , Xian-Sheng Hua , Yong Rui , Jinhui Tang , Tao Mei , Hong-Jiang Zhang, Correlative multi-label video annotation, Proceedings of the 15th international conference on Multimedia, September 25-29, 2007, Augsburg, Germany
[doi> 10.1145/1291233.1291245]
|
 |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
 |
8
|
|
| |
9
|
E. Chang, S. Tong, K. Goh, and C. Chang, "Support Vector Machine Concept-Dependent Active Learning for Image Retrieval," IEEE Transactions on Multimedia, 2005.
|
| |
10
|
X. Li, L. Wang, and E. Sung, "Multi-Label SVM Active Learning for Image Classification," in Proc. of IEEE International Conference on Image Processing, 2004.
|
| |
11
|
M. R. Boutell, J. Luo, X. Shen, and C. M. Brown, "Learning Multi-Label Scene Classification," Pattern Recognition, 2004.
|
| |
12
|
K. Brinker, "On active learning in multi-label classification," in Book "From Data and Information Analysis to Knowledge Engineering" of Book Series "Studies in Classification, Data Analysis, and Knowledge Organization", Springer, 2006.
|
| |
13
|
|
| |
14
|
G.-J. Qi, X.--S. Hua, et al., "Two-Dimensional Active Learning for Image Classification," in Proc. of IEEE Conference on Computer Vision and Patter Recognition, 2008.
|
| |
15
|
S. F. Chen and R. Rosenfeld, "A Gaussian Prior for Smoothing Maximum Entropy Models," School of Computer Science, Carnegie Mellon University, Tech. Rep. CMU-CS-99-108, 1999.
|
| |
16
|
N. Syed, H. Liu, and K. Sung, "Incremental Learning with Support Vector Machines," in Workshop on Support Vector Machines, at the IJCAI, 1999.
|
| |
17
|
G. Cauwenberghs and T. Poggio, "Incremental and Decremental Support Vector Machine," in Proc. of NIPS, 2000.
|
 |
18
|
|
| |
19
|
A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum-Likelihood from Incomplete Data via EM Algorithm," Journal of the Royal Statistical Society (Series B), 1977.
|
| |
20
|
W. Jiang, S.-F. Chang, and A. Loui, "Active Concept-Based Concept Fusion with Partial User Labels," in Proc. of IEEE International Conference on Image Processing, 2006.
|
 |
21
|
Jinhui Tang , Yan Song , Xian-Sheng Hua , Tao Mei , Xiuqing Wu, To construct optimal training set for video annotation, Proceedings of the 14th annual ACM international conference on Multimedia, October 23-27, 2006, Santa Barbara, CA, USA
[doi> 10.1145/1180639.1180667]
|
| |
22
|
Z.-J. Zha, X.-S. Hua, et al., "Joint Multi-Label Multi-Instance Learning for Image Classification," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 2008.
|
| |
23
|
X.-S. Hua, T. Mei, W. Lai, M. Wang, J. Tang, G.-J. Qi, L. Li, Z. Gu, "Microsoft Research Asia TRECVID 2006: High-Level Feature Extraction and Rushes Exploitation," In TREC Video Retrieval Evaluation Online Proceeding, 2006.
|
 |
24
|
|
| |
25
|
reCAPTCHA. http://recaptcha.net/.
|
 |
26
|
|
| |
27
|
C. Ngo, Y. Jiang, X. Wei, F. Wang, W. Zhao, H. Tan and X. Wu. Experimenting VIREO-374: Bag-of-Visual-Words and Visual-Based Ontology for Semantic Video Indexing and search. In TREC Video Retrieval Evaluation Online Proceeding, 2007.
|
| |
28
|
S. Chang, W. Jiang, A. Yanagawa, and E. Zavesky. Columbia University TRECVID 2007 High-Level Feature Extraction. In TREC Video Retrieval Evaluation Online Proceeding, 2007.
|
| |
29
|
M. Campbell, et al. IBM Research TRECVID-2007 Video Retrieval System. In TREC Video Retrieval Evaluation Online Proceeding, 2007.
|
| |
30
|
C. G. M. Snoek, et al. The MediaMill TRECVID 2007 Semantic Video Search Engine. In TREC Video Retrieval Evaluation Online Proceeding, 2007.
|
| |
31
|
J. Yuan, et al. THU and ICRC at TRECVID 2007. In TREC Video Retrieval Evaluation Online Proceeding, 2007.
|
| |
32
|
Milind Naphade , John R. Smith , Jelena Tesic , Shih-Fu Chang , Winston Hsu , Lyndon Kennedy , Alexander Hauptmann , Jon Curtis, Large-Scale Concept Ontology for Multimedia, IEEE MultiMedia, v.13 n.3, p.86-91, July 2006
[doi> 10.1109/MMUL.2006.63]
|
| |
33
|
|
| |
34
|
S. Ayache and G. Quénot. TRECVID 2007: Collaborative Annotation using Active Learning. In TREC Video Retrieval Evaluation Online Proceeding, 2007.
|
| |
35
|
Q. Zhang, et al. The COST292 experimental framework for TRECVID 2007. In TREC Video Retrieval Evaluation Online Proceeding, 2007.
|
| |
36
|
G.-J. Qi, X.-S. Hua, Y. Rui and H.-J. Zhang. Two-Dimensional Multi-Label Active Learning with An Efficient Online Adaption Model for Image Classification, Pre-prints of submission of IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
|
| |
37
|
|
| |
38
|
Alexander Sorokin, David Forsyth. Utility data annotation with Amazon Mechanical Turk. First International Workshop on Internet Vision (in conjunction with CVPR), 2008.
|
 |
39
|
Lei Wu , Xian-Sheng Hua , Nenghai Yu , Wei-Ying Ma , Shipeng Li, Flickr distance, Proceeding of the 16th ACM international conference on Multimedia, October 26-31, 2008, Vancouver, British Columbia, Canada
[doi> 10.1145/1459359.1459364]
|
 |
40
|
Yang Yang , Bin B. Zhu , Rui Guo , Linjun Yang , Shipeng Li , Nenghai Yu, A comprehensive human computation framework: with application to image labeling, Proceeding of the 16th ACM international conference on Multimedia, October 26-31, 2008, Vancouver, British Columbia, Canada
[doi> 10.1145/1459359.1459423]
|
CITED BY
|
|
Yang Yang , Bin B. Zhu , Rui Guo , Linjun Yang , Shipeng Li , Nenghai Yu, A comprehensive human computation framework: with application to image labeling, Proceeding of the 16th ACM international conference on Multimedia, October 26-31, 2008, Vancouver, British Columbia, Canada
|
|