|
ABSTRACT
As the consequence of semantic gap, visual similarity does not guarantee semantic similarity, which in general is conflicting with the inherent assumption of many generative-based image annotation methods. While discriminative learning approach had often been used to classify images into different semantic classes, its efficiency is often impaired by the problems of multi-labeling and large scale concept space typically encountered in practical image annotation tasks. In this paper, we explore solutions to the problems of large scale concept space learning and mismatch between semantic and visual space. To tackle the first problem, we explore the use of higher level semantic space with lower dimension by clustering correlated keywords into topics in the local neighborhood. The topics are used as lexis for assigning multiple labels for unlabeled images. To tackle the problem of semantic gap, we aim to reduce the bias between visual and semantic spaces by finding optimal margins in both spaces. In particular, we propose an iterative solution by alternately maximizing the sum of the margins to reduce the gap between visual similarity and semantic similarity. The experimental results on the ECCV2002 benchmark show that our method outperforms the state-of-the-art generative-based annotation method MBRM and discriminative-based ASVM-MIL by 9% and 11% in terms of F1 measure respectively.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A.Yavlinsky, E.Schofield, and S.Ruger. Annotation using global features and robust nonparametric density estimation. CIVR, 2005.
|
| |
2
|
K. Barnard and D. Forsyth. Learning the semantics of words and pictures. ICCV, pages 408--415, 2001.
|
| |
3
|
|
| |
4
|
|
| |
5
|
S. Deerwester, S. Dumais, G. Furnas, L. T. K., and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, pages 391--407, 1990.
|
| |
6
|
|
 |
7
|
|
| |
8
|
S. Feng, R. Manmatha, and V. Lavrenko. Multiple bernoulli relevance models for image and video annotation. CVPR, pages 1002--1009, 2004.
|
| |
9
|
|
 |
10
|
Yuli Gao , Jianping Fan , Xiangyang Xue , Ramesh Jain, Automatic image annotation by incorporating feature hierarchy and boosting to scale up SVM classifiers, Proceedings of the 14th annual ACM international conference on Multimedia, October 23-27, 2006, Santa Barbara, CA, USA
[doi> 10.1145/1180639.1180840]
|
 |
11
|
|
| |
12
|
|
 |
13
|
|
 |
14
|
|
 |
15
|
|
| |
16
|
|
| |
17
|
V. Lavrenko, R. Manmatha, and J. Jeon. A model for learning the semantics of pictures. NIPS, pages 553--560, 2004.
|
 |
18
|
Jing Liu , Bin Wang , Mingjing Li , Zhiwei Li , Weiying Ma , Hanqing Lu , Songde Ma, Dual cross-media relevance model for image annotation, Proceedings of the 15th international conference on Multimedia, September 25-29, 2007, Augsburg, Germany
[doi> 10.1145/1291233.1291380]
|
| |
19
|
|
 |
20
|
|
| |
21
|
Y. Mori, H. Takahashi, and R. Oka. Image-to-word transformation based on dividing and vector quantizing images with words. In First International Workshop on Multimedia Intellegent Storage and Retrieval Management, 1999.
|
| |
22
|
Milind Naphade , John R. Smith , Jelena Tesic , Shih-Fu Chang , Winston Hsu , Lyndon Kennedy , Alexander Hauptmann , Jon Curtis, Large-Scale Concept Ontology for Multimedia, IEEE MultiMedia, v.13 n.3, p.86-91, July 2006
[doi> 10.1109/MMUL.2006.63]
|
 |
23
|
Guo-Jun Qi , Xian-Sheng Hua , Yong Rui , Jinhui Tang , Tao Mei , Hong-Jiang Zhang, Correlative multi-label video annotation, Proceedings of the 15th international conference on Multimedia, September 25-29, 2007, Augsburg, Germany
[doi> 10.1145/1291233.1291245]
|
| |
24
|
R.Shi, T. Chua, C. lee, and S. Gao. Bayesian learning of hierarchical multinomial mixture models of concepts for automatic image annotation. CIVR, pages 102--112, 2006.
|
| |
25
|
|
 |
26
|
Bingjun Sun , Prasenjit Mitra , C. Lee Giles , John Yen , Hongyuan Zha, Topic segmentation with shared topic detection and alignment of multiple documents, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 23-27, 2007, Amsterdam, The Netherlands
[doi> 10.1145/1277741.1277778]
|
| |
27
|
|
 |
28
|
|
| |
29
|
|
| |
30
|
L. Xu, J. Neufeld, B. Larson, and D. Schuurmans. Maximum margin clustering. NIPS, 2004.
|
| |
31
|
L. Xu and D. Schuurmans. Unsupervised and semi-supervised multi-class support vector machines. NIPS, 2006.
|
| |
32
|
|
 |
33
|
|
| |
34
|
|
 |
35
|
Xiangdong Zhou , Mei Wang , Qi Zhang , Junqi Zhang , Baile Shi, Automatic image annotation by an iterative approach: incorporating keyword correlations and region matching, Proceedings of the 6th ACM international conference on Image and video retrieval, p.25-32, July 09-11, 2007, Amsterdam, The Netherlands
[doi> 10.1145/1282280.1282284]
|
|