|
ABSTRACT
Nowadays personal albums are becoming more and more popular due to the explosive growth of digital image capturing devices. An effective automatic annotation system for personal albums is desired for both efficient browsing and search. Existing research on image annotation evolves through two stages: learning-based methods and web-based methods. Learning-based methods attempt to learn classifiers or joint probabilities between images and concepts, which are difficult to handle large-scale concept sets due to the lack of training data. Web-based methods leverage web image data to learn relevant annotations, which greatly expand the scale of concepts. However, they still suffer two problems: the query image lacks prior knowledge and the annotations are often noisy and incoherent. To address the above issues, we propose a web-based annotation approach to annotate a collection of photos simultaneously, instead of annotating them independently, by leveraging the abundant correlations among the photos. A multi-graph similarity propagation based semi-supervised learning (MGSP-SSL) algorithm is proposed to suppress the noises in the initial annotations from the Web. Experiments on real personal albums show that the proposed approach outperforms existing annotation methods.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
C. Cusano, G. Ciocca, and R. Schettini. Image Annotation Using SVM. Proceedings of Internet Imaging, Vol. SPIE 5304. 2004.
|
| |
4
|
S. L. Feng, R. Manmatha, and V. Lavrenko. Multiple bernoulli relevance models for image and video annotation. In Proc. of CVPR, Washington, DC, June, 2004.
|
 |
5
|
|
 |
6
|
Changhu Wang , Feng Jing , Lei Zhang , Hong-Jiang Zhang, Scalable search-based image annotation of personal images, Proceedings of the 8th ACM international workshop on Multimedia information retrieval, October 26-27, 2006, Santa Barbara, California, USA
[doi> 10.1145/1178677.1178714]
|
| |
7
|
|
 |
8
|
Xiaoguang Rui , Mingjing Li , Zhiwei Li , Wei-Ying Ma , Nenghai Yu, Bipartite graph reinforcement model for web image annotation, Proceedings of the 15th international conference on Multimedia, September 25-29, 2007, Augsburg, Germany
[doi> 10.1145/1291233.1291378]
|
 |
9
|
Jingyu Cui , Fang Wen , Rong Xiao , Yuandong Tian , Xiaoou Tang, EasyAlbum: an interactive photo annotation system based on face clustering and re-ranking, Proceedings of the SIGCHI conference on Human factors in computing systems, April 28-May 03, 2007, San Jose, California, USA
[doi> 10.1145/1240624.1240684]
|
 |
10
|
Lei Zhang , Longbin Chen , Mingjing Li , Hongjiang Zhang, Automated annotation of human faces in family albums, Proceedings of the eleventh ACM international conference on Multimedia, November 02-08, 2003, Berkeley, CA, USA
[doi> 10.1145/957013.957090]
|
| |
11
|
D. Zhou, J. Huang, and B. Schölkopf. Learning with Local and global consistency. 18th Annual Conference on Neural Information Processing System, 2003.
|
 |
12
|
|
| |
13
|
J Philbin, O Chum, M Isard, J Sivic, A Zisserman. Object retrieval with large vocabularies and fast spatial matching. CVPR, 2007.
|
 |
14
|
Meng Wang , Xian-Sheng Hua , Xun Yuan , Yan Song , Li-Rong Dai, Optimizing multi-graph learning: towards a unified video annotation scheme, Proceedings of the 15th international conference on Multimedia, September 25-29, 2007, Augsburg, Germany
[doi> 10.1145/1291233.1291431]
|
| |
15
|
J. Tang, X. Hua, T. Mei, G. Qi, S. Li and X. Wu. Temporally Consistent Gaussian Random Field for Video Semantic Analysis. IEEE International Conference on Image Processing, 2007.
|
| |
16
|
V. Lavrenko, R. Manmatha and J. Jeon. A Model for Learning the Semantics of Pictures. In Proc. NIPS, 2003.
|
 |
17
|
|
| |
18
|
|
 |
19
|
Jing Liu , Mingjing Li , Wei-Ying Ma , Qingshan Liu , Hanqing Lu, An adaptive graph model for automatic image annotation, Proceedings of the 8th ACM international workshop on Multimedia information retrieval, October 26-27, 2006, Santa Barbara, California, USA
[doi> 10.1145/1178677.1178689]
|
| |
20
|
|
| |
21
|
J. Kandola, J. Shawe-Taylor, N. Cristianini. Learning Semantic Similarity. Annual Conference on Neural Information Processing System, 2003.
|
 |
22
|
Xin-Jing Wang , Wei-Ying Ma , Gui-Rong Xue , Xing Li, Multi-model similarity propagation and its application for web image retrieval, Proceedings of the 12th annual ACM international conference on Multimedia, October 10-16, 2004, New York, NY, USA
[doi> 10.1145/1027527.1027746]
|
| |
23
|
D. Lowe. Local feature view clustering for 3D object recognition. In Proc. CVPR, 2001.
|
 |
24
|
Herwig Lejsek , Fridrik H. Ásmundsson , Björn Thór Jónsson , Laurent Amsaleg, Scalability of local image descriptors: a comparative study, Proceedings of the 14th annual ACM international conference on Multimedia, October 23-27, 2006, Santa Barbara, CA, USA
[doi> 10.1145/1180639.1180760]
|
 |
25
|
Susanne Boll , Philipp Sandhaus , Ansgar Scherp , Utz Westermann, Semantics, content, and structure of many for the creation of personal photo albums, Proceedings of the 15th international conference on Multimedia, September 25-29, 2007, Augsburg, Germany
[doi> 10.1145/1291233.1291385]
|
| |
26
|
X. Lian, L. Chen, J. X. Yu, G. Wang, G. Yu. Similarity Match Over High Speed Time-Series Streams. ICDE 2007: 1086--1095
|
| |
27
|
|
 |
28
|
|
 |
29
|
|
 |
30
|
Matthew Cooper , Jonathan Foote , Andreas Girgensohn , Lynn Wilcox, Temporal event clustering for digital photo collections, ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP), v.1 n.3, p.269-288, August 2005
[doi> 10.1145/1083314.1083317]
|
| |
31
|
|
| |
32
|
W. Klas, R. King. Context-Aware Multimedia. Encyclopedia of Multimedia 2006
|
| |
33
|
L. Hardman, J. Ossenbruggen. Creating meaningful multimedia presentations. ISCAS 2006
|
 |
34
|
Jing Liu , Bin Wang , Mingjing Li , Zhiwei Li , Weiying Ma , Hanqing Lu , Songde Ma, Dual cross-media relevance model for image annotation, Proceedings of the 15th international conference on Multimedia, September 25-29, 2007, Augsburg, Germany
[doi> 10.1145/1291233.1291380]
|
| |
35
|
E. Chang, et al. CBSA: Content-Based Soft Annotation for Multimodal Image Retrieval Using Bayes Point Machines. CirSysVideo, 2003. 13(1): p. 26--38.
|
|