|
ABSTRACT
Photo community sites such as Flickr and Picasa Web Album host a massive amount of personal photos with millions of new photos uploaded every month. These photos constitute an overwhelming source of images that require effective management. There is an increasingly imperative need for semantic annotation of these web images. This paper addresses the problem by considering two kinds of annotation: semantic annotation and geographic annotation. Both are useful for image search and retrieval and for facilitating communities and social networks. This paper proposes a novel method of Logistic Canonical Correlation Regression (LCCR) for the annotation task. This model exploits the canonical correlation between heterogeneous features and an annotation lexicon of interest, and builds a generalized annotation engine based on canonical correlations in order to produce enhanced annotation for web images. We validate the effectiveness of our algorithm using a dataset of over 380,000 images tagged with GPS coordinates.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Flickr APIs. http://www.flickr.com/services/api/.
|
| |
2
|
L. Cao, J. Luo, and T. Huang. Annotating photo collections by label propagation according to multiple similarity cues. In ACM Conference on Multimedia, 2008.
|
| |
3
|
Y. Cheng. Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Analysis and Machine Intelligence, 17(8):790--799, 1995.
|
| |
4
|
D. Comaniciu and P. Meer. Mean shift analysis and applications. IEEE International Conference on Computer Vision, pages 1197--1203, 1999.
|
| |
5
|
D. Crandall, L. Backstrom, D. Huttenlocher, and J. Kleinberg. Mapping the world's photos. International conference on World Wide Web, pages 761--770, 2009.
|
| |
6
|
S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American society for information science, 41(6):391--407, 1990.
|
| |
7
|
Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. ICML, pages 148--156, 1996.
|
| |
8
|
Y. Fu, L. Cao, G. Guo, and T. Huang. Multiple feature fusion by subspace learning. In ACM Conference on Content-based Image and Video Retrieval, pages 127--134, 2008.
|
| |
9
|
D. Hardoon, S. Szedmak, and J. Shawe-Taylor. Canonical correlation analysis: an overview with application to learning methods. Neural Computation, 16(12):2639--2664, 2004.
|
| |
10
|
J. Hays and A. A. Efros. Im2gps: estimating geographic information from a single image. In IEEE Conference on Computer Vision and Pattern Recognition, 2008.
|
| |
11
|
G. Holmes, A. Donkin, and I. Witten. Weka: A machine learning workbench. Intelligent Information Systems, pages 357--361, 1994.
|
| |
12
|
H. Hotelling. Relations between two sets of variates. Biometrika, 28(3-4):321--377, 1936.
|
| |
13
|
A. Jaffe, M. Naaman, T. Tassa, and M. Davis. Generating summaries and visualization for large collections of geo-referenced photographs. In ACM international workshop on Multimedia Information Retrieval, pages 89--98, 2006.
|
| |
14
|
J. Jia, N. Yu, and X.-S. Hua. Annotating personal albums via web mining. In ACM International Conference on Multimedia, pages 459--468, 2008.
|
| |
15
|
Y. Jing and S. Baluja. Apply pagerank to google product image search. International World Wide Web Conference, 2008.
|
| |
16
|
L. Kennedy, M. Naaman, S. Ahern, R. Nair, and T. Rattenbury. How flickr helps us make sense of the world: Context and content in community-contributed media collections. In ACM Conference on Multimedia, 2007.
|
| |
17
|
T. Kim, J. Kittler, and R. Cipolla. Discriminative learning and recognition of image set classes using canonical correlations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6):1005, 2007.
|
| |
18
|
P. Lai and C. Fyfe. Kernel and nonlinear canonical correlation analysis. International Journal of Neural Systems, 16(12):2639--2664, 2004.
|
| |
19
|
J. Luo, J. Yu, D. Joshi, and W. Hao. Event recognition: viewing the world with a third eye. In ACM International Conference on Multimedia, pages 1071--1080, 2008.
|
| |
20
|
A. Oliva and A. Torralba. Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3):145--175, 2001.
|
| |
21
|
T. Quack, B. Leibe, and L. Van Gool. World-scale mining of objects and events from community photo collections. ACM Conference on Image and Video Retrieval, pages 47--56, 2008.
|
| |
22
|
G. Schindler, P. Krishnamurthy, R. Lublinerman, Y. Liu, and F. Dellaert. Detecting and matching repeated patterns for automatic geo-tagging in urban environments. In IEEE Conference on Computer Vision and Pattern Recognition, 2008.
|
| |
23
|
J. Smith and S. Chang. Visually searching the web for content. IEEE Multimedia Magazine, 4(3):12--20, 1997.
|
| |
24
|
A. Sorokin and D. Forsyth. Utility data annotation with amazon mechanical turk. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 1--8, 2008.
|
| |
25
|
A. Torralba, R. Fergus, and W. Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11):1958--1970, 2008.
|
| |
26
|
A. Vinokourov, J. Shawe-Taylor, and N. Cristianini. Inferring a semantic representation of text via cross-language correlation analysis. Advances in Neural Information Processing Systems, pages 1497--1504, 2003.
|
| |
27
|
C. Wang, L. Zhang, and H.-J. Zhang. Learning to reduce the semantic gap in web image retrieval and annotation. ACM SIGIR, 2008.
|
| |
28
|
X.-J. Wang, L. Zhang, X. Li, and W.-Y. Ma. Annotating images by mining image search results. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11), 2008.
|
| |
29
|
K. Q. Weinberger, M. Slaney, and R. Van Zwol. Resolving tag ambiguity. In ACM International Conference on Multimedia, pages 111--120, 2008.
|
| |
30
|
L. Wu, X.-S. Hua, N. Yu, W.-Y. Ma, and S. Li. Flickr distance. In ACM International Conference on Multimedia, pages 31--40, 2008.
|
| |
31
|
J. Yu and J. Luo. Leveraging probabilistic season and location context models for scene understanding. In International conference on Content-based image and video retrieval, pages 169--178, 2008.
|
| |
32
|
W. Zheng, X. Zhou, C. Zou, and L. Zhao. Facial expression recognition using kernel canonical correlation analysis (KCCA). IEEE Transactions on Neural Networks, 17(1):233--238, 2006.
|
|