|
ABSTRACT
Semantic event recognition based only on vision cues is a challenging problem. This problem is particularly acute when the application domain is unconstrained still images available on the Internet or in personal repositories. In recent years, it has been shown that metadata captured with pictures can provide valuable contextual cues complementary to the image content and can be used to improve classification performance. With the recent geotagging phenomenon, an important piece of metadata available with many geotagged pictures now on the World Wide Web is GPS information. In this study, we obtain satellite images corresponding to picture location data and investigate their novel use to recognize the picture-taking environment, as if through a third eye above the object. Additionally, we combine this inference with classical vision-based event detection methods and study the synergistic fusion of the two approaches. We employ both color- and structure-based visual vocabularies for characterizing ground and satellite images, respectively. Training of satellite image classifiers is done using a multiclass AdaBoost engine while the ground image classifiers are trained using SVMs. Modeling and prediction involve some of the most interesting semantic event-activity classes encountered in consumer pictures, including those that occur in residential areas, commercial areas, beaches, sports venues, and parks. The powerful fusion of the complementary views achieves significant performance improvement over the ground view baseline. With integrated GPS-capable cameras on the horizon, we believe that our line of research can revolutionize event recognition and media annotation in years to come.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Vailaya, A., Figueiredo, M., Jain, A., and Zhang, H.-J. 1999. Content-based hierarchical classification of vacation images. In Proceedings of IEEE Multimedia Systems, 1999.
|
| |
2
|
Li, L.-J. and Fei-Fei, L. 2007. What, where and who? Classifying event by scene and object recognition. In Proceedings of International Conference on Computer Vision, 2007.
|
| |
3
|
Jain, V. and Singhal, A. 2008. Selective hidden random fields: Exploiting domain specific saliency for event classification. In Proceedings of International Conference on Computer Vision and Pattern Recognition, 2008.
|
| |
4
|
|
| |
5
|
|
| |
6
|
Luo, J., Boutell, M., and Brown, C. 2006. Pictures are not taken in a vacuum: An overview of exploiting context for semantic scene content understanding. IEEE Signal Processing Magazine 23(2):101--114, 2006.
|
| |
7
|
Swain, P. H. 1978. Fundamentals of Pattern Recognition in Remote Sensing. McGraw-Hill, 1978.
|
| |
8
|
|
| |
9
|
Zhu, J., Rosset. S., Zou, H., and Hastie, T. Multi-class AdaBoost Technique. Technical Report. Stanford University, 2005.
|
| |
10
|
|
 |
11
|
|
 |
12
|
|
| |
13
|
|
 |
14
|
Shih-Fu Chang , Dan Ellis , Wei Jiang , Keansub Lee , Akira Yanagawa , Alexander C. Loui , Jiebo Luo, Large-scale multimodal semantic concept detection for consumer video, Proceedings of the international workshop on Workshop on multimedia information retrieval, September 24-29, 2007, Augsburg, Bavaria, Germany
[doi> 10.1145/1290082.1290118]
|
| |
15
|
Y. Chen , X. Y. Chen , F. Y. Rao , X. L. Yu , Y. Li , D. Liu, LORE: an infrastructure to support location-aware services, IBM Journal of Research and Development, v.48 n.5/6, p.601-615, September/November 2004
|
 |
16
|
|
 |
17
|
Ritendra Datta , Dhiraj Joshi , Jia Li , James Z. Wang, Image retrieval: Ideas, influences, and trends of the new age, ACM Computing Surveys (CSUR), v.40 n.2, p.1-60, April 2008
[doi> 10.1145/1348246.1348248]
|
 |
18
|
Micah Dubinko , Ravi Kumar , Joseph Magnani , Jasmine Novak , Prabhakar Raghavan , Andrew Tomkins, Visualizing tags over time, Proceedings of the 15th international conference on World Wide Web, May 23-26, 2006, Edinburgh, Scotland
[doi> 10.1145/1135777.1135810]
|
| |
19
|
Hinze A. and Voisard, A. 2003. Location and time-based information delivery in tourism. Advances in Spatial and Temporal Databases. Lecture Notes in Computer Science, 2750 (2003) 489--507.
|
 |
20
|
Alexandar Jaffe , Mor Naaman , Tamir Tassa , Marc Davis, Generating summaries and visualization for large collections of geo-referenced photographs, Proceedings of the 8th ACM international workshop on Multimedia information retrieval, October 26-27, 2006, Santa Barbara, California, USA
[doi> 10.1145/1178677.1178692]
|
 |
21
|
|
 |
22
|
Lyndon Kennedy , Mor Naaman , Shane Ahern , Rahul Nair , Tye Rattenbury, How flickr helps us make sense of the world: context and content in community-contributed media collections, Proceedings of the 15th international conference on Multimedia, September 25-29, 2007, Augsburg, Germany
[doi> 10.1145/1291233.1291384]
|
 |
23
|
|
 |
24
|
Michael S. Lew , Nicu Sebe , Chabane Djeraba , Ramesh Jain, Content-based multimedia information retrieval: State of the art and challenges, ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP), v.2 n.1, p.1-19, February 2006
[doi> 10.1145/1126004.1126005]
|
| |
25
|
|
| |
26
|
|
 |
27
|
|
 |
28
|
Alexander Loui , Jiebo Luo , Shih-Fu Chang , Dan Ellis , Wei Jiang , Lyndon Kennedy , Keansub Lee , Akira Yanagawa, Kodak's consumer video benchmark data set: concept definition and annotation, Proceedings of the international workshop on Workshop on multimedia information retrieval, September 24-29, 2007, Augsburg, Bavaria, Germany
[doi> 10.1145/1290082.1290117]
|
 |
29
|
|
 |
30
|
|
| |
31
|
|
 |
32
|
|
| |
33
|
Yanagawa, A., Chang, S.-F., Kennedy, L., and Hsu, W. 2007. Columbia University's Baseline Detectors for 374 LSCOM Semantic Visual Concepts. Columbia University ADVENT Technical Report, 2007.
|
 |
34
|
|
 |
35
|
|
| |
36
|
|
 |
37
|
Guo-Jun Qi , Xian-Sheng Hua , Yong Rui , Jinhui Tang , Tao Mei , Hong-Jiang Zhang, Correlative multi-label video annotation, Proceedings of the 15th international conference on Multimedia, September 25-29, 2007, Augsburg, Germany
[doi> 10.1145/1291233.1291245]
|
|