|
ABSTRACT
The use of contextual information in building concept detectors for digital media has caught the attention of the multimedia community in the recent years. Generally speaking, any information extracted from image headers or tags, or from large collections of related images and used at classification time, can be considered as contextual. Such information, being discriminative in its own right, when combined with pure content-based detection systems using pixel information, can improve the overall recognition performance significantly. In this paper, we describe a framework for probabilistically modeling geographical information using a Geographical Information Systems (GIS) database for event and activity recognition in general-purpose consumer images, such as those obtained from Flickr. The proposed framework discriminatively models the statistical saliency of geo-tags in describing an activity or event. Our work leverages the inherent patterns of association between events and their geographical venues. We use descriptions of small local neighborhoods to form bags of geo tags as our representation. Statistical coherence is observed in such descriptions across a wide range of event classes and across many different users. In order to test our approach, we identify certain classes of activities and events wherein people commonly participate and take pictures. Images and corresponding metadata, for the identified events and activities, are obtained from Flickr. We employ visual detectors obtained from Columbia University (Columbia 374), which perform pure visual event and activity recognition. In our experiments, we present the performance advantage obtained by combining contextual GPS information with pixel-based detection systems.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
|
 |
4
|
Shih-Fu Chang , Dan Ellis , Wei Jiang , Keansub Lee , Akira Yanagawa , Alexander C. Loui , Jiebo Luo, Large-scale multimodal semantic concept detection for consumer video, Proceedings of the international workshop on Workshop on multimedia information retrieval, September 24-29, 2007, Augsburg, Bavaria, Germany
[doi> 10.1145/1290082.1290118]
|
| |
5
|
Y. Chen , X. Y. Chen , F. Y. Rao , X. L. Yu , Y. Li , D. Liu, LORE: an infrastructure to support location-aware services, IBM Journal of Research and Development, v.48 n.5/6, p.601-615, September/November 2004
|
 |
6
|
|
 |
7
|
Ritendra Datta , Dhiraj Joshi , Jia Li , James Z. Wang, Image retrieval: Ideas, influences, and trends of the new age, ACM Computing Surveys (CSUR), v.40 n.2, p.1-60, April 2008
[doi> 10.1145/1348246.1348248]
|
 |
8
|
Micah Dubinko , Ravi Kumar , Joseph Magnani , Jasmine Novak , Prabhakar Raghavan , Andrew Tomkins, Visualizing tags over time, Proceedings of the 15th international conference on World Wide Web, May 23-26, 2006, Edinburgh, Scotland
[doi> 10.1145/1135777.1135810]
|
| |
9
|
Hinze A. and Voisard, A. 2003. Location and time-based information delivery in tourism. Advances in Spatial and Temporal Databases, Lecture Notes in Computer Science, 2750 (2003) 489--507.
|
 |
10
|
Alexandar Jaffe , Mor Naaman , Tamir Tassa , Marc Davis, Generating summaries and visualization for large collections of geo-referenced photographs, Proceedings of the 8th ACM international workshop on Multimedia information retrieval, October 26-27, 2006, Santa Barbara, California, USA
[doi> 10.1145/1178677.1178692]
|
 |
11
|
|
 |
12
|
Lyndon Kennedy , Mor Naaman , Shane Ahern , Rahul Nair , Tye Rattenbury, How flickr helps us make sense of the world: context and content in community-contributed media collections, Proceedings of the 15th international conference on Multimedia, September 25-29, 2007, Augsburg, Germany
[doi> 10.1145/1291233.1291384]
|
 |
13
|
|
 |
14
|
Michael S. Lew , Nicu Sebe , Chabane Djeraba , Ramesh Jain, Content-based multimedia information retrieval: State of the art and challenges, ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP), v.2 n.1, p.1-19, February 2006
[doi> 10.1145/1126004.1126005]
|
| |
15
|
|
| |
16
|
|
| |
17
|
Luo, J., Boutell, M., and Brown, C. 2006. Pictures are not taken in a vacuum: An overview of exploiting context for semantic scene content understanding. IEEE Signal Process. Mag. 23(2) (March 2006) 101--114.
|
 |
18
|
Alexander Loui , Jiebo Luo , Shih-Fu Chang , Dan Ellis , Wei Jiang , Lyndon Kennedy , Keansub Lee , Akira Yanagawa, Kodak's consumer video benchmark data set: concept definition and annotation, Proceedings of the international workshop on Workshop on multimedia information retrieval, September 24-29, 2007, Augsburg, Bavaria, Germany
[doi> 10.1145/1290082.1290117]
|
 |
19
|
|
 |
20
|
|
| |
21
|
|
 |
22
|
|
| |
23
|
Yanagawa, A., Chang, S.-F., Kennedy, L., and Hsu, W. 2007. Columbia University's Baseline Detectors for 374 LSCOM Semantic Visual Concepts. Columbia University ADVENT Technical Report, 2007.
|
 |
24
|
|
CITED BY 2
|
|
Jiebo Luo , Jie Yu , Dhiraj Joshi , Wei Hao, Event recognition: viewing the world with a third eye, Proceeding of the 16th ACM international conference on Multimedia, October 26-31, 2008, Vancouver, British Columbia, Canada
|
|
|
|
|