ACM Home Page
Please provide us with feedback. Feedback
Inferring generic activities and events from image content and bags of geo-tags
Full text PdfPdf (1.68 MB)
Source
Conference On Image And Video Retrieval archive
Proceedings of the 2008 international conference on Content-based image and video retrieval table of contents
Niagara Falls, Canada
SESSION: Tagging, training and classification table of contents
Pages 37-46  
Year of Publication: 2008
ISBN:978-1-60558-070-8
Authors
Dhiraj Joshi  Eastman Kodak Company, Rochester, NY, USA
Jiebo Luo  Eastman Kodak Company, Rochester, NY, USA
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
SIGMULTIMEDIA: ACM Special Interest Group on Multimedia
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 20,   Downloads (12 Months): 192,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1386352.1386361
What is a DOI?

ABSTRACT

The use of contextual information in building concept detectors for digital media has caught the attention of the multimedia community in the recent years. Generally speaking, any information extracted from image headers or tags, or from large collections of related images and used at classification time, can be considered as contextual. Such information, being discriminative in its own right, when combined with pure content-based detection systems using pixel information, can improve the overall recognition performance significantly. In this paper, we describe a framework for probabilistically modeling geographical information using a Geographical Information Systems (GIS) database for event and activity recognition in general-purpose consumer images, such as those obtained from Flickr. The proposed framework discriminatively models the statistical saliency of geo-tags in describing an activity or event. Our work leverages the inherent patterns of association between events and their geographical venues. We use descriptions of small local neighborhoods to form bags of geo tags as our representation. Statistical coherence is observed in such descriptions across a wide range of event classes and across many different users. In order to test our approach, we identify certain classes of activities and events wherein people commonly participate and take pictures. Images and corresponding metadata, for the identified events and activities, are obtained from Flickr. We employ visual detectors obtained from Columbia University (Columbia 374), which perform pure visual event and activity recognition. In our experiments, we present the performance advantage obtained by combining contextual GPS information with pixel-based detection systems.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
4
 
5
6
7
8
 
9
Hinze A. and Voisard, A. 2003. Location and time-based information delivery in tourism. Advances in Spatial and Temporal Databases, Lecture Notes in Computer Science, 2750 (2003) 489--507.
10
11
12
13
14
 
15
 
16
 
17
Luo, J., Boutell, M., and Brown, C. 2006. Pictures are not taken in a vacuum: An overview of exploiting context for semantic scene content understanding. IEEE Signal Process. Mag. 23(2) (March 2006) 101--114.
18
19
20
 
21
22
 
23
Yanagawa, A., Chang, S.-F., Kennedy, L., and Hsu, W. 2007. Columbia University's Baseline Detectors for 374 LSCOM Semantic Visual Concepts. Columbia University ADVENT Technical Report, 2007.
24