ACM Home Page
Please provide us with feedback. Feedback
Event recognition: viewing the world with a third eye
Full text PdfPdf (3.09 MB)
Source
International Multimedia Conference archive
Proceeding of the 16th ACM international conference on Multimedia table of contents
Vancouver, British Columbia, Canada
SESSION: Brave new topics table of contents
Pages 1071-1080  
Year of Publication: 2008
ISBN:978-1-60558-303-7
Authors
Jiebo Luo  Eastman Kodak Company, Rochester, NY, USA
Jie Yu  Eastman Kodak Company, Rochester, NY, USA
Dhiraj Joshi  Eastman Kodak Company, Rochester, NY, USA
Wei Hao  Eastman Kodak Company, Rochester, NY, USA
Sponsors
ACM: Association for Computing Machinery
SIGMULTIMEDIA: ACM Special Interest Group on Multimedia
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 26,   Downloads (12 Months): 254,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1459359.1459574
What is a DOI?

ABSTRACT

Semantic event recognition based only on vision cues is a challenging problem. This problem is particularly acute when the application domain is unconstrained still images available on the Internet or in personal repositories. In recent years, it has been shown that metadata captured with pictures can provide valuable contextual cues complementary to the image content and can be used to improve classification performance. With the recent geotagging phenomenon, an important piece of metadata available with many geotagged pictures now on the World Wide Web is GPS information. In this study, we obtain satellite images corresponding to picture location data and investigate their novel use to recognize the picture-taking environment, as if through a third eye above the object. Additionally, we combine this inference with classical vision-based event detection methods and study the synergistic fusion of the two approaches. We employ both color- and structure-based visual vocabularies for characterizing ground and satellite images, respectively. Training of satellite image classifiers is done using a multiclass AdaBoost engine while the ground image classifiers are trained using SVMs. Modeling and prediction involve some of the most interesting semantic event-activity classes encountered in consumer pictures, including those that occur in residential areas, commercial areas, beaches, sports venues, and parks. The powerful fusion of the complementary views achieves significant performance improvement over the ground view baseline. With integrated GPS-capable cameras on the horizon, we believe that our line of research can revolutionize event recognition and media annotation in years to come.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Vailaya, A., Figueiredo, M., Jain, A., and Zhang, H.-J. 1999. Content-based hierarchical classification of vacation images. In Proceedings of IEEE Multimedia Systems, 1999.
 
2
Li, L.-J. and Fei-Fei, L. 2007. What, where and who? Classifying event by scene and object recognition. In Proceedings of International Conference on Computer Vision, 2007.
 
3
Jain, V. and Singhal, A. 2008. Selective hidden random fields: Exploiting domain specific saliency for event classification. In Proceedings of International Conference on Computer Vision and Pattern Recognition, 2008.
 
4
 
5
 
6
Luo, J., Boutell, M., and Brown, C. 2006. Pictures are not taken in a vacuum: An overview of exploiting context for semantic scene content understanding. IEEE Signal Processing Magazine 23(2):101--114, 2006.
 
7
Swain, P. H. 1978. Fundamentals of Pattern Recognition in Remote Sensing. McGraw-Hill, 1978.
 
8
 
9
Zhu, J., Rosset. S., Zou, H., and Hastie, T. Multi-class AdaBoost Technique. Technical Report. Stanford University, 2005.
 
10
11
12
 
13
14
 
15
16
17
18
 
19
Hinze A. and Voisard, A. 2003. Location and time-based information delivery in tourism. Advances in Spatial and Temporal Databases. Lecture Notes in Computer Science, 2750 (2003) 489--507.
20
21
22
23
24
 
25
 
26
27
28
29
30
 
31
32
 
33
Yanagawa, A., Chang, S.-F., Kennedy, L., and Hsu, W. 2007. Columbia University's Baseline Detectors for 374 LSCOM Semantic Visual Concepts. Columbia University ADVENT Technical Report, 2007.
34
35
 
36
37

Collaborative Colleagues:
Jiebo Luo: colleagues
Jie Yu: colleagues
Dhiraj Joshi: colleagues
Wei Hao: colleagues