ACM Home Page
Please provide us with feedback. Feedback
Language modeling for bag-of-visual words image categorization
Full text PdfPdf (1.29 MB)
Source
Conference On Image And Video Retrieval archive
Proceedings of the 2008 international conference on Content-based image and video retrieval table of contents
Niagara Falls, Canada
POSTER SESSION: Poster/reception table of contents
Pages 249-258  
Year of Publication: 2008
ISBN:978-1-60558-070-8
Authors
Pierre Tirilly  CNRS, Rennes, France
Vincent Claveau  CNRS, Rennes, France
Patrick Gros  INRIA, Rennes, France
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
SIGMULTIMEDIA: ACM Special Interest Group on Multimedia
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 39,   Downloads (12 Months): 294,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1386352.1386388
What is a DOI?

ABSTRACT

In this paper, we propose two ways of improving image classification based on bag-of-words representation [25]. Two shortcomings of this representation are the loss of the spatial information of visual words and the presence of noisy visual words due to the coarseness of the vocabulary building process. On the one hand, we propose a new representation of images that goes further in the analogy with textual data: visual sentences, that allows us to "read" visual words in a certain order, as in the case of text. We can therefore consider simple spatial relations between words. We also present a new image classification scheme that exploits these relations. It is based on the use of language models, a very popular tool from speech and text analysis communities. On the other hand, we propose new techniques to eliminate useless words, one based on geometric properties of the keypoints, the other on the use of probabilistic Latent Semantic Analysis (pLSA). Experiments show that our techniques can significantly improve image classification, compared to a classical Support Vector Machine-based classification.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
J. Bai and J.-Y. Nie. Using language models for text classification. In Proceedings of the Asia Information Retrieval Symposium, Beijing, China, Oct 2004.
 
2
A. Bosch, A. Zisserman, and X. Munoz. Scene classification via pLSA. In Proceedings of ECCV, 2006.
 
3
A. Bosch, A. Zisserman, and X. Munoz. Image classification using random forests and ferns. In Proceedings of ICCV, 2007.
 
4
G. Carneiro and A. Jepson. Flexible spatial models for grouping local image features. Proceedings of CVPR, 2:II--747--II--754 Vol.2, 27 Jun-2 Jul 2004.
 
5
W. B. Cavnar and J. M. Trenkle. N-gram-based text categorization. In Proceedings of the Symposium on Document Analysis and Information Retrieval, pages 161--175, Las Vegas, US, 1994.
 
6
S. F. Chen and J. Goodman. An empirical study of smoothing techniques for language modeling. Technical report, Cambridge, MA, August 1998.
 
7
P. Clarkson and R. Rosenfeld. Statistical language modeling using the CMU-Cambridge toolkit. In Proceedings of the Eurospeech Conference, pages 2707--2710, Rhodes, Greece, 1997.
 
8
G. Csurka, C. R. Dance, L. Fan, J. Willamowski, and C. Bray. Visual categorization with bags of keypoints. In ECCV: Workshop on Statistical Learning in Computer Vision, Prague, Czech Republic, May 2004.
 
9
 
10
S. Gao, D.-H. Wang, and C.-H. Lee. Automatic image annotation through multi-topic text categorization. volume 2, pages II--II, 2006.
 
11
D. Gokalp and S. Aksoy. Scene classification using bag-of-regions representations. In Proceedings of CVPR, pages 1--8, 2007.
 
12
 
13
M. Jamieson, A. Fazly, S. Dickinson, S. Stevenson, and S. Wachsmuth. Learning structured appearance models from captioned images of cluttered scenes. Oct 2007.
 
14
I. Jolliffe. Principal Component Analysis. Springer, 2002.
 
15
D. Larlus, G. Dorkó, and F. Jurie. Création de vocabulaires visuels efficaces pour la catégorisation d'images. In Reconnaissance des Formes et Intelligence Artificielle, 2006.
 
16
D. Larlus and F. Jurie. Latent mixture vocabularies for object categorization. In Proceedings of the British Machine Vision Conference, 2006.
 
17
K. Mc Donald. Discrete Language Models for Video Retrieval. PhD thesis, School of Computing, Dublin City University, September 2005.
 
18
 
19
 
20
F. Moosmann, B. Triggs, and F. Jurie. Randomized clustering forests for building fast and discriminative visual vocabularies. In Neural Information Processing Systems, Nov 2006.
 
21
22
 
23
 
24
 
25
26
27
 
28
J. Yuan, Y. Wu, and M. Yang. Discovery of collocation patterns: from visual words to visual phrases. In Proceedings of CVPR, pages 1--8, Jun 2007.
 
29
30
31

Collaborative Colleagues:
Pierre Tirilly: colleagues
Vincent Claveau: colleagues
Patrick Gros: colleagues