|
ABSTRACT
In this paper, we propose two ways of improving image classification based on bag-of-words representation [25]. Two shortcomings of this representation are the loss of the spatial information of visual words and the presence of noisy visual words due to the coarseness of the vocabulary building process. On the one hand, we propose a new representation of images that goes further in the analogy with textual data: visual sentences, that allows us to "read" visual words in a certain order, as in the case of text. We can therefore consider simple spatial relations between words. We also present a new image classification scheme that exploits these relations. It is based on the use of language models, a very popular tool from speech and text analysis communities. On the other hand, we propose new techniques to eliminate useless words, one based on geometric properties of the keypoints, the other on the use of probabilistic Latent Semantic Analysis (pLSA). Experiments show that our techniques can significantly improve image classification, compared to a classical Support Vector Machine-based classification.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. Bai and J.-Y. Nie. Using language models for text classification. In Proceedings of the Asia Information Retrieval Symposium, Beijing, China, Oct 2004.
|
| |
2
|
A. Bosch, A. Zisserman, and X. Munoz. Scene classification via pLSA. In Proceedings of ECCV, 2006.
|
| |
3
|
A. Bosch, A. Zisserman, and X. Munoz. Image classification using random forests and ferns. In Proceedings of ICCV, 2007.
|
| |
4
|
G. Carneiro and A. Jepson. Flexible spatial models for grouping local image features. Proceedings of CVPR, 2:II--747--II--754 Vol.2, 27 Jun-2 Jul 2004.
|
| |
5
|
W. B. Cavnar and J. M. Trenkle. N-gram-based text categorization. In Proceedings of the Symposium on Document Analysis and Information Retrieval, pages 161--175, Las Vegas, US, 1994.
|
| |
6
|
S. F. Chen and J. Goodman. An empirical study of smoothing techniques for language modeling. Technical report, Cambridge, MA, August 1998.
|
| |
7
|
P. Clarkson and R. Rosenfeld. Statistical language modeling using the CMU-Cambridge toolkit. In Proceedings of the Eurospeech Conference, pages 2707--2710, Rhodes, Greece, 1997.
|
| |
8
|
G. Csurka, C. R. Dance, L. Fan, J. Willamowski, and C. Bray. Visual categorization with bags of keypoints. In ECCV: Workshop on Statistical Learning in Computer Vision, Prague, Czech Republic, May 2004.
|
| |
9
|
|
| |
10
|
S. Gao, D.-H. Wang, and C.-H. Lee. Automatic image annotation through multi-topic text categorization. volume 2, pages II--II, 2006.
|
| |
11
|
D. Gokalp and S. Aksoy. Scene classification using bag-of-regions representations. In Proceedings of CVPR, pages 1--8, 2007.
|
| |
12
|
|
| |
13
|
M. Jamieson, A. Fazly, S. Dickinson, S. Stevenson, and S. Wachsmuth. Learning structured appearance models from captioned images of cluttered scenes. Oct 2007.
|
| |
14
|
I. Jolliffe. Principal Component Analysis. Springer, 2002.
|
| |
15
|
D. Larlus, G. Dorkó, and F. Jurie. Création de vocabulaires visuels efficaces pour la catégorisation d'images. In Reconnaissance des Formes et Intelligence Artificielle, 2006.
|
| |
16
|
D. Larlus and F. Jurie. Latent mixture vocabularies for object categorization. In Proceedings of the British Machine Vision Conference, 2006.
|
| |
17
|
K. Mc Donald. Discrete Language Models for Video Retrieval. PhD thesis, School of Computing, Dublin City University, September 2005.
|
| |
18
|
|
| |
19
|
K. Mikolajczyk , T. Tuytelaars , C. Schmid , A. Zisserman , J. Matas , F. Schaffalitzky , T. Kadir , L. Van Gool, A Comparison of Affine Region Detectors, International Journal of Computer Vision, v.65 n.1-2, p.43-72, November 2005
[doi> 10.1007/s11263-005-3848-x]
|
| |
20
|
F. Moosmann, B. Triggs, and F. Jurie. Randomized clustering forests for building fast and discriminative visual vocabularies. In Neural Information Processing Systems, Nov 2006.
|
| |
21
|
|
 |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
|
 |
26
|
Lei Wu , Mingjing Li , Zhiwei Li , Wei-Ying Ma , Nenghai Yu, Visual language modeling for image classification, Proceedings of the international workshop on Workshop on multimedia information retrieval, September 24-29, 2007, Augsburg, Bavaria, Germany
[doi> 10.1145/1290082.1290101]
|
 |
27
|
Jun Yang , Yu-Gang Jiang , Alexander G. Hauptmann , Chong-Wah Ngo, Evaluating bag-of-visual-words representations in scene classification, Proceedings of the international workshop on Workshop on multimedia information retrieval, September 24-29, 2007, Augsburg, Bavaria, Germany
[doi> 10.1145/1290082.1290111]
|
| |
28
|
J. Yuan, Y. Wu, and M. Yang. Discovery of collocation patterns: from visual words to visual phrases. In Proceedings of CVPR, pages 1--8, Jun 2007.
|
| |
29
|
|
 |
30
|
|
 |
31
|
|
|