|
ABSTRACT
Although it has been studied for many years, image classification is still a challenging problem. In this paper, we propose a visual language modeling method for content-based image classification. It transforms each image into a matrix of visual words, and assumes that each visual word is conditionally dependent on its neighbors. For each image category, a visual language model is constructed using a set of training images, which captures both the co-occurrence and proximity information of visual words. According to how many neighbors are taken in consideration, three kinds of language models can be trained, including unigram, bigram and trigram, each of which corresponds to a different level of model complexity. Given a test image, its category is determined by estimating how likely it is generated under a specific category. Compared with traditional methods that are based on bag-of-words models, the proposed method can utilize the spatial correlation of visual words effectively in image classification. In addition, we propose to use the absent words, which refer to those appearing frequently in a category but not in the target image, to help image classification. Experimental results show that our method can achieve comparable accuracy while performing classification much more quickly.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Bahl, L. R., Jelinek, F., and Mercer, R. L. A maximum likelihood approach to continuous speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1983.
|
| |
2
|
Peter F. Brown , John Cocke , Stephen A. Della Pietra , Vincent J. Della Pietra , Fredrick Jelinek , John D. Lafferty , Robert L. Mercer , Paul S. Roossin, A statistical approach to machine translation, Computational Linguistics, v.16 n.2, p.79-85, June 1990
|
| |
3
|
Mays, E., Damerau, F. J. and Mercer, R. L. Context-based spelling correction. IBM Natural Language ITL, 1990.
|
| |
4
|
Chatterjee, S., Hadi, A. and Price, B. Simple Linear Regression. Regression Analysis by Example, 3rd ed. New York: Wiley, 2000.
|
| |
5
|
Sivic, J., Russell, B., Efros, A., Zisserman, A. and Freeman, W. Discovering object categories in image collections. Technical Report A. I. Memo 2005--005, MIT, 2005.
|
| |
6
|
|
| |
7
|
Fergus, R., Perona, P. and Zisserman, A. Object class recognition by unsupervised scale-invariant learning. In Proceedings of the Computer Vision and Pattern Recognition (CVPR'03), 2003.
|
| |
8
|
|
| |
9
|
Csurka, G., Bray, C., Dance, C. and Fan, L. Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision, (ECCV'04), 2004, 1--22.
|
 |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
Wang, B., Li, Z. W., Li, M. J. and Ma, W. Y. Large-Scale Duplicate Detection for Web Image Search. In Proceedings of IEEE International Conference on Multimedia & Expo (ICME'06), 2006.
|
| |
15
|
Katz, S. M. Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing, 35(3):400--401, 1987.
|
| |
16
|
Matas, J., Chum, O., Urban, M. and Pajdla, T. Robust wide baseline stereo from maximally stable extremal regions. In Proceedings of The British Machine Vision Conference (BMVC'02), 2002, 384--393.
|
| |
17
|
|
 |
18
|
|
| |
19
|
Otluman, H. and Aboulnasr, T. Low Complexity 2-d Hidden Markov Model for Face Recognition. In Proceedings of International Symposium on Computer Architecture. (ISCAS'00), 2000.
|
| |
20
|
Vailaya, A., Jain, A. K. and Zhang, H. J. On image classification: City images vs. landscapes. Pattern Recognition, Vol. 31, pp. 1921--1936, 1998.
|
| |
21
|
|
| |
22
|
|
| |
23
|
P. Quelhas , F. Monay , J.-M. Odobez , D. Gatica-Perez , T. Tuytelaars , L. Van Gool, Modeling Scenes with Local Descriptors and Latent Aspects, Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, p.883-890, October 17-20, 2005
[doi> 10.1109/ICCV.2005.152]
|
| |
24
|
|
| |
25
|
Gorkani, M. M. and Picard, R. W. Texture orientation for sorting photos .at a glance'. In Proc. 12th Int. Conf. on Pattern Recognition (ICPR'94), 1994, 459--464.
|
| |
26
|
|
| |
27
|
Peng, F. and Schuurmans, D. Combining Naive Bayes and n-Gram Language Models for Text Classification. In Proc. of The 25th European Conference on Information Retrieval Research (ECIR'03), 2003.
|
| |
28
|
Clarkson, P. R. and Rosenfeld, R. Statistical Language Modeling Using the CMU-Cambridge Toolkit. In Proceedings ESCA Eurospeech, 1997.
|
| |
29
|
|
CITED BY 4
|
|
|
|
|
Lei Wu , Xian-Sheng Hua , Nenghai Yu , Wei-Ying Ma , Shipeng Li, Flickr distance, Proceeding of the 16th ACM international conference on Multimedia, October 26-31, 2008, Vancouver, British Columbia, Canada
|
|
|
Yi Ouyang , Ming Tang , Jinqiao Wang , Hanqing Lu , Songde Ma, Boosting relative spaces for categorizing objects with large intra-class variation, Proceeding of the 16th ACM international conference on Multimedia, October 26-31, 2008, Vancouver, British Columbia, Canada
|
|
|
|
|