ACM Home Page
Please provide us with feedback. Feedback
Visual language modeling for image classification
Full text PdfPdf (906 KB)
Source
International Multimedia Conference archive
Proceedings of the international workshop on Workshop on multimedia information retrieval table of contents
Augsburg, Bavaria, Germany
POSTER SESSION: Multimedia retrieval and modeling table of contents
Pages: 115 - 124  
Year of Publication: 2007
ISBN:978-1-59593-778-0
Authors
Lei Wu  University of Science and Technology of China, Hefei, China
Mingjing Li  Microsoft Research Asia, Beijing, China
Zhiwei Li  Microsoft Research Asia, Beijing, China
Wei-Ying Ma  Microsoft Research Asia, Beijing, China
Nenghai Yu  University of Science and Technology of China, Hefei, China
Sponsors
SIGMULTIMEDIA: ACM Special Interest Group on Multimedia
ACM: Association for Computing Machinery
SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 18,   Downloads (12 Months): 201,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1290082.1290101
What is a DOI?

ABSTRACT

Although it has been studied for many years, image classification is still a challenging problem. In this paper, we propose a visual language modeling method for content-based image classification. It transforms each image into a matrix of visual words, and assumes that each visual word is conditionally dependent on its neighbors. For each image category, a visual language model is constructed using a set of training images, which captures both the co-occurrence and proximity information of visual words. According to how many neighbors are taken in consideration, three kinds of language models can be trained, including unigram, bigram and trigram, each of which corresponds to a different level of model complexity. Given a test image, its category is determined by estimating how likely it is generated under a specific category. Compared with traditional methods that are based on bag-of-words models, the proposed method can utilize the spatial correlation of visual words effectively in image classification. In addition, we propose to use the absent words, which refer to those appearing frequently in a category but not in the target image, to help image classification. Experimental results show that our method can achieve comparable accuracy while performing classification much more quickly.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Bahl, L. R., Jelinek, F., and Mercer, R. L. A maximum likelihood approach to continuous speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1983.
 
2
 
3
Mays, E., Damerau, F. J. and Mercer, R. L. Context-based spelling correction. IBM Natural Language ITL, 1990.
 
4
Chatterjee, S., Hadi, A. and Price, B. Simple Linear Regression. Regression Analysis by Example, 3rd ed. New York: Wiley, 2000.
 
5
Sivic, J., Russell, B., Efros, A., Zisserman, A. and Freeman, W. Discovering object categories in image collections. Technical Report A. I. Memo 2005--005, MIT, 2005.
 
6
 
7
Fergus, R., Perona, P. and Zisserman, A. Object class recognition by unsupervised scale-invariant learning. In Proceedings of the Computer Vision and Pattern Recognition (CVPR'03), 2003.
 
8
 
9
Csurka, G., Bray, C., Dance, C. and Fan, L. Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision, (ECCV'04), 2004, 1--22.
10
 
11
 
12
 
13
 
14
Wang, B., Li, Z. W., Li, M. J. and Ma, W. Y. Large-Scale Duplicate Detection for Web Image Search. In Proceedings of IEEE International Conference on Multimedia & Expo (ICME'06), 2006.
 
15
Katz, S. M. Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing, 35(3):400--401, 1987.
 
16
Matas, J., Chum, O., Urban, M. and Pajdla, T. Robust wide baseline stereo from maximally stable extremal regions. In Proceedings of The British Machine Vision Conference (BMVC'02), 2002, 384--393.
 
17
18
 
19
Otluman, H. and Aboulnasr, T. Low Complexity 2-d Hidden Markov Model for Face Recognition. In Proceedings of International Symposium on Computer Architecture. (ISCAS'00), 2000.
 
20
Vailaya, A., Jain, A. K. and Zhang, H. J. On image classification: City images vs. landscapes. Pattern Recognition, Vol. 31, pp. 1921--1936, 1998.
 
21
 
22
 
23
 
24
 
25
Gorkani, M. M. and Picard, R. W. Texture orientation for sorting photos .at a glance'. In Proc. 12th Int. Conf. on Pattern Recognition (ICPR'94), 1994, 459--464.
 
26
 
27
Peng, F. and Schuurmans, D. Combining Naive Bayes and n-Gram Language Models for Text Classification. In Proc. of The 25th European Conference on Information Retrieval Research (ECIR'03), 2003.
 
28
Clarkson, P. R. and Rosenfeld, R. Statistical Language Modeling Using the CMU-Cambridge Toolkit. In Proceedings ESCA Eurospeech, 1997.
 
29


Collaborative Colleagues:
Lei Wu: colleagues
Mingjing Li: colleagues
Zhiwei Li: colleagues
Wei-Ying Ma: colleagues
Nenghai Yu: colleagues