ACM Home Page
Please provide us with feedback. Feedback
Towards optimal bag-of-features for object categorization and semantic video retrieval
Full text PdfPdf (932 KB)
Source Conference On Image And Video Retrieval archive
Proceedings of the 6th ACM international conference on Image and video retrieval table of contents
Amsterdam, The Netherlands
Pages: 494 - 501  
Year of Publication: 2007
ISBN:978-1-59593-733-9
Authors
Yu-Gang Jiang  City University of Hong Kong, Kowloon, Hong Kong
Chong-Wah Ngo  City University of Hong Kong, Kowloon, Hong Kong
Jun Yang  Carnegie Mellon University, Pittsburgh, PA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 29,   Downloads (12 Months): 201,   Citation Count: 13
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1282280.1282352
What is a DOI?

ABSTRACT

Bag-of-features (BoF) deriving from local keypoints has recently appeared promising for object and scene classification. Whether BoF can naturally survive the challenges such as reliability and scalability of visual classification, nevertheless, remains uncertain due to various implementation choices. In this paper, we evaluate various factors which govern the performance of BoF. The factors include the choices of detector, kernel, vocabulary size and weighting scheme. We offer some practical insights in how to optimize the performance by choosing good keypoint detector and kernel. For the weighting scheme, we propose a novel soft-weighting method to assess the significance of a visual word to an image. We experimentally show that the proposed soft-weighting scheme can consistently offer better performance than other popular weighting methods. On both PASCAL-2005 and TRECVID-2006 datasets, our BoF setting generates competitive performance compared to the state-of-the-art techniques. We also show that the BoF is highly complementary to global features. By incorporating the BoF with color and texture features, an improvement of 50% is reported on TRECVID-2006 dataset.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
LSCOM lexicon definitions and annotations. In DTO Challenge Workshop on Large Scale Concept Ontology for Multimedia, Columbia University ADVENT Technical Report #217-2006-3, 2006.
 
2
A. C. Berg and J. Malik. Geometric blur for template matching. In IEEE CVPR, 2001.
 
3
M. Campbell et al. IBM research trecvid-2006 video retrieval system. In TRECVID, 2006.
 
4
J. Cao et al. Intelligent multimedia group of Tsinghua university at trecvid 2006. In TRECVID, 2006.
 
5
C. C. Chang and C. J. Lin. LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm, 2001.
 
6
O. Chapelle, P. Haffner, and V. N. Vapnik. Support vector machines for histogram-based image classification. IEEE Trans. on NN, 10(5), 1999.
 
7
M. Everingham et al. The 2005 pascal visual object classes challenge. In LNAI, volume 3944, pages 117--176. Springer-Verlag, 2005.
 
8
 
9
A. G. Hauptmann et al. Multi-lingual broadcast news retrieval. In TRECVID, 2006.
 
10
D. Larlus, G. Dorko, and F. Jurie. Creation de vocabulaires visuels efficaces pour la categorisation d'images. In Reconnaissance des Formes et Intelligence Artificielle, 2006.
 
11
 
12
 
13
 
14
 
15
 
16
A. Agarwal, and B. Triggs. Hyperfeatures - multilevel local coding for visual recognition. In ECCV, 2006.
 
17
E. Nowak et al. Sampling strategies for bag-of-features image classification. In ECCV, 2006.
 
18
F. Odone et al. Building kernels from binary strings for image matching. IEEE Trans. on IP, 14(2), 2005.
 
19
S. Petrov et al. Detecting categories in news video using acoustic, speech, and image features. In TRECVID, 2006.
 
20
J. Platt. Probabilities for SV machines. In Advances in Large Margin Classifiers, pages 61--74, 2000.
 
21
 
22
C. G. M. Snoek et al. The mediamill trecvid 2006 semantic video search engine. In TRECVID, 2006.
 
23
TREC Video Retrieval Evaluation (TRECVID). http://www-nlpir.nist.gov/projects/trecvid/.
 
24
 
25
J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid. Local features and kernels for classification of texture and object categories: An in-depth study. In INRIA Technical Report RR-5737, 2005.

CITED BY  13

Collaborative Colleagues:
Yu-Gang Jiang: colleagues
Chong-Wah Ngo: colleagues
Jun Yang: colleagues