| Evaluating bag-of-visual-words representations in scene classification |
| Full text |
Pdf
(411 KB)
|
Source
|
International Multimedia Conference
archive
Proceedings of the international workshop on Workshop on multimedia information retrieval
table of contents
Augsburg, Bavaria, Germany
POSTER SESSION: Video retrieval and annotation
table of contents
Pages: 197 - 206
Year of Publication: 2007
ISBN:978-1-59593-778-0
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 27, Downloads (12 Months): 173, Citation Count: 5
|
|
|
ABSTRACT
Based on keypoints extracted as salient image patches, an image can be described as a "bag of visual words" and this representation has been used in scene classification. The choice of dimension, selection, and weighting of visual words in this representation is crucial to the classification performance but has not been thoroughly studied in previous work. Given the analogy between this representation and the bag-of-words representation of text documents, we apply techniques used in text categorization, including term weighting, stop word removal, feature selection, to generate image representations that differ in the dimension, selection, and weighting of visual words. The impact of these representation choices to scene classification is studied through extensive experiments on the TRECVID and PASCAL collection. This study provides an empirical basis for designing visual-word representations that are likely to produce superior classification performance.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
|
 |
4
|
Susan Dumais , John Platt , David Heckerman , Mehran Sahami, Inductive learning algorithms and representations for text categorization, Proceedings of the seventh international conference on Information and knowledge management, p.148-155, November 02-07, 1998, Bethesda, Maryland, United States
[doi> 10.1145/288627.288651]
|
 |
5
|
|
| |
6
|
|
| |
7
|
Y. Ke and R. Sukthankar. Pca-sift: A more distinctive representation for local image descriptors. In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 2004.
|
| |
8
|
|
| |
9
|
|
 |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
M. R. Naphade, L. Kennedy, J. R. Kender, S. F. Chang, J. Smith, P. Over, and A. Hauptmann. A light scale concept ontology for multimedia understanding for TRECVID 2005. In IBM Research Technical Report, 2005.
|
| |
15
|
|
 |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
A. Smeaton and P. Over. Trecvid: Benchmarking the effectiveness of infomration retrieval tasks on digital video. In Proc. of the Intl. Conf. on Image and Video Retrieval, 2003.
|
 |
20
|
|
| |
21
|
|
| |
22
|
J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid. Local features and kernels for classification of texture and object categories: An in-depth study. In Technical report, INRIA, 2005.
|
| |
23
|
W. Zhao, Y.-G. Jiang, and C.-W. Ngo. Keyframe retrieval by keypoints: Can point-to-point matching help? In Proc. of 5th Int'l Conf. on Image and Video Retrieval (CIVR), pages 72--81, 2006.
|
CITED BY 5
|
|
Alexander G. Hauptmann , Jonathan J. Wang , Wei-Hao Lin , Jun Yang , Michael Christel, Efficient search: the informedia video retrieval system, Proceedings of the 2008 international conference on Content-based image and video retrieval, July 07-09, 2008, Niagara Falls, Canada
|
|
|
|
|
|
Datong Chen , Ming-yu Chen , Howard Wactlar , Can Gao , Ashok Bharucha, Video measurement of resident-on-resident physical aggression in nursing homes, Proceeding of the 1st ACM workshop on Vision networks for behavior analysis, October 31-31, 2008, Vancouver, British Columbia, Canada
|
|
|
|
|
|
|
|