|
ABSTRACT
An object can be a basic unit for multimedia content analysis. Besides similarity among common objects, each object has its own unique characteristics which we cannot find in other surrounding objects in multimedia data. We call such unique characteristics object fingerprints. In this paper, we propose a novel approach to extract and match object fingerprints for multimedia content analysis. In particular, we focus on the problem of street landmark localization from images. Instead of modeling and matching a street landmark as a whole, our proposed approach extracts the landmark's object fingerprints in a given image and match to a new image or video in order to localize the landmark. We formulate matching the landmark's object fingerprints as a classification problem solved by a cascade of 1NN classifiers. We develop a street landmark localization system that combines salient region detection, segmentation, and object fingerprint extraction techniques for the purpose. To evaluate, we have compiled a novel dataset which consists of 15 U.S. street landmarks' images and videos. Our experiments on this dataset show superior performance to state-of-the-art recognition algorithms [20, 33]. The proposed approach can also be well generalized to other objects of interest and content analysis tasks. We demonstrate the feasibility through the application of our approach to refine web image search results and obtained encouraging results.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
G. E. Burnett. Turn right at the king's head: Drivers' requirements for route guidance information. In PhD thesis, Loughborough University, UK, 1998.
|
| |
3
|
|
 |
4
|
Ritendra Datta , Dhiraj Joshi , Jia Li , James Z. Wang, Image retrieval: Ideas, influences, and trends of the new age, ACM Computing Surveys (CSUR), v.40 n.2, p.1-60, April 2008
[doi> 10.1145/1348246.1348248]
|
| |
5
|
A. de la Escalera, L. E. Moreno, M. A. Salichs, and J. M. Armingol. Road traffic sign detection and classification. IEEE Transactions on Industrial Electronics, 1997.
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
X. Hou and L. Zhang. Saliency detection: a spectral residual approach. In CVPR, 2007.
|
 |
11
|
|
 |
12
|
Feng Jing , Changhu Wang , Yuhuan Yao , Kefeng Deng , Lei Zhang , Wei-Ying Ma, IGroup: web image search results clustering, Proceedings of the 14th annual ACM international conference on Multimedia, October 23-27, 2006, Santa Barbara, CA, USA
[doi> 10.1145/1180639.1180720]
|
 |
13
|
|
 |
14
|
Lyndon Kennedy , Mor Naaman , Shane Ahern , Rahul Nair , Tye Rattenbury, How flickr helps us make sense of the world: context and content in community-contributed media collections, Proceedings of the 15th international conference on Multimedia, September 25-29, 2007, Augsburg, Germany
[doi> 10.1145/1291233.1291384]
|
| |
15
|
S. Kim, X. Jin, and J. Han. Sparclus: Spatial relationship pattern-based hierarchical clustering. In SIAM Int. Conf. on Data Mining, 2008.
|
 |
16
|
|
| |
17
|
B. Leibe, N. Cornelis, K. Cornelis, and L. Van Gool. Dynamic 3d scene analysis from a moving vehicle. In CVPR, 2007.
|
 |
18
|
Michael S. Lew , Nicu Sebe , Chabane Djeraba , Ramesh Jain, Content-based multimedia information retrieval: State of the art and challenges, ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP), v.2 n.1, p.1-19, February 2006
[doi> 10.1145/1126004.1126005]
|
 |
19
|
|
| |
20
|
|
| |
21
|
T. Malisiewicz and A. A. Efros. Improving spatial support for objects via multiple segmentations. In British Machine Vision Conference, 2007.
|
 |
22
|
Humera Noor , Shahid H. Mirza , Yaser Sheikh , Amit Jain , Mubarak Shah, Model generation for video-based object recognition, Proceedings of the 14th annual ACM international conference on Multimedia, October 23-27, 2006, Santa Barbara, CA, USA
[doi> 10.1145/1180639.1180791]
|
| |
23
|
G. Piccioli, E. De Micheli, P. Parodi, and M. Campani. Robust method for road sign detection and recognition. Image and Vision Computing, 1996.
|
| |
24
|
J. Pilet, V. Lepetit, and P. Fua. Real-time non-rigid surface detection. In CVPR, 2005.
|
| |
25
|
|
 |
26
|
|
| |
27
|
G. Schindler, M. Brown, and R. Szeliski. City-scale location recognition. In CVPR, 2007.
|
| |
28
|
|
| |
29
|
|
| |
30
|
|
| |
31
|
J. Willamowski, D. Arregui, G. Csurka, C. R. Dance, and L. Fan. Categorizing nine visual classes using local appearance descriptors. In IWLAVS, 2004.
|
 |
32
|
|
| |
33
|
|
 |
34
|
|
 |
35
|
|
| |
36
|
C. L. Zitnick, J. Sun, R. Szeliski, and S. Winder. Object instance recognition using triplets of feature symbols. In Microsoft Research Technical Report, 2007.
|
|