|
ABSTRACT
We propose a probabilistic framework that uses influence diagrams to fuse metadata of multiple modalities for photo annotation. We fuse contextual information (location, time, and camera parameters), visual content (holistic and local perceptual features), and semantic ontology in a synergistic way. We use causal strengths to encode causalities between variables, and between variables and semantic labels. Through analytical and empirical studies, we demonstrate that our fusion approach can achieve high-quality photo annotation and good interpretability, substantially better than traditional methods.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
K. Barnard and D. Forsyth. Learning the semantics of words and pictures. In International Conference on Computer Vision, volume 2, pages 408--415, 2000.
|
| |
3
|
M. Boutell and J. Luo. Bayesian fusion of camera metadata cues in semantic scene classification. IEEE CVPR, 2004.
|
| |
4
|
E. Y. Chang. Extent: Combining context, content, and semantic ontology for photo annotation. Second International Workshop on Computer Vision meets Databases, 2005.
|
 |
5
|
Marc Davis , Simon King , Nathan Good , Risto Sarvas, From context to content: leveraging context to infer media metadata, Proceedings of the 12th annual ACM international conference on Multimedia, October 10-16, 2004, New York, NY, USA
[doi> 10.1145/1027527.1027572]
|
| |
6
|
|
| |
7
|
T. Dietterich and G. Bakiri. Solving multiclass learning problems via error-correcting output codes. Artifical Intelligence Research, 2:263--286, 1995.
|
| |
8
|
|
| |
9
|
P. J. Doshi, L. G. Greenwald, and J. R. Clarke. Using bayesian networks for cleansing trauma data. American Association for Artificial Intelligence, 2003.
|
| |
10
|
|
| |
11
|
|
| |
12
|
E. B. Goldstein. Senstation and perception (5th edition). 1999.
|
| |
13
|
R. M. Haralick, K. Shanmugam, and I. Dinstein. Texture features for image classification. IEEE Trans. on Sys. Man. and Cyb, 3(6), 1973.
|
| |
14
|
D. Heckerman. A bayesian approach to learning causal networks. Conference on Uncertainty in Artificial Intelligence, pages 107--118, 1995.
|
| |
15
|
D. Heckerman and R. Shachter. Decision-theoretic foundations for causal reasoning.MSR-TR-94-11, 1994.
|
| |
16
|
Y. Ke and R. Sukthankar. Pca-sift: A more distinctive representation for local image descriptors. IEEE Computer Vision and Pattern Recognition, 2004.
|
| |
17
|
L. Khan and D. McLeod. Disambiguation of annotated text of audio using ontologies. SIGKDD, 2002.
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
 |
21
|
Mor Naaman , Susumu Harada , QianYing Wang , Hector Garcia-Molina , Andreas Paepcke, Context data in geo-referenced digital photo collections, Proceedings of the 12th annual ACM international conference on Multimedia, October 10-16, 2004, New York, NY, USA
[doi> 10.1145/1027527.1027573]
|
| |
22
|
M. Naaman, A. Paepcke, and H. Garcia-Molina. From where to what: Metadata sharing for digital photographs with geographic coordinates. International Conference on Cooperative Information Systems (CoopIS), 2003.
|
| |
23
|
NIST. Common evaluation measures. 2001.
|
| |
24
|
L. R. Novick and P. W. Cheng. Assessing interactive causal influence. Psychological Review, 111(2):455--485, 2004.
|
| |
25
|
|
| |
26
|
J. Pearl. Causal inference in the health sciences: A conceptual introduction. Special issue on causal inference, Kluwer Academic Publishers, Health Services and Outcomes Research Methodology, 2:189--220, 2001.
|
| |
27
|
J. Platt. Probabilistic outputs for svms and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers, 1999.
|
| |
28
|
Y. Rui, T. S. Huang, and S.-F. Chang. Image retrieval: Current techniques, promising directions and open issues. Journal of Visual Communication and Image Representation, 1999.
|
| |
29
|
Y. Rui, A. C. She, and T. S. Huang. Modified fourier descriptors for shape representations- a practical approach. Proc. of First International Workshop on Image Databases and Multi Media Search, 1996.
|
| |
30
|
J. R. Smith and S. F. Chang. Transform features for texture classification and discrimination in large image databases. Proc. IEEE Int. Conf. on Image Proc., 1994.
|
| |
31
|
J. R. Smith and S.-F. Chang. Tools and techniques for color image retrieval. Proc. SPIE Proceedings Storage and Retrieval for Image and Video Databases IV, 2670, 1995.
|
| |
32
|
|
| |
33
|
M. Stricker and M. Orengo. Similarity of color images. Proc. SPIE Storage and Retrieval for Image and Video Databases, 1995.
|
| |
34
|
H. Tamura, S. Mori, and T. Yamawaki. Texture features corresponding to visual perception. IEEE Trans. on Sys., Man. and Cyb, 3(6), 1978.
|
 |
35
|
|
| |
36
|
J. Z. Wang, J. Li, and G. Wiederhold. Simplicity: Semantics-sensitive integrated matching for picture libraries. ACM Multimedia Conference, 2000.
|
| |
37
|
J. Williamson. Causality, in Dov Gabbay & F. Guenthner (eds.): Handbook of Philosophical Logic. Kluwer (to appear), 2005.
|
| |
38
|
Y. Wu, B. L. Tseng, and J. R. Smith. Ontology-based multi-classification learning for video concept detection. IEEE International Conference on Multimedia and Expo, 2004.
|
CITED BY 7
|
|
|
|
|
Benjamin N. Lee , Wen-Yen Chen , Edward Y. Chang, A scalable service for photo annotation, sharing, and search, Proceedings of the 14th annual ACM international conference on Multimedia, October 23-27, 2006, Santa Barbara, CA, USA
|
|
|
|
|
|
Lyndon Kennedy , Mor Naaman , Shane Ahern , Rahul Nair , Tye Rattenbury, How flickr helps us make sense of the world: context and content in community-contributed media collections, Proceedings of the 15th international conference on Multimedia, September 25-29, 2007, Augsburg, Germany
|
|
|
|
|
|
|
|
|
Zhen Guo , Zhongfei Zhang , Eric Xing , Christos Faloutsos, Enhanced max margin learning on multimodal data mining in a multimedia database, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|