|
ABSTRACT
Under natural viewing conditions, human observers use shifts in gaze to allocate processing resources to subsets of the visual input. There are many computational models that try to predict these shifts in eye movement and attention. Although the important role of high level stimulus properties (e.g., semantic information) stands undisputed, most models are based solely on low-level image properties. We here demonstrate that a combined model of high-level object detection and low-level saliency significantly outperforms a low-level saliency model in predicting locations humans fixate on. The data is based on eye-movement recordings of humans observing photographs of natural scenes, which contained one of the following high-level stimuli: faces, text, scrambled text or cell phones. We show that observers - even when not instructed to look for anything particular, fixate on a face with a probability of over 80% within their first two fixations, on text and scrambled text with a probability of over 65.1% and 57.9% respectively, and on cell phones with probability of 8.3%. This suggests that content with meaningful semantic information is significantly more likely to be seen earlier. Adding regions of interest (ROI), which depict the locations of the high-level meaningful features, significantly improves the prediction of a saliency model for stimuli with high semantic importance, while it has little effect for an object with no semantic meaning.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Cerf, M., Harel, J., Einhäuser, W., and Koch, C. 2008. Predicting human gaze using low-level saliency combined with face detection. In Advances in Neural Information Processing Systems 20, J. Platt, D. Koller, Y. Singer, and S. Roweis, Eds. MIT Press, Cambridge, MA.
|
| |
2
|
Einhäuser, W., Rutishauser, U., Frady, E., Nadler, S., König, P., and Koch, C. 2006. The relation of phase noise and luminance contrast to overt attention in complex visual stimuli. Journal of Vision 6, 11, 1148--1158.
|
| |
3
|
Hershler, O., and Hochstein, S. 2005. At first sight: a high-level pop out effect for faces. Vision Res 45, 13, 1707--24.
|
| |
4
|
Itti, L., and Koch, C. 2000. A saliency-based search mechanism for overt and covert shifts of visual attention. Vision Research 40, 10--12, 1489--1506.
|
| |
5
|
Itti, L., and Koch, C. 2001. Computational modeling of visual attention. Nature Rev. Neurosci. 2, 3, 194--203.
|
| |
6
|
|
| |
7
|
James, W. 1950. The Principles of Psychology. Dover Publications.
|
| |
8
|
Johnson, M., Dziurawiec, S., Ellis, H., and Morton, J. 1991. Newborns' preferential tracking of face-like stimuli and its subsequent decline. Cognition 40, 1--2, 1--19.
|
| |
9
|
Oliva, A., Torralba, A., Castelhano, M., and Henderson, J. 2003. Top-down control of visual attention in object detection. Image Processing, 2003. Proceedings. 2003 International Conference on I.
|
| |
10
|
Peters, R., Iyer, A., Itti, L., and Koch, C. 2005. Components of bottom-up gaze allocation in natural images. Vision Research 45, 18, 2397--2416.
|
| |
11
|
Vanrullen, R. 2006. On second glance: Still no high-level popout effect for faces. Vision Res 46, 18, 3017--3027.
|
| |
12
|
Viola, P., and Jones, M. 2001. Rapid object detection using a boosted cascade of simple features. Computer Vision and Pattern Recognition 1, 511--518.
|
|