ACM Home Page
Please provide us with feedback. Feedback
Where is "it"? Event Synchronization in Gaze-Speech Input Systems
Full text PdfPdf (244 KB)
Source International Conference on Multimodal Interfaces archive
Proceedings of the 5th international conference on Multimodal interfaces table of contents
Vancouver, British Columbia, Canada
SESSION: Speech and gaze table of contents
Pages: 151 - 158  
Year of Publication: 2003
ISBN:1-58113-621-8
Authors
Manpreet Kaur  Rutgers University, Piscataway, NJ
Marilyn Tremaine  New Jersey Institute of Technology, Newark, NJ
Ning Huang  Rutgers University, Piscataway, NJ
Joseph Wilder  Rutgers University, Piscataway, NJ
Zoran Gacovski  Rutgers University, Piscataway, NJ
Frans Flippo  Rutgers University, Piscataway, NJ
Chandra Sekhar Mantravadi  Rutgers University, Piscataway, NJ
Sponsors
ACM: Association for Computing Machinery
SIGCHI: ACM Special Interest Group on Computer-Human Interaction
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 50,   Citation Count: 6
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/958432.958463
What is a DOI?

ABSTRACT

The relationship between gaze and speech is explored for the simple task of moving an object from one location to another on a computer screen. The subject moves a designated object from a group of objects to a new location on the screen by stating, "Move it there". Gaze and speech data are captured to determine if we can robustly predict the selected object and destination position. We have found that the source fixation closest to the desired object begins, with high probability, before the beginning of the word "Move". An analysis of all fixations before and after speech onset time shows that the fixation that best identifies the object to be moved occurs, on average, 630 milliseconds before speech onset with a range of 150 to 1200 milliseconds for individual subjects. The variance in these times for individuals is relatively small although the variance across subjects is large. Selecting a fixation closest to the onset of the word "Move" as the designator of the object to be moved gives a system accuracy close to 95% for all subjects. Thus, although significant differences exist between subjects, we believe that the speech and gaze integration patterns can be modeled reliably for individual users and therefore be used to improve the performance of multimodal systems.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Bernsen, N. O. and Dybkjær, L.: Is speech the right thing for your application? In Proceedings of the International Conference for Spoken Language Processing, ICSLP'98, Sydney. Australian Speech Science and Technology Association 1998, 320--3212.
 
2
Bolt, R. A. The Human Interface. Lifetime Learning Publications, Belmont, CA, 1984.
 
3
Corno, F., Farineti, L. and Signorile, I. A cost-effective solution for eye-gaze assistive technology. In Proceedings of the ICME 2002 IEEE Conference on Multimedia and Expo, IEEE Press, Piscataway, NJ, 2002.
 
4
Curry, R., Hung, G. K., Wilder, J. and Julesz, B. Context effect of common objects on visual processing. Optometry and Vision Science Vol.72, 1995, 452--460.
 
5
Dabbs, J. M., Jr., Evans, M. S., Hooper, C. H., & Purvis, J. A. Self monitors in conversation: Patterns of speech and gaze. Journal of Personality and Social Psychology Vol. 39, 1980, 278--284.
 
6
Farid, M. M. and Murtagh, F. Eye-movements and voice as interface modalities to computer systems. In Proceedings of OPTO Ireland, SPIE Press, Bellingham, WA, September 5-6, 2002, CD-ROM.
 
7
Glenstrup, A. J. and Engell-Nielsen, T. Eye controlled media: present and future state. Technical report, University of Copenhagen, Denmark, 1995.
 
8
 
9
Hung, G. K., Wilder, J., Curry, R., and Julesz, B. Simultaneous better than sequential for brief presentations. Journal of the Optical Society of America Vol. 12, 1995, 441--449.
 
10
Hung, G. K., Wilder, J., Weiss, F. and Curry, R. K, Random and direct path eye movements during target search. Medical Science Research Vol. 21, 1993, 389--391.
11
 
12
Kapoula, Z., and Robinson, D. A., "Saccadic undershoot is not inevitable: saccades can be accurate," Vision Research Vol. 26, 1986.735--743,
 
13
Kaur, M. Integration of Gaze and Speech for Multimodal Human-Computer Interaction. Unpublished Ph.D. dissertation, Department of Biomedical Engineering, Rutgers, the State University, 2000, 142 pages.
 
14
 
15
Kowler, E. and Blaser, E. The accuracy and precision of saccades to small and large targets. Vision Research Vol. 35 (12), 1995, 1741--1754.
 
16
Lin, W., Kaur, M., Tremaine, M., Hung, G. and Wilder, J.. Performance analysis of an eye-tracker.In Proceedings of the SPIE Conference on Machine Vision Applications, Architectures and Systems Integration V, 1999, CD-ROM.
 
17
 
18
Mantravadi, C. S., Wilder, J., Grove, D. and Yuan, X. A Java-based multimodal human-computer interface architecture. In Proceedings of ICICS-2001, Singapore, IEEE Press, Piscataway, NJ, 2001, CD-ROM.
19
 
20
Oviatt, S., Cohen, P., Wu, L., Vergo, J., Duncan, L., Subh, B., Bers, J., Holzman, T., Winograd, T., Landay, J., Larson, J. and Ferro, D. Designing the user interface for multimodal speech and pen-based gesture applications: State-of-the-art systems and future research directions. Human Computer Interaction, Vol. 15 (4), 2000, pp. 263--322.
21
22
 
23
24
25
26
 
27
Sharma, R., Pavlovic, V. I. and Huang, T. S. Toward multimodal human-computer interfaces. In Proceedings of the IEEE, Vol. 86, (5), May 1998, 853--869.
28
29
 
30
Tanenhaus, M. K., Spivey-Knowlton, M., Eberhard, K. and Sedivy, J. Integration of visual and linguistic information during spoken language comprehension. Science, Vol. 268, 1995, pp. 1632--1634.
31
32
33
34

CITED BY  6

Collaborative Colleagues:
Manpreet Kaur: colleagues
Marilyn Tremaine: colleagues
Ning Huang: colleagues
Joseph Wilder: colleagues
Zoran Gacovski: colleagues
Frans Flippo: colleagues
Chandra Sekhar Mantravadi: colleagues