|
ABSTRACT
Acknowledging an interruption with a nod of the head is a natural and intuitive communication gesture which can be performed without significantly disturbing a primary interface activity. In this paper we describe vision-based head gesture recognition techniques and their use for common user interface commands. We explore two prototype perceptual interface components which use detected head gestures for dialog box confirmation and document browsing, respectively. Tracking is performed using stereo-based alignment, and recognition proceeds using a trained discriminative classifier. An additional context learning component is described, which exploits interface context to obtain robust performance. User studies with prototype recognition components indicate quantitative and qualitative benefits of gesture-based confirmation over conventional alternatives.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
J. Cassell , T. Bickmore , M. Billinghurst , L. Campbell , K. Chang , H. Vilhjálmsson , H. Yan, Embodiment in conversational interfaces: Rea, Proceedings of the SIGCHI conference on Human factors in computing systems: the CHI is the limit, p.520-527, May 15-20, 1999, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/302979.303150]
|
 |
3
|
|
| |
4
|
H. Clark and E. Schaefer. Contributing to discourse. Cognitive Science, 13:259--294, 1989.
|
| |
5
|
|
 |
6
|
|
| |
7
|
S. Fujie, Y. Ejiri, K. Nakajima, Y. Matsusaka, and T. Kobayashi. A conversation robot using head gesture recognition as para-linguistic information. In Proceedings of 13th IEEE International Workshop on Robot and Human Communication, RO-MAN 2004, pages 159--164, September 2004.
|
| |
8
|
|
 |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
S. Lenman, L. Bretzer, and B. Thuresson. Computer vision based hand gesture interfaces for human-computer interaction. Technical Report CID-172, Center for User Oriented IT Design, June 2002.
|
| |
13
|
|
| |
14
|
L.-P. Morency, A. Rahimi, and T. Darrell. Adaptive view-based appearance model. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition, volume~1, pages 803--810, 2003.
|
 |
15
|
|
| |
16
|
Yukiko I. Nakano , Gabe Reinstein , Tom Stocky , Justine Cassell, Towards a model of face-to-face grounding, Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, p.553-561, July 07-12, 2003, Sapporo, Japan
[doi> 10.3115/1075096.1075166]
|
| |
17
|
|
 |
18
|
Candace L. Sidner , Cory D. Kidd , Christopher Lee , Neal Lesh, Where to look: a study of human-robot engagement, Proceedings of the 9th international conference on Intelligent user interface, January 13-16, 2004, Funchal, Madeira, Portugal
[doi> 10.1145/964442.964458]
|
| |
19
|
Sidner, Lee, and N. Lesh. Engagement when looking: Behaviors for robots when collaborating with people. In Diabruck: Proceedings of the 7th workshop on the Semantic and Pragmatics of Dialogue, pages 123--130, University of Saarland, 2003. I. Kruiff-Korbayova and C. Kosny (eds.).
|
| |
20
|
K. Toyama. Look, ma - no hands!hands-free cursor control with real-time 3D face tracking. In PUI98, 1998.
|
| |
21
|
Wikipedia. Wikipedia encyclopedia. http://en.wikipedia.org/wiki/Dialog_box.
|
 |
22
|
Shumin Zhai , Carlos Morimoto , Steven Ihde, Manual and gaze input cascaded (MAGIC) pointing, Proceedings of the SIGCHI conference on Human factors in computing systems: the CHI is the limit, p.246-253, May 15-20, 1999, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/302979.303053]
|
INDEX TERMS
Primary Classification:
I.
Computing Methodologies
I.2
ARTIFICIAL INTELLIGENCE
I.2.10
Vision and Scene Understanding
Subjects:
Motion
Additional Classification:
H.
Information Systems
H.5
INFORMATION INTERFACES AND PRESENTATION (I.7)
H.5.2
User Interfaces (D.2.2, H.1.2, I.3.6)
General Terms:
Algorithms,
Design,
Experimentation,
Human Factors,
Performance
Keywords:
IUI design,
context-based recognition,
head gesture,
multi-modal input,
nod recognition,
nodding,
user study
|