|
ABSTRACT
We describe an augmented reality system designed for online acquisition of visual knowledge and retrieval of memorized objects. The system relies on a head mounted camera and display, which allow the user to view the environment together with overlaid augmentations by the system. In this setup, communication by hand gestures and speech is mandatory as common input devices like mouse and keyboard are not available. Using gesture and speech, basically three types of tasks must be handled: (i) Communication with the system about the environment, in particular, directing attention towards objects and commanding the memorization of sample views; (ii) control of system operation, e.g. switching between display modes; and (iii) re-adaptation of the interface itself in case communication becomes unreliable due to changes in external factors, such as illumination conditions. We present an architecture to manage these tasks and describe and evaluate several of its key elements, including modules for pointing gesture recognition, menu control based on gesture and speech, and control strategies to cope with situations when vision becomes unreliable and has to be re-adapted by speech.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Y. Aloimonos. Active vision revisited. In Y. Aloimonos, editor, Active Perception, pages 1--18. Lawrence Efibaum, Hillsdale, 1993.
|
| |
2
|
H. J. Andersen, M. Stoerring, and E. Granum. Physics-based modelling of human skin colour under mixed illuminants. Robotics and Autonomous Systems, 35(3-4):131--142, 2001.
|
| |
3
|
C. Bauckhage, G. A. Fink, J. Fritsch, F. Kummert, F. Lomker, G. Sagerer, and S. Wachsmuth. An Integrated System for Cooperative Man-Machine Interaction. In IEEE Int'l Symp. Comput. Intelligence in Robotics and Automation, pages 328--333, Banff, Canada, 2001.
|
| |
4
|
|
| |
5
|
H. I. Christensen. Cognitive (vision) systems. ERCIM News, pages 17--18, April April, 2003.
|
| |
6
|
J. L. Crowley and H. I. Christensen, editors. Vision as Process. Springer, 1995.
|
 |
7
|
|
| |
8
|
B. A. Draper, G. Kutlu, E. M. Riseman, and A. R. Hanson. ISR3: Communication and Data Storage for an Unmanned Ground Vehicle. In Proc. ICPR, volume~I, pages 833--836, 1994.
|
| |
9
|
|
| |
10
|
A. R. Hanson and E. M. Riseman. VISIONS: A Computer System for Interpreting Scenes. In A.R. Hanson and E.M. Riseman, editors, Computer Vision Systems. Academic Press, 1978.
|
| |
11
|
C. Harris and M. Stephens. A Combined Corner and Edge Detector. In Proc. 4th Alvey Vision Conf., pages 147--151, 1988.
|
| |
12
|
G. Heidemann, R. Rae, H. Bekel, I. Bax, and H. Ritter. Integrating context-free and context-dependent attentional mechanisms for gestural object reference. In Proc. Int'l Conf. Cognitive Vision Systems, pages 22--33, Graz, Austria, 2003.
|
| |
13
|
A. Hoogs, J. Rittscher, G. Stein, and J. Schmiederer. Video Content Annotation Using Visual analysis and a Large Semantic Knowledgebase. In Proc. CVPR 2003, volume 2, pages 327--334, 2003.
|
| |
14
|
|
| |
15
|
G. Lindegaard. Usability Testing and System Evaluation: A Guide for Designing Useful Computer Systems. Chapman & Hall, 1994.
|
| |
16
|
P. J. Locher and C. F. Nodine. Symmetry Catches the Eye. In A. Levy-Schoen and J. K. O'Reagan, editors, Eye Movements: From Physiology to Cognition, pages 353--361. Elsevier Science Publishers B. V. (North Holland), 1987.
|
| |
17
|
|
 |
18
|
|
 |
19
|
|
| |
20
|
|
| |
21
|
D. Roy. Learning visually grounded words and syntax of natural spoken language. Evolution of Communication, 4(1):33--56, 2000.
|
| |
22
|
H. Siegl, and A. Pinz. A Mobile AR Kit as a Human Computer Interface for Cognitive Vision. In Proc. WIAMIS'04, Lisbon, 2004.
|
| |
23
|
|
| |
24
|
|
| |
25
|
|
| |
26
|
Craig Wisneski , Hiroshi Ishii , Andrew Dahley , Matthew G. Gorbet , Scott Brave , Brygg Ullmer , Paul Yarin, Ambient Displays: Turning Architectural Space into an Interface between People and Digital Information, Proceedings of the First International Workshop on Cooperative Buildings, Integrating Information, Organization, and Architecture, p.22-32, February 01, 1998
|
CITED BY 6
|
|
|
|
|
|
|
|
Christian Bauckhage , Marc Hanheide , Sebastian Wrede , Thomas Käster , Michael Pfeiffer , Gerhard Sagerer, Vision systems with the human in the loop, EURASIP Journal on Applied Signal Processing, v.2005 n.1, p.2375-2390, 1 January 2005
|
|
|
|
|
|
C. Bauckhage , S. Wachsmuth , M. Hanheide , S. Wrede , G. Sagerer , G. Heidemann , H. Ritter, The visual active memory perspective on integrated recognition systems, Image and Vision Computing, v.26 n.1, p.5-14, January, 2008
|
|
|
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.4
INFORMATION SYSTEMS APPLICATIONS
H.4.m
Miscellaneous
Additional Classification:
H.
Information Systems
H.5
INFORMATION INTERFACES AND PRESENTATION (I.7)
H.5.2
User Interfaces (D.2.2, H.1.2, I.3.6)
General Terms:
Algorithms,
Human Factors,
Reliability,
Verification
Keywords:
augmented reality,
human-machine-interaction,
image retrieval,
interfaces,
memory,
mobile systems,
object recognition
|