|
ABSTRACT
In order to understand utterance based human-robot interation, and to develop such a system, this paper initially analyzes how loud humans speak in a noisy environment. Experiments were conducted to measure how loud humans speak with 1) different noise levels, 2) different number of sound sources, 3) different sound sources, and 4) different distances to a robot. Synchronized sound sources add noise to the auditory scene, and resultant utterances are recorded and compared to a previously recorded noiseless utterance. From experiments, we understand that humans generate basically the same level of sound pressure level at his/her location irrespective of distance and background noise. More precisely, there is a band according to a distance, and also according to sound sources that is including language pronounce. According to this understanding, we developed an online spoken command recognition system for a mobile robot. System consists of two key componenets: 1) Low side-lobe microphone array that works as omini-directional telescopic microphone, and 2) DSBF combined with FBS method for sound source localization and segmentation. Caller location and segmented sound stream are calculated, and then the segmented sound stream is sent to voice recognition system. The system works with at most five sound sources at the same time with about at most 18[dB] sound pressure differences. Experimental results with the modile robot are also shown.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A.Lee, T.Kawahara and K.Shikano. Julius - an open source real-time large vocabulary recognition engine. In Proceedings of European Conference on Speech Communication and Technology, pages 1691--1694, 2001.
|
| |
2
|
|
| |
3
|
A. A. E. Weinstein, K. Steele and J. Glass. Loud: A 1020-node modular microphone array and beamformer for intelligent computing spaces. Technical Report MIT-LCS-TM-642, MIT/LCS Technical Memo, April 2004.
|
| |
4
|
J. Hirokawa, T. Koga, K. Suzuki, O. Hideki, and N. Matsuhira. Development of a high performance auditory function robot in interaction with human - aprialphatm with omni-directional auditory function - (in japanese). In Proceedings of Robotics and Mechatronics Conference 2006, pages 1A1--E16, Okubo Campas, Waseda University, May 2006.
|
| |
5
|
C. T. Ishi, S. Matsuda, T. Kanda, T. Jitsuhiro, H. Ishiguro, S. Nakamura, and N. Hagita. Robust speech recognition system for communication robots in real environments. In Proceedings of IEEE-RAS International Conference on Humanoid Robots(HUMANOIDS2006), pages 340--345, Genova, Italy, December 2006.
|
| |
6
|
James J. Kuffner. Efficient optimal search of Euclidean-cost grids and lattices. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2004.
|
| |
7
|
J.-C. Junqua. The Lombard reflex and its role on human listeners and automatic speech recognizer. The Journal of the Acoustical Society of America, 93(1):510--524, 1993.
|
| |
8
|
E. Martinson and A. Xchultz. Auditory evidence grids. In Proceedings of 2006 IEEE/RSJ International Conference on Intelligent Robot and Systems (IROS2006), pages 1140--1145, Beijing, China, October 2006.
|
| |
9
|
Toshihiro Matsui , Hideki Asoh , John Fry , Youichi Motomura , Futoshi Asano , Takio Kurita , Isao Hara , Nobuyuki Otsu, Integrated natural spoken dialogue system of Jijo-2 mobile robot for office services, Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence, p.621-627, July 18-22, 1999, Orlando, Florida, United States
|
| |
10
|
M. Murase, S. Yamamoto, J.-M. Valin, K. Nakadai, K. Yamada, K. Komatani, T. Ogata, and H. G. Okuno. Multiple moving speaker tracking by microphone array on mobile robot. In Proceedings of Proceedings of the Nineth European Conference on Speech Communication and Technology (Interspeech-2005), pages 249--252, Lisboa, Portugal, September 2005.
|
| |
11
|
K. Nakadai, H. Nakajima, M. Murase, S. Kaijiri, K. Yamada, Y. Hasegawa, H. G. Okuno, and H. Tsujino. Real-time tracking of multiple sound sources by integration of in-room and robot-embedded microphone arrays. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS-2006), pages 852--859, Beijing, China, September 2006.
|
| |
12
|
K. Nakadai, H. Nakajima, M. Murase, S. Kaijiri, K. Yamada, T. Nakamura, Y. Hasegawa, H. G. Okuno, and H. Tsujino. Robust tracking of multiple sound sources by spatial integration of room and robot microphone arrays. In Proceedings of International Conference on Acoustics, Speech, and Signal Processing 2006, pages IV 929--932, Toulouse, France, May 2006.
|
| |
13
|
M. SATO, A. SUGIYAMA, and S. OHNAKA. Near-field sound-source localization and adaptive noise cancellation in a personal robot, papero (in japanese). In Proceedings of the 22th Meeting of Special Interest Group on AI Challenges, pages 41--46, October 2005.
|
| |
14
|
|
| |
15
|
S. Yamamoto, K. Nakadai, H. Tsujino, T. Yokoyama, and H. G. Okuno. Improvement of robot audition by interfacing sound source separation and automatic speech recognition with missing feature theory. In Proceedings of IEEE-RAS International Conference on Robots and Automation (ICRA2004), pages 1517--1523, New Orleans, May 2004.
|
|