ACM Home Page
Please provide us with feedback. Feedback
Using redundant speech and handwriting for learning new vocabulary and understanding abbreviations
Full text PdfPdf (1.20 MB)
Source International Conference on Multimodal Interfaces archive
Proceedings of the 8th international conference on Multimodal interfaces table of contents
Banff, Alberta, Canada
SESSION: Oral session 5: speech and dialogue systems table of contents
Pages: 347 - 356  
Year of Publication: 2006
ISBN:1-59593-541-X
Author
Edward C. Kaiser  Natural Interaction Systems, LLC., Seattle, Washington
Sponsors
SIGCHI: ACM Special Interest Group on Computer-Human Interaction
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 59,   Citation Count: 6
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1180995.1181060
What is a DOI?

ABSTRACT

New language constantly emerges from complex, collaborative human-human interactions like meetings -- such as, for instance, when a presenter handwrites a new term on a whiteboard while saying it. Fixed vocabulary recognizers fail on such new terms, which often are critical to dialogue understanding. We present a proof-of-concept multimodal system that combines information from handwriting and speech recognition to learn the spelling, pronunciation and semantics of out-of-vocabulary terms from single instances of redundant multimodal presentation (e.g. saying a term while handwriting it). For the task of recognizing the spelling and semantics of abbreviated Gantt chart labels across a held-out test series of five scheduling meetings we show a significant relative error rate reduction of 37% when our learning methods are used and allowed to persist across the meeting series, as opposed to when they are not used.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Zdnet Whiteboard Videos, http://news.zdnet.com/2036-2_22-6035716.html.
 
2
Kaiser, E. C. and Barthelmess, P., Edge-Splitting in a Cumulative Multimodal System, for a No-Wait Temporal Threshold on Information Fusion, Combined with an under-Specified Display. Interspeech 2006 - ICSLP, (Pittsburgh, PA, 2006).
3
 
4
 
5
Baldwin, D. A., Markman, E. M., Bill, B., Desjardins, R. N., Irwin, J. M. and Tidball, G. Infant's Reliance on a Social Criterion for Establishing Word Object Relations. Child development, 67. 3125--3153.
 
6
Yu, C., Ballard, D. H. and Aslin, R. N., The Role of Embodied Intention in Early Lexical Acquisition. CogSci '03, (Boston, MA, 2003).
 
7
Clark, H. H. and Wilkes-Gibbs, D. Referring as a Collaborative Process. Cognition, 22. 1--39.
 
8
Brennan, S., Lexical Entrainment in Spontaneous Dialogue. In Proceedings of the International Symposium on Spoken Dialogue, (Philadelphia, USA, 1996), 41--44.
 
9
Yu, H., Tomokiyo, T., Wang, Z. and Waibel, A., New Developments in Automatic Meeting Transcription. in Proceedings of ICSLP, (Beijing, China, 2000).
 
10
Grice, H. P. Logic and Conversation. in Cole, P. and Morgan, J. eds. Speech Acts, Academic Press, New York, 1975, 41--58.
11
 
12
Oviatt, S. and Olsen, E., Integration Themes in Multimodal Human-Computer Interaction. ICSLP '94, (1994), 551--554.
13
 
14
Gupta, A. K. and Anastasakos, T., Integration Patterns During Multimodal Interaction. In INTERSPEECH-2004, (Jeju Island, Korea, 2004), 2293--2296.
15
16
 
17
Dumas, B., Pugin, C., Hennebert, J., Petrovska-Delacrétaz, D., Humm, A., Evéquoz, F., Ingold, R. and Rotz, D. V., Myidea - Multimodal Biometrics Database, Description of Acquisition Protocols. Third COST 275 Workshop, (Hatfield (UK), 2005), 59--62.
18
 
19
Schimke, S., Vogel, T., Vielhauer, C. and Dittmann, J., Integration and Fusion Aspects of Speech and Handwriting Media. SPECOM '04, (2004), 42--46.
 
20
Park, A. and Glass, J. R., Towards Unsupervised Pattern Discovery in Speech. Proc. ASRU, (San Juan, Puerto Rico, 2005), 53--58.
 
21
 
22
Chung, G., Seneff, S., Wang, C. and Hetherington, L., A Dynamic Vocabulary Spoken Dialogue Interface. in Interspeech '04, (Jeju Island, Korea, 2004), pp. 327--330.
 
23
Chung, G., Wang, C., Seneff, S., FIlisko, E. and Tang, M., Combining Linguistic Knowledge and Acoustic Information in Automatic Pronunciation Lexicon Generation. in Interspeech '04, (Jeju Island, Korea, 2004), pp. 328--332.
 
24
Galescu, L. Sub-Lexical Language Models for Unlimited Vocabulary Speech Recognition, ATR, Kyoto, Japan, 2002.
 
25
Potamianos, G., Neti, C., Luettin, J. and Matthews, I. Audio-Visual Automatic Speech Recognition: An Overview. in Bailly, G., Vatikiotis-Bateson, E. and Perrier, P. eds. Issues in Visual and Audio-Visual Speech Processing, MIT Press, 2004.
26
27
 
28
Roy, D. Learning Visually Grounded Words and Syntax for a Scene Description Task. Computer Speech and Language, 16. 353--385.
 
29
Roy, D. and Pentland, A. Learning Words from Sights and Sounds: A Computational Model. Cognitive Science, 26 (1). 113--146.
30
 
31
 
32
Kaiser, E. C., Barthelmess, P. and Arthur, A., Multimodal Play Back of Collaborative Multiparty Corpora. ICMI '05, Workshop on Multimodal, Multiparty Meeting Processing, (Trento, Italy, 2005).
 
33
Meurville, E. and Leroux, D. D1.2 Collection and Annotation of Meeting Room Data, (M4 Project) http://www.m4project.org/outputs.html, 2004.
 
34
Carletta, J., Ashby, S., Bourban, S., Flynn, M., Guillemot, M., Hain, T., Kadlec, J., Karaiskos, V., Kraaij, W., Kronenthal, M., Lathoud, G., Lincoln, M., Lisowska, A., McCowan, I., Post, W., Reidsma, D. and Wellner, P., The Ami Meeting Corpus: A Pre-Announcement. in 2nd Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms, (Edinburgh, UK, 2005).
 
35
Black, A. W. and Lenzo, K. A., Flite: A Small Fast Run-Time Synthesis Engine. in The 4th ISCA Worskop on Speech Synthesis, (Perthshire, Scotland, 2001).
 
36
Kaiser, E. C., Shacer: A Speech and Handwriting Recognizer. ICMI '05, Workshop on Multimodal, Multiparty Meeting Processing, (Trento, Italy, 2005).
 
37
Dhande, S. S. A Computational Model to Connect Gestalt Perception and Natural Language, Computer Engineering, Program in Media Arts and Sciences, School of Artchitecture and Planning, MIT, Boston, MA., 2003, 82.
38
 
39
Roy, D. and Mukherjee, N. Towards Situated Speech Understanding: Visual Context Priming of Language Models. Computer Speech and Language, 19 (2). 227--248.
 
40
41
 
42
 
43
44
 
45
 
46
Kaiser, E. C. and Cohen, P. R., Implementation Testing of a Hybrid Symbolic/Statistical Multimodal Architecture. ICSLP '02, (Denver, 2002), 173--176.
 
47
Wu, L., Oviatt, S. L. and Cohen, P. R. From Members to Teams to Committee: A Robust Approach to Gestural and Multimodal Recognition. IEEE Transactions on Neural Networks, 13 (4).
 
48
Kaiser, E. C., Can Modeling Redundancy in Multimodal, Multi-Party Tasks Support Dynamic Learning? CHI '05 Workshop: CHI Virtuality 2005, (Port. OR., USA, 2005).
 
49
Gogate, L. J., Walker-Andrews, A. S. and Bahrick, L. E. The Intersensory Origins of Word Comprehension: An Ecological-Dynamic Systems View. Development Science, 4 (1). 1--37.
 
50
Bahrick, L. E., Lickliter, R. and Flom, R. Intersensory Redundancy Guides Infants' Selective Attention, Perceptual and Cognitive Development. Current Directions in Psychological Science, 13. 99--102.
 
51
Baird, J. A. and Baldwin, D. A. Making Sense of Human Behavior: Action Parsing and Intentional Inference. in Malle, B. F., Moses, L. J. and Baldwin, D. A. eds. Intentions and Intentionality, MIT Press, Cambridge, MA., 2001, 193--206.
 
52
Baldwin, D. and Baird, J. A. Discerning Intentions in Dynamic Human Action. TRENDS in Cognitive Science, 5 (4). 171--178.
 
53
Malle, B. F., Moses, L. J. and Baldwin, D. A. Introduction: The Significance of Intentionality. in Malle, B. F., Moses, L. J. and Baldwin, D. A. eds. Intentions and Intentionality: Foundations of Social Cognition, MIT Press, Cambridge, Mass., 2001, 1--27.
 
54
Welleman, H. M. and Phillips, A. T. Developing Intentional Understandings. in Malle, B. F., Moses, L. J. and Baldwin, D. A. eds. Intentions and Intentionality: Foundations of Social Cognition, MIT Press, Cambridge, Mass, 2001, 125--148.
 
55
Woodward, A. L., Sommerville, J. A. and Guajardo, J. J. How Infants Make Sense of Intentional Action. in Malle, B. F., Moses, L. J. and Baldwin, D. A. eds. Intentions and Intentionality, MIT Press, Cambridge, MA, 2001, 149--170.
 
56
Mayer, R. E. and Moreno, R. Nine Ways to Reduce Cognitive Load in Multimedia Learning. Educational Psychologist, 38 (1). 43--52.
 
57
McNeill, D. Growth Points, Catchments, and Contexts. Cognitive Studies: Bulletin of the Japanese Cognitive Science Society, 7 (1).

CITED BY  6