|
ABSTRACT
Lecturers, presenters and meeting participants often say what they publicly handwrite. In this paper, we report on three empirical explorations of such multimodal redundancy -- during whiteboard presentations, during a spontaneous brainstorming meeting, and during the informal annotation and discussion of photographs. We show that redundantly presented words, compared to other words used during a presentation or meeting, tend to be topic specific and thus are likely to be out-of-vocabulary. We also show that they have significantly higher tf-idf (term frequency-inverse document frequency) weights than other words, which we argue supports the hypothesis that they are dialogue-critical words. We frame the import of these empirical findings by describing SHACER, our recently introduced Speech and HAndwriting reCognizER, which can combine information from instances of redundant handwriting and speech to dynamically learn new vocabulary.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A llauzen, A. and J.-L. Gauvain. Open Vocabulary ASR for Audiovisual Document Indexation. ICASSP '05, (2005).
|
 |
2
|
Richard Anderson , Crystal Hoyer , Craig Prince , Jonathan Su , Fred Videon , Steve Wolfman, Speech, ink, and slides: the interaction of content channels, Proceedings of the 12th annual ACM international conference on Multimedia, October 10-16, 2004, New York, NY, USA
[doi> 10.1145/1027527.1027713]
|
 |
3
|
Richard J. Anderson , Crystal Hoyer , Steven A. Wolfman , Ruth Anderson, A study of digital ink in lecture presentation, Proceedings of the SIGCHI conference on Human factors in computing systems, p.567-574, April 24-29, 2004, Vienna, Austria
[doi> 10.1145/985692.985764]
|
| |
4
|
|
 |
5
|
|
 |
6
|
Paulo Barthelmess , Edward Kaiser , Xiao Huang , David McGee , Philip Cohen, Collaborative multimodal photo annotation over digital paper, Proceedings of the 8th international conference on Multimodal interfaces, November 02-04, 2006, Banff, Alberta, Canada
[doi> 10.1145/1180995.1181000]
|
| |
7
|
Black, A., P. Taylor, and R. Caley, The Festival Speech Synthesis System: System Documentation, in Technical Report HCRC/TR--83. 1998, Human Communication Research Centre.
|
| |
8
|
Brennan, S. Lexical Entrainment in Spontaneous Dialogue. International Symposium on Spoken Dialogue, (1996), 41--44.
|
 |
9
|
Joyce Y. Chai , Zahar Prasov , Joseph Blaim , Rong Jin, Linguistic theories in efficient multimodal reference resolution: an empirical investigation, Proceedings of the 10th international conference on Intelligent user interfaces, January 10-13, 2005, San Diego, California, USA
[doi> 10.1145/1040830.1040850]
|
| |
10
|
Clark, H.H., Using Language: Cambridge University Press, 1996.
|
| |
11
|
Garofolo, J., G. Auzanne, and E. Voorhees. The Trec Spoken Document Retrieval Track: A Success Story. RAIO-2000: Content-Based Multimedia Information Access Conference, (2000), 1--20.
|
| |
12
|
Glass, J., T.J. Hazen, L. Hetherington, and C. Wang. Analysis and Processing of Lecture Audio Data: Preliminary Investigations. HLT-NAACL, Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval, (2004).
|
| |
13
|
Grice, H.P., Logic and Conversation, in Speech Acts, P. Cole and J. Morgan, Eds, Acad. Press: 1975, NY. 41--58.
|
| |
14
|
Gupta, A.K. and T. Anastasakos. Dynamic Time Windows for Multimodal Input Fusion. INTERSPEECH-'04, (2004), 1009--1012.
|
| |
15
|
|
 |
16
|
Ed Kaiser , David Demirdjian , Alexander Gruenstein , Xiaoguang Li , John Niekrasz , Matt Wesson , Sanjeev Kumar, A multimodal learning interface for sketch, speak and point creation of a schedule chart, Proceedings of the 6th international conference on Multimodal interfaces, October 13-15, 2004, State College, PA, USA
[doi> 10.1145/1027933.1027992]
|
 |
17
|
|
| |
18
|
Kaiser, E.C. Shacer: A Speech and Handwriting Recognizer. ICMI '05, Workshop on Multimodal, Multiparty Meeting Processing, (2005).
|
 |
19
|
|
| |
20
|
Kaiser, E.C. and P. Barthelmess. Edge-Splitting in a Cumulative Multimodal System, for a No-Wait Temporal Threshold on Information Fusion, Combined with an under-Specified Display. INTERSPEECH 2006.
|
| |
21
|
Kaiser, E.C., P. Barthelmess, and A. Arthur. Multimodal Play Back of Collaborative Multiparty Corpora. ICMI '05, Workshop on Multimodal, Multiparty Meeting Processing, (2005).
|
 |
22
|
Kazutaka Kurihara , Masataka Goto , Jun Ogata , Takeo Igarashi, Speech pen: predictive handwriting based on ambient multimodal recognition, Proceedings of the SIGCHI conference on Human Factors in computing systems, April 22-27, 2006, Montréal, Québec, Canada
[doi> 10.1145/1124772.1124897]
|
| |
23
|
Logan, B., P. Moreno, J.-M.V. Thong, and E. Whittaker. An Experimental Study of an Audio Indexing System for the Web. ICSLP, (2000).
|
| |
24
|
Mayer, R.E. and R. Moreno, Nine Ways to Reduce Cognitive Load in Multimedia Learning. Educational Psychologist 38, 1, (2003), 43--52.
|
| |
25
|
Moreno, R. and R.E. Mayer, Verbal Redundancy in Multimedia Learning: When Reading Helps Listening. Jour. of Educational Psychology 94, 1, (2002), 156--163.
|
| |
26
|
|
| |
27
|
Ohtsuki, K., N. Hiroshima, M. Oku, and A. Imamura. Unsupervised Vocabulary Expansion for Automatic Transcription of Broadcast News. ICASSP '05, (2005).
|
 |
28
|
|
 |
29
|
Sharon Oviatt , Antonella DeAngeli , Karen Kuhn, Integration and synchronization of input modes during multimodal human-computer interaction, Proceedings of the SIGCHI conference on Human factors in computing systems, p.415-422, March 22-27, 1997, Atlanta, Georgia, United States
[doi> 10.1145/258549.258821]
|
| |
30
|
|
| |
31
|
Saraclar, M. and R. Sproat. Lattice-Based Search for Spoken Utterance Retrieval. Proc. HLT/NAACL, (2004), 129--136.
|
| |
32
|
Seekafile, Http://www.Seekafile.Org/
|
| |
33
|
Sethy, A., S. Narayanan, and S. Parthasarthy. A Syllable Based Approach for Improved Recognition of Spoken Names. ISCA Pronunciation Modeling Workshop, (2002).
|
| |
34
|
WaveSurfer, Http://www.Speech.Kth.Se/Wavesurfer/, Dep. of Speech, Music and Hearing, KTH.
|
| |
35
|
Wickens, C.C., Multiple Resources and Performance Prediction. Theoretical Issues in Ergonomics Science 3, 2, (2002), 159--177.
|
 |
36
|
P. C. Woodland , S. E. Johnson , P. Jourlin , K. Spärck Jones, Effects of out of vocabulary words in spoken document retrieval (poster session), Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, p.372-374, July 24-28, 2000, Athens, Greece
[doi> 10.1145/345508.345661]
|
| |
37
|
Yu, H., T. Tomokiyo, Z. Wang, and A. Waibel. New Developments in Automatic Meeting Transcription. ICSLP, (2000).
|
| |
38
|
Yu, P., K. Chen, C. Ma, and F. Seide, Vocabulary-Independent Indexing of Spontaneous Speech. IEEE Transactions on Speech and Audio Processing 13, 5, (2005), 635--643.
|
| |
39
|
ZDNet, At The Whiteboard, http://news.zdnet.com/2036-2_22-6035716.html
|
|