ACM Home Page
Please provide us with feedback. Feedback
Multimodal redundancy across handwriting and speech during computer mediated human-human interactions
Full text PdfPdf (1.66 MB)
Source
Conference on Human Factors in Computing Systems archive
Proceedings of the SIGCHI conference on Human factors in computing systems table of contents
San Jose, California, USA
SESSION: Multimodal interactions table of contents
Pages: 1009 - 1018  
Year of Publication: 2007
ISBN:978-1-59593-593-9
Authors
Edward C. Kaiser  Adapx, Seattle, WA
Paulo Barthelmess  Adapx, Seattle, WA
Candice Erdmann  Adapx, Seattle, WA
Phil Cohen  Adapx, Seattle, WA
Sponsors
ACM: Association for Computing Machinery
SIGCHI: ACM Special Interest Group on Computer-Human Interaction
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 90,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1240624.1240778
What is a DOI?

ABSTRACT

Lecturers, presenters and meeting participants often say what they publicly handwrite. In this paper, we report on three empirical explorations of such multimodal redundancy -- during whiteboard presentations, during a spontaneous brainstorming meeting, and during the informal annotation and discussion of photographs. We show that redundantly presented words, compared to other words used during a presentation or meeting, tend to be topic specific and thus are likely to be out-of-vocabulary. We also show that they have significantly higher tf-idf (term frequency-inverse document frequency) weights than other words, which we argue supports the hypothesis that they are dialogue-critical words. We frame the import of these empirical findings by describing SHACER, our recently introduced Speech and HAndwriting reCognizER, which can combine information from instances of redundant handwriting and speech to dynamically learn new vocabulary.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A llauzen, A. and J.-L. Gauvain. Open Vocabulary ASR for Audiovisual Document Indexation. ICASSP '05, (2005).
2
3
 
4
5
6
 
7
Black, A., P. Taylor, and R. Caley, The Festival Speech Synthesis System: System Documentation, in Technical Report HCRC/TR--83. 1998, Human Communication Research Centre.
 
8
Brennan, S. Lexical Entrainment in Spontaneous Dialogue. International Symposium on Spoken Dialogue, (1996), 41--44.
9
 
10
Clark, H.H., Using Language: Cambridge University Press, 1996.
 
11
Garofolo, J., G. Auzanne, and E. Voorhees. The Trec Spoken Document Retrieval Track: A Success Story. RAIO-2000: Content-Based Multimedia Information Access Conference, (2000), 1--20.
 
12
Glass, J., T.J. Hazen, L. Hetherington, and C. Wang. Analysis and Processing of Lecture Audio Data: Preliminary Investigations. HLT-NAACL, Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval, (2004).
 
13
Grice, H.P., Logic and Conversation, in Speech Acts, P. Cole and J. Morgan, Eds, Acad. Press: 1975, NY. 41--58.
 
14
Gupta, A.K. and T. Anastasakos. Dynamic Time Windows for Multimodal Input Fusion. INTERSPEECH-'04, (2004), 1009--1012.
 
15
16
17
 
18
Kaiser, E.C. Shacer: A Speech and Handwriting Recognizer. ICMI '05, Workshop on Multimodal, Multiparty Meeting Processing, (2005).
19
 
20
Kaiser, E.C. and P. Barthelmess. Edge-Splitting in a Cumulative Multimodal System, for a No-Wait Temporal Threshold on Information Fusion, Combined with an under-Specified Display. INTERSPEECH 2006.
 
21
Kaiser, E.C., P. Barthelmess, and A. Arthur. Multimodal Play Back of Collaborative Multiparty Corpora. ICMI '05, Workshop on Multimodal, Multiparty Meeting Processing, (2005).
22
 
23
Logan, B., P. Moreno, J.-M.V. Thong, and E. Whittaker. An Experimental Study of an Audio Indexing System for the Web. ICSLP, (2000).
 
24
Mayer, R.E. and R. Moreno, Nine Ways to Reduce Cognitive Load in Multimedia Learning. Educational Psychologist 38, 1, (2003), 43--52.
 
25
Moreno, R. and R.E. Mayer, Verbal Redundancy in Multimedia Learning: When Reading Helps Listening. Jour. of Educational Psychology 94, 1, (2002), 156--163.
 
26
 
27
Ohtsuki, K., N. Hiroshima, M. Oku, and A. Imamura. Unsupervised Vocabulary Expansion for Automatic Transcription of Broadcast News. ICASSP '05, (2005).
28
29
 
30
 
31
Saraclar, M. and R. Sproat. Lattice-Based Search for Spoken Utterance Retrieval. Proc. HLT/NAACL, (2004), 129--136.
 
32
Seekafile, Http://www.Seekafile.Org/
 
33
Sethy, A., S. Narayanan, and S. Parthasarthy. A Syllable Based Approach for Improved Recognition of Spoken Names. ISCA Pronunciation Modeling Workshop, (2002).
 
34
WaveSurfer, Http://www.Speech.Kth.Se/Wavesurfer/, Dep. of Speech, Music and Hearing, KTH.
 
35
Wickens, C.C., Multiple Resources and Performance Prediction. Theoretical Issues in Ergonomics Science 3, 2, (2002), 159--177.
36
 
37
Yu, H., T. Tomokiyo, Z. Wang, and A. Waibel. New Developments in Automatic Meeting Transcription. ICSLP, (2000).
 
38
Yu, P., K. Chen, C. Ma, and F. Seide, Vocabulary-Independent Indexing of Spontaneous Speech. IEEE Transactions on Speech and Audio Processing 13, 5, (2005), 635--643.
 
39
ZDNet, At The Whiteboard, http://news.zdnet.com/2036-2_22-6035716.html


Collaborative Colleagues:
Edward C. Kaiser: colleagues
Paulo Barthelmess: colleagues
Candice Erdmann: colleagues
Phil Cohen: colleagues