ACM Home Page
Please provide us with feedback. Feedback
Multimodal model integration for sentence unit detection
Full text PdfPdf (469 KB)
Source International Conference on Multimodal Interfaces archive
Proceedings of the 6th international conference on Multimodal interfaces table of contents
State College, PA, USA
SESSION: Multimodal communication table of contents
Pages: 121 - 128  
Year of Publication: 2004
ISBN:1-58113-995-0
Authors
Mary P. Harper  Purdue University, West Lafayette, IN
Elizabeth Shriberg  SRI International, Menlo Park, CA
Sponsors
SIGCHI: ACM Special Interest Group on Computer-Human Interaction
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 46,   Citation Count: 5
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1027933.1027955
What is a DOI?

ABSTRACT

In this paper, we adopt a direct modeling approach to utilize conversational gesture cues in detecting sentence boundaries, called SUs, in video taped conversations. We treat the detection of SUs as a classification task such that for each inter-word boundary, the classifier decides whether there is an SU boundary or not. In addition to gesture cues, we also utilize prosody and lexical knowledge sources. In this first investigation, we find that gesture features complement the prosodic and lexical knowledge sources for this task. By using all of the knowledge sources, the model is able to achieve the lowest overall SU detection error rate.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
P. Boersma and D. Weeninck. Praat, a system for doing phonetics by computer. Technical Report 132, University of Amsterdam, Inst. of Phonetic Sc., 1996.
 
4
 
5
 
6
R. Bryll, F. Quek, and A. Esposito. Automatic hand hold detection in natural conversation. In IEEE Workshop on Cues in Communication, Kauai,Hawaii, Dec 2001.
 
7
W. Buntine. Learning classification trees. Statistics and Computing, 2:63--73, 1992.
 
8
J. Cassell and M. Stone. Living Hand to Mouth: Psychological Theories about Speech and Gesture in Interactive Dialogue Systems. In AAAI, 1999.
9
 
10
 
11
S. Coquoz. Broadcast news segmentation using mde and stt information to improve speech recognition. Technical report, International Computer Science Institute, 2004.
 
12
A. Esposito, K. E. McCullough, and F. Quek. Disfluencies in gesture: Gestural correlates to speech silent and filled pauses. In Proceeding of IEEE Workshop on Cues in Communication, Kauai,Hawaii, 2001.
 
13
S. Fels and G. Hinton. Glove-talk II - A neural-network interface which maps gestures to parallel formant speech synthesizer controls. IEEE Transactions on Neural Networks, 8:977--984, Sept. 1997.
 
14
 
15
D. Gibbon, B. Hell, K. Looks, and T. Trippel. Formal syntax of gesture : Cogest1.1. Technical report, Univ. of Bielefield, 2003.
 
16
A. Kendon. Some relationships between body motion and speech. In A. W. Siegman and B. Pope, editors, Studies in Dynamic Communication. Pergamon, New York, 1972.
 
17
 
18
Y. Liu, A. Stolcke, E. Shriberg, and M. P. Harper. Comparing and combining generative and posterior probability models: Some advances in sentence boundary detection in speech. In Proceedings of the Empirical Methods in Natural Language Processing, 2004.
 
19
Y. Liu, A. Stolcke, E. Shriberg, and M. P. Harper. Using machine learning to cope with imbalanced classes in natural speech: Evidence from sentence boundary and disfluency detection. In Proceedings of the International Conference on Spoken Language Processing, 2004.
 
20
M. Mateer and A. Taylor. Disfluency annotation stylebook for the Switchboard corpus. Technical report, Department of Computer and Information Science, University of Pennsylvania, 1995.
 
21
D. McNeil. Hand and Mind: What Gestures Reveal about Thought. Univ. Chicago Press, 1992.
 
22
D. McNeill and S. Duncan. Growth points in thinking-for-speaking, chapter~7, pages 141--161. Cambridge Univ. Press, 2000.
 
23
 
24
F. Quek, M. P. Harper, Y. Haciahmetoglu, L. Chen, and L. Ramig. Speech pauses and gestural holds in Parkinson's disease. In Seventh International Conference on Spoken Language Processing, ICSLP, Denver,CO, Sept. 2002.
25
 
26
F. Quek, Y. Shi, C. Kirbas, and S. Wu. Vissta: A tool for analyzing multimodal discourse data. In Seventh International Conference on Spoken Language Processing, Denver,CO, Sept. 2002.
 
27
F. Quek and Y. Xiong. Oscillatory gestures and discourse. In Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP2003, Hong Kong, April 2003.
 
28
F. Quek, Y. Xiong, and D. McNeill. Gestural trajectory symmetries and discourse segmentation. In 7th ICSLP, Denver, CO, Sept. 2002.
 
29
L. Rabiner and B. Juang. An introduction to hidden Markov models. IEEE ASSP Magazine, 3(1):4--16, 1986.
 
30
 
31
E. Shriberg and A. Stolcke. Direct modeling of prosody: An overview of applications in automatic speech processing. In International Conference on Speech Prosody, 2004.
 
32
 
33
K. Sonmez, E. Shriberg, L. Heck, and M. Weintraub. Modeling dynamic prosodic variation for speaker verification. In Proceedings of the International Conference on Spoken Language Processing, pages 3189--3192, 1998.
 
34
A. Stolcke and E. Shriberg. Statistical language modeling for speech disfluencies. In ICASSP, 1996.
 
35
S. Strassel. Simple Metadata Annotation Specification. Linguistic Data Consortium, 5.0 edition, 2003.
 
36


Collaborative Colleagues:
Mary P. Harper: colleagues
Elizabeth Shriberg: colleagues