|
ABSTRACT
As performance gains in automatic speech recognition systems plateau, improvements to existing applications of speech recognition technology seem more likely to come from better user interface design than from further progress in core recognition components. Among all applications of speech recognition, the usability of systems for transcription of spontaneous speech is particularly sensitive to high word error rates. This paper presents a series of approaches to improving the usability of such applications. We propose new mechanisms for error correction, use of contextual information, and use of 3D visualisation techniques to improve user interaction with a recogniser and maximise the impact of user feedback. These proposals are illustrated through several prototypes which target tasks such as: off-line transcript editing, dynamic transcript editing, and real-time visualisation of recognition paths. An evaluation of our dynamic transcript editing system demonstrates the gains that can be made by adding the corrected words to the recogniser's dictionary and then propagating the user's corrections.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
M.-M. Bouamrane. Interaction-based Information Retrieval in Multimodal, Online, Artefact-Focused Meeting Recordings. PhD thesis, Trinity College, Dept of Computer Science, 2007.
|
| |
4
|
M.-M. Bouamrane and S. Luz. Meeting browsing. Multimedia Systems, 12(4--5):439--457, 2007.
|
| |
5
|
M.-M. Bouamrane, S. Luz, and M. Masoodian. History based visual mining of semi-structured audio and text. In Proceedings of the 12th International Multi-media Modelling Conference, MMM06, pages 360--363, Beijing, China, Jan. 2006. IEEE Press.
|
| |
6
|
L. Deng , Y. Wang , K. Wang , A. Acero , H. Hon , J. Droppo , C. Boulis , M. Mahajan , X. D. Huang, Speech and Language Processing for Multimodal Human-Computer Interaction, Journal of VLSI Signal Processing Systems, v.36 n.2-3, p.161-187, February-March 2004
[doi> 10.1023/B:VLSI.0000015095.19623.73]
|
| |
7
|
|
| |
8
|
J. Goldman, S. Renals, S. Bird, F. de Jong, M. Federico, C. Fleischhauer, M. Kornbluh, L. Lamel, D. Oard, C. Stewart, and R. Wright. Accessing the spoken word. International Journal of Digital Libraries, 5(4):287--298, 2005.
|
| |
9
|
Halverson, C. A., Horn, D. B., Karat, C.-M., and J. Karat. The beauty of errors: Patterns of error correction in desktop speech systems. In Proceedings of INTERACT'99: Human-Computer Interaction, pages 133--140, 1999.
|
| |
10
|
T. Hazen. Automatic alignment and error correction of human generated transcripts for long speech recordings. In Procedings of Interspeech'06, pages 1606--1609, Pittsburgh, Pennsylvania, 2006.
|
 |
11
|
Clare-Marie Karat , Christine Halverson , Daniel Horn , John Karat, Patterns of entry and correction in large vocabulary continuous speech recognition systems, Proceedings of the SIGCHI conference on Human factors in computing systems: the CHI is the limit, p.568-575, May 15-20, 1999, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/302979.303160]
|
 |
12
|
|
| |
13
|
M. Masoodian, B. Rogers, and S. Luz. Improving automatic speech transcription for multimedia content. In P. Isaías and M. B. Nunes, editors, Proceedings of WWW/Internet '07, pages 145--152, Vila Real, 2007.
|
| |
14
|
M. Masoodian, B. Rogers, D. Ware, and S. McKoy. TRAED: Speech audio editing using imperfect transcripts. In 12th International Conference on Multi-Media Modeling (MMM 2006), pages 454--259, Beijing, China, 2006. IEEE Computer Society.
|
| |
15
|
H. Nanjo and T. Kawahara. Towards an efficient archive of spontaneous speech: Design of computer-assisted speech transcription system. The Journal of the Acoustical Society of America, 120:3042, 2006.
|
| |
16
|
NIST Automatic Meeting Transcription, Data Collection and Annotation Workshop, 2001.
|
| |
17
|
|
 |
18
|
John Robertson , Wai Yat Wong , Charles Chung , Dong Ki Kim, Automatic speech recognition for generalised time based media retrieval and indexing, Proceedings of the sixth ACM international conference on Multimedia, p.241-246, September 13-16, 1998, Bristol, United Kingdom
[doi> 10.1145/290747.290777]
|
| |
19
|
A. Sears, C. Karat, K. Oseitutu, A. Karimullah, and J. Feng. Productivity, satisfaction, and interaction strategies of individuals with spinal cord injuries and traditional users interacting with speech recognition software. Universal Access in the Information Society, 1(1):4--15, 2001.
|
 |
20
|
|
| |
21
|
University of Maryland. Proceedings of the 2000 Speech Transcription Workshop. NIST, 2000.
|
| |
22
|
A. Waibel, M. Brett, F. Metze, K. Ries, T. Schaaf, T. Schultz, H. Soltau, H. Yu, and K. Zechner. Advances in automatic meeting record creation and access. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, volume 1, pages 597--600. IEEE Press, 2001.
|
| |
23
|
|
| |
24
|
P. Wellner, M. Flynn, and M. Guillemot. Browsing recorded meetings with Ferret. In S. Bengio and H. Bourlard, editors, Proceedings of Machine Learning for Multimodal Interaction: First International Workshop, MLMI 2004, volume 3361, pages 12--21, Martigny, Switzerland, June 2004. Springer-Verlag GmbH.
|
 |
25
|
Steve Whittaker , Julia Hirschberg , Brian Amento , Litza Stark , Michiel Bacchiani , Philip Isenhour , Larry Stead , Gary Zamchick , Aaron Rosenberg, SCANMail: a voicemail interface that makes speech browsable, readable and searchable, Proceedings of the SIGCHI conference on Human factors in computing systems: Changing our world, changing ourselves, April 20-25, 2002, Minneapolis, Minnesota, USA
[doi> 10.1145/503376.503426]
|
|