|
ABSTRACT
A set of user interface design techniques for computer-assisted speech transcription are presented and evaluated with respect to task performance and usability. These techniques include error-correction mechanisms which originated in dictation systems and audio editors as well as new techniques developed by us which exploit specific characteristics of existing speech recognition technologies in order to facilitate transcription in settings that typically yield considerable recognition inaccuracy, such as when the speech to be transcribed was produced by different speakers. In particular, we describe a mechanism for dynamic propagation of user feedback which progressively adapts the system to different speakers and lexical contexts. Results of usability and performance evaluation trials indicate that feedback propagation, menu-based correction coupled with keyboard interaction and text-driven audio playback are positively perceived by users and result in improved transcript accuracy.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
A. Artegian. The technology-augmented court record. In Proceedings of the Fifth National Court Technology Conference, 1997.
|
| |
3
|
Audacity sound editor. http://audacity.sf.net/. accessed 2nd July 2008.
|
| |
4
|
|
| |
5
|
S. Borowitz. Computer-based speech recognition as an alternative to medical transcription. Journal of the American Medical Informatics Association, 8:101--102, 2001.
|
| |
6
|
M.-M. Bouamrane and S. Luz. An analytical evaluation of search by content and interaction patterns on multimodal meeting records. Multimedia Systems, 13(2):89--102, 2007.
|
| |
7
|
|
| |
8
|
Halverson, C. A., Horn, D. B., Karat, C.-M., and J. Karat. The beauty of errors: Patterns of error correction in desktop speech systems. In Proceedings of INTERACT'99: Human-Computer Interaction, pages 133--140, 1999.
|
| |
9
|
T. Hazen. Automatic alignment and error correction of human generated transcripts for long speech recordings. In Proceedings of Interspeech'06, pages 1606--1609, Pittsburgh, Pennsylvania, 2006.
|
| |
10
|
|
 |
11
|
|
 |
12
|
Saturnino Luz , Masood Masoodian , Bill Rogers, Interactive visualisation techniques for dynamic speech transcription, correction and training, Proceedings of the 9th ACM SIGCHI New Zealand Chapter's International Conference on Human-Computer Interaction: Design Centered HCI, p.9-16, July 02-02, 2008, Wellington, New Zealand
[doi> 10.1145/1496976.1496978]
|
| |
13
|
I. McCowan, D. Moore, J. Dines, D. Gatica-Perez, M. Flynn, P. Wellner, and H. Bourlard. On the use of information retrieval measures for speech recognition evaluation. Technical Report IDIAP-RR-04-73, LIDIAP, 2004.
|
| |
14
|
D. N. Mohr, D. W. Turner, G. R. Pond, J. S. Kamath, C. B. De Vos, and P. C. Carpenter. Speech recognition as a transcription aid: A randomized comparison with standard transcription. Journal of the American Medical Informatics Association, 10(1):85--93, 2003.
|
| |
15
|
MPI. ELAN: Eucido Linguistic Annotator. Max Planck Institute for Psycholinguistics, March 2005. http://www.mpi.nl/tool/elan.html.
|
 |
16
|
|
| |
17
|
H. Nanjo and T. Kawahara. Towards an efficient archive of spontaneous speech: Design of computer-assisted speech transcription system. The Journal of the Acoustical Society of America, 120:3042, 2006.
|
| |
18
|
|
| |
19
|
D. Rosenthal, F. Chew, D. Dupuy, S. Kattapuram, W. Palmer, R. Yap, and L. Levine. Computer-based speech recognition as a replacement for medical transcription. American Journal of Roentgenology, 170(1):23--5, 1998.
|
| |
20
|
K. Sjölander and J. Beskow. Wavesurfer - an open source speech tool. In Proceedings of the 6th International Conference on Spoken Language Processing. ISCA, 2000.
|
 |
21
|
|
| |
22
|
C. L. Wayne. Topic detection and tracking (TDT): Overview and perspectives. In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, Lansdowne Conference Resort Lansdowne, Virginia, USA, Feb. 1998.
|
| |
23
|
A. Zafar, B. Mamlin, S. Perkins, A. M. Belsito, J. M. Overhage, and C. J. McDonald. A simple error classification system for understanding sources of error in automatic speech recognition and human transcription. International Journal of Medical Informatics, 73:719--730, Sep 2004.
|
|