ACM Home Page
Please provide us with feedback. Feedback
Interactive visualisation techniques for dynamic speech transcription, correction and training
Full text PdfPdf (1.38 MB)
Source ACM International Conference Proceeding Series archive
Proceedings of the 9th ACM SIGCHI New Zealand Chapter's International Conference on Human-Computer Interaction: Design Centered HCI table of contents
Wellington, New Zealand
Pages 9-16  
Year of Publication: 2008
ISBN:978-1-60558-467-6
Authors
Saturnino Luz  Trinity College Dublin, Ireland
Masood Masoodian  The University of Waikato, New Zealand
Bill Rogers  The University of Waikato, New Zealand
Sponsors
: Victoria University of Wellington
: New Zealand Chapter of ACM SIGCHI
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 24,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1496976.1496978
What is a DOI?

ABSTRACT

As performance gains in automatic speech recognition systems plateau, improvements to existing applications of speech recognition technology seem more likely to come from better user interface design than from further progress in core recognition components. Among all applications of speech recognition, the usability of systems for transcription of spontaneous speech is particularly sensitive to high word error rates. This paper presents a series of approaches to improving the usability of such applications. We propose new mechanisms for error correction, use of contextual information, and use of 3D visualisation techniques to improve user interaction with a recogniser and maximise the impact of user feedback. These proposals are illustrated through several prototypes which target tasks such as: off-line transcript editing, dynamic transcript editing, and real-time visualisation of recognition paths. An evaluation of our dynamic transcript editing system demonstrates the gains that can be made by adding the corrected words to the recogniser's dictionary and then propagating the user's corrections.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
M.-M. Bouamrane. Interaction-based Information Retrieval in Multimodal, Online, Artefact-Focused Meeting Recordings. PhD thesis, Trinity College, Dept of Computer Science, 2007.
 
4
M.-M. Bouamrane and S. Luz. Meeting browsing. Multimedia Systems, 12(4--5):439--457, 2007.
 
5
M.-M. Bouamrane, S. Luz, and M. Masoodian. History based visual mining of semi-structured audio and text. In Proceedings of the 12th International Multi-media Modelling Conference, MMM06, pages 360--363, Beijing, China, Jan. 2006. IEEE Press.
 
6
 
7
 
8
J. Goldman, S. Renals, S. Bird, F. de Jong, M. Federico, C. Fleischhauer, M. Kornbluh, L. Lamel, D. Oard, C. Stewart, and R. Wright. Accessing the spoken word. International Journal of Digital Libraries, 5(4):287--298, 2005.
 
9
Halverson, C. A., Horn, D. B., Karat, C.-M., and J. Karat. The beauty of errors: Patterns of error correction in desktop speech systems. In Proceedings of INTERACT'99: Human-Computer Interaction, pages 133--140, 1999.
 
10
T. Hazen. Automatic alignment and error correction of human generated transcripts for long speech recordings. In Procedings of Interspeech'06, pages 1606--1609, Pittsburgh, Pennsylvania, 2006.
11
12
 
13
M. Masoodian, B. Rogers, and S. Luz. Improving automatic speech transcription for multimedia content. In P. Isaías and M. B. Nunes, editors, Proceedings of WWW/Internet '07, pages 145--152, Vila Real, 2007.
 
14
M. Masoodian, B. Rogers, D. Ware, and S. McKoy. TRAED: Speech audio editing using imperfect transcripts. In 12th International Conference on Multi-Media Modeling (MMM 2006), pages 454--259, Beijing, China, 2006. IEEE Computer Society.
 
15
H. Nanjo and T. Kawahara. Towards an efficient archive of spontaneous speech: Design of computer-assisted speech transcription system. The Journal of the Acoustical Society of America, 120:3042, 2006.
 
16
NIST Automatic Meeting Transcription, Data Collection and Annotation Workshop, 2001.
 
17
18
 
19
A. Sears, C. Karat, K. Oseitutu, A. Karimullah, and J. Feng. Productivity, satisfaction, and interaction strategies of individuals with spinal cord injuries and traditional users interacting with speech recognition software. Universal Access in the Information Society, 1(1):4--15, 2001.
20
 
21
University of Maryland. Proceedings of the 2000 Speech Transcription Workshop. NIST, 2000.
 
22
A. Waibel, M. Brett, F. Metze, K. Ries, T. Schaaf, T. Schultz, H. Soltau, H. Yu, and K. Zechner. Advances in automatic meeting record creation and access. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, volume 1, pages 597--600. IEEE Press, 2001.
 
23
 
24
P. Wellner, M. Flynn, and M. Guillemot. Browsing recorded meetings with Ferret. In S. Bengio and H. Bourlard, editors, Proceedings of Machine Learning for Multimodal Interaction: First International Workshop, MLMI 2004, volume 3361, pages 12--21, Martigny, Switzerland, June 2004. Springer-Verlag GmbH.
25


Collaborative Colleagues:
Saturnino Luz: colleagues
Masood Masoodian: colleagues
Bill Rogers: colleagues