ACM Home Page
Please provide us with feedback. Feedback
SCANMail: a voicemail interface that makes speech browsable, readable and searchable
Full text PdfPdf (541 KB)
Source Conference on Human Factors in Computing Systems archive
Proceedings of the SIGCHI conference on Human factors in computing systems: Changing our world, changing ourselves table of contents
Minneapolis, Minnesota, USA
SESSION: Speech, Audio, Gesture table of contents
Pages: 275 - 282  
Year of Publication: 2002
ISBN:1-58113-453-3
Authors
Steve Whittaker  AT&T Labs-Research, Florham Park, NJ
Julia Hirschberg  AT&T Labs-Research, Florham Park, NJ
Brian Amento  AT&T Labs-Research, Florham Park, NJ
Litza Stark  AT&T Labs-Research, Florham Park, NJ
Michiel Bacchiani  AT&T Labs-Research, Florham Park, NJ
Philip Isenhour  AT&T Labs-Research, Florham Park, NJ
Larry Stead  AT&T Labs-Research, Florham Park, NJ
Gary Zamchick  AT&T Labs-Research, Florham Park, NJ
Aaron Rosenberg  AT&T Labs-Research, Florham Park, NJ
Sponsor
SIGCHI: ACM Special Interest Group on Computer-Human Interaction
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 56,   Citation Count: 29
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/503376.503426
What is a DOI?

Warning: The download time has expired please click on the item to try again.


ABSTRACT

Increasing amounts of public, corporate, and private speech data are now available on-line. These are limited in their usefulness, however, by the lack of tools to permit their browsing and search. The goal of our research is to provide tools to overcome the inherent difficulties of speech access, by supporting visual scanning, search, and information extraction. We describe a novel principle for the design of UIs to speech data: What You See Is Almost What You Hear (WYSIAWYH). In WYSIAWYH, automatic speech recognition (ASR) generates a transcript of the speech data. The transcript is then used as a visual analogue to that underlying data. A graphical user interface allows users to visually scan, read, annotate and search these transcripts. Users can also use the transcript to access and play specific regions of the underlying message. We first summarize previous studies of voicemail usage that motivated the WYSIAWYH principle, and describe a voicemail UI, SCANMail, that embodies WYSIAWYH. We report on a laboratory experiment and a two-month field trial evaluation. SCANMail outperformed a state of the art voicemail system on core voicemail tasks. This was attributable to SCANMail's support for visual scanning, search and information extraction. While the ASR transcripts contain errors, they nevertheless improve the efficiency of voicemail processing. Transcripts either provide enough information for users to extract key points or to navigate to important regions of the underlying speech, which they can then play directly


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Askwall, S. Computer supported reading vs reading text on paper: a comparison of two reading situations, International Journal of Man Machine Studies, 22, 425--439, 1985.
3
4
5
 
6
7
 
8
Hirschberg, J., Bacchiani, M., Hindle, D., Isenhour, P., Rosenberg, A., Stark, L., Stead, L., Zamchick, G., and Whittaker, S. SCANMail: Browsing and Searching Speech Data by Content, Proceedings of Eurospeech 2001, Aalborg, 2001.
 
9
Hirschberg, J. and Nakatani, C. Acoustic indicators of topic segmentation. In ICSLP98, 1998.
10
 
11
12
 
13
 
14
Rice, R.E., & Tyler, J. (1995). Individual and organizational influences on voicemail use and evaluation. Behaviour and Information Technology, 14(6), 329--341.
 
15
Salton, G. The SMART Retrieval System, Prentice-Hall, Englewood Cliffs, NJ, 1971.
 
16
Stark, L., Whittaker, S., and Hirschberg, J. ASR satisficing: the effects of ASR accuracy on speech retrieval. In Proceedings of International Conference on Spoken Language Processing, 2000.
17
18
19
20
21
22
23
 
24
Wilcox, L. Chen, F., Kimber, D. and Balasubramanian, V. Segmentation of Speech Using Speaker Identification. Proc. ICASSP, 1994.

CITED BY  29

Collaborative Colleagues:
Steve Whittaker: colleagues
Julia Hirschberg: colleagues
Brian Amento: colleagues
Litza Stark: colleagues
Michiel Bacchiani: colleagues
Philip Isenhour: colleagues
Larry Stead: colleagues
Gary Zamchick: colleagues
Aaron Rosenberg: colleagues