| SCANMail: a voicemail interface that makes speech browsable, readable and searchable |
| Full text |
Pdf
(541 KB)
|
| Source
|
Conference on Human Factors in Computing Systems
archive
Proceedings of the SIGCHI conference on Human factors in computing systems: Changing our world, changing ourselves
table of contents
Minneapolis, Minnesota, USA
SESSION: Speech, Audio, Gesture
table of contents
Pages: 275 - 282
Year of Publication: 2002
ISBN:1-58113-453-3
|
|
Authors
|
|
Steve Whittaker
|
AT&T Labs-Research, Florham Park, NJ
|
|
Julia Hirschberg
|
AT&T Labs-Research, Florham Park, NJ
|
|
Brian Amento
|
AT&T Labs-Research, Florham Park, NJ
|
|
Litza Stark
|
AT&T Labs-Research, Florham Park, NJ
|
|
Michiel Bacchiani
|
AT&T Labs-Research, Florham Park, NJ
|
|
Philip Isenhour
|
AT&T Labs-Research, Florham Park, NJ
|
|
Larry Stead
|
AT&T Labs-Research, Florham Park, NJ
|
|
Gary Zamchick
|
AT&T Labs-Research, Florham Park, NJ
|
|
Aaron Rosenberg
|
AT&T Labs-Research, Florham Park, NJ
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 13, Downloads (12 Months): 56, Citation Count: 29
|
|
|
Warning: The download time has expired please click on the item to try again.
ABSTRACT
Increasing amounts of public, corporate, and private speech data are now available on-line. These are limited in their usefulness, however, by the lack of tools to permit their browsing and search. The goal of our research is to provide tools to overcome the inherent difficulties of speech access, by supporting visual scanning, search, and information extraction. We describe a novel principle for the design of UIs to speech data: What You See Is Almost What You Hear (WYSIAWYH). In WYSIAWYH, automatic speech recognition (ASR) generates a transcript of the speech data. The transcript is then used as a visual analogue to that underlying data. A graphical user interface allows users to visually scan, read, annotate and search these transcripts. Users can also use the transcript to access and play specific regions of the underlying message. We first summarize previous studies of voicemail usage that motivated the WYSIAWYH principle, and describe a voicemail UI, SCANMail, that embodies WYSIAWYH. We report on a laboratory experiment and a two-month field trial evaluation. SCANMail outperformed a state of the art voicemail system on core voicemail tasks. This was attributable to SCANMail's support for visual scanning, search and information extraction. While the ASR transcripts contain errors, they nevertheless improve the efficiency of voicemail processing. Transcripts either provide enough information for users to extract key points or to navigate to important regions of the underlying speech, which they can then play directly
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Askwall, S. Computer supported reading vs reading text on paper: a comparison of two reading situations, International Journal of Man Machine Studies, 22, 425--439, 1985.
|
 |
3
|
John Boreczky , Andreas Girgensohn , Gene Golovchinsky , Shingo Uchihashi, An interactive comic book presentation for exploring video, Proceedings of the SIGCHI conference on Human factors in computing systems, p.185-192, April 01-06, 2000, The Hague, The Netherlands
[doi> 10.1145/332040.332428]
|
 |
4
|
Barbara L. Chalfonte , Robert S. Fish , Robert E. Kraut, Expressive richness: a comparison of speech and text as media for revision, Proceedings of the SIGCHI conference on Human factors in computing systems: Reaching through technology, p.21-26, April 27-May 02, 1991, New Orleans, Louisiana, United States
[doi> 10.1145/108844.108848]
|
 |
5
|
Leo Degen , Richard Mander , Gitta Salomon, Working with audio: integrating personal tape recorders and desktop computers, Proceedings of the SIGCHI conference on Human factors in computing systems, p.413-418, May 03-07, 1992, Monterey, California, United States
[doi> 10.1145/142750.142877]
|
| |
6
|
|
 |
7
|
|
| |
8
|
Hirschberg, J., Bacchiani, M., Hindle, D., Isenhour, P., Rosenberg, A., Stark, L., Stead, L., Zamchick, G., and Whittaker, S. SCANMail: Browsing and Searching Speech Data by Content, Proceedings of Eurospeech 2001, Aalborg, 2001.
|
| |
9
|
Hirschberg, J. and Nakatani, C. Acoustic indicators of topic segmentation. In ICSLP98, 1998.
|
 |
10
|
G. J. F. Jones , J. T. Foote , K. Spärck Jones , S. J. Young, Retrieving spoken documents by combining multiple index sources, Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, p.30-38, August 18-22, 1996, Zurich, Switzerland
[doi> 10.1145/243199.243208]
|
| |
11
|
|
 |
12
|
Thomas P. Moran , Leysia Palen , Steve Harrison , Patrick Chiu , Don Kimber , Scott Minneman , William van Melle , Polle Zellweger, “I'll get that off the audio”: a case study of salvaging multimedia meeting records, Proceedings of the SIGCHI conference on Human factors in computing systems, p.202-209, March 22-27, 1997, Atlanta, Georgia, United States
[doi> 10.1145/258549.258704]
|
| |
13
|
Ronald E. Rice , Douglas E. Shook, Voice messaging, coordination, and communication, Intellectual teamwork: social and technological foundations of cooperative work, Lawrence Erlbaum Associates, Inc., Mahwah, NJ, 1990
|
| |
14
|
Rice, R.E., & Tyler, J. (1995). Individual and organizational influences on voicemail use and evaluation. Behaviour and Information Technology, 14(6), 329--341.
|
| |
15
|
Salton, G. The SMART Retrieval System, Prentice-Hall, Englewood Cliffs, NJ, 1971.
|
| |
16
|
Stark, L., Whittaker, S., and Hirschberg, J. ASR satisficing: the effects of ASR accuracy on speech retrieval. In Proceedings of International Conference on Spoken Language Processing, 2000.
|
 |
17
|
|
 |
18
|
Steve Whittaker , Richard Davis , Julia Hirschberg , Urs Muller, Jotmail: a voicemail interface that enables you to see what was said, Proceedings of the SIGCHI conference on Human factors in computing systems, p.89-96, April 01-06, 2000, The Hague, The Netherlands
[doi> 10.1145/332040.332411]
|
 |
19
|
Steve Whittaker , Julia Hirschberg , John Choi , Don Hindle , Fernando Pereira , Amit Singhal, SCAN: designing and evaluating user interfaces to support retrieval from speech archives, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.26-33, August 15-19, 1999, Berkeley, California, United States
[doi> 10.1145/312624.312639]
|
 |
20
|
Steve Whittaker , Julia Hirschberg , Christine H. Nakatani, All talk and all action: strategies for managing voicemail messages, CHI 98 conference summary on Human factors in computing systems, p.249-250, April 18-23, 1998, Los Angeles, California, United States
[doi> 10.1145/286498.286732]
|
 |
21
|
Steve Whittaker , Julia Hirschberg , Christine H. Nakatani, Play it again: a study of the factors underlying speech browsing behavior, CHI 98 conference summary on Human factors in computing systems, p.247-248, April 18-23, 1998, Los Angeles, California, United States
[doi> 10.1145/286498.286731]
|
 |
22
|
Steve Whittaker , Patrick Hyland , Myrtle Wiley, FILOCHAT: handwritten notes provide access to recorded conversations, Proceedings of the SIGCHI conference on Human factors in computing systems: celebrating interdependence, p.271-277, April 24-28, 1994, Boston, Massachusetts, United States
[doi> 10.1145/191666.191763]
|
 |
23
|
|
| |
24
|
Wilcox, L. Chen, F., Kimber, D. and Balasubramanian, V. Segmentation of Speech Using Speaker Identification. Proc. ICASSP, 1994.
|
CITED BY 29
|
|
|
|
|
Sunil Vemuri , Philip DeCamp , Walter Bender , Chris Schmandt, Improving speech playback using time-compression and speech recognition, Proceedings of the SIGCHI conference on Human factors in computing systems, p.295-302, April 24-29, 2004, Vienna, Austria
|
|
|
Kent Lyons , Christopher Skeels , Thad Starner , Cornelis M. Snoeck , Benjamin A. Wong , Daniel Ashbrook, Augmenting conversations using dual-purpose speech, Proceedings of the 17th annual ACM symposium on User interface software and technology, October 24-27, 2004, Santa Fe, NM, USA
|
|
|
|
|
|
Steve Whittaker , Quentin Jones , Bonnie Nardi , Mike Creech , Loren Terveen , Ellen Isaacs , John Hainsworth, ContactMap: Organizing communication in a social desktop, ACM Transactions on Computer-Human Interaction (TOCHI), v.11 n.4, p.445-471, December 2004
|
|
|
|
|
|
|
|
|
Cosmin Munteanu , Ronald Baecker , Gerald Penn , Elaine Toms , David James, The effect of speech recognition accuracy rates on the usefulness and usability of webcast archives, Proceedings of the SIGCHI conference on Human Factors in computing systems, April 22-27, 2006, Montréal, Québec, Canada
|
|
|
Abhishek Ranjan , Ravin Balakrishnan , Mark Chignell, Searching in audio: the utility of transcripts, dichotic presentation, and time-compression, Proceedings of the SIGCHI conference on Human Factors in computing systems, April 22-27, 2006, Montréal, Québec, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Cosmin Munteanu , Gerald Penn , Ron Baecker , Yuecheng Zhang, Automatic speech recognition for webcasts: how good is good enough and what to do when it isn't, Proceedings of the 8th international conference on Multimodal interfaces, November 02-04, 2006, Banff, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Saverio Perugini , Taylor J. Anderson , William F. Moroney, A study of out-of-turn interaction in menu-based, IVR, voicemail systems, Proceedings of the SIGCHI conference on Human factors in computing systems, April 28-May 03, 2007, San Jose, California, USA
|
|
|
Lei Wang , Paul Roe , Binh Pham , Dian Tjondronegoro, An audio wiki supporting mobile collaboration, Proceedings of the 2008 ACM symposium on Applied computing, March 16-20, 2008, Fortaleza, Ceara, Brazil
|
|
|
|
|
|
|
|
|
|
|
|
Siín E. Lindley , Richard Banks , Richard Harper , Anab Jain , Tim Regan , Abigail Sellen , Alex S. Taylor, Resilience in the face of innovation: Household trials with BubbleBoard, International Journal of Human-Computer Studies, v.67 n.2, p.154-164, February, 2009
|
|
|
|
|
|
|
|
|
Saturnino Luz , Masood Masoodian , Bill Rogers, Interactive visualisation techniques for dynamic speech transcription, correction and training, Proceedings of the 9th ACM SIGCHI New Zealand Chapter's International Conference on Human-Computer Interaction: Design Centered HCI, p.9-16, July 02-02, 2008, Wellington, New Zealand
|
|
|
|
|
|
|
|