|
ABSTRACT
Although talking is an integral part of collaboration, there has been little computer support for acquiring and accessing the contents of conversations. Our approach has focused on ubiquitous audio, or the unobtrusive capture of speech interactions in everyday work environments. Speech recognition technology cannot yet transcribe fluent conversational speech, so the words themselves are not available for organizing the captured interactions. Instead, the structure of an interaction is derived from acoustical information inherent in the stored speech and augmented by user interaction during or after capture. This article describes applications for capturing and structuring audio from office discussions and telephone calls, and mechanisms for later retrieval of these stored interactions. An important aspect of retrieval is choosing an appropriate visual representation, and this article describes the evolution of a family of representations across a range of applications. Finally, this work is placed within the broader context of desktop audio, mobile audio applications, and social implications.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
ADES, S., AND SWINEHART, D.C. 1986. Voice annotation and editing in a workstation enwronment. In Proceedings of the 1986 Conference. The American Voice I/O Society, San Jose, Calif., 13 28.
|
 |
2
|
|
| |
3
|
ARONS, B. 1992a. Techniques, perception, and applications of time-compressed speech. In Proceedings of the 1992 Conference. The American Voice I/O Society, San Jose, Calif., 169-177.
|
 |
4
|
|
 |
5
|
|
| |
6
|
BEATTIE, G. W., AND BARNARD, P. J 1979. The temporal structure of natural telephone conversations (directory enquiry calls) Lmguistics 17, 213 229.
|
| |
7
|
BELLOTTI, V., AND SELLEN, A. 1993. Design for privacy in ubiquitous computing environments. In Proceedings of European Conference oil Computer Szepported Cooperative Work. Available as Rank Xerox EuroPARC Tech Rep EPC-93-103
|
 |
8
|
|
 |
9
|
Barbara L. Chalfonte , Robert S. Fish , Robert E. Kraut, Expressive richness: a comparison of speech and text as media for revision, Proceedings of the SIGCHI conference on Human factors in computing systems: Reaching through technology, p.21-26, April 27-May 02, 1991, New Orleans, Louisiana, United States
[doi> 10.1145/108844.108848]
|
| |
10
|
CHEN, F. R., AND WITHGOTT, M M. 1992 The use of emphasis to automatically summarize a spoken discourse. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing IEEE, New York, 1-229-232.
|
 |
11
|
Leo Degen , Richard Mander , Gitta Salomon, Working with audio: integrating personal tape recorders and desktop computers, Proceedings of the SIGCHI conference on Human factors in computing systems, p.413-418, May 03-07, 1992, Monterey, California, United States
[doi> 10.1145/142750.142877]
|
| |
12
|
Alan R. Dennis , Joey F. George , Len M. Jessup , Jay F. Nunamaker, Jr. , Douglas R. Vogel, “Information technology to support electronic meetings", Management Information Systems Quarterly, v.12 n.4, p.591-624, December 1988
[doi> 10.2307/249135]
|
| |
13
|
DOURISH, P. 1993. Culture and control in a media space In Proceedzngs of the European Conference on Computer Supported Cooperatme Work. Available as Rank Xerox EuroPARC Tech Rep. EPC-93-101
|
| |
14
|
|
| |
15
|
|
 |
16
|
Robert S. Fish , Robert E. Kraut , Mary D. P. Leland, Quilt: a collaborative tool for cooperative writing, Proceedings of the ACM SIGOIS and IEEECS TC-OA 1988 conference on Office information systems, p.30-37, March 23-25, 1988, Palo Alto, California, United States
|
 |
17
|
|
 |
18
|
William Gaver , Thomas Moran , Allan MacLean , Lennart Lövstrand , Paul Dourish , Kathleen Carter , William Buxton, Realizing a video environment: EuroPARC's RAVE system, Proceedings of the SIGCHI conference on Human factors in computing systems, p.27-35, May 03-07, 1992, Monterey, California, United States
[doi> 10.1145/142750.142754]
|
| |
19
|
HINDUS, D. 1992. Semi-structured capture and display of telephone conversations. Master's thesis, Massachusetts Institute of Technolog~y, Cambridge, Mass.
|
| |
20
|
HORNER, C. 1993. NewsTime: A graphical user interface to audio news. Master's thesis, Massachusetts Institute of Technology, Cambridge, Mass.
|
 |
21
|
|
 |
22
|
|
| |
23
|
LAMMING, M., AND NEWMAN, W. 1992. Activity-based information retrieval: Technology in support of human memory. Tech. Rep. 92-002, Rank Xerox EuroPARC.
|
 |
24
|
|
 |
25
|
|
 |
26
|
|
 |
27
|
Marilyn M. Mantei , Ronald M. Baecker , Abigail J. Sellen , William A. S. Buxton , Thomas Milligan , Barry Wellman, Experiences in the use of a media space, Proceedings of the SIGCHI conference on Human factors in computing systems: Reaching through technology, p.203-208, April 27-May 02, 1991, New Orleans, Louisiana, United States
[doi> 10.1145/108844.108888]
|
 |
28
|
Michael Mills , Jonathan Cohen , Yin Yin Wong, A magnifier tool for video data, Proceedings of the SIGCHI conference on Human factors in computing systems, p.93-98, May 03-07, 1992, Monterey, California, United States
[doi> 10.1145/142750.142764]
|
 |
29
|
|
 |
30
|
|
| |
31
|
OSCHMAN, a. B., AND CHAPANIS, h. 1974. The effects of ten communication modes on the behavior of teams during co-operative problem solving. Int. J. Man/Machine Syst. 6, 579 619.
|
 |
32
|
|
 |
33
|
|
 |
34
|
|
| |
35
|
ROTHFEDER, J. 1992. Privacy for Sale. Simon and Schuster, New York.
|
| |
36
|
RUq~rER. D.R. 1987. Communicating by Telephone. Pergamon Press, New York.
|
 |
37
|
|
| |
38
|
SCHMANgT, C. 1990. Caltalk: A multi-media calendar. In Proceedings of the 1990 Conference. The American Voice I/O Society, San Jose, Calif., 71-75.
|
| |
39
|
SCUMANDT, C. 1981. The Intelligent Ear: A graphical interfaceto digital audio. In Proceedings of the IEEE Conference on Cybernctlc~' altd Hocle(v. IEEE, New York, 393 397.
|
| |
40
|
SCHMANDT, C., AND ARONS, B. 1985. Phone Slave: A graphical telecommunications interface. Proc. Soc. Inf. D~splay 26, 1, 79 82.
|
| |
41
|
SOCLOF, M., AND ZUE, V. 1990. Collection and analysis of spontaneous and read corpora for spoken language system development. In Proceedmgs of ICSLP. 1105-1108.
|
| |
42
|
|
| |
43
|
TIFELMAN, L.J. 1992. VoiceNotes: An application for a voice-controlled hand-held computer. Master's thesm, Massachusetts Institute of Technology, Cambridge, Mass
|
| |
44
|
STIFELMAN, L. J. 1991. Not just another voice mail system. In Proceedings of the 1991 Conference. American Voice I/O Society, San Jose, Calif., 21-26.
|
 |
45
|
Lisa J. Stifelman , Barry Arons , Chris Schmandt , Eric A. Hulteen, VoiceNotes: a speech interface for a hand-held voice notetaker, Proceedings of the SIGCHI conference on Human factors in computing systems, p.179-186, April 24-29, 1993, Amsterdam, The Netherlands
[doi> 10.1145/169059.169150]
|
 |
46
|
|
| |
47
|
WATABE, K., SAKATA, S., MAENO, K., FUKUOKA, H., AND OHMORI, T. 1991. Distributed desktop conferenclng system with multluser multimedia interface. IEEE J. Sel. Areas Commun. 9, 4, 531 539.
|
| |
48
|
WEISER, M. 1991. The computer for the 21st century. Sc~. Am. 265, 3 (Sept.), 66 75.
|
| |
49
|
WILCOX, L., AND BUSH, M. 1991. HMM-based wordspotting for vmce editing and indexing. In Proceedings of Eurospeech 91. 25 28.
|
| |
50
|
ZELLWECER, P., TERRY, D., ANO SWlNE~ART, D. 1988. An overview of the Etherphone system and its applications. In Proceedings of the 2nd IEEE Conference on Computer Workstatmns. IEEE, New York, 160-168.
|
| |
51
|
ZuE, V.W. 1991. From signals to symbols to meaning. On machine understanding of spoken language. In Proceedings of the 12th International Congress of Phonetic Sciences.
|
CITED BY 21
|
|
|
|
|
|
|
|
|
|
|
Steve Whittaker , Richard Davis , Julia Hirschberg , Urs Muller, Jotmail: a voicemail interface that enables you to see what was said, Proceedings of the SIGCHI conference on Human factors in computing systems, p.89-96, April 01-06, 2000, The Hague, The Netherlands
|
|
|
Scott Minneman , Steve Harrison , Bill Janssen , Gordon Kurtenbach , Thomas Moran , Ian Smith , Bill van Melle, A confederation of tools for capturing and accessing collaborative activity, Proceedings of the third ACM international conference on Multimedia, p.523-534, November 05-09, 1995, San Francisco, California, United States
|
|
|
Steve Whittaker , Julia Hirschberg , Brian Amento , Litza Stark , Michiel Bacchiani , Philip Isenhour , Larry Stead , Gary Zamchick , Aaron Rosenberg, SCANMail: a voicemail interface that makes speech browsable, readable and searchable, Proceedings of the SIGCHI conference on Human factors in computing systems: Changing our world, changing ourselves, April 20-25, 2002, Minneapolis, Minnesota, USA
|
|
|
Lynn D. Wilcox , Bill N. Schilit , Nitin Sawhney, Dynomite: a dynamically organized ink and audio notebook, Proceedings of the SIGCHI conference on Human factors in computing systems, p.186-193, March 22-27, 1997, Atlanta, Georgia, United States
|
|
|
Donald G. Kimber , Lynn D. Wilcox , Francine R. Chen , Thomas P. Moran, Speaker segmentation for browsing recorded audio, Conference companion on Human factors in computing systems, p.212-213, May 07-11, 1995, Denver, Colorado, United States
|
|
|
Stuart Goose , Michael Wynblatt , Hans Mollenhauer, 1-800-hypertext: browsing hypertext with a telephone, Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems, p.287-288, June 20-24, 1998, Pittsburgh, Pennsylvania, United States
|
|
|
|
|
|
|
|
|
Chris Schmandt , Jang Kim , Kwan Lee , Gerardo Vallejo , Mark Ackerman, Mediated voice communication via mobile IP, Proceedings of the 15th annual ACM symposium on User interface software and technology, October 27-30, 2002, Paris, France
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Lei Wang , Paul Roe , Binh Pham , Dian Tjondronegoro, An audio wiki supporting mobile collaboration, Proceedings of the 2008 ACM symposium on Applied computing, March 16-20, 2008, Fortaleza, Ceara, Brazil
|
INDEX TERMS
Primary Classification:
C.
Computer Systems Organization
Additional Classification:
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
H.4
INFORMATION SYSTEMS APPLICATIONS
H.5
INFORMATION INTERFACES AND PRESENTATION (I.7)
H.5.1
Multimedia Information Systems
Subjects:
Audio input/output
H.5.2
User Interfaces (D.2.2, H.1.2, I.3.6)
Subjects:
Interaction styles (e.g., commands, menus, forms, direct manipulation)
H.5.3
Group and Organization Interfaces
Subjects:
Synchronous interaction;
Asynchronous interaction
General Terms:
Design,
Human Factors
Keywords:
audio interactions,
collaborative work,
multimedia workstation software,
semi-structured data,
software telephony,
stored speech,
ubiquitous computing
|