ACM Home Page
Please provide us with feedback. Feedback
Capturing, structuring, and representing ubiquitous audio
Full text PdfPdf (1.78 MB)
Source ACM Transactions on Information Systems (TOIS) archive
Volume 11 ,  Issue 4  (October 1993) table of contents
Pages: 376 - 400  
Year of Publication: 1993
ISSN:1046-8188
Authors
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 30,   Citation Count: 20
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/159764.159761
What is a DOI?

ABSTRACT

Although talking is an integral part of collaboration, there has been little computer support for acquiring and accessing the contents of conversations. Our approach has focused on ubiquitous audio, or the unobtrusive capture of speech interactions in everyday work environments. Speech recognition technology cannot yet transcribe fluent conversational speech, so the words themselves are not available for organizing the captured interactions. Instead, the structure of an interaction is derived from acoustical information inherent in the stored speech and augmented by user interaction during or after capture. This article describes applications for capturing and structuring audio from office discussions and telephone calls, and mechanisms for later retrieval of these stored interactions. An important aspect of retrieval is choosing an appropriate visual representation, and this article describes the evolution of a family of representations across a range of applications. Finally, this work is placed within the broader context of desktop audio, mobile audio applications, and social implications.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
ADES, S., AND SWINEHART, D.C. 1986. Voice annotation and editing in a workstation enwronment. In Proceedings of the 1986 Conference. The American Voice I/O Society, San Jose, Calif., 13 28.
2
 
3
ARONS, B. 1992a. Techniques, perception, and applications of time-compressed speech. In Proceedings of the 1992 Conference. The American Voice I/O Society, San Jose, Calif., 169-177.
4
5
 
6
BEATTIE, G. W., AND BARNARD, P. J 1979. The temporal structure of natural telephone conversations (directory enquiry calls) Lmguistics 17, 213 229.
 
7
BELLOTTI, V., AND SELLEN, A. 1993. Design for privacy in ubiquitous computing environments. In Proceedings of European Conference oil Computer Szepported Cooperative Work. Available as Rank Xerox EuroPARC Tech Rep EPC-93-103
8
9
 
10
CHEN, F. R., AND WITHGOTT, M M. 1992 The use of emphasis to automatically summarize a spoken discourse. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing IEEE, New York, 1-229-232.
11
 
12
 
13
DOURISH, P. 1993. Culture and control in a media space In Proceedzngs of the European Conference on Computer Supported Cooperatme Work. Available as Rank Xerox EuroPARC Tech Rep. EPC-93-101
 
14
 
15
16
17
18
 
19
HINDUS, D. 1992. Semi-structured capture and display of telephone conversations. Master's thesis, Massachusetts Institute of Technolog~y, Cambridge, Mass.
 
20
HORNER, C. 1993. NewsTime: A graphical user interface to audio news. Master's thesis, Massachusetts Institute of Technology, Cambridge, Mass.
21
22
 
23
LAMMING, M., AND NEWMAN, W. 1992. Activity-based information retrieval: Technology in support of human memory. Tech. Rep. 92-002, Rank Xerox EuroPARC.
24
25
26
27
28
29
30
 
31
OSCHMAN, a. B., AND CHAPANIS, h. 1974. The effects of ten communication modes on the behavior of teams during co-operative problem solving. Int. J. Man/Machine Syst. 6, 579 619.
32
33
34
 
35
ROTHFEDER, J. 1992. Privacy for Sale. Simon and Schuster, New York.
 
36
RUq~rER. D.R. 1987. Communicating by Telephone. Pergamon Press, New York.
37
 
38
SCHMANgT, C. 1990. Caltalk: A multi-media calendar. In Proceedings of the 1990 Conference. The American Voice I/O Society, San Jose, Calif., 71-75.
 
39
SCUMANDT, C. 1981. The Intelligent Ear: A graphical interfaceto digital audio. In Proceedings of the IEEE Conference on Cybernctlc~' altd Hocle(v. IEEE, New York, 393 397.
 
40
SCHMANDT, C., AND ARONS, B. 1985. Phone Slave: A graphical telecommunications interface. Proc. Soc. Inf. D~splay 26, 1, 79 82.
 
41
SOCLOF, M., AND ZUE, V. 1990. Collection and analysis of spontaneous and read corpora for spoken language system development. In Proceedmgs of ICSLP. 1105-1108.
 
42
 
43
TIFELMAN, L.J. 1992. VoiceNotes: An application for a voice-controlled hand-held computer. Master's thesm, Massachusetts Institute of Technology, Cambridge, Mass
 
44
STIFELMAN, L. J. 1991. Not just another voice mail system. In Proceedings of the 1991 Conference. American Voice I/O Society, San Jose, Calif., 21-26.
45
46
 
47
WATABE, K., SAKATA, S., MAENO, K., FUKUOKA, H., AND OHMORI, T. 1991. Distributed desktop conferenclng system with multluser multimedia interface. IEEE J. Sel. Areas Commun. 9, 4, 531 539.
 
48
WEISER, M. 1991. The computer for the 21st century. Sc~. Am. 265, 3 (Sept.), 66 75.
 
49
WILCOX, L., AND BUSH, M. 1991. HMM-based wordspotting for vmce editing and indexing. In Proceedings of Eurospeech 91. 25 28.
 
50
ZELLWECER, P., TERRY, D., ANO SWlNE~ART, D. 1988. An overview of the Etherphone system and its applications. In Proceedings of the 2nd IEEE Conference on Computer Workstatmns. IEEE, New York, 160-168.
 
51
ZuE, V.W. 1991. From signals to symbols to meaning. On machine understanding of spoken language. In Proceedings of the 12th International Congress of Phonetic Sciences.

CITED BY  21

Collaborative Colleagues:
Debby Hindus: colleagues
Chris Schmandt: colleagues
Chris Horner: colleagues