ACM Home Page
Please provide us with feedback. Feedback
Audio privacy: reducing speech intelligibility while preserving environmental sounds
Full text PdfPdf (1.01 MB)
Source
International Multimedia Conference archive
Proceeding of the 16th ACM international conference on Multimedia table of contents
Vancouver, British Columbia, Canada
SESSION: Content track short papers session 2: content analysis and applications table of contents
Pages 733-736  
Year of Publication: 2008
ISBN:978-1-60558-303-7
Authors
Francine Chen  FX Palo Alto Laboratory, Palo Alto, CA, USA
John Adcock  FX Palo Alto Laboratory, Palo Alto, CA, USA
Shruti Krishnagiri  FX Palo Alto Laboratory, Palo Alto, CA, USA
Sponsors
ACM: Association for Computing Machinery
SIGMULTIMEDIA: ACM Special Interest Group on Multimedia
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 46,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1459359.1459472
What is a DOI?

ABSTRACT

Audio monitoring has many applications but also raises privacy concerns. In an attempt to help alleviate these concerns, we have developed a method for reducing the intelligibility of speech while preserving intonation and the ability to recognize most environmental sounds. The method is based on identifying vocalic regions and replacing the vocal tract transfer function of these regions with the transfer function from prerecorded vowels, where the identity of the replacement vowel is independent of the identity of the spoken syllable. The audio signal is then re-synthesized using the original pitch and energy, but with the modified vocal tract transfer function. We performed an intelligibility study which showed that environmental sounds remained recognizable but speech intelligibility can be dramatically reduced to a 7% word recognition rate.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
K. Caine. Privacy perceptions of visual sensing devices Effects of users' ability and type of sensing device Master's thesis, Georgia Institute of Technology, 2006.
 
2
D. T. Chappell and J. H. L. Hansen. Spectral smoothing for concatenative speech synthesis. In International Conference on Spoken Language Processing, volume 5, pages 1935--1938, 1998.
 
3
 
4
J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fixcus, D. S. Pallet, N. L. Dahlgren, and V. Zue. Timit acoustic-phonetic continuous speech corpus. Linguistic Data Consortium, Philadelphia.
 
5
I. Gauthier, A. C.-N. Wong, W. G. Hayward, and O. S. Cheung. Font tuning associated with expertise in letter perception. Perception, 35:541--559, 2006.
 
6
D. Kewley-Port, T. Z. Burkle, and J. H. Lee. Contribution of consonant versus vowel information to sentence intelligibility for young normal-hearing and elderly hearing-impaired listeners. The Journal of the Acoustical Society of America, 122(4):2365--2375, Oct. 2007.
 
7
L. Rabiner and R. Schafer. Digital Processing of Speech Signals, chapter 7. Prentice-Hall, Inc., 1978.
 
8
C. Schmandt and G. Vallejo. "listenin" to domestic environments from remote locations. In Proc. the 2003 International Conference on Auditory Display, pages 853--856, Boston, MA, 2003.

Collaborative Colleagues:
Francine Chen: colleagues
John Adcock: colleagues
Shruti Krishnagiri: colleagues