|
ABSTRACT
Despite recent advances, including sound source clustering and perceptual auditory masking, high quality rendering of complex virtual scenes with thousands of sound sources remains a challenge. Two major bottlenecks appear as the scene complexity increases: the cost of clustering itself, and the cost of pre-mixing source signals within each cluster. In this paper, we first propose an improved hierarchical clustering algorithm that remains efficient for large numbers of sources and clusters while providing progressive refinement capabilities. We then present a lossy pre-mixing method based on a progressive representation of the input audio signals and the perceptual importance of each sound source. Our quality evaluation user tests indicate that the recently introduced audio saliency map is inappropriate for this task. Consequently we propose a "pinnacle", loudness-based metric, which gives the best results for a variety of target computing budgets. We also performed a perceptual pilot study which indicates that in audio-visual environments, it is better to allocate more clusters to visible sound sources. We propose a new clustering metric using this result. As a result of these three solutions, our system can provide high quality rendering of thousands of 3D-sound sources on a "gamer-style" PC.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Alais, D., and Burr, D. 2004. The ventriloquism effect results from near-optimal bimodal integration. Current Biology 14, 257--262.
|
| |
2
|
AND, C. G. 1993. Methods for quality assessment of low bit-rate audio codecs, proceedings of the 12th aes conference. 97--107.
|
| |
3
|
Berkhout, A., de Vries, D., and Vogel, P. 1993. Acoustic control by wave field synthesis. J. of the Acoustical Society of America 93, 5 (may), 2764--2778.
|
| |
4
|
Blauert, J. 1997. Spatial Hearing: The Psychophysics of Human Sound Localization. M.I.T. Press, Cambridge, MA.
|
| |
5
|
Chen, J., Veen, B. V., and Hecox, K. 1995. A spatial feature extraction and regularization model for the head-related transfer function. J. of the Acoustical Society of America 97 (Jan.), 439--452.
|
| |
6
|
Darlington, D., Daudet, L., and Sandler, M. 2002. Digital audio effects in the wavelet domain. In Proceedings of COST-G6 Conference on Digital Audio Effects, DAFX2002, Hamburg, Germany.
|
| |
7
|
2003. EBU subjective listening tests on low-bitrate audio codecs. Technical report 3296, European Broadcast Union (EBU), Projet Group B/AIM (june).
|
| |
8
|
Fouad, H., Hahn, J., and Ballas, J. 1997. Perceptually based scheduling algorithms for real-time synthesis of complex sonic environments. proceedings of the 1997 International Conference on Auditory Display (ICAD'97),.
|
| |
9
|
Gallo, E., Lemaitre, G., and Tsingos, N. 2005. Prioritizing signals for selective real-time audio processing. In Proc. of ICAD 2005.
|
| |
10
|
W. D. Hairston , M. T. Wallace , J. W. Vaughan , B. E. Stein , J. L. Norris , J. A. Schirillo, Visual Localization Ability Influences Cross-Modal Bias, Journal of Cognitive Neuroscience, v.15 n.1, p.20-29, January 2003
[doi> 10.1162/089892903321107792]
|
| |
11
|
Herder, J. 1999. Optimization of sound spatialization resource management through clustering. The Journal of Three Dimensional Images, 3D-Forum Society 13, 3 (Sept.), 59--65.
|
| |
12
|
Hochbaum, D. S., and Schmoys, D. B. 1985. A best possible heuristic for the ik-center problem. Mathematics of Operations Research 10, 2 (May), 180--184.
|
| |
13
|
Howell, D. C. 1992. Statistical methods for psychology. PWS-Kent.
|
| |
14
|
International Telecom. Union. 2001--2003. Method for the subjective assessment of intermediate quality level of coding systems. Recommendation ITU-R BS. 1534--1.
|
| |
15
|
|
| |
16
|
Itu-R. 1994. Methods for subjective assessment of small impairments in audio systems including multichannel sound systems. itu-r bs 1116. Tech. rep.
|
| |
17
|
Jot, J.-M., and Walsh, M. 2006. Binaural simulation of complex acoustic scenes for interactive audio. In 121th AES Convention, San Francisco, USA. Preprint 6950.
|
| |
18
|
Jot, J.-M., Larcher, V., and Pernaux, J.-M. 1999. A comparative study of 3D audio encoding and rendering techniques. Proceedings of the AES 16th international conference, Spatial sound reproduction, Rovaniemi, Finland (April).
|
| |
19
|
Kayser, C., Petkov, C., Lippert, M., and Logothetis, N. 2005. Mechanisms for allocating auditory attention: An auditory saliency map. Current Biology 15 (Nov.), 1943--1947.
|
| |
20
|
Kelly, M., and Tew, A. 2002. The continuity illusion in virtual auditory space. proc. of the 112th AES Conv., Munich, Germany (May).
|
| |
21
|
Kurniawati, E., Absar, J., George, S., Lau, C. T., and Premkumar, B. 2002. The significance of tonality index and nonlinear psychoacoustics models for masking threshold estimation. In Proceedings of the International Conference on Virtual, Synthetic and Entertainment Audio AES22.
|
| |
22
|
Lanciani, C. A., and Schafer, R. W. 1997. Psychoacoustically-based processing of MPEG-I layer 1--2 encoded signals. In Proc. IEEE Signal Processing Society 1997 Workshop on Multimedia Signal Processing, 53--58.
|
| |
23
|
|
| |
24
|
Larcher, V., Jot, J., Guyard, G., and Warusfel, O. 2000. Study and comparison of efficient methods for 3d audio spatialization based on linear decomposition of HRTF data. Proc. 108th Audio Engineering Society Convention.
|
| |
25
|
Lewald, J., Ehrenstein, W. H., and Guski, R. 2001. Spatio-temporal constraints for auditory-visual integration. Beh. Brain Research 121, 1--2, 69--79.
|
| |
26
|
Malham, D., and Myatt, A. 1995. 3D sound spatialization using ambisonic techniques. Computer Music Journal 19, 4, 58--70.
|
| |
27
|
Møller, H. 1992. Fundamentals of binaural technology. Applied Acoustics 36, 171--218.
|
| |
28
|
Painter, E. M., and Spanias, A. S. 2000. Perceptual coding of digital audio. Proceedings of the IEEE 88, 4 (Apr.).
|
| |
29
|
Sarlat, L., Warusfel, O., and Viaud-Delmon, I. 2006. Ventriloquism after-effects occur in the rear hemisphere. Neuroscience Letters 404, 324--329.
|
| |
30
|
Stoll, G., and Kozamernik, F. 2000. EBU subjective listening tests on internet audio codecs. EBU TECHNICAL REVIEW, (June).
|
 |
31
|
|
| |
32
|
Touimi, A. B. 2000. A generic framework for filtering in subband domain. In In Proc. of IEEE 9th Wkshp. on Digital Signal Processing, Hunt, Texas, USA.
|
 |
33
|
|
| |
34
|
Tsingos, N. 2005. Scalable perceptual mixing and filtering of audio signals using an augmented spectral representation. Proc. of 8th Intl. Conf. on Digital Audio Effects (DAFX'05), Madrid, Spain (Sept.).
|
| |
35
|
Wand, M., and Strasser, W. 2004. Multi-resolution sound rendering. In Symp. Point-Based Graphics.
|
| |
36
|
Zölzer, U., Ed. 2002. DAFX - Digital Audio Effects. Wiley.
|
CITED BY 3
|
|
|
David Grelaud , Nicolas Bonneel , Michael Wimmer , Manuel Asselot , George Drettakis, Efficient and practical audio-visual rendering for games using crossmodal perception, Proceedings of the 2009 symposium on Interactive 3D graphics and games, February 27-March 01, 2009, Boston, Massachusetts
|
|
|
|