|
ABSTRACT
While monochrome unformatted text and richly colored graphical content are both capable of conveying a message, well designed graphical content has the potential for better engaging the human sensory system. It is our contention that the author of an audio presentation should be afforded the benefit of judiciously exploiting the human aural perceptual ability to deliver content in a more compelling, concise and realistic manner. While contemporary streaming media players and voice browsers share the ability to render content non-textually, neither technology is currently capable of rendering three dimensional media. The contributions described in this paper are proposed 3D audio extensions to SMIL and a server-based framework able to receive a request and, on-demand, process such a SMIL file and dynamically create the multiple simultaneous audio objects, spatialize them in 3D space, multiplex them into a single stereo audio and prepare it for transmission over an audio stream to a mobile device. To the knowledge of the authors, this is the first reported solution for delivering and rendering on a commercially available wireless handheld device a rich 3D audio listening experience as described by a markup language. Naturally, in addition to mobile devices this solution also works with desktop streaming media players.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Arons, B., A Review of the Cocktail Party Effect, Journal of the American Voice I/O Society 12, pages 35--50, July 1992.
|
 |
2
|
|
| |
3
|
Aural Cascading Style Sheets (ACSS). W3C Note, http://www.w3.org/Style/css/Speech/NOTE-ACSS
|
| |
4
|
Blattner, M., Sumikawa, D., and Greenberg, R., Earcons and Icons: Their Structure and Common Design Principles, Human-Computer Interaction, 4(1), pages 11--44, 1989.
|
| |
5
|
Bregman, A., Auditory Scene Analysis: The Perception and Organization of Sound. MIT Press, 1990.
|
| |
6
|
Emblaze, http://www.emblaze.com
|
| |
7
|
Foulke, E. and Sticht, T., Review of Research on the Intelligibility and Compression of Accelerated Speech, Psychological Bulletin, 72(1), pages 50--62, 1969.
|
| |
8
|
Gaver, W., Auditory Icons: Using Sound in Computer Interfaces. Human Computer Interaction, 2(2), pages 167--177, 1986.
|
| |
9
|
Goose, S., Gruber, I., Sudarsky, S., Hampel, K., Baxter, B. and Navab, N., 3D Interaction and Visualization in the Industrial Environment, Proceedings of the 9th International Conference on Human Computer Interaction, New Orleans, USA, Volume 1, pages 31--35, August, 2001.
|
| |
10
|
|
 |
11
|
|
 |
12
|
Stuart Goose , Michael Wynblatt , Hans Mollenhauer, 1-800-hypertext: browsing hypertext with a telephone, Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems, p.287-288, June 20-24, 1998, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/276627.276662]
|
| |
13
|
Hakkinen, M., Issues in Non-Visual Web Browser Design: pwWebSpeak, Proceedings of the 6th International World Wide Web Conference, April 1997.
|
 |
14
|
|
| |
15
|
HRTF Measurements of a KEMAR Dummy-Head Microphone, http://sound.media.mit.edu/KEMAR.html
|
| |
16
|
James, F., Presenting HTML Structure in Audio: User Satisfaction with Audio Hypertext, Proceedings of the International Conference on Auditory Display (ICAD), Palo Alto, USA, pages. 97--103, November 1997.
|
| |
17
|
Lernout and Hauspie, http://www.lhs.com
|
| |
18
|
|
| |
19
|
Kelly, T., Internet Media Strategies, http://www.nielsen-netratings.com
|
| |
20
|
Kelsey Group, http://www.kelseygroup.com
|
 |
21
|
|
| |
22
|
Microsoft, (formerly DirectSound) DirectAudio, http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnaudio/html/daov.asp
|
| |
23
|
Microsoft, Windows Media Player for Pocket PC, http://www.microsoft.com/windows/windowsmedia/download/pocket.asp
|
| |
24
|
Microsoft, Windows Media Services, http://www.microsoft.com/windows/windowsmedia/en/default.asp
|
 |
25
|
Elizabeth D. Mynatt , Maribeth Back , Roy Want , Ron Frederick, Audio aura: light-weight audio augmented reality, Proceedings of the 10th annual ACM symposium on User interface software and technology, p.211-212, October 14-17, 1997, Banff, Alberta, Canada
[doi> 10.1145/263407.264218]
|
| |
26
|
Nokia, http://www.nokia.com
|
| |
27
|
Oldfield, S. and Parker, S., Acuity of Sound Localization: A Topography of Auditory Space. I. Normal Hearing Conditions, Perception, 13, pages 581--600, 1984.
|
| |
28
|
Packet Video, http://www.pv.com
|
 |
29
|
Helen Petrie , Sarah Morley , Peter McNally , Anne-Marie O'Neill , Dennis Majoe, Initial design and evaluation of an interface to hypermedia systems for blind users, Proceedings of the eighth ACM conference on Hypertext, p.48-56, April 06-11, 1997, Southampton, United Kingdom
[doi> 10.1145/267437.267443]
|
| |
30
|
Productivity Works Inc, http://www.prodworks.com
|
| |
31
|
Raman, T., The Audible WWW: The World In My Ears, Proceedings of the 6th International World Wide Web Conference, April 1997.
|
| |
32
|
(Formerly Intel) Real Sound Experience (RSX), http://www.radgametools.com
|
| |
33
|
Roth, P., Petrucci, L., Assimacopoulos, A. and Pun, T., AB-Web: Active Audio Browser for Visually Impaired and Blind Users, Proceedings of the International Conference on Auditory Display (ICAD), Glasgow, UK, November 1999.
|
| |
34
|
SALT Forum, http://www.saltforum.org
|
| |
35
|
Sawhney, N. and Schmandt, C., Design of Spatialized Nomadic Environments, Proceedings of the International Conference on Auditory Display (ICAD), Palo Alto, USA, pages 109--113, November 1997.
|
 |
36
|
|
| |
37
|
W3C Recommendation: Synchronized Multimedia Integration Language (SMIL 2.0), http://www.w3.org/AudioVideo
|
| |
38
|
Voice Browser Working Group, http://www.w3.org/Voice
|
| |
39
|
VoiceXML version 1 specification, http://www.voicexml.org/specs/VoiceXML-100.pdf
|
| |
40
|
VoiceXML version 2 specification, http://www.w3.org/TR/2001/WD-voicexml20-20011023
|
| |
41
|
Web Accessibility Initiative, http://www.w3.org/WAI
|
| |
42
|
Walker, A., Brewster, S.A., McGookin, D. and Ng, A., Diary in the sky: A Spatial Audio Display for a Mobile Calendar, Proceedings of IHM-HCI 2001, Lille, France, September 2001.
|
| |
43
|
Wynblatt, M., Benson, D., and Hsu, A., Browsing the World Wide Web in a Non-Visual Environment, Proceedings of the International Conference on Auditory Display (ICAD), Palo Alto, USA, pages 135--138, November 1997.
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.5
INFORMATION INTERFACES AND PRESENTATION (I.7)
H.5.5
Sound and Music Computing
Subjects:
Modeling
Additional Classification:
H.
Information Systems
H.5
INFORMATION INTERFACES AND PRESENTATION (I.7)
H.5.2
User Interfaces (D.2.2, H.1.2, I.3.6)
Subjects:
Voice I/O
H.5.5
Sound and Music Computing
Subjects:
Methodologies and techniques
General Terms:
Design,
Human Factors,
Languages
Keywords:
3D audio,
PDA,
SMIL,
accessibility,
location-based,
mobile,
spatialization,
speech synthesis,
streaming,
wireless
|