ACM Home Page
Please provide us with feedback. Feedback
Streaming speech3: a framework for generating and streaming 3D text-to-speech and audio presentations to wireless PDAs as specified using extensions to SMIL
Full text PdfPdf (108 KB)
Source International World Wide Web Conference archive
Proceedings of the 11th international conference on World Wide Web table of contents
Honolulu, Hawaii, USA
SESSION: Multimedia table of contents
Pages: 37 - 44  
Year of Publication: 2002
ISBN:1-58113-449-5
Authors
Stuart Goose  Siemens Corporate Research, Inc., Princeton, NJ
Sreedhar Kodlahalli  Siemens Corporate Research, Inc., Princeton, NJ
William Pechter  Siemens Corporate Research, Inc., Princeton, NJ
Rune Hjelsvold  Siemens Corporate Research, Inc., Princeton, NJ
Sponsors
ACM: Association for Computing Machinery
: WWW'02
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 46,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/511446.511452
What is a DOI?

ABSTRACT

While monochrome unformatted text and richly colored graphical content are both capable of conveying a message, well designed graphical content has the potential for better engaging the human sensory system. It is our contention that the author of an audio presentation should be afforded the benefit of judiciously exploiting the human aural perceptual ability to deliver content in a more compelling, concise and realistic manner. While contemporary streaming media players and voice browsers share the ability to render content non-textually, neither technology is currently capable of rendering three dimensional media. The contributions described in this paper are proposed 3D audio extensions to SMIL and a server-based framework able to receive a request and, on-demand, process such a SMIL file and dynamically create the multiple simultaneous audio objects, spatialize them in 3D space, multiplex them into a single stereo audio and prepare it for transmission over an audio stream to a mobile device. To the knowledge of the authors, this is the first reported solution for delivering and rendering on a commercially available wireless handheld device a rich 3D audio listening experience as described by a markup language. Naturally, in addition to mobile devices this solution also works with desktop streaming media players.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Arons, B., A Review of the Cocktail Party Effect, Journal of the American Voice I/O Society 12, pages 35--50, July 1992.
2
 
3
Aural Cascading Style Sheets (ACSS). W3C Note, http://www.w3.org/Style/css/Speech/NOTE-ACSS
 
4
Blattner, M., Sumikawa, D., and Greenberg, R., Earcons and Icons: Their Structure and Common Design Principles, Human-Computer Interaction, 4(1), pages 11--44, 1989.
 
5
Bregman, A., Auditory Scene Analysis: The Perception and Organization of Sound. MIT Press, 1990.
 
6
Emblaze, http://www.emblaze.com
 
7
Foulke, E. and Sticht, T., Review of Research on the Intelligibility and Compression of Accelerated Speech, Psychological Bulletin, 72(1), pages 50--62, 1969.
 
8
Gaver, W., Auditory Icons: Using Sound in Computer Interfaces. Human Computer Interaction, 2(2), pages 167--177, 1986.
 
9
Goose, S., Gruber, I., Sudarsky, S., Hampel, K., Baxter, B. and Navab, N., 3D Interaction and Visualization in the Industrial Environment, Proceedings of the 9th International Conference on Human Computer Interaction, New Orleans, USA, Volume 1, pages 31--35, August, 2001.
 
10
11
12
 
13
Hakkinen, M., Issues in Non-Visual Web Browser Design: pwWebSpeak, Proceedings of the 6th International World Wide Web Conference, April 1997.
14
 
15
HRTF Measurements of a KEMAR Dummy-Head Microphone, http://sound.media.mit.edu/KEMAR.html
 
16
James, F., Presenting HTML Structure in Audio: User Satisfaction with Audio Hypertext, Proceedings of the International Conference on Auditory Display (ICAD), Palo Alto, USA, pages. 97--103, November 1997.
 
17
Lernout and Hauspie, http://www.lhs.com
 
18
 
19
Kelly, T., Internet Media Strategies, http://www.nielsen-netratings.com
 
20
Kelsey Group, http://www.kelseygroup.com
21
 
22
Microsoft, (formerly DirectSound) DirectAudio, http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnaudio/html/daov.asp
 
23
Microsoft, Windows Media Player for Pocket PC, http://www.microsoft.com/windows/windowsmedia/download/pocket.asp
 
24
Microsoft, Windows Media Services, http://www.microsoft.com/windows/windowsmedia/en/default.asp
25
 
26
Nokia, http://www.nokia.com
 
27
Oldfield, S. and Parker, S., Acuity of Sound Localization: A Topography of Auditory Space. I. Normal Hearing Conditions, Perception, 13, pages 581--600, 1984.
 
28
Packet Video, http://www.pv.com
29
 
30
Productivity Works Inc, http://www.prodworks.com
 
31
Raman, T., The Audible WWW: The World In My Ears, Proceedings of the 6th International World Wide Web Conference, April 1997.
 
32
(Formerly Intel) Real Sound Experience (RSX), http://www.radgametools.com
 
33
Roth, P., Petrucci, L., Assimacopoulos, A. and Pun, T., AB-Web: Active Audio Browser for Visually Impaired and Blind Users, Proceedings of the International Conference on Auditory Display (ICAD), Glasgow, UK, November 1999.
 
34
SALT Forum, http://www.saltforum.org
 
35
Sawhney, N. and Schmandt, C., Design of Spatialized Nomadic Environments, Proceedings of the International Conference on Auditory Display (ICAD), Palo Alto, USA, pages 109--113, November 1997.
36
 
37
W3C Recommendation: Synchronized Multimedia Integration Language (SMIL 2.0), http://www.w3.org/AudioVideo
 
38
Voice Browser Working Group, http://www.w3.org/Voice
 
39
VoiceXML version 1 specification, http://www.voicexml.org/specs/VoiceXML-100.pdf
 
40
VoiceXML version 2 specification, http://www.w3.org/TR/2001/WD-voicexml20-20011023
 
41
Web Accessibility Initiative, http://www.w3.org/WAI
 
42
Walker, A., Brewster, S.A., McGookin, D. and Ng, A., Diary in the sky: A Spatial Audio Display for a Mobile Calendar, Proceedings of IHM-HCI 2001, Lille, France, September 2001.
 
43
Wynblatt, M., Benson, D., and Hsu, A., Browsing the World Wide Web in a Non-Visual Environment, Proceedings of the International Conference on Auditory Display (ICAD), Palo Alto, USA, pages 135--138, November 1997.


Collaborative Colleagues:
Stuart Goose: colleagues
Sreedhar Kodlahalli: colleagues
William Pechter: colleagues
Rune Hjelsvold: colleagues