|
ABSTRACT
In comparison to text, audio-video content is much more challenging to browse. Time-compression has been suggested as a key technology that can support browsing-time compression speeds up the playback of audio-video content without causing the pitch to change. Simple forms of time-compression are starting to appear in commercial streaming-media products from Microsoft and Real Networks.In this paper we explore the potential benefits of more recent and advanced types of time compression, called non-linear time compression. The most advanced of these algorithms exploit fine-grain structure of human speech (e.g., phonemes) to differentially speedup segments of speech, so that the overall speedup can be higher. In this paper we explore what are the actual gains achieved by end-users from these advanced algorithms. Our results indicate that the gains are actually quite small in common cases and come with significant system complexity and some audio/video synchronization issues.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Arons, B. "Techniques, Perception, and Applications of Time-Compressed Speech." In Proceedings of 1992 Conference, American Voice I/O Society, Sep. 1992, pp. 169-177.
|
| |
2
|
|
| |
3
|
Atal, B.S. & Rabiner, L.R. "A Pattern Recognition Approach to Voiced-Unvoiced-Silence Classification with Applications to Speech Recognition," IEEE International Conference on Acoustics, Speech, and Signal Processing, ASSP-24, 3 (June 1976) 201-212.
|
| |
4
|
Covell, M., Withgott, M., & Slaney, M. "Machl: Nonuniform Time-Scale Modification of Speech," Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing. Seattle, WA, May 12-15 1998.
|
| |
5
|
Davis, K.C. ""Don't Know Much About Geography," Bantam Doubleday Dell Audio Publishing, New York, 1992.
|
| |
6
|
Enounce, 2xAV Plug-in for RealPlayer http://www.enounce.com/groducts/real/2xav/index.htm
|
| |
7
|
Fairbanks, G., Everitt, W.L., & Jaeger, R.P. "Method for Time or Frequency Compression-Expansion of Speech." Transactions of the Institute of Radio Engineers, Professional Group on Audio AU-2 (1954): 7-12. Reprinted in G. Fairbanks, Experimental Phonetics: Selected Articles, University of Illinois Press, 1966.
|
| |
8
|
Foulke, W. & Sticht, T.G. "Review of research on the intelligibility and comprehension of accelerated speech." Psychological Bulletin, 72: 50-62, 1969.
|
| |
9
|
Gan, C.K. & Donaldson, R.W. Adaptive Silence Deletion for Speech Storage and Voice Mail Applications. ZEEE Transactions on Acoustics, Speech, and Signal Processing 36, 6 (Jun. 1988) pp 924-927.
|
| |
10
|
Gerber, S.E. "Limits of speech time compression." In S. Duker (Ed.), Time-Compressed Speech, 456-465. Scarecrow, 1974.
|
| |
11
|
Griffin, D.W. & Lim, J.S. "Signal estimation from modified short-time fourier transform." ZEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-32 (2): 236-243, 1984.
|
| |
12
|
Harrigan, K. 'The SPECIAL System: Self-Paced Education with Compressed Interactive Audio Learning," Journal of Research on Computing in Education, 27, 3, Spring 1995.
|
| |
13
|
Harrigan, K.A. "Just Noticeable Difference and Effects of Searching of User-Controlled Time-Compressed Digital-Video. Ph.D. Thesis, University of Toronto, 1996.
|
 |
14
|
|
| |
15
|
Heiman, G.W., Leo, R.J., Leighbody, G., & Bowler, K. "Word Intelligibility Decrements and the Comprehension of Time-Compressed Speech." Perception and Psychophysics 40, 6 (1986): 407-411.
|
| |
16
|
|
 |
17
|
Francis C. Li , Anoop Gupta , Elizabeth Sanocki , Li-wei He , Yong Rui, Browsing digital video, Proceedings of the SIGCHI conference on Human factors in computing systems, p.169-176, April 01-06, 2000, The Hague, The Netherlands
[doi> 10.1145/332040.332425]
|
| |
18
|
Maxemchuk, N. "An Experimental Speech Storage and Editing Facility." Bell System Technical Journal 59, 8 (1980): 1383-1395.
|
| |
19
|
Microsoft Corporation, Windows Media Encoder 7.0 http:llwww.microsoft.coliliwindow/windows/media/wm7 iEncoder.asp
|
 |
20
|
Nosa Omoigui , Liwei He , Anoop Gupta , Jonathan Grudin , Elizabeth Sanocki, Time-compression: systems concerns, usage, and benefits, Proceedings of the SIGCHI conference on Human factors in computing systems: the CHI is the limit, p.136-143, May 15-20, 1999, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/302979.303017]
|
| |
21
|
Quereshi, S.U.H. "Speech compression by computer." In S. Duker (Ed.), Time-Compressed Speech, 618-623. Scarecrow, 1974.
|
| |
22
|
Roucos, S. & Wilgus, A. "High Quality Time-Scale Modification for Speech," IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 2, pp 493-496, Tampa, FL, 1985.
|
| |
23
|
Rymniak, M., Kurlandski, G, et al. "The Essential Review: TOEFL (Test of English as a Foreign Language," Kaplan Educational Centers and Simon & Schuster, New York.
|
| |
24
|
Stanford Online: Masters in Electrical Engineering, 1998. http://scpd.stanford.edu/cee/telecom/onlinedegree.html
|
 |
25
|
Wallapak Tavanapong , Kien A. Hua , James Z. Wang, A framework for supporting previewing and VCR operations in a low bandwidth environment, Proceedings of the fifth ACM international conference on Multimedia, p.303-312, November 09-13, 1997, Seattle, Washington, United States
[doi> 10.1145/266180.266381]
|
| |
26
|
van Santen, J. "Assignment of Segmental Duration in Textto-Speech Synthesis,' Computer Speech and Language, 8(2): 95-128, 1994.
|
| |
27
|
Withgott, M. & Chen, F. "Computational Models of American Speech," CSLI Lecture Notes #32, Center for the Study of Language and Information, Stanford, CA.
|
CITED BY 7
|
|
Ross Cutler , Yong Rui , Anoop Gupta , JJ Cadiz , Ivan Tashev , Li-wei He , Alex Colburn , Zhengyou Zhang , Zicheng Liu , Steve Silverberg, Distributed meetings: a meeting capture and broadcasting system, Proceedings of the tenth ACM international conference on Multimedia, December 01-06, 2002, Juan-les-Pins, France
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|