|
ABSTRACT
We describe our system called MobileASL for real-time video communication on the current U.S. mobile phone network. The goal of MobileASL is to enable Deaf people to communicate with Sign Language over mobile phones by compressing and transmitting sign language video in real-time on an off-the-shelf mobile phone, which has a weak processor, uses limited bandwidth, and has little battery capacity. We develop several H.264-compliant algorithms to save system resources while maintaining ASL intelligibility by focusing on the important segments of the video. We employ a dynamic skin-based region-of-interest (ROI) that encodes the skin at higher quality at the expense of the rest of the video. We also automatically recognize periods of signing versus not signing and raise and lower the frame rate accordingly, a technique we call variable frame rate (VFR). We show that our variable frame rate technique results in a 47% gain in battery life on the phone, corresponding to an extra 68 minutes of talk time. We also evaluate our system in a user study. Participants fluent in ASL engage in unconstrained conversations over mobile phones in a laboratory setting. We find that the ROI increases intelligibility and decreases guessing. VFR increases the need for signs to be repeated and the number of conversational breakdowns, but does not affect the users' perception of adopting the technology. These results show that our sign language sensitive algorithms can save considerable resources without sacrificing intelligibility.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
D. Agrafiotis, C.N. Canagarajah, D.R. Bull, M. Dye, H. Twyford, J. Kyle, and J.T. Chung-How. Optimized sign language video coding based on eye-tracking analysis. In VCIP, pages 1244--1252, 2003.
|
| |
2
|
L. Aimar, L. Merritt, E. Petit, M. Chen, J. Clay, M.R., C. Heine, and A. Izvorski. x264 -- a free h264/AVC encoder. http://www.videolan.org/x264.html, 2005.
|
| |
3
|
D. Bavelier, A. Tomann, C. Hutton, T. Mitchell, D. Corina, G. Liu, and H. Neville. Visual attention to the periphery is enhanced in congenitally deaf individuals. The Journal of Neuroscience, 20(RC93):1--6, 2000.
|
| |
4
|
N. Cherniavsky, R.E. Ladner, and E.A. Riskin. Activity detection in conversational sign language video for mobile telecommunication. In Proceedings of the 8th international IEEE conference on Automatic Face and Gesture Recognition. IEEE Computer Society, Sept 2008.
|
| |
5
|
E. Clarkson, J. Clawson, K. Lyons, and T. Starner. An empirical study of typing rates on mini-QWERTY keyboards. In CHI '05: CHI '05 extended abstracts on Human factors in computing systems, pages 1288--1291, 2005.
|
| |
6
|
R.A. Foulds. Piecewise parametric interpolation for temporal compression of multijoint movement trajectories. IEEE Transactions on information technology in biomedicine, 10(1), January 2006.
|
| |
7
|
L. Garber. Technology news: Will 3G really be the next big wireless technology? Computer, 35(1):26--32, January 2002.
|
| |
8
|
GSMA. General packet radio service. http://www.gsmworld.com/technology/gprs/class.shtml, 2006.
|
| |
9
|
N. Habili, C.-C. Lim, and A. Moini. Segmentation of the face and hands in sign language video sequences using color and motion cues. IEEE Trans. Circuits Syst. Video Techn., 14(8):1086--1097, 2004.
|
| |
10
|
S. Hooper, C. Miller, S. Rose, and G. Veletsianos. The effects of digital video quality on learner comprehension in an American Sign Language assessment environment. Sign Language Studies, 8(1):42--58, 2007.
|
| |
11
|
R. Hsing and T.P. Sosnowski. Deaf phone: sign language telephone. In SPIE volume 575: Applications of digital image processing VIII, pages 56--61, 1985.
|
| |
12
|
International Telecommunication Union. International Mobile Telecommunications-2000 (IMT-2000), 2000. http://www.itu.int/home/imt.html.
|
| |
13
|
International Telecommunication Union. Trends in Telecommunication Reform 2007: The Road to NGN, Sept 2007.
|
| |
14
|
C.L. James and K.M. Reischel. Text input for mobile devices: comparing model prediction to actual performance. In CHI '01: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 365--371, 2001.
|
| |
15
|
B.F. Johnson and J.K. Caird. The effect of frame rate and video information redundancy on the perceptual learning of American Sign Language gestures. In CHI '96: Conference companion on Human factors in computing systems, pages 121--122, New York, NY, USA, 1996. ACM Press.
|
| |
16
|
L. Merritt and R. Vanam. Improved rate control and motion estimation for H.264 encoder. In Proceedings of ICIP, volume 5, pages 309--312, 2007.
|
| |
17
|
Joint Model. JM ver. 10.2. http://iphome.hhi.de/suehring/tml/index.htm.
|
| |
18
|
L. Muir and I. Richardson. Perception of sign language and its application to visual communications for deaf people. Journal of Deaf Studies and Deaf Education, 10(4):390--401, 2005.
|
| |
19
|
S.C.W. Ong and S. Ranganath. Automatic sign language analysis: A survey and the future beyond lexical meaning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6), June 2005.
|
| |
20
|
D.H. Parish, G. Sperling, and M.S. Landy. Intelligent temporal subsampling of american sign language using event boundaries. Journal of Experimental Psychology: Human Perception and Performance, 16(2):282--294, 1990.
|
| |
21
|
D.E. Pearson. Visual communication system for the deaf. IEEE Transactions on Communication, 29:1986--1992, December 1981.
|
| |
22
|
C.M. Reed, L.A. Delhorne, N.I. Durlach, and S.D. Fischer. A study of the tactual and visual reception of fingerspelling. Journal of Speech and Hearing Research, 33:786--797, December 1990.
|
| |
23
|
A. Rosenfeld and J. Pfaltz. Sequential operations in digital picture processing. Journal of the ACM, 13(4):471--494, 1966.
|
| |
24
|
D.M. Saxe and R.A. Foulds. Robust region of interest coding for improved sign language telecommunication. IEEE Transactions on Information Technology in Biomedicine, 6:310--316, December 2002.
|
| |
25
|
R. Schumeyer, E. Heredia, and K. Barner. Region of Interest Priority Coding for Sign Language Video-conferencing. In IEEE First Workshop on Multimedia Signal Processing, pages 531--536, 1997.
|
| |
26
|
G. Sperling, M. Landy, Y. Cohen, and M. Pavel. Intelligible encoding of ASL image sequences at extremely low information rates. In Papers from the second workshop Vol. 13 on Human and Machine Vision II, pages 256--312, San Diego, CA, USA, 1986. Academic Press Professional, Inc.
|
| |
27
|
W.C. Stokoe. Sign Language Structure: An Outline of the Visual Communication System of the American Deaf. Studies in Linguistics: Occasional Papers 8. Linstok Press, Silver Spring, MD, 1960. Revised 1978.
|
| |
28
|
D.R. Traum and E.A. Hinkelman. Conversation acts in task-oriented spoken dialogue. Computational Intelligence, 8:575--599, 1992.
|
| |
29
|
M.A. Viredaz and D.A. Wallach. Power evaluation of a handheld computer. IEEE Micro, 23(1):66--74, 2003.
|
| |
30
|
T. Wiegand, G.J. Sullivan, G. Bjntegaard, and A. Luthra. Overview of the H.264/AVC video coding standard. IEEE Trans. Circuits Syst. Video Techn, 13(7):560--576, 2003.
|
| |
31
|
W.W. Woelders, H.W. Frowein, J. Nielsen, P. Questa, and G. Sandini. New developments in low-bit rate videotelephony for people who are deaf. Journal of Speech, Language, and Hearing Research, 40:1425--1433, December 1997.
|
| |
32
|
J. Yang, W. Lu, and A. Waibel. Skin-color modeling and adaptation. In Proceedings of the Third Asian Conference on Computer Vision-Volume II, pages 687--694. Springer-Verlag, 1998.
|
|