ACM Home Page
Please provide us with feedback. Feedback
Activity analysis enabling real-time video communication on mobile phones for deaf users
Full text PdfPdf (654 KB)
Source
Symposium on User Interface Software and Technology archive
Proceedings of the 22nd annual ACM symposium on User interface software and technology table of contents
Victoria, BC, Canada
SESSION: New solutions for old problems table of contents
Pages 79-88  
Year of Publication: 2009
ISBN:978-1-60558-745-5
Authors
Neva Cherniavsky  University of Washington, Seattle, WA, USA
Jaehong Chon  University of Washington, Seattle, WA, USA
Jacob O. Wobbrock  University of Washington, Seattle, WA, USA
Richard E. Ladner  University of Washington, Seattle, WA, USA
Eve A. Riskin  University of Washington, Seattle, WA, USA
Sponsors
ACM: Association for Computing Machinery
SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques
SIGCHI: ACM Special Interest Group on Computer-Human Interaction
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 71,   Downloads (12 Months): 71,   Citation Count: 0
Additional Information:

abstract   references   index terms  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1622176.1622192
What is a DOI?

ABSTRACT

We describe our system called MobileASL for real-time video communication on the current U.S. mobile phone network. The goal of MobileASL is to enable Deaf people to communicate with Sign Language over mobile phones by compressing and transmitting sign language video in real-time on an off-the-shelf mobile phone, which has a weak processor, uses limited bandwidth, and has little battery capacity. We develop several H.264-compliant algorithms to save system resources while maintaining ASL intelligibility by focusing on the important segments of the video. We employ a dynamic skin-based region-of-interest (ROI) that encodes the skin at higher quality at the expense of the rest of the video. We also automatically recognize periods of signing versus not signing and raise and lower the frame rate accordingly, a technique we call variable frame rate (VFR).

We show that our variable frame rate technique results in a 47% gain in battery life on the phone, corresponding to an extra 68 minutes of talk time. We also evaluate our system in a user study. Participants fluent in ASL engage in unconstrained conversations over mobile phones in a laboratory setting. We find that the ROI increases intelligibility and decreases guessing. VFR increases the need for signs to be repeated and the number of conversational breakdowns, but does not affect the users' perception of adopting the technology. These results show that our sign language sensitive algorithms can save considerable resources without sacrificing intelligibility.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
D. Agrafiotis, C.N. Canagarajah, D.R. Bull, M. Dye, H. Twyford, J. Kyle, and J.T. Chung-How. Optimized sign language video coding based on eye-tracking analysis. In VCIP, pages 1244--1252, 2003.
 
2
L. Aimar, L. Merritt, E. Petit, M. Chen, J. Clay, M.R., C. Heine, and A. Izvorski. x264 -- a free h264/AVC encoder. http://www.videolan.org/x264.html, 2005.
 
3
D. Bavelier, A. Tomann, C. Hutton, T. Mitchell, D. Corina, G. Liu, and H. Neville. Visual attention to the periphery is enhanced in congenitally deaf individuals. The Journal of Neuroscience, 20(RC93):1--6, 2000.
 
4
N. Cherniavsky, R.E. Ladner, and E.A. Riskin. Activity detection in conversational sign language video for mobile telecommunication. In Proceedings of the 8th international IEEE conference on Automatic Face and Gesture Recognition. IEEE Computer Society, Sept 2008.
 
5
E. Clarkson, J. Clawson, K. Lyons, and T. Starner. An empirical study of typing rates on mini-QWERTY keyboards. In CHI '05: CHI '05 extended abstracts on Human factors in computing systems, pages 1288--1291, 2005.
 
6
R.A. Foulds. Piecewise parametric interpolation for temporal compression of multijoint movement trajectories. IEEE Transactions on information technology in biomedicine, 10(1), January 2006.
 
7
L. Garber. Technology news: Will 3G really be the next big wireless technology? Computer, 35(1):26--32, January 2002.
 
8
GSMA. General packet radio service. http://www.gsmworld.com/technology/gprs/class.shtml, 2006.
 
9
N. Habili, C.-C. Lim, and A. Moini. Segmentation of the face and hands in sign language video sequences using color and motion cues. IEEE Trans. Circuits Syst. Video Techn., 14(8):1086--1097, 2004.
 
10
S. Hooper, C. Miller, S. Rose, and G. Veletsianos. The effects of digital video quality on learner comprehension in an American Sign Language assessment environment. Sign Language Studies, 8(1):42--58, 2007.
 
11
R. Hsing and T.P. Sosnowski. Deaf phone: sign language telephone. In SPIE volume 575: Applications of digital image processing VIII, pages 56--61, 1985.
 
12
International Telecommunication Union. International Mobile Telecommunications-2000 (IMT-2000), 2000. http://www.itu.int/home/imt.html.
 
13
International Telecommunication Union. Trends in Telecommunication Reform 2007: The Road to NGN, Sept 2007.
 
14
C.L. James and K.M. Reischel. Text input for mobile devices: comparing model prediction to actual performance. In CHI '01: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 365--371, 2001.
 
15
B.F. Johnson and J.K. Caird. The effect of frame rate and video information redundancy on the perceptual learning of American Sign Language gestures. In CHI '96: Conference companion on Human factors in computing systems, pages 121--122, New York, NY, USA, 1996. ACM Press.
 
16
L. Merritt and R. Vanam. Improved rate control and motion estimation for H.264 encoder. In Proceedings of ICIP, volume 5, pages 309--312, 2007.
 
17
Joint Model. JM ver. 10.2. http://iphome.hhi.de/suehring/tml/index.htm.
 
18
L. Muir and I. Richardson. Perception of sign language and its application to visual communications for deaf people. Journal of Deaf Studies and Deaf Education, 10(4):390--401, 2005.
 
19
S.C.W. Ong and S. Ranganath. Automatic sign language analysis: A survey and the future beyond lexical meaning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6), June 2005.
 
20
D.H. Parish, G. Sperling, and M.S. Landy. Intelligent temporal subsampling of american sign language using event boundaries. Journal of Experimental Psychology: Human Perception and Performance, 16(2):282--294, 1990.
 
21
D.E. Pearson. Visual communication system for the deaf. IEEE Transactions on Communication, 29:1986--1992, December 1981.
 
22
C.M. Reed, L.A. Delhorne, N.I. Durlach, and S.D. Fischer. A study of the tactual and visual reception of fingerspelling. Journal of Speech and Hearing Research, 33:786--797, December 1990.
 
23
A. Rosenfeld and J. Pfaltz. Sequential operations in digital picture processing. Journal of the ACM, 13(4):471--494, 1966.
 
24
D.M. Saxe and R.A. Foulds. Robust region of interest coding for improved sign language telecommunication. IEEE Transactions on Information Technology in Biomedicine, 6:310--316, December 2002.
 
25
R. Schumeyer, E. Heredia, and K. Barner. Region of Interest Priority Coding for Sign Language Video-conferencing. In IEEE First Workshop on Multimedia Signal Processing, pages 531--536, 1997.
 
26
G. Sperling, M. Landy, Y. Cohen, and M. Pavel. Intelligible encoding of ASL image sequences at extremely low information rates. In Papers from the second workshop Vol. 13 on Human and Machine Vision II, pages 256--312, San Diego, CA, USA, 1986. Academic Press Professional, Inc.
 
27
W.C. Stokoe. Sign Language Structure: An Outline of the Visual Communication System of the American Deaf. Studies in Linguistics: Occasional Papers 8. Linstok Press, Silver Spring, MD, 1960. Revised 1978.
 
28
D.R. Traum and E.A. Hinkelman. Conversation acts in task-oriented spoken dialogue. Computational Intelligence, 8:575--599, 1992.
 
29
M.A. Viredaz and D.A. Wallach. Power evaluation of a handheld computer. IEEE Micro, 23(1):66--74, 2003.
 
30
T. Wiegand, G.J. Sullivan, G. Bjntegaard, and A. Luthra. Overview of the H.264/AVC video coding standard. IEEE Trans. Circuits Syst. Video Techn, 13(7):560--576, 2003.
 
31
W.W. Woelders, H.W. Frowein, J. Nielsen, P. Questa, and G. Sandini. New developments in low-bit rate videotelephony for people who are deaf. Journal of Speech, Language, and Hearing Research, 40:1425--1433, December 1997.
 
32
J. Yang, W. Lu, and A. Waibel. Skin-color modeling and adaptation. In Proceedings of the Third Asian Conference on Computer Vision-Volume II, pages 687--694. Springer-Verlag, 1998.