ACM Home Page
Please provide us with feedback. Feedback
Text versus speech: a comparison of tagging input modalities for camera phones
Full text PdfPdf (1.29 MB)
Source
ACM International Conference Proceeding Series archive
Proceedings of the 11th International Conference on Human-Computer Interaction with Mobile Devices and Services table of contents
Bonn, Germany
SESSION: Camera-based interaction table of contents
Article No. 1  
Year of Publication: 2009
ISBN:978-1-60558-281-8
Authors
Mauro Cherubini  Telefónica Research, Barcelona, Spain
Xavier Anguera  Telefónica Research, Barcelona, Spain
Nuria Oliver  Telefónica Research, Barcelona, Spain
Rodrigo de Oliveira  Telefónica Research, Barcelona, Spain
Sponsors
SIGCHI : Specialist Interest Group in Computer-Human Interaction of the ACM
SIGMOBILE: ACM Special Interest Group on Mobility of Systems, Users, Data and Computing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 27,   Downloads (12 Months): 27,   Citation Count: 0
Additional Information:

abstract   references   index terms  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1613858.1613860
What is a DOI?

ABSTRACT

Speech and typed text are two common input modalities for mobile phones. However, little research has compared them in their ability to support annotation and retrieval of digital pictures on mobile devices. In this paper, we report the results of a month-long field study in which participants took pictures with their camera phones and had the choice of adding annotations using speech, typed text, or both. Subsequently, the same subjects participated in a controlled experiment where they were asked to retrieve images based on annotations as well as retrieve annotations based on images in order to study the ability of each modality to effectively support users' recall of the previously captured pictures. Results demonstrate that each modality has advantages and shortcomings for the production of tags and retrieval of pictures. Several guidelines are suggested when designing tagging applications for portable devices.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
M. Ames and M. Naaman. Why we tag: motivations for annotation in mobile and online media. In Proc. CHI 2007, pages 971--980. ACM Press (2007).
 
2
X. Anguera, N. Oliver, and M. Cherubini. Multimodal and mobile personal image retrieval: A user study. In K. L. Chan, editor, Proc. MobIR 2008, pages 17--23.
 
3
X. Anguera, J. Xu, and N. Oliver. Multimodal photo annotation and retrieval on a mobile phone. In Proc. MIR 2008, pages 188--194. ACM Press (2008).
 
4
A. D. Baddeley. Human Memory: Theory and Practice. Psychology Press, London, UK, 1997.
 
5
M. G. Brown, J. T. Foote, G. J. F. Jones, K. S. Jones, and S. J. Young. Open-vocabulary speech indexing for voice and video mail retrieval. In Proc. Multimedia'96, pages 307--316. ACM Press (1996).
 
6
J. Chen, T. Tan, and P. Mulhem. A method for photograph indexing using speech annotation. In Proc. PCM 2001, pages 867--872. Springer-Verlag Press (2001).
 
7
A. Cox, P. Cairns, A. Walton, and S. Lee. Tlk or txt? using voice input for sms composition. Pers. and Ubiq. Computing, 12(8):567--588, 11 2008.
 
8
S. Hah and V. Ahlstrom. Comparison of speech with keyboard and mouse as the text entry method. In Proc. of the Human Factors and Ergonomics Society, pages 619--622, 2005.
 
9
A. G. Hauptmann and A. I. Rudnicky. A comparison of speech and typed input. In Proc. HLT 1990, pages 219--224. ACL Press (1990).
 
10
T. J. Hazen, B. Sherry, and M. Adler. Speech-based annotation and retrieval of digital photographs. In Proc. INTERSPEECH 2007, pages 2165--2168.
 
11
T. Kindberg, M. Spasojevic, R. Fleck, and A. Sellen. The ubiquitous camera: An in-depth study of camera phone use. IEEE Pervasive Computing, 4(2):42--50, 2005.
 
12
A. Kuchinsky, C. Pering, M. L. Creech, D. Freeze, B. Serra, and J. Gwizdka. FotoFile: a consumer multimedia organization and retrieval system. In Proc. CHI 1999, pages 496--503. ACM Press (1999).
 
13
J. Kustaniwitz and B. Shneiderman. Motivating annotation for personal digital photo libraries: Lowering barriers while raising incentives. Technical Report HCIL-2004-18, University of Mariland, 2005.
 
14
K. M. Lee and J. Lai. Speech versus touch: A comparative study of the use of speech and dtmf keypad for navigation. Int. Journal of Human-Computer Int., 19(3): 343--360, 2006.
 
15
T. J. Mills, D. Pye, D. Sinclair, and K. R. Wood. Managing photos with AT&T Shoebox (demo session). In Proc. SIGIR 2000, page 390. ACM Press (2000).
 
16
H. Mitchard and J. Winkles. Experimental comparisons of data entry by automated speech recognition, keyboard, and mouse. Human Factors, 44(2):198--209, Summer 2002.
 
17
T. Paek, B. Thiesson, Y.-C. Ju, and B. Lee. Search vox: leveraging multimodal refinement and partial knowledge for mobile voice search. In UIST '08, pages 141--150, New York, NY, USA, 2008. ACM.
 
18
M. Perakakis and A. Potamianos. A study in efficiency and modality usage in multimodal form filling systems. IEEE Trans. on Audio, Speech and Language Processing, 16(6):1194--1206, August 2008.
 
19
K. Rodden and K. R. Wood. How do people manage their digital photographs? In Proc. CHI 2003.
 
20
A. I. Rudnicky. Mode preference in a simple data-retrieval task. In Proc. HLT 1993, pages 364--369. ACL Press (1993).
 
21
Herrarte. E. Wilhelm. A. Sarvas, R. and M. Davis. Metadata creation system for mobile images. In Proc. MobiSys'04, (2004).
 
22
R. K. Srihari. Multimedia indexing and retrieval of voice-annotated consumer photos. In Proc. SIGIR 1999, pages 1--16. ACM.
 
23
A. Stent and A. Loui. Using event segmentation to improve indexing of consumer photographs. In Proc. SIGIR 2001, pages 59--65. ACM Press (2001).
 
24
T. Tan, J. Chen, P. Mulhem, and M. Kankanhalli. Smartalbum: a multi-modal photo annotation system. In Proc. Multimedia 2002, pages 87--88. ACM Press (2002).
 
25
L. Wenyin, S. Dumais, Y. Sun, H. Zhang, M. Czerwinski, and B. Field. Semi-automatic image annotation. In Proc. INTERACT 2001, pages 326--333.