ACM Home Page
Please provide us with feedback. Feedback
Sikuli: using GUI screenshots for search and automation
Full text PdfPdf (1.43 MB)
Source
Symposium on User Interface Software and Technology archive
Proceedings of the 22nd annual ACM symposium on User interface software and technology table of contents
Victoria, BC, Canada
SESSION: A.I./U.I. table of contents
Pages 183-192  
Year of Publication: 2009
ISBN:978-1-60558-745-5
Authors
Tom Yeh  MIT CSAIL, Cambridge, MA, USA
Tsung-Hsiang Chang  MIT CSAIL, Cambridge, MA, USA
Robert C. Miller  MIT CSAIL, Cambridge, MA, USA
Sponsors
ACM: Association for Computing Machinery
SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques
SIGCHI: ACM Special Interest Group on Computer-Human Interaction
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 40,   Downloads (12 Months): 87,   Citation Count: 0
Additional Information:

abstract   references   index terms  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1622176.1622213
What is a DOI?

ABSTRACT

We present Sikuli, a visual approach to search and automation of graphical user interfaces using screenshots. Sikuli allows users to take a screenshot of a GUI element (such as a toolbar button, icon, or dialog box) and query a help system using the screenshot instead of the element's name. Sikuli also provides a visual scripting API for automating GUI interactions, using screenshot patterns to direct mouse and keyboard events. We report a web-based user study showing that searching by screenshot is easy to learn and faster to specify than keywords. We also demonstrate several automation tasks suitable for visual scripting, such as map navigation and bus tracking, and show how visual scripting can improve interactive help systems previously proposed in the literature.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
St. Amant, R., H. Lieberman, R. Potter, and L. Zettlemoyer. Programming by example: visual generalization in programming by example. Commun. ACM 43(3), 107--114, 2003.
 
2
Bergman, L., V. Castelli, T. Lau, and D. Oblinger. DocWizards: a system for authoring follow-me documentation wizards. Proc. UIST '05, 191--200, 2005.
 
3
Bolin, M., M. Webber, P. Rha, T. Wilson, and R. C. Miller. Automation and customization of rendered web pages. Proc. UIST '05, 163--172, 2005.
 
4
Forsyth, D. and Ponce, J., Computer Vision: A Modern Approach, Prentice Hall, USA, 2002.
 
5
Harrison, S. M. A comparison of still, animated, or nonillustrated on-line help with written or spoken instructions in a graphical user interface. Proc. CHI '95, 82--89, 1995.
 
6
Hoffmann, R., J. Fogarty, and D. S. Weld. Assieme: finding and leveraging implicit references in a web search interface for programmers. Proc. UIST '07, 13--22, 2007.
 
7
Huang, J. and M. B. Twidale. Graphstract: minimal graphical help for computers. Proc. UIST '07, 203--212, 2007.
 
8
Kelleher, C. and R. Pausch. Stencils-based tutorials: design and evaluation. Proc. CHI '05, 541--550, 2005.
 
9
Knabe, K. Apple guide: a case study in user-aided design of online help. Proc. CHI '95, 286--287, 1995.
 
10
Leshed, G., E. M. Haber, T. Matthews, and T. Lau. CoScripter: automating & sharing how-to knowledge in the enterprise. Proc. CHI '08, 1719--1728, 2008.
 
11
Lowe, D. G. Object recognition from local scale-invariant features. Proc. International Conference on Computer Vision, 1150--1157, 1999.
 
12
Matas, J., O. Chum, U. Martin, and T. Pajdla. Robust wide baseline stereo from maximally stable extremal regions. Proc. British Machine Vision Conference '02, 2002.
 
13
Mikolajczyk, K., B. Leibe, and B. Schiele. Multiple object class detection with a generative model. Proc. Computer Vision and Pattern Recognition '06, 26--36, 2006.
 
14
Moriyon, R., P. Szekely, and R. Neches. Automatic generation of help from interface design models. Proc. CHI '94, 225--231, 1994.
 
15
Pangoli, S. and F. Paternó. Automatic generation of task-oriented help. Proc. UIST '95, 181--187, 1995.
 
16
Potter, R. L. Pixel data access: interprocess communication in the user interface for end-user programming and graphical macros. Ph.D. thesis, College Park, MD, USA, 1999.
 
17
Prabaker, M., L. Bergman, and V. Castelli. An evaluation of using programming by demonstration and guided walkthrough techniques for authoring and utilizing documentation. Proc. CHI '06, 241--250, 2006.
 
18
Sivic, J. and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. Proc. International Conference on Computer Vision, 2003..
 
19
Sukaviriya, P. and J. D. Foley. Coupling a UI framework with automatic generation of context-sensitive animated help. Proc. UIST '90, 152--166, 1990.
 
20
Zettlemoyer, L. S. and St. Amant, R. A visual medium for programmatic control of interactive applications. Proc. CHI '99, 199--206, 1999.