ACM Home Page
Please provide us with feedback. Feedback
A comprehensive human computation framework: with application to image labeling
Full text PdfPdf (874 KB)
Source
International Multimedia Conference archive
Proceeding of the 16th ACM international conference on Multimedia table of contents
Vancouver, British Columbia, Canada
SESSION: Applications track A5/H3: browsing table of contents
Pages 479-488  
Year of Publication: 2008
ISBN:978-1-60558-303-7
Authors
Yang Yang  University of Science and Technology of China, Hefei, Anhui, China
Bin B. Zhu  Microsoft Research Asia, Beijing, China
Rui Guo  Beihang University, Beijing, China
Linjun Yang  Microsoft Research Asia, Beijing, China
Shipeng Li  Microsoft Research Asia, Beijing, China
Nenghai Yu  University of Science and Technology of China, Hefei, Anhui, China
Sponsors
ACM: Association for Computing Machinery
SIGMULTIMEDIA: ACM Special Interest Group on Multimedia
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 142,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1459359.1459423
What is a DOI?

ABSTRACT

Image and video labeling is important for computers to understand images and videos and for image and video search. Manual labeling is tedious and costly. Automatically image and video labeling is yet a dream. In this paper, we adopt a Web 2.0 approach to labeling images and videos efficiently: Internet users around the world are mobilized to apply their "common sense" to solve problems that are hard for today's computers, such as labeling images and videos. We first propose a general human computation framework that binds problem providers, Web sites, and Internet users together to solve large-scale common sense problems efficiently and economically. The framework addresses the technical challenges such as preventing a malicious party from attacking others, removing answers from bots, and distilling human answers to produce high-quality solutions to the problems. The framework is then applied to labeling images. Three incremental refinement stages are applied. The first stage collects candidate labels of objects in an image. The second stage refines the candidate labels using multiple choices. Synonymic labels are also correlated in this stage. To prevent bots and lazy humans from selecting all the choices, trap labels are generated automatically and intermixed with the candidate labels. Semantic distance is used to ensure that the selected trap labels would be different enough from the candidate labels so that no human users would mistakenly select the trap labels. The last stage is to ask users to locate an object given a label from a segmented image. The experimental results are also reported in this paper. They indicate that our proposed schemes can successfully remove spurious answers from bots and distill human answers to produce high-quality image labels.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
The Open Mind project. http://www.openmind.org/.
4
5
6
7
 
8
 
9
 
10
Google Image Labeler (Beta, published in August, 2006). http://images.google.com/imagelabeler/.
 
11
Luis von Ahn. CAPTCHA: Using Hard AI Problems For Security. Eurocrypt 2003.
 
12
reCAPTCHA. http://recaptcha.net/.
 
13
Internet Archive. http://www.archive.org/index.php.
 
14
Wikipedia item on Human-Based Computation. http://en.wikipedia.org/wiki/Human-based_computation.
 
15
Google AdSense. https://www.google.com/adsense/.
 
16
A. Kosorukoff. Human Based Genetic Algorithm. IEEE Int. Conf. on Systems, Man, and Cybernetics, vol. 5, pp. 3464--3469, 2001.
17
18
 
19
J. Ruderman. The Same Origin Policy. http://www.mozilla.org/projects/security/components/same-origin.html.
 
20
Web Hypertext Application Technology Working Group. HTML 5 - Cross-document messaging. http://www.whatwg.org/specs/web-apps/current-work/#crossDocumentMessages.
21
22
23
24
25
 
26
WordNet::Similarity, a Perl module for computing measures of semantic relatedness based on WordNet. http://www.d.umn.edu/~tpederse/similarity.html.
 
27
EDISON: the Edge Detection and Image Segmentation system. http://www.caip.rutgers.edu/riul/research/code/EDISON/index.html.


Collaborative Colleagues:
Yang Yang: colleagues
Bin B. Zhu: colleagues
Rui Guo: colleagues
Linjun Yang: colleagues
Shipeng Li: colleagues
Nenghai Yu: colleagues