ACM Home Page
Please provide us with feedback. Feedback
Internet-scale collection of human-reviewed data
Full text PdfPdf (564 KB)
Source
International World Wide Web Conference archive
Proceedings of the 16th international conference on World Wide Web table of contents
Banff, Alberta, Canada
SESSION: E-communities table of contents
Pages: 231 - 240  
Year of Publication: 2007
ISBN:978-1-59593-654-7
Authors
Qi Su  Yahoo! Inc, Sunnyvale, CA
Dmitry Pavlov  Yahoo! Inc, Sunnyvale, CA
Jyh-Herng Chow  Yahoo! Inc, Sunnyvale, CA
Wendell C. Baker  Yahoo! Inc, Sunnyvale, CA
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 20,   Downloads (12 Months): 166,   Citation Count: 11
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1242572.1242604
What is a DOI?

ABSTRACT

Enterprise and web data processing and content aggregation systems often require extensive use of human-reviewed data (e.g. for training and monitoring machine learning-based applications). Today these needs are often met by in-house efforts or out-sourced offshore contracting. Emerging applications attempt to provide automated collection of human-reviewed data at Internet-scale. We conduct extensive experiments to study the effectiveness of one such application. We also study the feasibility of using Yahoo! Answers, a general question-answering forum, for human-reviewed data collection.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Amazon Mechanical Turk. http://www.mturk.com/.
 
2
J. Angwin. On the offensive -- a problem for hot web outfits: Keeping pages free from porn. Wall Street Journal, May 2006.
 
3
S. Argamon-Engelson and I. Dagan. Committee-based sample selection for probabilistic classiers. Journal of Artificial Intelligence Research, 1999.
 
4
O. Benjelloun, H. Garcia-Molina, H. Kawai, T. Larson, D. Menestrina, Q. Su, S. Thavisomboon, and J. Widom. Generic Entity Resolution in the SERF Project. IEEE Data Engineering Bulletin, June 2006.
 
5
J. Breese, D. Heckerman, and C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In Proc. of Uncertainty in Artificial Intelligence, 1998.
 
6
 
7
8
 
9
Google Image Labeler. http://images.google.com/imagelabeler/.
 
10
J. Hipp, U. Guntzer, and U. Grimmer. Data quality mining -- making a virtue of necessity. In Proc. of SIGMOD DMKD Workshop, 2001.
 
11
J. Howe. The rise of crowdsourcing. Wired, June 2006.
 
12
A. Koblin. The sheep market: Two cents worth. Master's thesis, UCLA, 2006.
 
13
 
14
E. Rahm and H. Do. Data Cleaning: Problems and Current Approaches. IEEE Data Engineering Bulletin, December 2000.
15
16
 
17
 
18
Tenacious Search. http://openphi.net/tenacious/.
 
19
20
21
22
 
23
L. von Ahn et al. The ESP Game. http://www.espgame.org/.
 
24
Yahoo! Answers.http://answers.yahoo.com/.
 
25
Yahoo! Suggestion Board. http://suggestions.yahoo.com/.

CITED BY  11

Collaborative Colleagues:
Qi Su: colleagues
Dmitry Pavlov: colleagues
Jyh-Herng Chow: colleagues
Wendell C. Baker: colleagues