| Internet-scale collection of human-reviewed data |
| Full text |
Pdf
(564 KB)
|
Source
|
International World Wide Web Conference
archive
Proceedings of the 16th international conference on World Wide Web
table of contents
Banff, Alberta, Canada
SESSION: E-communities
table of contents
Pages: 231 - 240
Year of Publication: 2007
ISBN:978-1-59593-654-7
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 20, Downloads (12 Months): 166, Citation Count: 11
|
|
|
ABSTRACT
Enterprise and web data processing and content aggregation systems often require extensive use of human-reviewed data (e.g. for training and monitoring machine learning-based applications). Today these needs are often met by in-house efforts or out-sourced offshore contracting. Emerging applications attempt to provide automated collection of human-reviewed data at Internet-scale. We conduct extensive experiments to study the effectiveness of one such application. We also study the feasibility of using Yahoo! Answers, a general question-answering forum, for human-reviewed data collection.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Amazon Mechanical Turk. http://www.mturk.com/.
|
| |
2
|
J. Angwin. On the offensive -- a problem for hot web outfits: Keeping pages free from porn. Wall Street Journal, May 2006.
|
| |
3
|
S. Argamon-Engelson and I. Dagan. Committee-based sample selection for probabilistic classiers. Journal of Artificial Intelligence Research, 1999.
|
| |
4
|
O. Benjelloun, H. Garcia-Molina, H. Kawai, T. Larson, D. Menestrina, Q. Su, S. Thavisomboon, and J. Widom. Generic Entity Resolution in the SERF Project. IEEE Data Engineering Bulletin, June 2006.
|
| |
5
|
J. Breese, D. Heckerman, and C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In Proc. of Uncertainty in Artificial Intelligence, 1998.
|
| |
6
|
|
| |
7
|
Helena Galhardas , Daniela Florescu , Dennis Shasha , Eric Simon , Cristian-Augustin Saita, Declarative Data Cleaning: Language, Model, and Algorithms, Proceedings of the 27th International Conference on Very Large Data Bases, p.371-380, September 11-14, 2001
|
 |
8
|
Craig Gentry , Zulfikar Ramzan , Stuart Stubblebine, Secure distributed human computation, Proceedings of the 6th ACM conference on Electronic commerce, p.155-164, June 05-08, 2005, Vancouver, BC, Canada
[doi> 10.1145/1064009.1064026]
|
| |
9
|
Google Image Labeler. http://images.google.com/imagelabeler/.
|
| |
10
|
J. Hipp, U. Guntzer, and U. Grimmer. Data quality mining -- making a virtue of necessity. In Proc. of SIGMOD DMKD Workshop, 2001.
|
| |
11
|
J. Howe. The rise of crowdsourcing. Wired, June 2006.
|
| |
12
|
A. Koblin. The sheep market: Two cents worth. Master's thesis, UCLA, 2006.
|
| |
13
|
|
| |
14
|
E. Rahm and H. Do. Data Cleaning: Problems and Current Approaches. IEEE Data Engineering Bulletin, December 2000.
|
 |
15
|
|
 |
16
|
|
| |
17
|
|
| |
18
|
Tenacious Search. http://openphi.net/tenacious/.
|
| |
19
|
|
 |
20
|
|
 |
21
|
|
 |
22
|
|
| |
23
|
L. von Ahn et al. The ESP Game. http://www.espgame.org/.
|
| |
24
|
Yahoo! Answers.http://answers.yahoo.com/.
|
| |
25
|
Yahoo! Suggestion Board. http://suggestions.yahoo.com/.
|
CITED BY 11
|
|
|
|
|
|
|
|
Eugene Agichtein , Carlos Castillo , Debora Donato , Aristides Gionis , Gilad Mishne, Finding high-quality content in social media, Proceedings of the international conference on Web search and web data mining, February 11-12, 2008, Palo Alto, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jiang Bian , Yandong Liu , Ding Zhou , Eugene Agichtein , Hongyuan Zha, Learning to recognize reliable users and content in social media with coupled mutual reinforcement, Proceedings of the 18th international conference on World wide web, April 20-24, 2009, Madrid, Spain
|
|
|
|
|
|
Xin-Jing Wang , Xudong Tu , Dan Feng , Lei Zhang, Ranking community answers by modeling question-answer relationships via analogical reasoning, Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, July 19-23, 2009, Boston, MA, USA
|
|