|
ABSTRACT
The quality of user-generated content varies drastically from excellent to abuse and spam. As the availability of such content increases, the task of identifying high-quality content sites based on user contributions --social media sites -- becomes increasingly important. Social media in general exhibit a rich variety of information sources: in addition to the content itself, there is a wide array of non-content information available, such as links between items and explicit quality ratings from members of the community. In this paper we investigate methods for exploiting such community feedback to automatically identify high quality content. As a test case, we focus on Yahoo! Answers, a large community question/answering portal that is particularly rich in the amount and types of content and social interactions available in it. We introduce a general classification framework for combining the evidence from different sources of information, that can be tuned automatically for a given social media type and quality definition. In particular, for the community question/answering domain, we show that our system is able to separate high-quality items from the rest with an accuracy close to that of humans
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Eugene Agichtein , Eric Brill , Susan Dumais , Robert Ragno, Learning user interaction models for predicting web search result preferences, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
[doi> 10.1145/1148170.1148175]
|
 |
2
|
|
| |
3
|
|
| |
4
|
Y. Attali and J. Burstein. Automated essay scoring with e-rater v. 2. Journal of Technology, Learning, and Assessment, 4(3), February 2006.
|
| |
5
|
|
| |
6
|
Jill Burstein , Karen Kukich , Susanne Wolff , Chi Lu , Martin Chodorow , Lisa Braden-Harder , Mary Dee Harris, Automated scoring using a hybrid feature identification technique, Proceedings of the 17th international conference on Computational linguistics, August 10-14, 1998, Montreal, Quebec, Canada
|
| |
7
|
|
 |
8
|
Christopher S. Campbell , Paul P. Maglio , Alex Cozzi , Byron Dom, Expertise identification using email communications, Proceedings of the twelfth international conference on Information and knowledge management, November 03-08, 2003, New Orleans, LA, USA
[doi> 10.1145/956863.956965]
|
| |
9
|
|
| |
10
|
M. Collins and Y. Singer. Unsupervised models for named entity classification. In Natural Language Processing and Very Large Corpora, 1999.
|
 |
11
|
|
| |
12
|
|
| |
13
|
|
 |
14
|
R. Guha , Ravi Kumar , Prabhakar Raghavan , Andrew Tomkins, Propagation of trust and distrust, Proceedings of the 13th international conference on World Wide Web, May 17-20, 2004, New York, NY, USA
[doi> 10.1145/988672.988727]
|
| |
15
|
R. Gunning. The technique of clear writing. McGraw-Hill, 1952.
|
| |
16
|
F. Heylighen and J.-M. Dewaele. Variation in the contextuality of language: An empirical measure. Context in Context. Special issue Foundations of Science,7(3):293--340, 2002.
|
 |
17
|
Jiwoon Jeon , W. Bruce Croft , Joon Ho Lee , Soyeon Park, A framework to predict the quality of answers with non-textual features, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
[doi> 10.1145/1148170.1148212]
|
 |
18
|
Thorsten Joachims , Laura Granka , Bing Pan , Helene Hembrooke , Geri Gay, Accurately interpreting clickthrough data as implicit feedback, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
[doi> 10.1145/1076034.1076063]
|
 |
19
|
|
 |
20
|
|
| |
21
|
J. P. Kincaid, R. P. Fishburn, R. L. Rogers, and B. S. Chissom. Derivation of new readability formulas for navy enlisted personnel. Technical Report Research Branch Report 8-75, Millington, Tenn, Naval Air Station, 1975.
|
 |
22
|
|
| |
23
|
G. H. McLaughlin. SMOG grading: A new readability formula. Journal of Reading, 12(8):639--646, 1969.
|
| |
24
|
E. B. Page. Computer grading of student prose, using modern concepts and software. Journal of Experimental Education, 62(2), 1994.
|
| |
25
|
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: bringing order to the Web. Technical report, Stanford Digital Library Technologies Project, 1998.
|
| |
26
|
|
| |
27
|
L. Prescott. Yahoo! Answers captures 96% of Q and A market share, 2006.
|
| |
28
|
L. M. Rudner and T. Liang. Automated essay scoring using bayes. Journal of Technology, Learning, and Assessment, 1(2), June 2002.
|
| |
29
|
C. Sang-Hun. To outdo Google, Naver taps into Korea's collective wisdom. International Herald Tribune, July 4 2007.
|
| |
30
|
J. P. Scott. Social Network Analysis: A Handbook. SAGE Publications, January 2000.
|
 |
31
|
Qi Su , Dmitry Pavlov , Jyh-Herng Chow , Wendell C. Baker, Internet-scale collection of human-reviewed data, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
[doi> 10.1145/1242572.1242604]
|
 |
32
|
|
| |
33
|
|
CITED BY 28
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jinwen Guo , Shengliang Xu , Shenghua Bao , Yong Yu, Tapping on the potential of q&a community by recommending answer providers, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
Marc Smith , Vladimir Barash , Lise Getoor , Hady W. Lauw, Leveraging social context for searching social media, Proceeding of the 2008 ACM workshop on Search in social media, October 30-30, 2008, Napa Valley, California, USA
|
|
|
Manos Tsagkias , Martha Larson , Wouter Weerkamp , Maarten de Rijke, PodCred: a framework for analyzing podcast preference, Proceeding of the 2nd ACM workshop on Information credibility on the web, October 30-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jiang Bian , Yandong Liu , Ding Zhou , Eugene Agichtein , Hongyuan Zha, Learning to recognize reliable users and content in social media with coupled mutual reinforcement, Proceedings of the 18th international conference on World wide web, April 20-24, 2009, Madrid, Spain
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Xin-Jing Wang , Xudong Tu , Dan Feng , Lei Zhang, Ranking community answers by modeling question-answer relationships via analogical reasoning, Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, July 19-23, 2009, Boston, MA, USA
|
|
|
|
|
|
|
|
|
|
|
|
Kevin Kyung Nam , Mark S. Ackerman , Lada A. Adamic, Questions in, knowledge in?: a study of naver's question answering community, Proceedings of the 27th international conference on Human factors in computing systems, April 04-09, 2009, Boston, MA, USA
|
REVIEW
"Klaus K. Obermeier : Reviewer"
The presented research proposes a framework for establishing a quality control system for user input into question answering (QA) systems such as Yahoo! Answers. This is a very well-presented and easy-to-read investigation into ways to improve ver
more...
|