ACM Home Page
Please provide us with feedback. Feedback
Finding high-quality content in social media
Full text PdfPdf (582 KB)
Source
Web Search and Web Data Mining archive
Proceedings of the international conference on Web search and web data mining table of contents
Palo Alto, California, USA
SESSION: Social search table of contents
Pages 183-194  
Year of Publication: 2008
ISBN:978-1-59593-927-9
Authors
Eugene Agichtein  Emory University, Atlanta, Georgia
Carlos Castillo  Yahoo! Research, Barcelona, Spain
Debora Donato  Yahoo! Research, Barcelona, Spain
Aristides Gionis  Yahoo! Research, Barcelona, Spain
Gilad Mishne  Search and Advertising, Sciences, Yahoo!
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 144,   Downloads (12 Months): 749,   Citation Count: 28
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1341531.1341557
What is a DOI?

ABSTRACT

The quality of user-generated content varies drastically from excellent to abuse and spam. As the availability of such content increases, the task of identifying high-quality content sites based on user contributions --social media sites -- becomes increasingly important. Social media in general exhibit a rich variety of information sources: in addition to the content itself, there is a wide array of non-content information available, such as links between items and explicit quality ratings from members of the community. In this paper we investigate methods for exploiting such community feedback to automatically identify high quality content. As a test case, we focus on Yahoo! Answers, a large community question/answering portal that is particularly rich in the amount and types of content and social interactions available in it. We introduce a general classification framework for combining the evidence from different sources of information, that can be tuned automatically for a given social media type and quality definition. In particular, for the community question/answering domain, we show that our system is able to separate high-quality items from the rest with an accuracy close to that of humans


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
 
4
Y. Attali and J. Burstein. Automated essay scoring with e-rater v. 2. Journal of Technology, Learning, and Assessment, 4(3), February 2006.
 
5
 
6
 
7
8
 
9
 
10
M. Collins and Y. Singer. Unsupervised models for named entity classification. In Natural Language Processing and Very Large Corpora, 1999.
11
 
12
 
13
14
 
15
R. Gunning. The technique of clear writing. McGraw-Hill, 1952.
 
16
F. Heylighen and J.-M. Dewaele. Variation in the contextuality of language: An empirical measure. Context in Context. Special issue Foundations of Science,7(3):293--340, 2002.
17
18
19
20
 
21
J. P. Kincaid, R. P. Fishburn, R. L. Rogers, and B. S. Chissom. Derivation of new readability formulas for navy enlisted personnel. Technical Report Research Branch Report 8-75, Millington, Tenn, Naval Air Station, 1975.
22
 
23
G. H. McLaughlin. SMOG grading: A new readability formula. Journal of Reading, 12(8):639--646, 1969.
 
24
E. B. Page. Computer grading of student prose, using modern concepts and software. Journal of Experimental Education, 62(2), 1994.
 
25
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: bringing order to the Web. Technical report, Stanford Digital Library Technologies Project, 1998.
 
26
 
27
L. Prescott. Yahoo! Answers captures 96% of Q and A market share, 2006.
 
28
L. M. Rudner and T. Liang. Automated essay scoring using bayes. Journal of Technology, Learning, and Assessment, 1(2), June 2002.
 
29
C. Sang-Hun. To outdo Google, Naver taps into Korea's collective wisdom. International Herald Tribune, July 4 2007.
 
30
J. P. Scott. Social Network Analysis: A Handbook. SAGE Publications, January 2000.
31
32
 
33

CITED BY  28


REVIEW

"Klaus K. Obermeier : Reviewer"

The presented research proposes a framework for establishing a quality control system for user input into question answering (QA) systems such as Yahoo! Answers. This is a very well-presented and easy-to-read investigation into ways to improve ver  more...

Collaborative Colleagues:
Eugene Agichtein: colleagues
Carlos Castillo: colleagues
Debora Donato: colleagues
Aristides Gionis: colleagues
Gilad Mishne: colleagues