|
ABSTRACT
Online social media draws heavily on active reader participation, such as voting or rating of news stories, articles, or responses to a question. This user feedback is invaluable for ranking, filtering, and retrieving high quality content - tasks that are crucial with the explosive amount of social content on the web. Unfortunately, as social media moves into the mainstream and gains in popularity, the quality of the user feedback degrades. Some of this is due to noise, but, increasingly, a small fraction of malicious users are trying to "game the system" by selectively promoting or demoting content for profit, or fun. Hence, an effective ranking of social media content must be robust to noise in the user interactions, and in particular to vote spam. We describe a machine learning based ranking framework for social media that integrates user interactions and content relevance, and demonstrate its effec- tiveness for answer retrieval in a popular community question answering portal. We consider several vote spam attacks, and introduce a method of training our ranker to increase its robustness to some common forms of vote spam attacks. The results of our large-scale experimental evaluation show that our ranker is signifcicantly more robust to vote spam compared to a state-of-the-art baseline as well as the ranker not explicitly trained to handle malicious interactions.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Eugene Agichtein , Carlos Castillo , Debora Donato , Aristides Gionis , Gilad Mishne, Finding high-quality content in social media, Proceedings of the international conference on Web search and web data mining, February 11-12, 2008, Palo Alto, California, USA
[doi> 10.1145/1341531.1341557]
|
 |
3
|
|
| |
4
|
|
| |
5
|
J. Friedman. Greedy function approximation: a gradient boosting machine. In Ann. Statist., 2001.
|
| |
6
|
N. Immorlica, K. Jain, M. Mahdian, and K. Talwar. Click fraud resistant methods for learning click-through rates. In Workshop on Internet and Network Economics (WINE), 2005.
|
| |
7
|
B. J. Jansen. Adversarial information retrieval aspects of sponsored search. In Proc. of the 2nd international workshop on adversarial information retrieval on the web (AIRWeb), 2006.
|
 |
8
|
|
 |
9
|
Jiwoon Jeon , W. Bruce Croft , Joon Ho Lee , Soyeon Park, A framework to predict the quality of answers with non-textual features, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
[doi> 10.1145/1148170.1148212]
|
 |
10
|
|
 |
11
|
Thorsten Joachims , Laura Granka , Bing Pan , Helene Hembrooke , Geri Gay, Accurately interpreting clickthrough data as implicit feedback, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
[doi> 10.1145/1076034.1076063]
|
 |
12
|
|
 |
13
|
|
 |
14
|
|
| |
15
|
|
| |
16
|
F. Radlinski. Addressing malicious noise in clickthrough data. In Proc. of the 3rd international workshop on adversarial information retrieval on the web (AIRWeb), 2007.
|
| |
17
|
F. Radlinski and T. Joachims. Minimally invasive randomization for collecting unbiased preferences from clickthrough logs. In Proc. of the National Conference on Artificial Intelligence (AAAI), 2006.
|
 |
18
|
Qi Su , Dmitry Pavlov , Jyh-Herng Chow , Wendell C. Baker, Internet-scale collection of human-reviewed data, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
[doi> 10.1145/1242572.1242604]
|
| |
19
|
E. M. Voorhees. Overview of the TREC 2003 question answering track. In Text REtrieval Conference, 2003.
|
 |
20
|
|
CITED BY 2
|
|
|
|
|
Xin-Jing Wang , Xudong Tu , Dan Feng , Lei Zhang, Ranking community answers by modeling question-answer relationships via analogical reasoning, Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, July 19-23, 2009, Boston, MA, USA
|
|