| Automatic seed set expansion for trust propagation based anti-spamming algorithms |
| Full text |
Pdf
(1.24 MB)
|
Source
|
Conference on Information and Knowledge Management
archive
Proceeding of the eleventh international workshop on Web information and data management
table of contents
Hong Kong, China
SESSION: Web algorithms
table of contents
Pages: 31-38
Year of Publication: 2009
ISBN:978-1-60558-808-7
|
|
Authors
|
|
Xianchao Zhang
|
Dalian University of Technology, Dalian, China
|
|
Bo Han
|
Dalian University of Technology, Dalian, China
|
|
Wenxin Liang
|
Dalian University of Technology, Dalian, China
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 10, Downloads (12 Months): 20, Citation Count: 0
|
|
|
ABSTRACT
Seed sets are of significant importance for trust propagation based anti-spamming algorithms, e.g., TrustRank. Conventional approaches require manual evaluation to construct a seed set, which restricts the seed set to be small in size, since it would cost too much and may even be impossible to construct a very large seed set manually. The small-sized seed set can cause detrimental effect on the final ranking results. Thus, it is desirable to automatically expand an initial seed set to a much larger one. In this paper, we propose the first automatic seed set expansion algorithm (ASE), which expands a small seed set by selecting reputable seeds that are found and guaranteed to be reputable through a joint recommendation link structure. Experimental results on the WEBSPAM-2007 dataset show that with the same manual evaluation efforts, ASE can automatically obtain a large number of reputable seeds with high precision, thus significantly improving the performance of the baseline algorithm in terms of both reputable site promotion and spam site demotion.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
L. Adamic and E. Adar. Friends and neighbors on the web. Social Networks, 25(3): 211--230. 2003.
|
| |
2
|
L. Becchetti, C. Castillo, D. Donatol, S. Leonardi, and R. Baeza-Yates. Using rank propagation and probabilistic counting for link-based spam detection. In Proc. of WebKDD, 2006.
|
| |
3
|
A. A. Benczur, K. Csalogany, T. Sarlos, and M. Uher. SpamRank-fully automatic link spam detection. In Workshop of AIRWeb, pages 25--38, 2005.
|
| |
4
|
P. Berkhin. A survey on PageRank computing. Internet Mathematics, 2(1): 73--120, 2005.
|
 |
5
|
|
 |
6
|
|
| |
7
|
|
| |
8
|
Z. Gyongyi, H. Garcia-Molina. Web spam taxonomy. In Workshop of AIRWeb, pages 39--47, 2005.
|
| |
9
|
|
 |
10
|
|
 |
11
|
|
| |
12
|
V. Krishnan and R. Raj. Web spam detection with anti-trust rank. In Workshop of AIRWeb, pages 37--40, 2006.
|
 |
13
|
Yuting Liu , Bin Gao , Tie-Yan Liu , Ying Zhang , Zhiming Ma , Shuyuan He , Hang Li, BrowseRank: letting web users vote for page importance, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, July 20-24, 2008, Singapore, Singapore
[doi> 10.1145/1390334.1390412]
|
| |
14
|
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: bringing order to the web. Technical Report, Stanford University, 1998.
|
 |
15
|
|
 |
16
|
|
 |
17
|
|
| |
18
|
B. N. Wu, V. Goel, and B. D. Davison. Propagating trust and distrust to demote web spam. In Proc. of MTW, 2006.
|
 |
19
|
|
| |
20
|
B. Zhou, J. Pei, and Z. Tang. A spamicity approach to web spam detection. In Proc. of SDM, pages 24--26, 2008.
|
| |
21
|
Yahoo! Research: "Web Spam Collections". http://barcelona.research.yahoo.net/WEBSPAM-2007/datasets/ Crawled by the Laboratory of Web Algorithmics, University of Milan, http://law.dsi.unimi.it/. URLs retrieved 05, 2007.
|
|