ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Automatic seed set expansion for trust propagation based anti-spamming algorithms
Full text PdfPdf (1.24 MB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the eleventh international workshop on Web information and data management table of contents
Hong Kong, China
SESSION: Web algorithms table of contents
Pages: 31-38  
Year of Publication: 2009
ISBN:978-1-60558-808-7
Authors
Xianchao Zhang  Dalian University of Technology, Dalian, China
Bo Han  Dalian University of Technology, Dalian, China
Wenxin Liang  Dalian University of Technology, Dalian, China
Sponsors
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 20,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1651587.1651596
What is a DOI?

ABSTRACT

Seed sets are of significant importance for trust propagation based anti-spamming algorithms, e.g., TrustRank. Conventional approaches require manual evaluation to construct a seed set, which restricts the seed set to be small in size, since it would cost too much and may even be impossible to construct a very large seed set manually. The small-sized seed set can cause detrimental effect on the final ranking results. Thus, it is desirable to automatically expand an initial seed set to a much larger one. In this paper, we propose the first automatic seed set expansion algorithm (ASE), which expands a small seed set by selecting reputable seeds that are found and guaranteed to be reputable through a joint recommendation link structure. Experimental results on the WEBSPAM-2007 dataset show that with the same manual evaluation efforts, ASE can automatically obtain a large number of reputable seeds with high precision, thus significantly improving the performance of the baseline algorithm in terms of both reputable site promotion and spam site demotion.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
L. Adamic and E. Adar. Friends and neighbors on the web. Social Networks, 25(3): 211--230. 2003.
 
2
L. Becchetti, C. Castillo, D. Donatol, S. Leonardi, and R. Baeza-Yates. Using rank propagation and probabilistic counting for link-based spam detection. In Proc. of WebKDD, 2006.
 
3
A. A. Benczur, K. Csalogany, T. Sarlos, and M. Uher. SpamRank-fully automatic link spam detection. In Workshop of AIRWeb, pages 25--38, 2005.
 
4
P. Berkhin. A survey on PageRank computing. Internet Mathematics, 2(1): 73--120, 2005.
5
6
 
7
 
8
Z. Gyongyi, H. Garcia-Molina. Web spam taxonomy. In Workshop of AIRWeb, pages 39--47, 2005.
 
9
10
11
 
12
V. Krishnan and R. Raj. Web spam detection with anti-trust rank. In Workshop of AIRWeb, pages 37--40, 2006.
13
 
14
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: bringing order to the web. Technical Report, Stanford University, 1998.
15
16
17
 
18
B. N. Wu, V. Goel, and B. D. Davison. Propagating trust and distrust to demote web spam. In Proc. of MTW, 2006.
19
 
20
B. Zhou, J. Pei, and Z. Tang. A spamicity approach to web spam detection. In Proc. of SDM, pages 24--26, 2008.
 
21
Yahoo! Research: "Web Spam Collections". http://barcelona.research.yahoo.net/WEBSPAM-2007/datasets/ Crawled by the Laboratory of Web Algorithmics, University of Milan, http://law.dsi.unimi.it/. URLs retrieved 05, 2007.

Collaborative Colleagues:
Xianchao Zhang: colleagues
Bo Han: colleagues
Wenxin Liang: colleagues