ACM Home Page
Please provide us with feedback. Feedback
Identifying link farm spam pages
Full text PdfPdf (261 KB)
Source International World Wide Web Conference archive
Special interest tracks and posters of the 14th international conference on World Wide Web table of contents
Chiba, Japan
SESSION: Industrial and practical experience track paper session 1 table of contents
Pages: 820 - 829  
Year of Publication: 2005
ISBN:1-59593-051-5
Authors
Baoning Wu  Lehigh University, Bethlehem, PA
Brian D. Davison  Lehigh University, Bethlehem, PA
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 142,   Citation Count: 39
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1062745.1062762
What is a DOI?

ABSTRACT

With the increasing importance of search in guiding today's web traffic, more and more effort has been spent to create search engine spam. Since link analysis is one of the most important factors in current commercial search engines' ranking systems, new kinds of spam aiming at links have appeared. Building link farms is one technique that can deteriorate link-based ranking algorithms. In this paper, we present algorithms for detecting these link farms automatically by first generating a seed set based on the common link set between incoming and outgoing links of Web pages and then expanding it. Links between identified pages are re-weighted, providing a modified web graph to use in ranking page importance. Experimental results show that we can identify most link farm spam pages and the final ranking results are improved for almost all tested queries.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Pr0 - Google's PageRank 0, 2002. http://pr.efactory.de/e-pr0.shtml.
 
2
Lycos 50, 2005. http://50.lycos.com/.
 
3
Open directory project, 2005. http://dmoz.org/.
4
 
5
6
7
 
8
 
9
10
 
11
 
12
 
13
B. D. Davison. Recognizing nepotistic links on the Web. In Artificial Intelligence for Web Search, pages 23--28. AAAI Press, Jul 2000. Technical Report WS-00-01.
14
 
15
Z. Gyongyi and H. Garcia-Molina. Web spam taxonomy. Technical report, Stanford Digital Library Technologies Project, Mar. 2004.
 
16
Z. Gyongyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with TrustRank. In Proceedings of the 30th VLDB Conference, Sept. 2004.
17
 
18
 
19
D. Lemin. Google Zeitgeist, Dec. 2004. http://www.google.com/press/zeitgeist.html.
 
20
21
 
22
A. Perkins. White paper: The classification of search engine spam, Sept. 2001. Online at http://www.silverdisc.co.uk/articles/spam-classification/.
 
23
G. O. Roberts and J. S. Rosenthal. Downweighting tightly knit communities in world wide web rankings. Advances and Applications in Statistics, 3(3):199--216, Dec. 2003.
 
24
D. Sullivan. Search engine optimization, Apr. 2000. Online at http://searchenginewatch.com/resources/article.php/2156511.
 
25
H. Zhang, A. Goel, R. Govindan, K. Mason, and B. V. Roy. Making eigenvector-based reputation systems robust to collusions. In Proceedings of the Third Workshop on Algorithms and Models for the Web Graph, Oct. 2004.

CITED BY  39

Collaborative Colleagues:
Baoning Wu: colleagues
Brian D. Davison: colleagues