ACM Home Page
Please provide us with feedback. Feedback
Transductive link spam detection
Full text PdfPdf (346 KB)
Source AIRWeb; Vol. 215 archive
Proceedings of the 3rd international workshop on Adversarial information retrieval on the web table of contents
Banff, Alberta, Canada
SESSION: Temporal and topological factors table of contents
Pages: 21 - 28  
Year of Publication: 2007
ISBN:978-1-59593-732-2
Authors
Dengyong Zhou  Microsoft Research, Redmond, WA
Christopher J. C. Burges  Microsoft Research, Redmond, WA
Tao Tao  Microsoft Corp., Redmond, WA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 64,   Citation Count: 6
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1244408.1244413
What is a DOI?

ABSTRACT

Web spam can significantly deteriorate the quality of search engines. Early web spamming techniques mainly manipulate page content. Since linkage information is widely used in web search, link-based spamming has also developed. So far, many techniques have been proposed to detect link spam. Those approaches are basically built on link-based web ranking methods.

In contrast, we cast the link spam detection problem into a machine learning problem of classification on directed graphs. We develop discrete analysis on directed graphs, and construct a discrete analogue of classical regularization theory via discrete analysis. A classification algorithm for directed graphs is then derived from the discrete regularization. We have applied the approach to real-world link spam detection problems, and encouraging results have been obtained.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A. Ando and T. Zhang. Learning on graph with Laplacian regularization. In Advances in Neural Information Processing Systems 19, Cambridge, MA, 2007. MIT Press.
2
 
3
M. Belkin, I. Matveeva, and P. Niyogi. Regression and regularization on large graphs. In Proc. 17th Annual Conference on Learning Theory, 2004.
 
4
 
5
 
6
C. Carlos, D. Debora, G. Aristides, M. Vanessa, and S. Fabrizio. Know your neighbors: Web spam detection using the web topology. Technical report, 2006.
7
 
8
F. Chung. Laplacian and the Cheeger inequality for directed graphs. Annals of Combinatorics, 9:1--19, 2005.
 
9
F. Chung, A. Grigoryan, and S.-T. Yau. Higher eigenvalues and isoperimetric inequalities on Riemannian manifolds and graphs. Communications on Analysis and Geometry, 8:969--1026, 2000.
 
10
D. A. Cohn, Z. Ghahramani, and M. I. Jordan. Active learning with statistical models. Journal of Artificial Intelligence Research, 4:129--145, 1996.
 
11
S. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the Society for Information Science, 41(6):391--407, 1990.
 
12
R, El-Yaniv and D. Pechyony. Stable transductive learning. In Proc. 19th Annual Conference on Computational Learning Theory, pages 35--49, 2006.
 
13
Z. Gyongyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with trustrank. In Proc. 30th International Conference on Very Large Data Bases, pages 576--587, 2004.
14
 
15
M. Hein, J. Audibert, and U. von Luxburg. From graphs to manifolds - weak and strong pointwise consistency of graph Laplacians. In Proc. 18th Annual Conference on Learning Theory, pages 470--485, 2005.
 
16
 
17
T. Joachims. Transductive learning via spectral graph partitioning. In Proc. 20th I International Conference on Machine Learning, 2003.
 
18
J. Jost. Riemannian Geometry and Geometric Analysis. Springer-Verlag, Berlin-Heidelberg, third edition, 2002.
19
 
20
J. Kleinberg, R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. The web as a graph: Measurements, models, and methods. In Proc. 5th International Conference on Computing and Combinatorics, 1999.
21
 
22
R. Raj and V. Krishnan. Web spam detection with anti-trust rank. In Proc. 2nd International Workshop on Adversarial Information Retrieval on the Web, pages 37--40, 2006.
 
23
B. Schölkopf and A. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.
 
24
 
25
R. Tarjan. Depth first search and linear graph algorithms. SIAM Journal on Computing, 1:146--160, 1972.
 
26
A. Tikhonov and V. Arsenin. Solutions of Ill-posed Problems. W. H. Winston, Washington, DC, 1977.
 
27
V. Vapnik. Statistical Learning Theory. Wiley, NY, 1998.
 
28
G. Wahba. Spline Models for Observational Data. Number 59 in CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia, 1990.
29
 
30
D. Zhou, O. Bousquet, T. Lal, J. Weston, and B. Schölkopf. Learning with local and global consistency. In Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 2004.
31
 
32
X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using Gaussian fields and harmonic functions. In Proc. 20th International Conference on Machine Learning, 2003.


Collaborative Colleagues:
Dengyong Zhou: colleagues
Christopher J. C. Burges: colleagues
Tao Tao: colleagues