|
ABSTRACT
Web spam can significantly deteriorate the quality of search engines. Early web spamming techniques mainly manipulate page content. Since linkage information is widely used in web search, link-based spamming has also developed. So far, many techniques have been proposed to detect link spam. Those approaches are basically built on link-based web ranking methods. In contrast, we cast the link spam detection problem into a machine learning problem of classification on directed graphs. We develop discrete analysis on directed graphs, and construct a discrete analogue of classical regularization theory via discrete analysis. A classification algorithm for directed graphs is then derived from the discrete regularization. We have applied the approach to real-world link spam detection problems, and encouraging results have been obtained.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Ando and T. Zhang. Learning on graph with Laplacian regularization. In Advances in Neural Information Processing Systems 19, Cambridge, MA, 2007. MIT Press.
|
 |
2
|
|
| |
3
|
M. Belkin, I. Matveeva, and P. Niyogi. Regression and regularization on large graphs. In Proc. 17th Annual Conference on Learning Theory, 2004.
|
| |
4
|
|
| |
5
|
Andrei Broder , Ravi Kumar , Farzin Maghoul , Prabhakar Raghavan , Sridhar Rajagopalan , Raymie Stata , Andrew Tomkins , Janet Wiener, Graph structure in the Web, Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking, p.309-320, June 2000, Amsterdam, The Netherlands
|
| |
6
|
C. Carlos, D. Debora, G. Aristides, M. Vanessa, and S. Fabrizio. Know your neighbors: Web spam detection using the web topology. Technical report, 2006.
|
 |
7
|
Carlos Castillo , Debora Donato , Luca Becchetti , Paolo Boldi , Stefano Leonardi , Massimo Santini , Sebastiano Vigna, A reference collection for web spam, ACM SIGIR Forum, v.40 n.2, p.11-24, December 2006
[doi> 10.1145/1189702.1189703]
|
| |
8
|
F. Chung. Laplacian and the Cheeger inequality for directed graphs. Annals of Combinatorics, 9:1--19, 2005.
|
| |
9
|
F. Chung, A. Grigoryan, and S.-T. Yau. Higher eigenvalues and isoperimetric inequalities on Riemannian manifolds and graphs. Communications on Analysis and Geometry, 8:969--1026, 2000.
|
| |
10
|
D. A. Cohn, Z. Ghahramani, and M. I. Jordan. Active learning with statistical models. Journal of Artificial Intelligence Research, 4:129--145, 1996.
|
| |
11
|
S. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the Society for Information Science, 41(6):391--407, 1990.
|
| |
12
|
R, El-Yaniv and D. Pechyony. Stable transductive learning. In Proc. 19th Annual Conference on Computational Learning Theory, pages 35--49, 2006.
|
| |
13
|
Z. Gyongyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with trustrank. In Proc. 30th International Conference on Very Large Data Bases, pages 576--587, 2004.
|
 |
14
|
|
| |
15
|
M. Hein, J. Audibert, and U. von Luxburg. From graphs to manifolds - weak and strong pointwise consistency of graph Laplacians. In Proc. 18th Annual Conference on Learning Theory, pages 470--485, 2005.
|
| |
16
|
|
| |
17
|
T. Joachims. Transductive learning via spectral graph partitioning. In Proc. 20th I International Conference on Machine Learning, 2003.
|
| |
18
|
J. Jost. Riemannian Geometry and Geometric Analysis. Springer-Verlag, Berlin-Heidelberg, third edition, 2002.
|
 |
19
|
|
| |
20
|
J. Kleinberg, R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. The web as a graph: Measurements, models, and methods. In Proc. 5th International Conference on Computing and Combinatorics, 1999.
|
 |
21
|
|
| |
22
|
R. Raj and V. Krishnan. Web spam detection with anti-trust rank. In Proc. 2nd International Workshop on Adversarial Information Retrieval on the Web, pages 37--40, 2006.
|
| |
23
|
B. Schölkopf and A. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.
|
| |
24
|
|
| |
25
|
R. Tarjan. Depth first search and linear graph algorithms. SIAM Journal on Computing, 1:146--160, 1972.
|
| |
26
|
A. Tikhonov and V. Arsenin. Solutions of Ill-posed Problems. W. H. Winston, Washington, DC, 1977.
|
| |
27
|
V. Vapnik. Statistical Learning Theory. Wiley, NY, 1998.
|
| |
28
|
G. Wahba. Spline Models for Observational Data. Number 59 in CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia, 1990.
|
 |
29
|
|
| |
30
|
D. Zhou, O. Bousquet, T. Lal, J. Weston, and B. Schölkopf. Learning with local and global consistency. In Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 2004.
|
 |
31
|
|
| |
32
|
X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using Gaussian fields and harmonic functions. In Proc. 20th International Conference on Machine Learning, 2003.
|
CITED BY 6
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Reid Andersen , Christian Borgs , Jennifer Chayes , John Hopcroft , Kamal Jain , Vahab Mirrokni , Shanghua Teng, Robust PageRank and locally computable spam detection features, Proceedings of the 4th international workshop on Adversarial information retrieval on the web, April 22-22, 2008, Beijing, China
|
|
|
|
|