| A study of link farm distribution and evolution using a time series of web snapshots |
| Full text |
Pdf
(6.77 MB)
|
| Source
|
ACM International Conference Proceeding Series
archive
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
table of contents
Madrid, Spain
SESSION: Temporal analysis
table of contents
Pages 9-16
Year of Publication: 2009
ISBN:978-1-60558-438-6
|
|
Authors
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 11, Downloads (12 Months): 38, Citation Count: 0
|
|
|
ABSTRACT
In this paper, we study the overall link-based spam structure and its evolution which would be helpful for the development of robust analysis tools and research for Web spamming as a social activity in the cyber space. First, we use strongly connected component (SCC) decomposition to separate many link farms from the largest SCC, so called the core. We show that denser link farms in the core can be extracted by node filtering and recursive application of SCC decomposition to the core. Surprisingly, we can find new large link farms during each iteration and this trend continues until at least 10 iterations. In addition, we measure the spamicity of such link farms. Next, the evolution of link farms is examined over two years. Results show that almost all large link farms do not grow anymore while some of them shrink, and many large link farms are created in one year.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
M. Kitsuregawa, T. Tamura, M. Toyoda and N. Kaji. Socio-Sense:A system for analysing the societal behavior from long term Web archive, In Proceedings of 10th Asia-Pacific Web conference, 2008.
|
 |
4
|
|
 |
5
|
|
| |
6
|
|
 |
7
|
Hiroo Saito , Masashi Toyoda , Masaru Kitsuregawa , Kazuyuki Aihara, A large-scale study of link spam detection by graph algorithms, Proceedings of the 3rd international workshop on Adversarial information retrieval on the web, May 08-08, 2007, Banff, Alberta, Canada
[doi> 10.1145/1244408.1244417]
|
 |
8
|
Dennis Fetterly , Mark Manasse , Marc Najork, Spam, damn spam, and statistics: using statistical analysis to locate spam web pages, Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004, June 17-18, 2004, Paris, France
[doi> 10.1145/1017074.1017077]
|
| |
9
|
Andrei Broder , Ravi Kumar , Farzin Maghoul , Prabhakar Raghavan , Sridhar Rajagopalan , Raymie Stata , Andrew Tomkins , Janet Wiener, Graph structure in the Web, Computer Networks: The International Journal of Computer and Telecommunications Networking, v.33 n.1-6, p.309-320, June 2000
|
| |
10
|
Z. Gyöngyi and H. Garcia-Molina. Web spam taxonomy. In Proceedings of the 1st international workshop on Adversarial information retrieval on the Web, 2005.
|
| |
11
|
Z. Gyöngyi and H. Molina. Link Spam Alliance In Proceedings of the 31st international conference on Very large Data Bases, 2005.
|
| |
12
|
|
| |
13
|
A. A. Benczúr, K Csalogány, T Sarlós and M. Uher. SpamRank-fully automatic link spam detection. In Proceedings of the 1st international workshop on Adversarial information retrieval on the Web, 2005.
|
| |
14
|
L. Becchetti, C. Castillo, D. Donato, S. Leonardi and R. Baeza-Yates. Link-based characterization and detection of Web spam. In Proceedings of the 2nd international workshop on Adversarial information retrieval on the Web, 2006.
|
 |
15
|
André Luiz da Costa Carvalho , Paul - Alexandru Chirita , Edleno Silva de Moura , Pável Calado , Wolfgang Nejdl, Site level noise removal for search engines, Proceedings of the 15th international conference on World Wide Web, May 23-26, 2006, Edinburgh, Scotland
[doi> 10.1145/1135777.1135793]
|
 |
16
|
|
 |
17
|
|
 |
18
|
Carlos Castillo , Debora Donato , Luca Becchetti , Paolo Boldi , Stefano Leonardi , Massimo Santini , Sebastiano Vigna, A reference collection for web spam, ACM SIGIR Forum, v.40 n.2, p.11-24, December 2006
[doi> 10.1145/1189702.1189703]
|
| |
19
|
Internet Archive Wayback Machine. http://www.archive.org.
|
| |
20
|
Y. Fujiwara, C. Di Guilmi, H. Aoyama, M. Gallegati and W. Souma. Do Pareto-Zipf and Gibrat laws hold true? An analysis with European firms. Physica A(335), 2004, pp. 197--216.
|
|