| Detecting nepotistic links by language model disagreement |
| Full text |
Pdf
(65 KB)
|
| Source
|
International World Wide Web Conference
archive
Proceedings of the 15th international conference on World Wide Web
table of contents
Edinburgh, Scotland
POSTER SESSION: Browsers and UI, web engineering, hypermedia & multimedia, security, and accessibility
table of contents
Pages: 939 - 940
Year of Publication: 2006
ISBN:1-59593-323-9
|
|
Authors
|
|
András A. Benczúr
|
Hungarian Academy of Sciences (MTA SZTAKI) and Eötvös University, Budapest
|
|
István Bíró
|
Hungarian Academy of Sciences (MTA SZTAKI) and Eötvös University, Budapest
|
|
Károly Csalogány
|
Hungarian Academy of Sciences (MTA SZTAKI) and Eötvös University, Budapest
|
|
Máté Uher
|
Hungarian Academy of Sciences (MTA SZTAKI) and Eötvös University, Budapest
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 2, Downloads (12 Months): 42, Citation Count: 3
|
|
|
ABSTRACT
In this short note we demonstrate the applicability of hyperlink downweighting by means of language model disagreement. The method filters out hyperlinks with no relevance to the target page without the need of white and blacklists or human interaction. We fight various forms of nepotism such as common maintainers, ads, link exchanges or misused affiliate programs. Our method is tested on a 31 M page crawl of the .de domain with a manually classified 1000-page random sample.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
A. A. Benczúr, K. Csalogány, T. Sarlós, and M. Uher. SpamRank -- Fully automatic link spam detection. In Proc. 1st AIRWeb, 2005.
|
| |
3
|
B. D. Davison. Recognizing nepotistic links on the web. In AAAI-2000 Workshop on Artificial Intelligence for Web Search, pages 23--28, Austin, TX, 2000.
|
 |
4
|
|
| |
5
|
Z. Gyöngyi and H. Garcia-Molina. Web spam taxonomy. In Proc. 1st AIRWeb, Chiba, Japan, 2005.
|
| |
6
|
Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with TrustRank. In Proc. 30th VLDB, pages 576--587, Toronto, Canada, 2004.
|
| |
7
|
G. Mishne, D. Carmel, and R. Lempel. Blocking blog spam with language model disagreement. In Proc. 1st AIRWeb, Chiba, Japan, 2005.
|
CITED BY 3
|
|
Carlos Castillo , Debora Donato , Luca Becchetti , Paolo Boldi , Stefano Leonardi , Massimo Santini , Sebastiano Vigna, A reference collection for web spam, ACM SIGIR Forum, v.40 n.2, p.11-24, December 2006
|
|
|
|
|
|
|
|