| Beyond blacklists: learning to detect malicious web sites from suspicious URLs |
| Full text |
Mov
(10:30),
Pdf
(361 KB)
|
Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Paris, France
SESSION: Industrial track papers
table of contents
Pages 1245-1254
Year of Publication: 2009
ISBN:978-1-60558-495-9
|
|
Authors
|
|
Justin Ma
|
UC San Diego, La Jolla, CA, USA
|
|
Lawrence K. Saul
|
UC San Diego, La Jolla, CA, USA
|
|
Stefan Savage
|
UC San Diego, La Jolla, CA, USA
|
|
Geoffrey M. Voelker
|
UC San Diego, La Jolla, CA, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 41, Downloads (12 Months): 135, Citation Count: 1
|
|
|
ABSTRACT
Malicious Web sites are a cornerstone of Internet criminal activities. As a result, there has been broad interest in developing systems to prevent the end user from visiting such sites. In this paper, we describe an approach to this problem based on automated URL classification, using statistical methods to discover the tell-tale lexical and host-based properties of malicious Web site URLs. These methods are able to learn highly predictive models by extracting and automatically analyzing tens of thousands of features potentially indicative of suspicious URLs. The resulting classifiers obtain 95-99% accuracy, detecting large numbers of malicious Web sites from their URLs, with only modest false positives.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Saeed Abu-Nimeh , Dario Nappa , Xinlei Wang , Suku Nair, A comparison of machine learning techniques for phishing detection, Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit, p.60-69, October 04-05, 2007, Pittsburgh, Pennsylvania
[doi> 10.1145/1299015.1299021]
|
| |
2
|
Against Intuition. WOT Web of Trust. http://www.mywot.com.
|
| |
3
|
David S. Anderson , Chris Fleizach , Stefan Savage , Geoffrey M. Voelker, Spamscatter: characterizing internet scam hosting infrastructure, Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium, p.1-14, August 06-10, 2007, Boston, MA
|
| |
4
|
A. Bergholz, J.-H. Chang, G. Paaß, F. Reichartz, and S. Strobel. Improved Phishing Detection using Model-Based Features. In Proceedings of the Conference on Email and Anti-Spam (CEAS), Mountain View, CA, Aug. 2008.
|
| |
5
|
|
| |
6
|
C.-C. Chang and C.-J. Lin. LIBSVM: A Library for Support Vector Machines. http://www.csie.ntu.edu.tw/ cjlin/libsvm/.
|
| |
7
|
R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A Library for Large Linear Classification. http://www.csie.ntu.edu.tw/ cjlin/liblinear/.
|
 |
8
|
|
 |
9
|
Sujata Garera , Niels Provos , Monica Chew , Aviel D. Rubin, A framework for detection and measurement of phishing attacks, Proceedings of the 2007 ACM workshop on Recurring malcode, November 02-02, 2007, Alexandria, Virginia, USA
[doi> 10.1145/1314389.1314391]
|
| |
10
|
Google. Google Toolbar. http://tools.google.com/firefox/toolbar/.
|
| |
11
|
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Publishing Company, New York, NY, 2001.
|
| |
12
|
IronPort. IronPort Web Reputation: Protect and Defend Against URL-Based Threat. IronPort White Paper, 2008.
|
| |
13
|
P. Kolari, T. Finin, and A. Joshi. SVMs for the Blogosphere: Blog Identification and Splog Detection. In Proceedings of the AAAI Spring Symposium on Computational Approaches to Analysing Weblogs, Stanford, CA, Mar. 2006.
|
 |
14
|
Justin Ma , Lawrence K. Saul , Stefan Savage , Geoffrey M. Voelker, Identifying suspicious URLs: an application of large-scale online learning, Proceedings of the 26th Annual International Conference on Machine Learning, p.681-688, June 14-18, 2009, Montreal, Quebec, Canada
[doi> 10.1145/1553374.1553462]
|
| |
15
|
McAfee. SiteAdvisor. http://www.siteadvisor.com.
|
| |
16
|
|
| |
17
|
Alexander Moshchuk , Tanya Bragin , Damien Deville , Steven D. Gribble , Henry M. Levy, SpyProxy: execution-based detection of malicious web content, Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium, p.1-16, August 06-10, 2007, Boston, MA
|
| |
18
|
A. Moshchuk, T. Bragin, S. D. Gribble, and H. M. Levy. A Crawler-Based Study of Spyware on the Web. In Proceedings of the Symposium on Network and Distributed System Security (NDSS), San Diego, CA, Feb. 2006.
|
| |
19
|
Netscape. DMOZ Open Directory Project. http://www.dmoz.org.
|
| |
20
|
Y. Niu, Y.-M. Wang, H. Chen, M. Ma, and F. Hsu. A Quantitative Study of Forum Spamming Using Context-based Analysis. In Proceedings of the Symposium on Network and Distributed System Security (NDSS), San Diego, CA, Mar. 2007.
|
| |
21
|
OpenDNS. PhishTank. http://www.phishtank.com.
|
| |
22
|
Niels Provos , Panayiotis Mavrommatis , Moheeb Abu Rajab , Fabian Monrose, All your iFRAMEs point to Us, Proceedings of the 17th conference on Security symposium, p.1-15, July 28-August 01, 2008, San Jose, CA
|
| |
23
|
|
| |
24
|
F. Sha, A. Park, and L. K. Saul. Multiplicative Updates for L_1-Regularized Linear and Logistic Regression. In Proceedings of the Symposium on Intelligent Data Analysis (IDA), Ljubljana, Slovenia, Sept. 2007.
|
| |
25
|
Y.-M. Wang, D. Beck, X. Jiang, R. Roussev, C. Verbowski, S. Chen, and S. King. Automated Web Patrol with Strider HoneyMonkeys: Finding Web Sites That Exploit Browser Vulnerabilities. In Proceedings of the Symposium on Network and Distributed System Security (NDSS), San Diego, CA, Feb. 2006.
|
| |
26
|
WebSense. ThreatSeeker Network. http://www.websense.com/content/Threatseeker.aspx.
|
| |
27
|
|
 |
28
|
|
CITED BY
|
|
Justin Ma , Lawrence K. Saul , Stefan Savage , Geoffrey M. Voelker, Identifying suspicious URLs: an application of large-scale online learning, Proceedings of the 26th Annual International Conference on Machine Learning, p.681-688, June 14-18, 2009, Montreal, Quebec, Canada
|
|