ACM Home Page
Please provide us with feedback. Feedback
Identifying suspicious URLs: an application of large-scale online learning
Full text PdfPdf (666 KB)
Source ACM International Conference Proceeding Series; Vol. 382 archive
Proceedings of the 26th Annual International Conference on Machine Learning table of contents
Montreal, Quebec, Canada
Pages 681-688  
Year of Publication: 2009
ISBN:978-1-60558-516-1
Authors
Justin Ma  UC San Diego, La Jolla, CA
Lawrence K. Saul  UC San Diego, La Jolla, CA
Stefan Savage  UC San Diego, La Jolla, CA
Geoffrey M. Voelker  UC San Diego, La Jolla, CA
Sponsors
: MITACS
: NSF
Microsoft Research : Microsoft Research
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 21,   Downloads (12 Months): 59,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1553374.1553462
What is a DOI?

ABSTRACT

This paper explores online learning approaches for detecting malicious Web sites (those involved in criminal scams) using lexical and host-based features of the associated URLs. We show that this application is particularly appropriate for online algorithms as the size of the training data is larger than can be efficiently processed in batch and because the distribution of features that typify malicious URLs is changing continuously. Using a real-time system we developed for gathering URL features, combined with a real-time source of labeled URLs from a large Web mail provider, we demonstrate that recently-developed online algorithms can be as accurate as batch techniques, achieving classification accuracies up to 99% over a balanced data set.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Bergholz, A., Chang, J.-H., Paaß, G., Reichartz, F., & Strobel, S. (2008). Improved Phishing Detection using Model-Based Features. Proceedings of the Conference on Email and Anti-Spam (CEAS). Mountain View, CA.
 
2
 
3
Bottou, L., & LeCun, Y. (2004). Large Scale Online Learning. In S. Thrun, L. K. Saul and B. Schöölkopf (Eds.), Advances in Neural Information Processing Systems 16, 217--224. Cambridge, MA: MIT Press.
 
4
Chou, N., Ledesma, R., Teraguchi, Y., Boneh, D., & Mitchell, J. C. (2004). Client-Side Defense against Web-Based Identity Theft. Network and Distributed System Security (NDSS). San Diego, CA.
 
5
 
6
Crammer, K., Dredze, M., & Pereira, F. (2009). Exact Convex Confidence-Weighted Learning. Advances in Neural Information Processing Systems 21 (pp. 345--352).
 
7
8
 
9
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., & Lin, C.-J. (2008). LIBLINEAR: A Library for Large Linear Classification. http://www.csie.ntu.edu.tw/cjlin/liblinear/.
10
11
12
 
13
 
14
Moshchuk, A., Bragin, T., Gribble, S. D., & Levy, H. M. (2006). A Crawler-Based Study of Spyware on the Web. Network and Distributed System Security (NDSS). San Diego, CA.
15
 
16
 
17
 
18
Rosenblatt, F. (1958). The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain. Psychological Review, 65, 386--408.
 
19
Rudd, J. (2007). Botnet plugin for SpamAssas-sin. http://people.ucsc.edu/~jrudd/spamassassin/.
 
20
Sinha, S., Bailey, M., & Jahanian, F. (2008). Shades of Grey: On the Effectiveness of Reputation-Based Blacklists. Proceedings of the International Conference on Malicious and Unwanted Software (Malware) (pp. 57--64). Alexandria, VA.
 
21
Sonnenburg, S., Franc, V., Yom-Tov, E., & Sebag, M. (2008). PASCAL Large Scale Learning Challenge. http://largescale.first.fraunhofer.de/workshop/.
 
22
Wang, Y.-M., Beck, D., Jiang, X., Roussev, R., Verbowski, C., Chen, S., & King, S. (2006). Automated Web Patrol with Strider HoneyMonkeys: Finding Web Sites That Exploit Browser Vulnerabilities. Network and Distributed System Security (NDSS). San Diego, CA.


Collaborative Colleagues:
Justin Ma: colleagues
Lawrence K. Saul: colleagues
Stefan Savage: colleagues
Geoffrey M. Voelker: colleagues