|
ABSTRACT
Each month, more attacks are launched with the aim of making web users believe that they are communicating with a trusted entity for the purpose of stealing account information, logon credentials, and identity information in general. This attack method, commonly known as "phishing," is most commonly initiated by sending out emails with links to spoofed websites that harvest information. We present a method for detecting these attacks, which in its most general form is an application of machine learning on a feature set designed to highlight user-targeted deception in electronic communication. This method is applicable, with slight modification, to detection of phishing websites, or the emails used to direct victims to these sites. We evaluate this method on a set of approximately 860 such phishing emails, and 6950 non-phishing emails, and correctly identify over 96% of the phishing emails while only mis-classifying on the order of 0.1% of the legitimate emails. We conclude with thoughts on the future for such techniques to specifically identify deception, specifically with respect to the evolutionary nature of the attacks and information available.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
K. Albrecht, N. Burri, and R. Wattenhofer. Spamato - An Extendable Spam Filter System. In 2nd Conference on Email and Anti-Spam (CEAS), Stanford University, Palo Alto, California, USA, July 2005.
|
| |
2
|
A. Alsaid and C. J. Mitchell. Installing fake root keys in a pc. In EuroPKI, pages 227--239, 2005.
|
| |
3
|
Anti-Phishing Working Group. Phishing activity trends report, Jan. 2005. http://www.antiphishing.org/reports/apwg_report_jan_2006.pdf.
|
| |
4
|
Apache Software Foundation. Spamassassin homepage, 2006. http://spamassassin.apache.org/.
|
| |
5
|
Apache Software Foundation. Spamassassin public corpus, 2006. http://spamassassin.apache.org/publiccorpus/.
|
| |
6
|
|
| |
7
|
M. Chandrasekaran, K. Karayanan, and S. Upadhyaya. Towards phishing e-mail detection based on their structural properties. In New York State Cyber Security Conference, 2006.
|
| |
8
|
N. Chou, R. Ledesma, Y. Teraguchi, and J. C. Mitchell. Client-side defense against web-based identity theft. In NDSS, 2004.
|
| |
9
|
W. Cohen. Learning to classify English text with ILP methods. In L. De Raedt, editor, Advances in Inductive Logic Programming, pages 124--143. IOS Press, 1996.
|
| |
10
|
L. Cranor, S. Egelman, J. Hong, and Y. Zhang. Phinding phish: An evaluation of anti-phishing toolbars. Technical report, Carnegie Mellon University, Nov. 2006.
|
| |
11
|
|
| |
12
|
FDIC. Putting an end to account-hijacking identity theft, Dec. 2004. http://www.fdic.gov/consumers/consumer/idtheftstudy/identity_theft.pdf.
|
| |
13
|
I. Fette, N. Sadeh, and A. Tomasic. Learning to detect phishing emails. Technical Report CMU-ISRI-06-112, Institute for Software Research, Carnegie Mellon University, June 2006. http://reports-archive.adm.cs.cmu.edu/anon/isri2006/abstracts/06-112.html.
|
| |
14
|
F. L. Gandon and N. M. Sadeh. Semantic web technologies to reconcile privacy and context awareness. Journal of Web Semantics, 1(3):241--260, 2004.
|
| |
15
|
Gilby Productions. Tinyurl, 2006. http://www.tinyurl.com/.
|
| |
16
|
P. Graham. Better bayesian filtering. In Proceedings of the 2003 Spam Conference, Jan 2003.
|
| |
17
|
B. Leiba and N. Borenstein. A multifaceted approach to spam reduction. In Proceedings of the First Conference on Email and Anti-Spam (CEAS), 2004.
|
| |
18
|
T. Meyer and B. Whateley. Spambayes: Effective open-source, bayesian based, email classification system. In Proceedings of the First Conference on Email and Anti-Spam (CEAS), 2004.
|
| |
19
|
Microsoft. Sender ID framework, 2006. http://www.microsoft.com/senderid.
|
| |
20
|
|
| |
21
|
Mozilla. Mozilla thunderbird, 2006. http://www.mozilla.com/thunderbird/.
|
| |
22
|
J. Nazario. phishingcorpus homepage, Apr. 2006. http://monkey.org/%7Ejose/wiki/doku.php?id=PhishingCorpus.
|
| |
23
|
Netcraft Ltd. Netcraft toolbar, 2006. http://toolbar.netcraft.com/.
|
| |
24
|
V. V. Prakash. Vipul's razor, 2006. http://razor.sourceforge.net.
|
 |
25
|
|
| |
26
|
I. Rigoutsos and T. Huynh. Chung-kwei: a pattern-discovery-based system for the automatic identification of unsolicited e-mail messages (spam). In Proceedings of the First Conference on Email and Anti-Spam (CEAS), 2004.
|
| |
27
|
M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz. A bayesian approach to filtering junk e-mail. In Learning for Text Categorization: Papers from the 1998 Workshop, Madison, Wisconsin, 1998. AAAI Technical Report WS-98-05.
|
| |
28
|
Yahoo. Domainkeys, 2006. http://antispam.yahoo.com/domainkeys.
|
| |
29
|
Yahoo. Flickr homepage, 2006. http://www.flickr.com/.
|
 |
30
|
|
CITED BY 8
|
|
Saeed Abu-Nimeh , Dario Nappa , Xinlei Wang , Suku Nair, A comparison of machine learning techniques for phishing detection, Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit, p.60-69, October 04-05, 2007, Pittsburgh, Pennsylvania
|
|
|
|
|
|
|
|
|
|
|
|
Craig A. Shue , Andrew J. Kalafut , Minaxi Gupta, Exploitable redirects on the web: identification, prevalence, and defense, Proceedings of the 2nd conference on USENIX Workshop on offensive technologies, p.1-7, July 28, 2008, San Jose, CA
|
|
|
|
|
|
|
|
|
Justin Ma , Lawrence K. Saul , Stefan Savage , Geoffrey M. Voelker, Identifying suspicious URLs: an application of large-scale online learning, Proceedings of the 26th Annual International Conference on Machine Learning, p.681-688, June 14-18, 2009, Montreal, Quebec, Canada
|
|