research-article

VisualPhishNet: Zero-Day Phishing Website Detection by Visual Similarity

Authors:

Sahar Abdelnabi,

Katharina Krombholz,

Mario FritzAuthors Info & Claims

CCS '20: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security

Pages 1681 - 1698

https://doi.org/10.1145/3372297.3417233

Published: 02 November 2020 Publication History

Abstract

Phishing websites are still a major threat in today's Internet ecosystem. Despite numerous previous efforts, similarity-based detection methods do not offer sufficient protection for the trusted websites, in particular against unseen phishing pages. This paper contributes VisualPhishNet, a new similarity-based phishing detection framework, based on a triplet Convolutional Neural Network (CNN). VisualPhishNet learns profiles for websites in order to detect phishing websites by a similarity metric that can generalize to pages with new visual appearances. We furthermore present VisualPhish, the largest dataset to date that facilitates visual phishing detection in an ecologically valid manner. We show that our method outperforms previous visual similarity phishing detection approaches by a large margin while being robust against a range of evasion attacks.

Supplementary Material

MOV File (Copy of CCS20_fp044_VisualPhishNet - Brian Hollendyke.mov)

Presentation video

Download
297.88 MB

References

[1]

Sadia Afroz and Rachel Greenstadt. 2011. Phishzoo: Detecting phishing websites by looking at them. In Proceedings of the IEEE International Conference on Semantic Computing.

Digital Library

[2]

APWG. 2019. Anti Phishing Working Group report. https://www.antiphishing.org/resources/apwg-reports/.

[3]

Battista Biggio and Fabio Roli. 2018. Wild patterns: Ten years after the rise of adversarial machine learning. Pattern Recognition, Vol. 84 (2018), 317--331.

Digital Library

[4]

Aaron Blum, Brad Wardman, Thamar Solorio, and Gary Warner. 2010. Lexical feature based phishing URL detection using online learning. In Proceedings of the ACM Workshop on Artificial Intelligence and Security.

Digital Library

[5]

Ahmet Selman Bozkir and Ebru Akcapinar Sezer. 2016. Use of HOG descriptors in phishing detection. In Proceedings of the IEEE International Symposium on Digital Forensic and Security (ISDFS).

[6]

Ee Hung Chang, Kang Leng Chiew, Wei King Tiong, et al. 2013. Phishing detection via identification of website identity. In Proceedings of the IEEE International Conference on IT Convergence and Security (ICITCS).

[7]

Kuan-Ta Chen, Jau-Yuan Chen, Chun-Rong Huang, and Chu-Song Chen. 2009. Fighting phishing with discriminative keypoint features. IEEE Internet Computing, Vol. 13, 3 (2009), 56--63.

Digital Library

[8]

Teh-Chung Chen, Scott Dick, and James Miller. 2010. Detecting visually similar web pages: Application to phishing detection. ACM Transactions on Internet Technology (TOIT), Vol. 10, 2 (2010), 5.

Digital Library

[9]

Neil Chou, Robert Ledesma, Yuka Teraguchi, Dan Boneh, and John C. Mitchell. 2004. Client-side defense against web-based identity theft. In Proceedings of the Network and Distributed System Security Symposium (NDSS).

[10]

Igino Corona, Battista Biggio, Matteo Contini, Luca Piras, Roberto Corda, Mauro Mereu, Guido Mureddu, Davide Ariu, and Fabio Roli. 2017. Deltaphish: Detecting phishing webpages in compromised websites. In Proceedings of European Symposium on Research in Computer Security (ESORICS). Springer.

[11]

Firat Coskun Dalgic, Ahmet Selman Bozkir, and Murat Aydos. 2018. Phish-IRIS: A New Approach for Vision Based Brand Prediction of Phishing Web Pages via Compact Visual Descriptors. In Proceedings of the IEEE International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT).

[12]

Sounak Dey, Anjan Dutta, J Ignacio Toledo, Suman K Ghosh, Josep Lladós, and Umapada Pal. 2017. Signet: Convolutional siamese network for writer independent offline signature verification. arXiv preprint arXiv:1707.02131 (2017).

[13]

Matthew Dunlop, Stephen Groat, and David Shelly. 2010. Goldphish: Using images for content-based phishing analysis. In Proceedings of the IEEE International Conference on Internet Monitoring and Protection.

Digital Library

[14]

Anthony Y Fu, Liu Wenyin, and Xiaotie Deng. 2006. Detecting phishing web pages with visual similarity assessment based on earth mover's distance (EMD). IEEE Transactions on Dependable and Secure Computing, Vol. 3, 4 (2006), 301--311.

Digital Library

[15]

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR).

[16]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).

Digital Library

[17]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]

Chun-Ying Huang, Shang-Pin Ma, Wei-Lin Yeh, Chia-Yi Lin, and Chien-Tsung Liu. 2010. Mitigate web phishing using site signatures. In Proceedings of the IEEE Region 10 Conference (TENCON).

[19]

Ankit Kumar Jain and B Brij Gupta. 2017. Phishing detection: analysis of visual similarity based approaches. Security and Communication Networks (2017).

[20]

Mahmoud Khonji, Youssef Iraqi, and Andrew Jones. 2013. Phishing detection: a literature survey. IEEE Communications Surveys & Tutorials, Vol. 15, 4 (2013), 2091--2121.

[21]

Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR).

[22]

Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. 2015. Siamese neural networks for one-shot image recognition. In International Conference on Machine Learning (ICML) Deep Learning Workshop.

[23]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems.

[24]

Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2017. Adversarial machine learning at scale. In International Conference on Learning Representations (ICLR).

[25]

Ieng-Fat Lam, Wei-Cheng Xiao, Szu-Chi Wang, and Kuan-Ta Chen. 2009. Counteracting phishing page polymorphism: An image layout analysis approach. In Proceedings of the International Conference and Workshops on Advances in Information Security and Assurance. Springer.

Digital Library

[26]

Yukun Li, Zhenguo Yang, Xu Chen, Huaping Yuan, and Wenyin Liu. 2019. A stacking model using URL and HTML features for phishing webpage detection. Future Generation Computer Systems, Vol. 94 (2019), 27--39.

Digital Library

[27]

Min Lin, Qiang Chen, and Shuicheng Yan. 2014. Network in network. In International Conference on Learning Representations (ICLR).

[28]

Wenyin Liu, Xiaotie Deng, Guanglin Huang, and Anthony Y Fu. 2006. An antiphishing strategy based on visual similarity assessment. IEEE Internet Computing, Vol. 10, 2 (2006), 58--65.

Digital Library

[29]

Jonathan L Long, Ning Zhang, and Trevor Darrell. 2014. Do convnets learn correspondence?. In Advances in Neural Information Processing Systems.

[30]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research, Vol. 9, Nov (2008), 2579--2605.

[31]

Luka Malisa, Kari Kostiainen, and Srdjan Capkun. 2017. Detecting mobile application spoofing attacks by leveraging user visual similarity perception. In Proceedings of the ACM on Conference on Data and Application Security and Privacy.

Digital Library

[32]

Jian Mao, Pei Li, Kun Li, Tao Wei, and Zhenkai Liang. 2013. BaitAlarm: detecting phishing sites using similarity in fundamental visual features. In Proceedings of the IEEE International Conference on Intelligent Networking and Collaborative Systems.

Digital Library

[33]

Jian Mao, Wenqian Tian, Pei Li, Tao Wei, and Zhenkai Liang. 2017. Phishing-alarm: robust and efficient phishing detection via page component similarity. IEEE Access, Vol. 5 (2017), 17020--17030.

[34]

Eric Medvet, Engin Kirda, and Christopher Kruegel. 2008. Visual-similarity-based phishing detection. In Proceedings of the 4th international conference on Security and privacy in communication netowrks.

Digital Library

[35]

Luong Anh Tuan Nguyen, Ba Lam To, Huu Khuong Nguyen, and Minh Hoang Nguyen. 2014. A novel approach for phishing detection using URL-based heuristic. In Proceedings of the IEEE International Conference on Computing, Management and Telecommunications (ComManTel).

[36]

Adam Oest, Yeganeh Safaei, Adam Doupé, Gail-Joon Ahn, Brad Wardman, and Kevin Tyers. 2019. PhishFarm: A Scalable Framework for Measuring the Effectiveness of Evasion Techniques Against Browser Phishing Blacklists. In Proceedings of the IEEE Symposium on Security and Privacy (SP).

[37]

Adam Oest, Yeganeh Safei, Adam Doupé, Gail-Joon Ahn, Brad Wardman, and Gary Warner. 2018. Inside a phisher's mind: Understanding the anti-phishing ecosystem through phishing kit analysis. In APWG Symposium on Electronic Crime Research (eCrime).

[38]

Ying Pan and Xuhua Ding. 2006. Anomaly based web phishing page detection. In Proceedings of the IEEE Annual Computer Security Applications Conference (ACSAC).

Digital Library

[39]

Routhu Srinivasa Rao and Syed Taqi Ali. 2015. A computer vision technique to detect phishing attacks. In Proceedings of the IEEE International Conference on Communication Systems and Network Technologies.

[40]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems.

[41]

Angelo PE Rosiello, Engin Kirda, Fabrizio Ferrandi, et al. 2007. A layout-similarity-based approach for detecting phishing pages. In Proceedings of the IEEE International Conference on Security and Privacy in Communications Networks and the Workshops (SecureComm).

[42]

Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]

Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. 2014. CNN features off-the-shelf: an astounding baseline for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) workshops.

Digital Library

[44]

Steve Sheng, Brad Wardman, Gary Warner, Lorrie Faith Cranor, Jason Hong, and Chengshan Zhang. 2009. An empirical analysis of phishing blacklists. In the Sixth Conference on Email and Anti-Spam (CEAS).

[45]

K. Simonyan and A. Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations (ICLR).

[46]

Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, and Lior Wolf. 2014. Deepface: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Digital Library

[47]

Kurt Thomas, Frank Li, Ali Zand, Jacob Barrett, Juri Ranieri, Luca Invernizzi, Yarik Markov, Oxana Comanescu, Vijay Eranti, Angelika Moscicki, et al. 2017. Data breaches, phishing, or malware?: Understanding the risks of stolen credentials. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security.

Digital Library

[48]

Liu Wenyin, Guanglin Huang, Liu Xiaoyue, Zhang Min, and Xiaotie Deng. 2005. Detection of phishing webpages based on visual similarity. In Special interest tracks and posters of the 14th international conference on World Wide Web.

[49]

Colin Whittaker, Brian Ryner, and Marria Nazif. 2010. Large-Scale Automatic Classification of Phishing Pages. In Proceedings of the Network and Distributed System Security Symposium (NDSS).

[50]

Jonathan Woodbridge, Hyrum S Anderson, Anjum Ahuja, and Daniel Grant. 2018. Detecting Homoglyph Attacks with a Siamese Neural Network. In Proceedings of the IEEE Security and Privacy Workshops.

[51]

Ning Yu, Larry Davis, and Mario Fritz. 2019. Attributing fake images to GANs: learning and analyzing GAN fingerprints. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).

[52]

Joe Yue-Hei Ng, Fan Yang, and Larry S Davis. 2015. Exploiting local features from deep networks for image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) workshops.

[53]

Haijun Zhang, Gang Liu, Tommy WS Chow, and Wenyin Liu. 2011. Textual and visual content-based anti-phishing: a Bayesian approach. IEEE Transactions on Neural Networks, Vol. 22, 10 (2011), 1532--1546.

Digital Library

[54]

Yue Zhang, Jason I Hong, and Lorrie F Cranor. 2007. Cantina: a content-based approach to detecting phishing web sites. In Proceedings of the 16th international conference on World Wide Web.

Digital Library

[55]

Mouad Zouina and Benaceur Outtaj. 2017. A novel lightweight URL phishing detection system using SVM and similarity index. Human-centric Computing and Information Sciences, Vol. 7, 1 (2017), 98.

Digital Library

Cited By

Yuan YApruzzese GConti M(2025)Beyond the west: Revealing and bridging the gap between Western and Chinese phishing website detectionComputers & Security10.1016/j.cose.2024.104115148(104115)Online publication date: Jan-2025
https://doi.org/10.1016/j.cose.2024.104115
Rashid FRanaweera NDoyle BSeneviratne S(2025)LLMs are one-shot URL classifiers and explainersComputer Networks10.1016/j.comnet.2024.111004258(111004)Online publication date: Mar-2025
https://doi.org/10.1016/j.comnet.2024.111004
Liu RLin YTeoh XLiu GHuang ZDong JBalzarotti DXu W(2024)Less defined knowledge and more true alarmsProceedings of the 33rd USENIX Conference on Security Symposium10.5555/3698900.3698930(523-540)Online publication date: 14-Aug-2024
https://dl.acm.org/doi/10.5555/3698900.3698930
Show More Cited By

Index Terms

VisualPhishNet: Zero-Day Phishing Website Detection by Visual Similarity
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation
    1. Social engineering attacks
      1. Phishing
  2. Software and application security
    1. Web application security

Recommendations

Leveraging Deep Learning Image Classifiers for Visual Similarity-based Phishing Website Detection
SoICT '22: Proceedings of the 11th International Symposium on Information and Communication Technology

Phishing is a major cybersecurity threat that is increasingly dangerous and complicated, especially during a global pandemic when there is a great need for remote work and communication between Internet users. Moreover, the challenge is even greater ...
Using visual website similarity for phishing detection and reporting
CHI EA '12: CHI '12 Extended Abstracts on Human Factors in Computing Systems

Phishing is a severe threat to online users, especially since attackers improve in impersonating other websites [1]. With websites looking visually the same, users are fooled more easily. However, the close visual similarity can also be used to ...
Phishing webpage detection based on global and local visual similarity
Abstract
In recent years, phishing websites have constantly evolved, causing traditional URL or HTML-based detection methods less effective. This limitation motivated the development of visual similarity-based detection methods. These methods can identify ...
Highlights
- A visual similarity method combining full webpages with logos is proposed.
- The method obtains better performance than prior anti-phishing methods.
- The method can be used to detect partially similar phishing webpages.
- The method ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CCS '20: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security

October 2020

2180 pages

ISBN:9781450370899

DOI:10.1145/3372297

General Chairs:
Jay Ligatti
University of South Florida, USA
,
Xinming Ou
University of South Florida, USA
,
Program Chairs:
Jonathan Katz
University of Maryland, USA
,
Giovanni Vigna
University of California-Santa Barbara, USA

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CCS '20

Sponsor:

SIGSAC

CCS '20: 2020 ACM SIGSAC Conference on Computer and Communications Security

November 9 - 13, 2020

Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

81
Total Citations
View Citations
1,485
Total Downloads

Downloads (Last 12 months)265
Downloads (Last 6 weeks)35

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yuan YApruzzese GConti M(2025)Beyond the west: Revealing and bridging the gap between Western and Chinese phishing website detectionComputers & Security10.1016/j.cose.2024.104115148(104115)Online publication date: Jan-2025
https://doi.org/10.1016/j.cose.2024.104115
Rashid FRanaweera NDoyle BSeneviratne S(2025)LLMs are one-shot URL classifiers and explainersComputer Networks10.1016/j.comnet.2024.111004258(111004)Online publication date: Mar-2025
https://doi.org/10.1016/j.comnet.2024.111004
Liu RLin YTeoh XLiu GHuang ZDong JBalzarotti DXu W(2024)Less defined knowledge and more true alarmsProceedings of the 33rd USENIX Conference on Security Symposium10.5555/3698900.3698930(523-540)Online publication date: 14-Aug-2024
https://dl.acm.org/doi/10.5555/3698900.3698930
Teoh XLin YLiu RHuang ZDong JBalzarotti DXu W(2024)PhishDecloakerProceedings of the 33rd USENIX Conference on Security Symposium10.5555/3698900.3698929(505-522)Online publication date: 14-Aug-2024
https://dl.acm.org/doi/10.5555/3698900.3698929
Chen YLu YPan ZChen JShi FLi YJiang Y(2024)APIMiner: Identifying Web Application APIs Based on Web Page States Similarity AnalysisElectronics10.3390/electronics1306111213:6(1112)Online publication date: 18-Mar-2024
https://doi.org/10.3390/electronics13061112
Haq QFaheem MAhmad I(2024)Detecting Phishing URLs Based on a Deep Learning Approach to Prevent Cyber-AttacksApplied Sciences10.3390/app14221008614:22(10086)Online publication date: 5-Nov-2024
https://doi.org/10.3390/app142210086
Komiya CYanai NYamashita KOkamura S(2024)JABBERWOCK: A Tool for WebAssembly Dataset Generation and Its Application to Malicious Website DetectionJournal of Information Processing10.2197/ipsjjip.32.29832(298-307)Online publication date: 2024
https://doi.org/10.2197/ipsjjip.32.298
Nguyen QWu TNguyen VYuan XXue JRudolph C(2024)Utilizing Large Language Models with Human Feedback Integration for Generating Dedicated Warning for Phishing EmailsProceedings of the 2nd ACM Workshop on Secure and Trustworthy Deep Learning Systems10.1145/3665451.3665531(35-46)Online publication date: 2-Jul-2024
https://dl.acm.org/doi/10.1145/3665451.3665531
Shukla SMirzaei OLuo BLiao XXu JKirda ELie D(2024)Poster: Different Victims, Same Layout: Email Visual Similarity Detection for Enhanced Email ProtectionProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3691381(4988-4990)Online publication date: 2-Dec-2024
https://dl.acm.org/doi/10.1145/3658644.3691381
Charmet FMorikawa TTanaka ATakahashi T(2024)VORTEX : Visual phishing detectiOns aRe Through EXplanationsACM Transactions on Internet Technology10.1145/365466524:2(1-24)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.1145/3654665
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten