skip to main content
10.1145/3372297.3417233acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

VisualPhishNet: Zero-Day Phishing Website Detection by Visual Similarity

Published: 02 November 2020 Publication History

Abstract

Phishing websites are still a major threat in today's Internet ecosystem. Despite numerous previous efforts, similarity-based detection methods do not offer sufficient protection for the trusted websites, in particular against unseen phishing pages. This paper contributes VisualPhishNet, a new similarity-based phishing detection framework, based on a triplet Convolutional Neural Network (CNN). VisualPhishNet learns profiles for websites in order to detect phishing websites by a similarity metric that can generalize to pages with new visual appearances. We furthermore present VisualPhish, the largest dataset to date that facilitates visual phishing detection in an ecologically valid manner. We show that our method outperforms previous visual similarity phishing detection approaches by a large margin while being robust against a range of evasion attacks.

Supplementary Material

MOV File (Copy of CCS20_fp044_VisualPhishNet - Brian Hollendyke.mov)
Presentation video

References

[1]
Sadia Afroz and Rachel Greenstadt. 2011. Phishzoo: Detecting phishing websites by looking at them. In Proceedings of the IEEE International Conference on Semantic Computing.
[2]
APWG. 2019. Anti Phishing Working Group report. https://www.antiphishing.org/resources/apwg-reports/.
[3]
Battista Biggio and Fabio Roli. 2018. Wild patterns: Ten years after the rise of adversarial machine learning. Pattern Recognition, Vol. 84 (2018), 317--331.
[4]
Aaron Blum, Brad Wardman, Thamar Solorio, and Gary Warner. 2010. Lexical feature based phishing URL detection using online learning. In Proceedings of the ACM Workshop on Artificial Intelligence and Security.
[5]
Ahmet Selman Bozkir and Ebru Akcapinar Sezer. 2016. Use of HOG descriptors in phishing detection. In Proceedings of the IEEE International Symposium on Digital Forensic and Security (ISDFS).
[6]
Ee Hung Chang, Kang Leng Chiew, Wei King Tiong, et al. 2013. Phishing detection via identification of website identity. In Proceedings of the IEEE International Conference on IT Convergence and Security (ICITCS).
[7]
Kuan-Ta Chen, Jau-Yuan Chen, Chun-Rong Huang, and Chu-Song Chen. 2009. Fighting phishing with discriminative keypoint features. IEEE Internet Computing, Vol. 13, 3 (2009), 56--63.
[8]
Teh-Chung Chen, Scott Dick, and James Miller. 2010. Detecting visually similar web pages: Application to phishing detection. ACM Transactions on Internet Technology (TOIT), Vol. 10, 2 (2010), 5.
[9]
Neil Chou, Robert Ledesma, Yuka Teraguchi, Dan Boneh, and John C. Mitchell. 2004. Client-side defense against web-based identity theft. In Proceedings of the Network and Distributed System Security Symposium (NDSS).
[10]
Igino Corona, Battista Biggio, Matteo Contini, Luca Piras, Roberto Corda, Mauro Mereu, Guido Mureddu, Davide Ariu, and Fabio Roli. 2017. Deltaphish: Detecting phishing webpages in compromised websites. In Proceedings of European Symposium on Research in Computer Security (ESORICS). Springer.
[11]
Firat Coskun Dalgic, Ahmet Selman Bozkir, and Murat Aydos. 2018. Phish-IRIS: A New Approach for Vision Based Brand Prediction of Phishing Web Pages via Compact Visual Descriptors. In Proceedings of the IEEE International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT).
[12]
Sounak Dey, Anjan Dutta, J Ignacio Toledo, Suman K Ghosh, Josep Lladós, and Umapada Pal. 2017. Signet: Convolutional siamese network for writer independent offline signature verification. arXiv preprint arXiv:1707.02131 (2017).
[13]
Matthew Dunlop, Stephen Groat, and David Shelly. 2010. Goldphish: Using images for content-based phishing analysis. In Proceedings of the IEEE International Conference on Internet Monitoring and Protection.
[14]
Anthony Y Fu, Liu Wenyin, and Xiaotie Deng. 2006. Detecting phishing web pages with visual similarity assessment based on earth mover's distance (EMD). IEEE Transactions on Dependable and Secure Computing, Vol. 3, 4 (2006), 301--311.
[15]
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR).
[16]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
[17]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18]
Chun-Ying Huang, Shang-Pin Ma, Wei-Lin Yeh, Chia-Yi Lin, and Chien-Tsung Liu. 2010. Mitigate web phishing using site signatures. In Proceedings of the IEEE Region 10 Conference (TENCON).
[19]
Ankit Kumar Jain and B Brij Gupta. 2017. Phishing detection: analysis of visual similarity based approaches. Security and Communication Networks (2017).
[20]
Mahmoud Khonji, Youssef Iraqi, and Andrew Jones. 2013. Phishing detection: a literature survey. IEEE Communications Surveys & Tutorials, Vol. 15, 4 (2013), 2091--2121.
[21]
Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR).
[22]
Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. 2015. Siamese neural networks for one-shot image recognition. In International Conference on Machine Learning (ICML) Deep Learning Workshop.
[23]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems.
[24]
Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2017. Adversarial machine learning at scale. In International Conference on Learning Representations (ICLR).
[25]
Ieng-Fat Lam, Wei-Cheng Xiao, Szu-Chi Wang, and Kuan-Ta Chen. 2009. Counteracting phishing page polymorphism: An image layout analysis approach. In Proceedings of the International Conference and Workshops on Advances in Information Security and Assurance. Springer.
[26]
Yukun Li, Zhenguo Yang, Xu Chen, Huaping Yuan, and Wenyin Liu. 2019. A stacking model using URL and HTML features for phishing webpage detection. Future Generation Computer Systems, Vol. 94 (2019), 27--39.
[27]
Min Lin, Qiang Chen, and Shuicheng Yan. 2014. Network in network. In International Conference on Learning Representations (ICLR).
[28]
Wenyin Liu, Xiaotie Deng, Guanglin Huang, and Anthony Y Fu. 2006. An antiphishing strategy based on visual similarity assessment. IEEE Internet Computing, Vol. 10, 2 (2006), 58--65.
[29]
Jonathan L Long, Ning Zhang, and Trevor Darrell. 2014. Do convnets learn correspondence?. In Advances in Neural Information Processing Systems.
[30]
Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research, Vol. 9, Nov (2008), 2579--2605.
[31]
Luka Malisa, Kari Kostiainen, and Srdjan Capkun. 2017. Detecting mobile application spoofing attacks by leveraging user visual similarity perception. In Proceedings of the ACM on Conference on Data and Application Security and Privacy.
[32]
Jian Mao, Pei Li, Kun Li, Tao Wei, and Zhenkai Liang. 2013. BaitAlarm: detecting phishing sites using similarity in fundamental visual features. In Proceedings of the IEEE International Conference on Intelligent Networking and Collaborative Systems.
[33]
Jian Mao, Wenqian Tian, Pei Li, Tao Wei, and Zhenkai Liang. 2017. Phishing-alarm: robust and efficient phishing detection via page component similarity. IEEE Access, Vol. 5 (2017), 17020--17030.
[34]
Eric Medvet, Engin Kirda, and Christopher Kruegel. 2008. Visual-similarity-based phishing detection. In Proceedings of the 4th international conference on Security and privacy in communication netowrks.
[35]
Luong Anh Tuan Nguyen, Ba Lam To, Huu Khuong Nguyen, and Minh Hoang Nguyen. 2014. A novel approach for phishing detection using URL-based heuristic. In Proceedings of the IEEE International Conference on Computing, Management and Telecommunications (ComManTel).
[36]
Adam Oest, Yeganeh Safaei, Adam Doupé, Gail-Joon Ahn, Brad Wardman, and Kevin Tyers. 2019. PhishFarm: A Scalable Framework for Measuring the Effectiveness of Evasion Techniques Against Browser Phishing Blacklists. In Proceedings of the IEEE Symposium on Security and Privacy (SP).
[37]
Adam Oest, Yeganeh Safei, Adam Doupé, Gail-Joon Ahn, Brad Wardman, and Gary Warner. 2018. Inside a phisher's mind: Understanding the anti-phishing ecosystem through phishing kit analysis. In APWG Symposium on Electronic Crime Research (eCrime).
[38]
Ying Pan and Xuhua Ding. 2006. Anomaly based web phishing page detection. In Proceedings of the IEEE Annual Computer Security Applications Conference (ACSAC).
[39]
Routhu Srinivasa Rao and Syed Taqi Ali. 2015. A computer vision technique to detect phishing attacks. In Proceedings of the IEEE International Conference on Communication Systems and Network Technologies.
[40]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems.
[41]
Angelo PE Rosiello, Engin Kirda, Fabrizio Ferrandi, et al. 2007. A layout-similarity-based approach for detecting phishing pages. In Proceedings of the IEEE International Conference on Security and Privacy in Communications Networks and the Workshops (SecureComm).
[42]
Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[43]
Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. 2014. CNN features off-the-shelf: an astounding baseline for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) workshops.
[44]
Steve Sheng, Brad Wardman, Gary Warner, Lorrie Faith Cranor, Jason Hong, and Chengshan Zhang. 2009. An empirical analysis of phishing blacklists. In the Sixth Conference on Email and Anti-Spam (CEAS).
[45]
K. Simonyan and A. Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations (ICLR).
[46]
Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, and Lior Wolf. 2014. Deepface: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[47]
Kurt Thomas, Frank Li, Ali Zand, Jacob Barrett, Juri Ranieri, Luca Invernizzi, Yarik Markov, Oxana Comanescu, Vijay Eranti, Angelika Moscicki, et al. 2017. Data breaches, phishing, or malware?: Understanding the risks of stolen credentials. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security.
[48]
Liu Wenyin, Guanglin Huang, Liu Xiaoyue, Zhang Min, and Xiaotie Deng. 2005. Detection of phishing webpages based on visual similarity. In Special interest tracks and posters of the 14th international conference on World Wide Web.
[49]
Colin Whittaker, Brian Ryner, and Marria Nazif. 2010. Large-Scale Automatic Classification of Phishing Pages. In Proceedings of the Network and Distributed System Security Symposium (NDSS).
[50]
Jonathan Woodbridge, Hyrum S Anderson, Anjum Ahuja, and Daniel Grant. 2018. Detecting Homoglyph Attacks with a Siamese Neural Network. In Proceedings of the IEEE Security and Privacy Workshops.
[51]
Ning Yu, Larry Davis, and Mario Fritz. 2019. Attributing fake images to GANs: learning and analyzing GAN fingerprints. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
[52]
Joe Yue-Hei Ng, Fan Yang, and Larry S Davis. 2015. Exploiting local features from deep networks for image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) workshops.
[53]
Haijun Zhang, Gang Liu, Tommy WS Chow, and Wenyin Liu. 2011. Textual and visual content-based anti-phishing: a Bayesian approach. IEEE Transactions on Neural Networks, Vol. 22, 10 (2011), 1532--1546.
[54]
Yue Zhang, Jason I Hong, and Lorrie F Cranor. 2007. Cantina: a content-based approach to detecting phishing web sites. In Proceedings of the 16th international conference on World Wide Web.
[55]
Mouad Zouina and Benaceur Outtaj. 2017. A novel lightweight URL phishing detection system using SVM and similarity index. Human-centric Computing and Information Sciences, Vol. 7, 1 (2017), 98.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CCS '20: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security
October 2020
2180 pages
ISBN:9781450370899
DOI:10.1145/3372297
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. phishing detection
  2. triplet networks
  3. visual similarity

Qualifiers

  • Research-article

Conference

CCS '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)265
  • Downloads (Last 6 weeks)35
Reflects downloads up to 26 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media