research-article

Improving malware classification: bridging the static/dynamic gap

Authors:

Blake Anderson,

Curtis Storlie,

Terran LaneAuthors Info & Claims

AISec '12: Proceedings of the 5th ACM workshop on Security and artificial intelligence

Pages 3 - 14

https://doi.org/10.1145/2381896.2381900

Published: 19 October 2012 Publication History

Abstract

Malware classification systems have typically used some machine learning algorithm in conjunction with either static or dynamic features collected from the binary. Recently, more advanced malware has introduced mechanisms to avoid detection in these views by using obfuscation techniques to avoid static detection and execution-stalling techniques to avoid dynamic detection. In this paper we construct a classification framework that is able to incorporate both static and dynamic views into a unified framework in the hopes that, while a malicious executable can disguise itself in some views, disguising itself in every view while maintaining malicious intent will prove to be substantially more difficult. Our method uses kernels to place a similarity metric on each distinct view and then employs multiple kernel learning to find a weighted combination of the data sources which yields the best classification accuracy in a support vector machine classifier. Our approach opens up new avenues of malware research which will allow the research community to elegantly look at multiple facets of malware simultaneously, and which can easily be extended to integrate any new data sources that may become popular in the future.

References

[1]

Offensive Computing. http://www.offensivecomputing.net/, Accessed June 2011.

[2]

Virus Total. http://www.virustotal.com/, Accessed October 2011.

[3]

Portable Executable iDentifier. http://peid.info/, Accessed 6 October 2011.

[4]

Blake Anderson, Daniel Quist, Joshua Neil, Curtis Storlie, and Terran Lane. Graph-Based Malware Detection using Dynamic Analysis. Journal in Computer Virology, 7:247--258, 2011.

Digital Library

[5]

Anubis. http://anubis.iseclab.org/, 2009.

[6]

Francis R. Bach, Gert R. G. Lanckriet, and Michael I. Jordan. Multiple Kernel Learning, Conic Duality, and the SMO Algorithm. In Proceedings of the Twenty-First International Conference on Machine Learning. ACM, 2004.

Digital Library

[7]

Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. Xen and the Art of Virtualization. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, pages 164--177. ACM, 2003.

Digital Library

[8]

Ulrich Bayer, Paolo Milani Comparetti, Clemens Hlauschek, Christopher Kruegel, and Engin Kirda. Scalable, Behavior-Based Malware Clustering. In ISOC Network and Distributed System Security Symposium. 2009.

[9]

Ulrich Bayer, Andreas Moser, Christopher Kruegel, and Engin Kirda. Dynamic Analysis of Malicious Code. Journal in Computer Virology, 2:67--77, 2006.

[10]

Daniel Bilar. Opcodes as Predictor for Malware. International Journal of Electronic Security and Digital Forensics, 1:156--168, January 2007.

Digital Library

[11]

Christopher M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.

Digital Library

[12]

Christopher J. C. Burges. A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery, 2:121--167, 1998.

Digital Library

[13]

Mihai Christodorescu and Somesh Jha. Static Analysis of Executables to Detect Malicious Patterns. In Proceedings of the 12th USENIX Security Symposium, pages 169--186, 2003.

Digital Library

[14]

Jianyong Dai, Ratan Guha, and Joohan Lee. Efficient Virus Detection Using Dynamic Instruction Sequences. Journal of Computers, 4(5), 2009.

[15]

Artem Dinaburg, Paul Royal, Monirul Sharif, and Wenke Lee. Ether: Malware Analysis Via Hardware Virtualization Extensions. In Proceedings of the 15th ACM Conference on Computer and Communications Security, pages 51--62, 2008.

Digital Library

[16]

J. A. Hartigan and M. A. Wong. Algorithm AS 136: A K-Means Clustering Algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1):100--108, 1979.

[17]

R. Hettich and K. O. Kortanek. Semi-Infinite Programming: Theory, Methods, and Applications. SIAM Review, 35:380--429, September 1993.

Digital Library

[18]

Steven A. Hofmeyr, Stephanie Forrest, and Anil Somayaji. Intrusion Detection Using Sequences of System Calls. Journal of Computer Security, 6(3):151--180, January 1998.

[19]

Md. Karim, Andrew Walenstein, Arun Lakhotia, and Laxmi Parida. Malware Phylogeny Generation Using Permutations of Code. Journal in Computer Virology, 1:13--23, 2005.

[20]

H. Kashima, K. Tsuda, and A. Inokuchi. Kernels for Graphs. MIT Press, 2004.

[21]

J. Zico Kolter and Marcus A. Maloof. Learning to Detect and Classify Malicious Executables in the Wild. The Journal of Machine Learning Research, 7:2721--2744, December 2006.

Digital Library

[22]

Christopher Kruegel, Engin Kirda, Darren Mutz, William Robertson, and Giovanni Vigna. Polymorphic Worm Detection Using Structural Information of Executables. In Recent Advances in Intrusion Detection, pages 207--226. Springer Berlin / Heidelberg, 2006.

Digital Library

[23]

G. Jacob L. Nataraj, S. Karthikeyan and B. Manjunath. Malware Images: Visualization and Automatic Classification. In Proceedings of VizSec, 2011.

Digital Library

[24]

Corrado Leita, Ulrich Bayer, and Engin Kirda. Exploiting Diverse Observation Perspectives to get Insights on the Malware Landscape. In 2010 IEEE/IFIP International Conference on Dependable Systems and Networks, pages 393--402, 2010.

[25]

Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 190--200, June 2005.

Digital Library

[26]

Ulrike Luxburg. A Tutorial on Spectral Clustering. Statistics and Computing, 17(4):395--416, 2007.

Digital Library

[27]

Robert Lyda and James Hamrock. Using Entropy Analysis to Find Encrypted and Packed Malware. IEEE Security & Privacy, 5(2):40--45, 2007.

Digital Library

[28]

Eitan Menahem, Asaf Shabtai, Lior Rokach, and Yuval Elovici. Improving Malware Detection by Applying Multi-Inducer Ensemble. Computational Statistics and Data Analysis, 53(4):1483--1494, 2009.

Digital Library

[29]

Andreas Moser, Christopher Kruegel, and Engin Kirda. Limits of Static Analysis for Malware Detection. Computer Security Applications Conference, Annual, 0:421--430, 2007.

[30]

Jon Oberheide, Evan Cooke, and Farnam Jahanian. CloudAV: N-version Antivirus in the Network Cloud. In Proceedings of the 17th Conference on Security Symposium, pages 91--106, 2008.

Digital Library

[31]

Jon Oberheide, Kaushik Veeraraghavan, Evan Cooke, Jason Flinn, and Farnam Jahanian. Virtualized In-Cloud Security Services for Mobile Devices. In Proceedings of the First Workshop on Virtualization in Mobile Computing, MobiVirt, pages 31--35. ACM, 2008.

Digital Library

[32]

Roberto Perdisci, David Dagon, Prahlad Fogla, and Monirul Sharif. Misleading Worm Signature Generators Using Deliberate Noise Injection. In In Proceedings of the 2006 IEEE Symposium on Security and Privacy, pages 17--31, 2006.

Digital Library

[33]

IDA Pro. http://www.hex-rays.com/products/ida/index.shtml, 2012.

[34]

Daniel Quist, Lorie Liebrock, and Joshua Neil. Improving Antivirus Accuracy with Hypervisor Assisted Analysis. Journal in Computer Virology, pages 1--11, 2010.

Digital Library

[35]

Konrad Rieck, Thorsten Holz, Carsten Willems, Patrick Düssel, and Pavel Laskov. Learning and Classification of Malware Behavior. In Detection of Intrusions and Malware, and Vulnerability Assessment, volume 5137 of Lecture Notes in Computer Science, pages 108--125. Springer Berlin / Heidelberg, 2008.

Digital Library

[36]

Paul Royal, Mitch Halpin, David Dagon, Robert Edmonds, and Wenke Lee. PolyUnpack: Automating the Hidden-Code Extraction of Unpack-Executing Malware. In 22nd Annual Computer Security Applications Conference (ACSAC), pages 289--300, 2006.

Digital Library

[37]

Bernhard Schölkopf and Alexander Johannes Smola. Learning with Kernels. MIT Press, 2002.

[38]

R. Sekar, M. Bendre, D. Dhurjati, and P. Bollineni. A Fast Automaton-Based Method for Detecting Anomalous Program Behaviors. In IEEE Symposium on Security and Privacy, pages 144--155, 2001.

Digital Library

[39]

M. Shafiq, Syed Khayam, and Muddassar Farooq. Embedded Malware Detection Using Markov n-Grams. In Detection of Intrusions and Malware, and Vulnerability Assessment, volume 5137 of Lecture Notes in Computer Science, pages 88--107. Springer Berlin / Heidelberg, 2008.

Digital Library

[40]

Madhu Shankarapani, Subbu Ramamoorthy, Ram Movva, and Srinivas Mukkamala. Malware Detection Using Assembly and API Call Sequences. Journal in Computer Virology, 7(2):1--13, 2010.

Digital Library

[41]

Nino Shervashidze, S. V. N. Vishwanathan, Tobias H. Petri, Kurt Mehlhorn, and Karsten M. Borgwardt. Efficient Graphlet Kernels for Large Graph Comparison. In Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS), volume 5, pages 488--495. CSAIL, 2009.

[42]

Dawn Song, David Brumley, Heng Yin, Juan Caballero, Ivan Jager, Min Kang, Zhenkai Liang, James Newsome, Pongsin Poosankam, and Prateek Saxena. BitBlaze: A New Approach to Computer Security via Binary Analysis. In Information Systems Security, volume 5352 of Lecture Notes in Computer Science, pages 1--25. Springer Berlin / Heidelberg, 2008.

Digital Library

[43]

Sören Sonnenburg, Gunnar Rätsch, and Christin Schaefer. A General and Efficient Multiple Kernel Learning Algorithm. Nineteenth Annual Conference on Neural Information Processing Systems, 2005.

[44]

Sören Sonnenburg, Gunnar Rätsch, Sebastian Henschel, Christian Widmer, Jonas Behr, Alexander Zien, Fabio de Bona, Alexander Binder, Christian Gehl, and Vojtvech Franc. The SHOGUN Machine Learning Toolbox. The Journal of Machine Learning Research, 99:1799--1802, August 2010.

Digital Library

[45]

Salvatore Stolfo, Ke Wang, and Wei-Jen Li. Towards Stealthy Malware Detection. In Malware Detection, volume 27 of Advances in Information Security, pages 231--249. Springer US, 2007.

[46]

Salvatore J. Stolfo, Ke Wang, and Wei-Jen Li. Fileprint Analysis for Malware Detection. In ACM Workshop on Recurring/Rapid Malcode, 2005.

[47]

Symantec. Internet Security Threat Report, Volume 16. White Paper, April 2011.

[48]

The Silicon Realms Toolworks. Armadillo Software Protection System. http://www.siliconrealms.com/, Accessed 6 October 2011.

[49]

UPX: The Ultimate Packer for eXecutables. http://upx.sourceforge.net/, Accessed 6 October 2011.

[50]

Yanfang Ye, Tao Li, Shenghuo Zhu, Weiwei Zhuang, Egmen Tas, Umesh Gupta, and Melih Abdulhayoglu. Combining File Content and File Relations for Cloud Based Malware Detection. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2011.

Digital Library

[51]

Heng Yin, Dawn Song, Manuel Egele, Christopher Kruegel, and Engin Kirda. Panorama: Capturing System-Wide Information Flow for Malware Detection and Analysis. In Proceedings of the 14th ACM Conference on Computer and Communications Security, CCS '07, pages 116--127. ACM, 2007.

Digital Library

Cited By

Basak MHan M(2024)CyberSentinel: A Transparent Defense Framework for Malware Detection in High-Stakes Operational EnvironmentsSensors10.3390/s2411340624:11(3406)Online publication date: 25-May-2024
https://doi.org/10.3390/s24113406
Alandjani G(2024)Securing Edge Devices: Malware Classification with Dual-Attention Deep NetworkApplied Sciences10.3390/app1411464514:11(4645)Online publication date: 28-May-2024
https://doi.org/10.3390/app14114645
Mimura MKanno S(2024)Hybrid Input Model Using Multiple Features From Surface Analysis for Malware DetectionIEEE Access10.1109/ACCESS.2024.345267512(121198-121207)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3452675
Show More Cited By

Index Terms

Improving malware classification: bridging the static/dynamic gap

Recommendations

Ontology for Malware Behavior: A Core Model Proposal
WETICE '14: Proceedings of the 2014 IEEE 23rd International WETICE Conference

The ubiquity of Internet-connected devices motivates attackers to create malicious programs (malware) to exploit users and their systems. Malware detection requires a deep understanding of their possible behaviors, one that is detailed enough to tell ...
Opcode sequences as representation of executables for data-mining-based unknown malware detection

Malware can be defined as any type of malicious code that has the potential to harm a computer or network. The volume of malware is growing faster every year and poses a serious global security threat. Consequently, malware detection has become a ...
Obfuscation: The Hidden Malware

A cyberwar exists between malware writers and antimalware researchers. At this war's heart rages a weapons race that originated in the 80s with the first computer virus. Obfuscation is one of the latest strategies to camouflage the telltale signs of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

AISec '12: Proceedings of the 5th ACM workshop on Security and artificial intelligence

October 2012

116 pages

ISBN:9781450316644

DOI:10.1145/2381896

General Chair:
Ting Yu
North Carolina State University, USA
,
Program Chairs:
V. N. Venkatakrishan
University of Illinois at Chicago, USA
,
Apu Kapadia
Indiana University, Bloomington, USA

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 October 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CCS'12

Sponsor:

SIGSAC

CCS'12: the ACM Conference on Computer and Communications Security

October 19, 2012

North Carolina, Raleigh, USA

Acceptance Rates

AISec '12 Paper Acceptance Rate 10 of 24 submissions, 42%;

Overall Acceptance Rate 94 of 231 submissions, 41%

Upcoming Conference

CCS '25

Sponsor:
sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 13 - 17, 2025

Taipei , Taiwan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

90
Total Citations
View Citations
1,552
Total Downloads

Downloads (Last 12 months)44
Downloads (Last 6 weeks)3

Reflects downloads up to 06 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Basak MHan M(2024)CyberSentinel: A Transparent Defense Framework for Malware Detection in High-Stakes Operational EnvironmentsSensors10.3390/s2411340624:11(3406)Online publication date: 25-May-2024
https://doi.org/10.3390/s24113406
Alandjani G(2024)Securing Edge Devices: Malware Classification with Dual-Attention Deep NetworkApplied Sciences10.3390/app1411464514:11(4645)Online publication date: 28-May-2024
https://doi.org/10.3390/app14114645
Mimura MKanno S(2024)Hybrid Input Model Using Multiple Features From Surface Analysis for Malware DetectionIEEE Access10.1109/ACCESS.2024.345267512(121198-121207)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3452675
Zelinka ISzczypka MPlucar JKuznetsov N(2024)From malware samples to fractal images: A new paradigm for classificationMathematics and Computers in Simulation10.1016/j.matcom.2023.11.032218(174-203)Online publication date: Apr-2024
https://doi.org/10.1016/j.matcom.2023.11.032
Kale GBostancı GÇelebi F(2024)Evolutionary feature selection for machine learning based malware classificationEngineering Science and Technology, an International Journal10.1016/j.jestch.2024.10176256(101762)Online publication date: Aug-2024
https://doi.org/10.1016/j.jestch.2024.101762
Malik MIbrahim AHannay PSikos L(2023)Developing Resilient Cyber-Physical Systems: A Review of State-of-the-Art Malware Detection Approaches, Gaps, and Future DirectionsComputers10.3390/computers1204007912:4(79)Online publication date: 14-Apr-2023
https://doi.org/10.3390/computers12040079
Priya VSathya Sofia A(2023)Review on Malware Classification and Malware Detection Using Transfer Learning Approach2023 5th International Conference on Smart Systems and Inventive Technology (ICSSIT)10.1109/ICSSIT55814.2023.10061076(1042-1049)Online publication date: 23-Jan-2023
https://doi.org/10.1109/ICSSIT55814.2023.10061076
Zola FBruse JGalar M(2023)Temporal Analysis of Distribution Shifts in Malware Classification for Digital Forensics2023 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)10.1109/EuroSPW59978.2023.00054(439-450)Online publication date: Jul-2023
https://doi.org/10.1109/EuroSPW59978.2023.00054
Akhtar MFeng T(2022)Malware Analysis and Detection Using Machine Learning AlgorithmsSymmetry10.3390/sym1411230414:11(2304)Online publication date: 3-Nov-2022
https://doi.org/10.3390/sym14112304
Hussain AAhmad STanveer MIqbal A(2022)Computer Malware Classification, Factors, and Detection Techniques: A Systematic Literature Review (SLR)International Journal of Innovations in Science and Technology10.33411/IJIST/20220403204:3(899-918)Online publication date: 29-Aug-2022
https://doi.org/10.33411/IJIST/2022040320
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten