research-article

Open access

HiPeR - Early Detection of a Ransomware Attack using Hardware Performance Counters

Authors:

P. Mohan Anand,

P. V. Sai Charan,

Sandeep K. ShuklaAuthors Info & Claims

Digital Threats: Research and Practice, Volume 4, Issue 3

Article No.: 43, Pages 1 - 24

https://doi.org/10.1145/3608484

Published: 06 October 2023 Publication History

PDF eReader

Abstract

Ransomware has been one of the most prevalent forms of malware over the previous decade, and it continues to be one of the most significant threats today. Recently, ransomware strategies such as double extortion and rapid encryption have encouraged attacker communities to consider ransomware as a business model. With the advent of Ransomware as a Service (RaaS) models, ransomware spread and operations continue to increase. Even though machine learning and signature-based detection methods for ransomware have been proposed, they often fail to achieve very accurate detection. Ransomware that evades detection moves to the execution phase after initial access and installation. Due to the catastrophic nature of a ransomware attack, it is crucial to detect in its early stages of execution. If there is a method to detect ransomware in its execution phase early enough, then one can kill the processes to stop the ransomware attack. However, early detection with dynamic API call analysis is not an ideal solution, as the contemporary ransomware variants use low-level system calls to circumvent the detection methods. In this work, we use hardware performance counters (HPC) as features to detect the ransomware within 3-4 seconds - which may be sufficient, at least in the case of ransomware that takes longer to complete its full execution.

1 Introduction

Ransomware is a type of malware that prohibits or restricts users access to their computers by locking the screen or encrypting their files until a ransom is paid. Recently, RaaS (Ransomware as a Service) is widely being used by the attacker community as it offers the following services as a package: (1) To identify known and unknown vulnerabilities in services to design a payload. (2) To spread the payload through various channels such as phishing, spamming, malvertising, and exploit kits. (3) To locate sensitive files on the victim’s network by scanning multiple extensions and encrypting them with effective cryptographic algorithms. (4) To demand ransom for decryption services to restore original files. Also, some ransomware families perform double extortion, i.e., release sensitive files on the dark web to blackmail the victim into paying higher ransom amounts. As it involves money in its operations, ransomware is seen as a business by the majority of the attack groups. Recently, the Conti group’s ransomware attack on the Costa Rican government; initially demanded $10 million in ransom, which was later increased to $20 million through double extortion. In another instance, the ransomware group Lapsus claimed responsibility for the assault on Nvidia and demanded $1 million to unlock the sensitive data of 1 TB [12]. There are three factors that mainly contribute to the increase in the ransomware activity. One notable factor is the emergence of ransomware as a service (RaaS), i.e., developers offering a simple-to-use ransomware creation kit that customers can purchase from the dark web to launch ransomware attacks on the intended targets. Second, ransomware operators achieve untraceability by employing cryptocurrencies to collect the ransom from users. The final rationale is that ransomware delivery is simple and diverse. To spread dangerous payloads, ransomware use spam, drive-by downloads, malicious advertising, exploit kits and supply chain poisoning. Ransomware threats typically hinder computer operations through two distinct ways. The first method, known as locker ransomware, involves limiting computer access, while the second, known as crypto-ransomware, involves encrypting user data and limiting file access. In our work, we mainly consider windows based crypto ransomware variants since they are the most prevalent forms of ransomware.

Modern-day ransomware strains offer rapid encryption rates to perform file encryption operation. LockBit version-3, the recent variant, is anticipated to encrypt around 25,000 data items per minute. Hence, it is crucial to detect ransomware activity prior to the initiation of encryption. In general, ransomware does not directly encrypt the files; it performs some activities in its pre-encryption stage to escape various detection methods. In our research, we approximated the duration before encryption for different ransomware payloads and compiled the findings in Table 1. The analysis is conducted on a sandbox environment of 1,640 files that consume 17.56 GB of disk space. The sandbox machine has a base RAM of 2048 MB, a total storage space of 128 GB, and Windows 7 installed as the operating system. We noticed that most ransomware executables start to perform encryption around 8 to 10 seconds after they start execution. Consequently, building upon this analysis, we highlight the importance of timely ransomware detection, whereby the detection method identifies ransomware’s suspicious behavior within a 10-second timeframe from the beginning of its execution.

Table 1.

Ransomware Family	Pre-encryption duration (in seconds)
AtomSilo	12
AvosLocker	13
BlackMatter	12
BlackOut	12
Bubuk	4
CABP	18
Cerber	15
Conti	13
Cuba	25
DemonWare	23
Globe Imposter	18
HelloXD	95
Hive	15
InterCobros	22
Jigsaw	3
Karma	5
Lockbit	11
Lorenz	5
Magniber	10
Makop	9
Mespinoza	10
MountLocker	15
Revil	10
Surtr	4
Vovobol	8
Zeppelin	45
Zeznzo	25

Table 1. Ransomware Families - Pre-encryption Duration

1.1 Ransomware Pre-encryption Behavior

Ransomware performs multiple tasks before encrypting the files to avoid various signature and machine learning-based detection methods. The pre-encryption behavior of a ransomware executable is explained below.

Once initial access is obtained, ransomware proceeds to execute malicious payloads, leading to the execution of adversary-controlled code on either a local or remote system [18]. The most prevalent techniques for malicious code execution include malicious file execution, Windows command line, PowerShell, and Windows Management Instrumentation (WMI).

To ensure persistent access, attackers strive to establish persistence on systems by configuring the ransomware to execute during boot or logon processes. This is achieved through techniques such as leveraging Windows services, modifying run keys, creating scheduled tasks, and manipulating user accounts to maintain compromised access. Next, ransomware developers employ various methods to exploit service flaws and system errors to elevate privileges. In general, the majority of them use PowerShell Empire and Cobalt Strike frameworks to elevate their local privileges. In order to get beyond network safeguards, several ransomware families, such as Clop and Blackbyte, modify the system firewall using netsh.exe. Additionally, most ransomware samples mask their activity by masquerading as benign applications. For example, a BlackCat ransomware sample dropped an executable with the name cmd.exe during its execution. To move laterally in the network, ransomware require credentials which enables attackers to launch the attack remotely. Dumping the LSASS memory is the ransomware perpetrators preferred method. They employ well-known tools like Mimikatz, LaZange, and Empire to carry out such operations.

After penetrating into the system, threat actors often look for the connections that are currently in use to encrypt neighboring hosts. To acquire data on the active connections, they often use netstat, and query session commands. Accounts having high privileges, such as local administrators, administrators of multiple services, service accounts, and groups with elevated permissions, are of interest to attackers. The enumeration of files and directories plays a crucial role in the directory and file discovery process, enabling the determination of whether specific items should be encrypted or exempted from encryption. Additionally, in order to encrypt files, ransomware payloads typically look for specific filename patterns or extensions, such as docx, pptx, xlsx, and the like. Finally, attackers remove backup files, disable automated repair, recovery options and destroy shadow copies which leaves no option for the victims but to pay the ransom.

1.2 Motivation

All the above-mentioned techniques employed during the ransomware execution make the job harder for the detection methods at the application level. To fool the defensive solutions, ransomware mimics benign executables during their operations and uses techniques like process injection to hide their presence [10]. Ransomware use obfuscation and packing techniques to evade static detection. Also, recent ransomware strains leverage low-level system operations to circumvent detection based on API calls [23]. Although the dynamic analysis approaches result in highly accurate models, early detection of ransomware is still a major concern. Based on this insight, our work focuses on exploring the hardware performance counters (HPC) to identify the suspicious nature of ransomware. Consideration of HPC features for ransomware detection serves two purposes. One, ransomware leaves enough traces unaltered at the hardware level, and one can easily capture them using performance counters. Second, HPC statistics can be obtained in real-time and are helpful for the early detection of ransomware.

Hardware performance counters (HPC), commonly referred to as hardware counters, are specialized registers integrated into modern microprocessors. These counters serve the purpose of monitoring system performance across diverse conditions. Performance monitoring entails the collection of data on the functionality of an application or system. Many modern CPUs are equipped with a performance monitoring unit (PMU) comprising various registers, such as the performance monitor data (PMD) and the performance monitor configuration (PMC). Depending on the CPU configuration, the number of registers and counters will vary. These registers are used to measure the application load on the system.

Our primary objective is to detect ransomware activity at an early stage, which is why we place significant emphasis on the pre-encryption phase of ransomware execution. During our research, we observed a notable and statistically significant difference in HPC statistics between benign and ransomware executables during the installation process. To conduct a thorough analysis, we examined 27 recent variants of ransomware. We collected hardware-level performance counter information to create an extensive feature set, incorporating both single and multi-timeframe HPC data. Subsequently, we conducted a feature correlation analysis to determine the most valuable HPC events for early detection of ransomware. Leveraging this feature set, we developed a robust classifier capable of identifying ransomware activity in its initial execution stages.

1.3 Key Contributions of our Work:

Propose an approach for the early detection of crypto-ransomware by considering Hardware Performance Counter features.

A comprehensive analysis on multiple HPC registers to compare AntiVirus, Web browser, File search and File encryption applications with Ransomware executables. This study aids in distinguishing between ransomware and benign applications by leveraging HPC features.

We analyze multiple timeframe settings to capture HPC data and identify the timeframe value that best suits the early detection of ransomware activity.

We conduct feature importance and correlation analysis on HPC registers to identify the optimal feature set for early detection of ransomware activity.

2 Related Works

In the last decade, ransomware has become a growing concern in the digital landscape. The rapid increase in file encryption rates and the implementation of sophisticated evasion techniques have made it increasingly challenging for cybersecurity experts to detect such attacks in real-time. Significant research efforts have been observed in this domain, driven by the intriguing and challenging task of ransomware identification. In general, research in this field can be classified into various areas, such as API call detection, honeypot detection, file entropy detection, encryption-based approaches, network detection, and analysis of hardware profiling counters.

2.1 API Call based Methods

In a study conducted by Sgandurra et al., an approach called “Elderan” was proposed for dynamic ransomware detection with a decent level of accuracy [32]. The researchers employed the Cuckoo sandbox environment to dynamically analyze ransomware executables and extract various features, including API invocations, Registry Key modifications, dropped files, File/Directory changes, and embedded strings. It is important to note that this experiment focused on only 11 distinct ransomware families, and further validation is required to generalize the findings to newer variants of ransomware. Another study by Chen et al. centered around dynamic ransomware analysis and the extraction of API call characteristics [11]. The researchers created API call flow graphs that depict the sequential invocation of API calls during the execution of a sample executable. The frequencies of API call flows were utilized as features to develop a robust detection model using Support Vector Machines (SVM), achieving a detection accuracy of 97.6%. Four types of ransomware, namely CryptoWall, Kollah, Trojan-Ransom, and TeslaCrypt, were considered for analysis in this work.

In a study conducted by Vinayakumar et al., a multi-layer perceptron (MLP) architecture was proposed for identifying ransomware behavior based on API call instances as features [35]. The researchers identified 131 key API calls through feature importance analysis and developed an MLP model for ransomware detection. Their suggested approach reported an AUC score of 1.00 may not hold true when evaluating a larger set of ransomware strains. Another approach by Kok et al., known as PEDA (Pre Encryption Detection Algorithm), focuses on analyzing API call occurrences as features to identify crypto variants of ransomware [19, 20]. This approach operates on two levels: first, by checking known variants of malicious executables in a signature repository, and second, by conducting dynamic analysis to extract characteristics such as API calls, registry changes, network activity and file system changes. Bagging principles are then utilized to construct a robust classifier capable of detecting ransomware activity. Hampton et al. compared ransomware API requests to baselines of regular operating system activity [14]. They examined executables from 14 ransomware families and identified important features for detecting ransomware activity based on API call information. Furthermore, Anand et al. explored a feature set of 135 API calls to identify ransomware activity with an accuracy score of 96% [4]. The authors analyzed 46 ransomware families and compared API call frequencies among various modern-day ransomware variants.

2.2 File and Network based Methods

Jung and Won proposed an approach for detecting ransomware behavior that incorporates context-aware entropy analysis, specifically focusing on identifying unusual encryption operations [15]. Based on their findings, files infected with ransomware exhibit relatively high entropy values. By conducting an analysis based on entropy, the proposed method detects abnormal encryption behaviors and prevents the execution of relevant executables by examining their malicious operations at the host and network levels. The objective of this approach is to proactively anticipate ransomware attacks by creating data backups prior to encryption and identifying malicious activity through entropy analysis. Almashhadani et al. conducted an extensive behavioral analysis of network activities associated with crypto-ransomware, with a case study focusing on the Locky ransomware variant [3]. A network-based intrusion detection system was developed to detect suspicious network activities, including key exchanges and Domain Generation Algorithm (DGA) Command and Control (C&C) communication. The system incorporates two separate classifiers that work concurrently at different levels, namely packet and flow, to identify and flag suspicious activities. Additionally, Charan et al. conducted research on the detection of suspicious communication using character-based and word-based Domain Generation Algorithm (DGA) in their studies [5, 9].

Similarly, Alhawi et al. introduced “NetConverse,” which utilizes the J48 classifier to identify ransomware activity at the network level and achieved a detection rate of 97.1% [2]. The authors analyzed network traffic from nine ransomware families, collecting data on 13 network flow parameters. They used this data to build a classification model based on a sample count of 210. Furthermore, the study titled “PayBreak” presents an intriguing approach that utilizes the principles of a hybrid encryption system to recover encrypted data [21]. The concept behind “PayBreak” revolves around the notion that the security encryption employed on the targeted computer relies on hybrid encryption with symmetric session keys. This approach involves monitoring the utilization of session keys and storing them securely, which allows for the decryption of data that would otherwise require payment of the ransom to recover.

2.3 Hardware Counters based Methods

When it comes to hardware level detection, the authors of several studies have concentrated on various forms of malware, such as viruses, rootkits, worms, and so on, and their impact on hardware performance counters. Patel et al. collected HPC traces of 52 benign and 57 malicious applications tailored to the Linux system for analysis [29]. The authors constructed and assessed hardware-accelerated classifiers on FPGA, measuring power, latency, and area overhead. Their research findings indicate that classifiers, such as logistic regression and multilayer perceptron can achieve a detection accuracy of 90% in identifying malware activity. In another study, Kuruvila et al. used the Raspberry Pi3 model B device to collect HPC events [22]. In the study, a total of 300 benign and 300 malicious executables were taken into consideration. The researchers performed experiments using seven distinct classifiers trained solely on four HPC features. The findings of the study indicated that the Random Forest classifier exhibited superior performance compared to the other classifiers, achieving the highest accuracy of 83.04%. Similarly, Garcia-Serrano proposed an anomaly-based technique for detecting malware activity that considers HPC statistics [13]. The author evaluated two common malware attacks to identify the anomaly: stack overflow and ROP (return-oriented programming). He employed the Local Outlier Factor (LOF), which is a density-based clustering technique, to identify malware activity based on six HPC characteristics. Kadiyala et al. conducted an analysis of hardware and hardware cache events in HPC to identify malware activity [16]. The study involved evaluating 322 malware samples and 293 benign executables. The authors employed Analysis of Variance (ANOVA) to determine the most effective HPC feature selection from the resulting feature set. Their approach considered nine HPC characteristics, resulting in an accuracy score of 98.9% and a low False Positive Rate (FPR) of 0.031%.

Additionally, Bahador et al. introduced HLMD, a hardware-based approach that utilizes behavioral indicators extracted from traces of hardware performance counters to detect malicious applications at their initial execution stage [7]. The behavioral signatures are created using the Singular Value Decomposition (SVD) technique. HLMD employs initial matching and signature matching methods to swiftly determine whether a running application is classified as malware or benign. Initial matching identifies potential malware families that the application may belong to, while signature matching confirms if the application is benign or a potential malware family. The authors evaluated 210 malware and 360 benign samples specific to the Linux operating system for the analysis. HLMD attained an overall precision and recall of 95.19% and 89.96% during the test. In their previous research, known as “HPCMalHunter,” the authors examined 11 malicious and 20 benign executables [6]. HPC events were taken into account, and a dataset was generated using Singular Value Decomposition (SVD) for classification purposes. The results demonstrated that SVM outperformed other methods, achieving a detection rate of 90.69% and a low False Positive Rate (FPR) of 0.79%.

Regarding the detection of ransomware, Olani et al. developed a technique called “DeepWare,” which utilizes Convolutional Neural Networks (CNN) to convert HPC information into images. A CNN classifier is then applied to the image data for the purpose of ransomware detection [28]. The authors collect HPC events every 100 ms in their study to create image data. The dataset used for analysis consists of 515 ransomware samples from 21 different families, along with an equal number of benign samples. According to the authors, DeepWare achieved an accuracy of 98.6% recall score and nearly zero FPR. In another study, Pundir et al. proposed RanStop, a runtime detector for crypto ransomware that utilizes hardware performance counters [30]. The researchers collected HPC data at 20 timestamps with a 100 $\mu$s interval. They employed an ML model based on LSTM to achieve an accuracy of 97% while evaluating 76 benign and 80 ransomware executables. Recently, Alam et al. introduced a two-step detection framework named “RAPPER” that utilizes Artificial Neural Networks and Rapid Fourier Transformation to offer a precise solution for ransomware detection using a limited number of tracepoints [1]. The researchers selected hardware events including instructions, cache misses, cache references, branches, and branch misses for their investigation. They developed a watchdog application to collect HPC information at a 10 ms interval. Furthermore, they performed an analysis based on HPC data and presented case studies comparing their findings to ransomware variants such as WannaCry, Petya, Locky, and Vipasana. The summary of the related works is listed in Table 2.

Table 2.

Work	Features Considered	Takeaways
Sgandurra et al. [32]	API Call Information	• Considered only 11 distinct ransomware families. • The considered features include Registry Key modifications, API invocations, Dropped files, File/Directory actions, and embedded text.
Chen et al. [11]	API Call Information	• Considered only 4 distinct ransomware families. • API call flow graphs were generated to illustrate the sequential invocation of API calls during the execution of the given executable.
Vinay et al. [35]	API Call Information	• Considered only 7 distinct ransomware families. • Presented a multi-layer perceptron architecture for identifying ransomware behavior based on API call information. • Identified 131 key API calls as part of feature importance to help ransomware detection.
Kok et al. [19]	API Call Information	• Considered only 10 distinct ransomware families. • The concept operates primarily on two levels. It begins with a signature repository that verifies known variants of ransomware executables. • If the sample is identified as new, dynamic analysis is performed to extract characteristics such as Registry changes, file system modifications, API calls, network activities, and more, prior to the start of encryption.
Anand et al. [4]	API Call Information	• Considered 46 distinct ransomware families. • Presented a comparison of API calls and feature sets amongst modern-day ransomware variants. • Explored a set of 135 API calls as features to identify ransomware activity, achieving an accuracy score of approximately 96%.
Jung et al. [15]	File entropy information along with API call information	• Emphasized that ransomware-infected files exhibit a relatively high entropy value. • If any abnormal encryption activities are detected through changes in entropy, the execution of relevant executables is prevented by analyzing their malicious operations at the host-level and network level.
Alma et al. [3]	Network based information	• A case study performed on the one of the most virulent ransomware families i.e., Locky. • It features two independent classifiers that operate in parallel on distinct levels: packet and flow to identify the suspicious activity of ransomware.
Alhawi et al. [2]	Network based information	• Considered 9 distinct ransomware families. • Collected 13 network flow parameters for building J48 classifier to identify ransomware with a detection rate of 97.1%
Patel et al. [29]	HPC Features	• Collected HPC traces of 52 benign and 57 malicious apps tailored to the Linux system for analysis. • Specific analysis is not performed on ransomware. • Constructed and assessed hardware accelerated ML classifiers on FPGA, measuring power, latency, and area overhead.
Kuruvila et al. [22]	HPC Features	• Collected HPC traces of 300 benign and 300 malicious apps tailored to the Linux system for analysis. • Specific analysis is not performed on ransomware. • Conducted tests using seven different classifiers that were trained on only four HPC features. The findings indicated that the Random Forest classifier emerged as the most accurate, with an accuracy score of 83.04%.
Gracia et al. [13]	HPC Features	• Suggested an anomaly based method for identifying malware behavior using HPC statistics. • For recognising the anomaly, authors evaluated two prevalent forms of malware attacks: stack overflow and ROP. • Specific analysis is not performed on ransomware. • Used LOF (Local Outlier Factor) - a density-based clustering technique; based on six HPC characteristics to detect malware activity.
Kadiyala et al. [16]	HPC Features	• Performed analysis on HPC’s hardware and hardware cache events to detect malware activity. • Specific analysis is not performed on ransomware. • For the investigation, the authors considered 293 benign and 322 malware samples. • Performed Analysis of Variance (ANOVA) on the feature set to determine the most effective HPC feature selection.
Bahador et al. [7]	HPC Features	• The authors evaluate 210 malware, 360 benign samples specific to Linux operating system for the analysis. • Specific analysis is not performed on ransomware. • The behavioral signatures are generated from the HPC statistics using Singular Value Decomposition (SVD) techniques.
Olani et al. [28]	HPC Features	• Considered 21 distinct ransomware families. • Works by translating HPC information into images, and a CNN classifier is employed on the image data to detect ransomware activity.
Pundir et al. [30]	HPC Features	• Considered 76 benign and 80 ransomware executables. • In this study, HPC data was collected for 20 timestamps with a time interval of 100 $\mu$s between each timestamp. • The authors employed an ML classifier based on LSTM (Long Short-Term Memory) to achieve an accuracy rate of 97%.
Alam et al. [1]	HPC Features	• The authors present a two-step detection framework that utilizes Artificial Neural Networks and Rapid Fourier Transformation. This framework provides an effective approach for ransomware detection, requiring only a few trace points. • The analysis is performed by considering only 4 variants of ransomware.

Table 2. Summary of Related Works

All of the API call-based, static analysis approaches mentioned above exhibit good detection accuracy. However, their ability to accurately classify newer ransomware variants needs to be assessed. At the application level, ransomware may use many techniques, such as obfuscation and packaging, to avoid static detection. Similarly, recent ransomware exploits low-level system functions to avoid detection based on API calls. Furthermore, most studies focusing on HPC-based ransomware detection have only analyzed a limited number of ransomware variants. However, when examining a larger range of ransomware strains, it provides a more comprehensive understanding of the commonalities and differences in ransomware behavior. Another important factor to consider is the early detection of ransomware. Modern-day ransomware stages itself quickly to get into the encryption phase. Most of the discussed dynamic analysis methods focus on achieving accurate models by sacrificing early detection, i.e., the majority of the approaches based on HPC provide less emphasis on the pre-encryption behavior of ransomware in their proposed methods. Additionally, contemporary processors impose a constraint on the simultaneous utilization of HPC registers for collecting performance information. To develop an early detection model for ransomware, it is important to balance the number of HPC registers and the time needed to capture HPC events [31]. Our research addresses the aforementioned shortcomings by analyzing samples of 27 ransomware families and extracting the essential HPC features that aid in the early detection of ransomware.

3 Proposed Methodology

We collected 183 ransomware samples for our analysis from the popular malware repository [24]. We consider the executables in our work to be part of a total of 27 ransomware families to adequately represent the diversity of ransomware behavior. The distribution of sample size for various ransomware families is shown in Table 3. Similarly, from the software informer website [33], we collected the same number of benign samples. The benign sample set we collected includes antivirus applications, text editors, gaming applications, drivers, browsers, compression software, media players, and so on.

Table 3.

Ransomware Family	Sample Size
CABP	1
Intercobros	1
Jigsaw	1
Zeznzo	1
AtomSilo	3
Cuba	3
DemonWare	3
Hello XD	3
Surtr	3
Zeppelin	3
Lorenz	4
Blackout	5
Makop	5
Hive	7
AvosLocker	8
Conti	8
Karma	6
MountLocker	6
Mespinoza	9
GlobeImposter	10
Vovabol	10
Cerber	11
BlackMatter	13
Magniber	13
Revil	13
LockBit	14
Bubuk	19

Table 3. Ransomware Families - Sample Size

Modern CPUs from Intel and AMD are equipped with the capability to support up to ten hardware performance counters, allowing the simultaneous recording of statistics in multiple HPC registers. In our specific case, we utilize an Intel(R) Core(TM) i7-9700 CPU, running at a frequency of 3.00GHz and equipped with ten hardware performance counters, to configure the feature extraction process. To log all HPC events, we employ the use of ‘Perf,’ a tool that utilizes Linux’s performance counters subsystem and offers tracing capabilities [8]. This allows us to effectively capture and analyze the HPC data. Performance counters are hardware registers within the CPU that monitor various hardware events, including instructions, cache misses, and branch predictions. These counters serve as the basis for profiling applications, enabling the tracing of dynamic control flow and identification of performance bottlenecks. The Perf tool offers a set of generalized abstractions that encompass the specific capabilities of different hardware platforms. It provides metrics at the per-task, per-CPU, and per-workload levels, facilitating in-depth performance analysis of the system. Table 4 displays the specific HPC events we consider for our analysis.

Table 4.

HPC events
Cache-misses Cache-references Cpu-cycles Instructions Cpu-clock Cpu-migrations Branches Branch-misses Bus-cycles Ref-cycles Context-switches Major-faults Minor-faults Page-faults L1-dcache-load-misses L1-dcache-loads L1-dcache-stores L1-cache-load-misses LLC-load-misses LLC-loads	LLC-store-misses LLC-stores Branch-load-misses Branch-loads dTLB-load-misses dTLB-loads dTLB-store-misses dTLB-stores iTLB-load-misses iTLB-loads Node-loads Node-stores Top Down fetch bubbles Top Down recovery bubbles Top Down slots issued Top down slots retired Top Down total slots Mem stores Mem loads

Table 4. HPC Events

Among the HPC features listed in Table 4, the Level-1 cache (L1) stands out as the smallest and fastest cache within the system. LLC corresponds to the final tier of the cache hierarchy, signifying the biggest but slowest cache. The instruction cache is distinguished from the data cache by ‘i’ vs. ‘d’. A Translation Look-aside Buffer (TLB) is a cache memory that stores recent translations of virtual memory to physical addresses, facilitating faster retrieval. The count of misses indicates how often a requested data item was not found in the cache.

To ensure the safety of the system and prevent ransomware from causing harm to other files or network devices, we executed the malicious payloads within a sandbox environment. In this setup, the payload was run on a Windows-7 sandbox environment running on top of the Linux operating system. During the execution of the payload, we collected HPC statistics from the sandbox container. The sandbox environment utilized in our analysis consisted of 1,640 files occupying a total disk space of 17.56 GB. We selected Ubuntu 18.04 as the base operating system for our analysis due to its security measures that restrict the execution of ransomware executables (.exe). Additionally, Ubuntu 18.04 offers a wide range of tools, including Perf, which greatly facilitates our data collection process.

As illustrated in Figure 1, our proposed methodology is made up of three major components.

Fig. 1.

•

A sandbox environment was utilized to execute both ransomware payloads and benign executables.

•

An automated script handles the entire process of extracting HPC features.

•

An ensemble classifier is constructed using the collected HPC features to efficiently detect ransomware behavior.

We have developed a script file that automates the process of extracting HPC features. The script executes a payload within a sandbox environment and collects HPC information in a vectorized format for specified time intervals. The script begins by retrieving the list of executables, and each executable is then transferred to the sandbox environment for dynamic analysis. Upon delivering an executable to the sandbox environment, the script identifies the process ID (referred to as “sbpid”) of the sandbox machine. The subsequent step involves utilizing the sandbox machine’s process ID as a parameter in the Perf tool to capture the information stored in the HPC registers.

Perf command example :

perf stat -p $sbpid -x “,” -o “HPCstat.csv” -e “Branch-loads, L1-dcache-load-misses, L1-dcache-stores, iTLB-loads, dTLB-loads, L1-dcache-loads, cpu-migrations, mem-stores, branches” sleep T seconds/milliseconds

After collecting the HPC stats for the specified interval of $`T$’ seconds or milliseconds, the captured HPC information is stored in a CSV file (HPCstat.csv - from the above example). Later, the script will end the execution of the payload by terminating the corresponding process ID, and the sandbox environment will be restored to its initial state.

As part of building a classifier, we employ ensemble learning approaches to build a strong classification model to accurately detect ransomware behavior. Ensemble learning aims to enhance generalization beyond that of a single estimator by combining the predictions of multiple base estimators trained using a specific learning algorithm.

4 Experiments and Results

The concept of utilizing HPC registers in malware analysis involves capturing the counter values during the execution of an application over a predefined time interval. These HPC counter statistics are subsequently employed to compare malicious and benign programs, discern their dissimilarities and resemblances. Utilizing the observed variations, an appropriate machine learning classifier is constructed to identify malicious behavior based on the hardware performance counters (HPC).

Modern processors have a limitation on the number of HPC registers that can be utilized concurrently to gather performance data. Both contemporary Intel and AMD CPUs offer a maximum of 10 or 11 hardware performance counters, which is a significant factor to consider when designing a restricted feature set. This means that we can collect statistics from a maximum of ten HPC registers simultaneously. Furthermore, determining the optimal timeframe for collecting HPC data is essential to achieve optimal results in the early detection of ransomware. The timeframe refers to the duration, measured in milliseconds or seconds, for which the HPC register data is captured. Striking a balance between the number of HPC events considered during model construction and the time required for feature extraction is crucial for effective early ransomware detection.

Considering the restriction imposed by modern processors on the utilization of HPC registers for smooth data extraction, we can set a maximum limit of 10 HPC registers. However, since there are many HPC events listed in Table 4, it is important to identify which events play an important role in ransomware detection. To achieve this, we perform feature importance calculation for all the sets of HPC events and then select the Top-10 contributing HPC functionalities for seamless data extraction.

As we experimented with multiple ransomware strains, we observed that ransomware starts to encrypt the whole file system roughly 8 to 10 seconds after it begins to execute. Based on the aforementioned statistics, we have decided to collect HPC statistics ranging from 5 to 10 seconds of application execution to deal with the swift detection of ransomware.

As illustrated in Figure 1, we initially assigned the time frame value ‘T’ to 5 seconds and obtained HPC register information for both benign and ransomware samples from all available events (refer to Table 4). This process is repeated three times, i.e., the HPC events for each sample are extracted three times to create a robust dataset that minimizes the variation across identical samples. Similarly, the HPC data extraction is carried out by changing the timeframe value ‘T’ to 6, 7, 8, 9, and 10 seconds. For each case, the dimensionality of the benign and ransomware datasets is (sample size * 3, total number of HPC features). In our case, we have 183 benign samples and 183 ransomware samples, and the total number of HPC features is 39 (reference: Table 4), which gives us the dimensionality of the benign and ransomware datasets as (549, 39), respectively.

4.1 Correlation Analysis on HPC Features

The correlation analysis assists in identifying the positive and negative relationships among the given attributes. In our work, we utilize the Pearson correlation to assess the statistical association or relationship between two continuous variables. Correlations can be broadly categorized into two types: positive correlation and negative correlation. Positive Correlation: When feature A increases, feature B also tends to increase, and when feature A decreases, feature B tends to decrease. Both features exhibit a consistent linear relationship, moving in tandem. Negative Correlation: When one feature increases, the other feature tends to decrease, and vice versa. There exists an inverse relationship between the variables. In cases where two or more independent variables display a high level of correlation, it indicates redundancy, and one of the variables can be considered as a duplicate and removed. High correlation among independent variables implies that changes in one variable will result in significant fluctuations in the others, leading to instability in the model outcomes. Even a slight alteration in the data or model can cause substantial fluctuations and instability in the model’s predictions. Thus, Pearson’s correlation helps in understanding the dependency among various HPC features over a time frame, which essentially aids in the feature selection process. The Pearson correlation coefficient (r) is calculated using the formula below:

\begin{equation*} r= \frac{\sum _{}{(x_{i}-m_{x_{i}})(y_{i}-m_{y_{i}})}}{\sqrt {\sum _{}^{}(x_{i}-m_{x_{i}})^{2} \sum _{}^{}(y_{i}-m_{y_{i}})^{2}}} \end{equation*}

Here:

$x_{i}$ - x variable samples

$y_{i}$ - y variable samples

$m_{x_{i}}$ and $m_{y_{i}}$ - mean of values in x and y, respectively.

r - Pearson correlation coefficient and the range of values for r is defined below:

\[\begin{gather*} {\left\lbrace \begin{array}{ll} r = -1 \text{ indicates Negative Correlation} \\ r = +1 \text{ indicates Positive Correlation } \end{array}\right.} \end{gather*}\]

As we have carried out HPC data extraction for multiple time frame values ranging from 5 seconds to 10 seconds, we have a total of six different datasets, one for each time frame. We use Pearson correlation analysis on each dataset generated for each timeframe individually to understand the relationship between HPC features for that particular time frame. The correlation matrix representation for the time frames of 5 and 10 seconds is shown in Figures 2(a) and 2(b), respectively. We notice that positive correlations among the HPC features increase with the increase in the time frame. Also, we can see that negative correlations tend to vanish with an increase in the timeframe value. Drawing from these insights, we employed a feature selection algorithm called Boruta, which is based on the Random Forest approach, to identify the essential features crucial for constructing a robust classification model [26].

Fig. 2.

4.2 Feature Importance & Classification

We employ the Boruta algorithm, an extension of the Random Forest classification technique, to select significant features by considering non-linear or complex relationships between variables [17]. The algorithm operates by duplicating the original dataset’s features and rearranging the values in each column to achieve randomness. These newly created features are known as “shadow features.” The shadow features are combined with the original features to create a new feature space with double the dimension of the initial dataset. A classifier, preferably a Random Forest, is constructed on this new set of features. The algorithm then uses a statistical test called the Z-Score to determine the significance of each feature. It compares the actual feature’s relevance with the maximum importance of shadow features. If the actual feature is more important, it is retained; otherwise, it is removed from the dataset. The dataset used in the next iteration is derived from the attributes that qualified in the first iteration. The algorithm then constructs shadow features using these attributes and calculates their significance. This procedure is repeated until a predetermined number of iterations or until all features have been confirmed or eliminated.

Considering the maximum number of hardware performance counters available on contemporary processors, we have set the feature set limit for the dataset to 10. Therefore, current processors can gather statistics from up to 10 HPC registers simultaneously. Additionally, certain older Intel processors feature a maximum of six hardware performance counters. We have identified the Top-5 and Top-10 most contributing features using the Boruta feature selection algorithm. Based on these two distinct feature sets, we generated datasets and constructed various ensemble classifiers to detect ransomware activity with decent accuracy.

We used the Boruta feature selection technique to identify the Top-5 and Top-10 features from the HPC statistics obtained for each time frame spanning from 5 to 10 seconds. For each time frame, we created a dataset based on the Top-5 and Top-10 features and constructed a classification model using RandomForest. To assess the model’s performance,we utilized a train-test split ratio of 75:25 and conducted a 10-fold repeated cross-validation with a repeat value of three. Since our focus is on early detection of ransomware, we observed that the classifier achieved optimal accuracy of 0.9818 on the dataset created from HPC statistics collected over 7 seconds. The corresponding Boruta feature significance results for the 7-second dataset are shown in Figure 3. Accuracy, sensitivity, and specificity were employed as evaluation metrics, and the outcomes are depicted in Figure 4.

Fig. 3.

Fig. 4.

In addition, Table 5 displays the overall test time for different classifiers, which refers to the total time required to generate a data frame with HPC statistics to determine the classifier output. We observed that Random Forest had the minimal test time and is therefore best suited for early detection of ransomware. The overall classification results across multiple datasets are presented in Table 6. Table 7 indicates the Top-10 features (HPC events) for detecting ransomware activity with decent accuracy.

Table 5.

Classifier	Test duration for a single sample (in ms)
Random Forest	630
CART	2293
GBM	2146
AdaBoost	2300

Table 5. Test Duration for Single Sample Executable (in ms)

Table 6.

Time Frame	Feature Count	Classifier	Accuracy	Sensitivity	Specificity	Kappa
5 Seconds	Top-5	RandomForest	0.7555	0.7956	0.7153	0.5109
		CART	0.7336	0.7591	0.7080	0.4672
		GBM	0.7555	0.8029	0.7080	0.5109
		AdaBoost	0.7153	0.7153	0.7153	0.4307
	Top-10	RandomForest	0.7482	0.7737	0.7226	0.4964
		CART	0.7518	0.7591	0.7445	0.5036
		GBM	0.7628	0.7737	0.7518	0.5255
		AdaBoost	0.7482	0.7518	0.7445	0.4964
6 Seconds	Top-5	RandomForest	0.9161	0.8832	0.9489	0.8321
		CART	0.9124	0.8905	0.9343	0.8248
		GBM	0.9088	0.8832	0.9343	0.8175
		AdaBoost	0.9197	0.8905	0.9489	0.8394
	Top-10	RandomForest	0.9234	0.8905	0.9562	0.8467
		CART	0.9197	0.8905	0.9489	0.8394
		GBM	0.9124	0.8905	0.9343	0.8248
		AdaBoost	0.9380	0.9051	0.9708	0.8759
7 Seconds	Top-5	RandomForest	0.9818	0.9854	0.9781	0.9635
		CART	0.9818	0.9927	0.9708	0.9635
		GBM	0.9854	0.9927	0.9781	0.9708
		AdaBoost	0.9854	0.9927	0.9781	0.9708
	Top-10	RandomForest	0.9818	0.9854	0.9781	0.9635
		CART	0.9818	0.9854	0.9781	0.9635
		GBM	0.9781	0.9854	0.9708	0.9562
		AdaBoost	0.9927	0.9927	0.9927	0.9854
8 Seconds	Top-5	RandomForest	0.9672	0.9562	0.9781	0.9343
		CART	0.9672	0.9635	0.9708	0.9343
		GBM	0.9708	0.9562	0.9854	0.9416
		AdaBoost	0.9672	0.9562	0.9781	0.9343
	Top-10	RandomForest	0.9781	0.9708	0.9854	0.9562
		CART	0.9708	0.9708	0.9708	0.9416
		GBM	0.9781	0.9781	0.9781	0.9562
		AdaBoost	0.9745	0.9781	0.9708	0.9489
9 Seconds	Top-5	RandomForest	0.9677	0.9648	0.9708	0.9355
		CART	0.9677	0.9577	0.9781	0.9355
		GBM	0.9677	0.9648	0.9708	0.9355
		AdaBoost	0.9677	0.9577	0.9781	0.9355
	Top-10	RandomForest	0.9821	0.9859	0.9781	0.9641
		CART	0.9857	0.9859	0.9854	0.9713
		GBM	0.9821	0.9859	0.9781	0.9641
		AdaBoost	0.9892	0.9930	0.9854	0.9785
10 Seconds	Top-5	RandomForest	0.9854	0.9854	0.9854	0.9708
		CART	0.9781	0.9781	0.9781	0.9562
		GBM	0.9891	0.9927	0.9854	0.9781
		AdaBoost	0.9964	0.9927	1.0000	0.9927
	Top-10	RandomForest	0.9891	0.9927	0.9854	0.9781
		CART	0.9781	0.9781	0.9781	0.9562
		GBM	0.9891	0.9781	1.0000	0.9781
		AdaBoost	0.9927	0.9854	1.0000	0.9854

Table 6. Performance of Various Classifiers on the HPC Datasets (5 to 10 Seconds)

Table 7.

Top-10 HPC Features identified using Boruta on 7sec HPC dataset
1 2 3 4 5 6 7 8 9 10	L1-dcache-loads dTLB-loads L1-dcache-stores branch-loads L1-dcache-load-misses mem-stores cpu-migrations iTLB-loads branches dTLB-stores

Table 7. Top-10 HPC Features

4.3 Experiment with Multi-timeframe HPC Data

In our previous experiment, we extracted HPC features at a single time frame, specifically after ‘n’ seconds of application execution, where ‘n’ ranged from 5 to 10 seconds. Although we achieved decent detection accuracy with a 7-second time frame, extracting HPC data over a single time frame has its limitations. They are:

•

We do not have access to time-series data to observe the application’s load during the first few seconds of execution.

•

Obtaining a single time frame HPC dataset cannot capture sudden changes in the application load on specific HPC events.

Considering the above-mentioned limitations, we decided to capture HPC register information over multiple time frames. In essence, we partitioned the HPC capture duration into multiple time frames and extracted HPC register information for each individual frame. For instance, we set the total HPC capture duration to 2 seconds and the time frame value to 100 milliseconds. This means we captured HPC register information every 100 milliseconds during the first 2 seconds of application execution. This method of HPC feature extraction enables us to obtain a substantial amount of HPC register information in a short amount of time, and the resulting time-series data can effectively aid in building a better classifier.

We have configured the time frame value to be 100 milliseconds, representing the duration during which the HPC register data is recorded. As shown in Algorithm 1 (Perf Command), the sleep time is set to 0.1 seconds, equivalent to 100 milliseconds. In order to facilitate the early identification of ransomware samples, we have defined the overall duration for HPC capture as 2 seconds, with a specific emphasis on the first 2 seconds of application execution. As the total HPC capture duration is set to 2 seconds (i.e., 2000 milliseconds), and the time frame value is set to 100 milliseconds, we set the iteration limit to 20.

Iteration limit = (total duration of HPC capture/timeframe value) = 2000/100 = 20

The iteration limit indicates how many times we need to capture the HPC features during the given total duration. In our scenario, we set the iteration limit to 20, and each time we capture the HPC information from the Top-10 HPC features mentioned in Table 7. The captured HPC information for multiple time frames is merged to generate a row vector that stores the HPC register information for a sample executable. This feature extraction process is repeated three times for each sample to create a robust dataset that minimizes the variation across identical samples. In our case, we have 183 benign and 183 ransomware samples, and the dimensionality of the benign and ransomware datasets is defined below:

•

Top-10 Features, 100 milliseconds timeframe = (sample size * 3, Number of HPC features captures for each timeframe * iteration limit) = (183 * 3, 10 * 20) = (549,200)

•

Top-5 Features, 100 milliseconds timeframe = (sample size * 3, Number of HPC features captures for each timeframe * iteration limit) = (183 * 3, 5 * 20) = (549,100)

The dataset collected for each case, i.e., the top five features and the top ten features, is used to build a classification model capable of identifying ransomware activity with decent accuracy. To determine the most suitable classifier for the specific scenario, we explored ensemble methods such as boosting (GBM and Adaboost) and bagging (Random Forest, CART-Classification, and Regression Tree) in our study. The classification model was evaluated using a 75:25 train-test split ratio and 10-fold repeated cross-validation with three repetitions. Accuracy, sensitivity, specificity, and kappa score were selected as metrics to assess the model’s performance in each case, and the outcomes are presented in Table 8. Notably, Random Forest emerged as the optimal classifier for the given scenario, exhibiting high accuracy and a low false positive rate. Also, from Table 8, we can observe that with a decrease in the number of HPC features, the accuracy of the classifier on the test dataset is reduced. However, Random Forest maintained accuracy levels with a minimal feature set.

Table 8.

HPC Feature Count	Classifier	Accuracy	Sensitivity	Specificity	Kappa
Top-10	CART	0.9726	0.9815	0.9634	0.9451
	RF	0.9909	0.9939	0.9878	0.9817
	GBM	0.9848	0.9817	0.9878	0.9695
	AdaBoost	0.9878	0.9939	0.9817	0.9756
Top-5	CART	0.9695	0.9756	0.9634	0.9390
	RF	0.9909	0.9939	0.9878	0.9817
	GBM	0.9756	0.9817	0.9695	0.9512
	AdaBoost	0.9817	0.9878	0.9756	0.9634

Table 8. Performance of Various Classifiers on the Multi-timeframe HPC Datasets (Top-10 and Top-5 HPC Features Parameters:- Timeframe = 100ms, Total Duration = 2 Seconds)

One notable finding from this experiment is that dividing the total duration into multiple timeframes leads to significantly improved results. This approach allows us to capture more features within the same duration by distributing them across multiple timeframes. Comparing the outcomes of a single timeframe HPC dataset for 5 seconds (Reference: Table 6 - accuracy: 0.7555 and specificity: 0.7153) with those of a multiple timeframe HPC dataset for 2 seconds (Reference: Table 8 - accuracy: 0.9909 and specificity: 0.9878), we can deduce the same.

In our research, we have chosen a timeframe value of 100 milliseconds. Setting the timeframe value too low, such as 10ms or 50ms, would require capturing an excessive amount of HPC data within a very short duration. For example, if the timeframe parameter is set to 10ms and the total duration is 2000ms, the HPC data capture would occur 200 times throughout the 2-second execution of the application. This frequent capturing of HPC values introduces a small delay each time the data is saved, adding to the overall execution time. On the other hand, increasing the timeframe value to 500 milliseconds or higher may lead to a model with lower accuracy and specificity scores. Since our goal is to achieve early detection, it is crucial to select the optimal timeframe that balances detection accuracy and specificity. Based on our findings, we have identified the Top-5 significant HPC registers (Branch-loads, L1-dcache-loads, L1-dcache-stores, dTLB-loads, L1-dcache-load-misses) for every 100ms during the dynamic analysis of ransomware within the initial 2 seconds of execution. This approach allows us to evaluate a smaller subset of HPC registers for a shorter duration, which is ideal for early detection of ransomware.

4.3.1 Classification results on unseen data:.

The purpose of developing a robust ransomware detection engine is to detect unseen/newer ransomware variants during the testing process. As part of this, we do not train the classifier with all ransomware variants, but instead, maintain a selected list of families relevant to the testing phase. This technique allows us to evaluate the model’s effectiveness in recognizing newer ransomware variants.

While conducting this study, we deliberately avoided using HPC records for samples belonging to Magniber, Makop, Mespinoza, MountLocker, Revil, Surtr, Vovabol, Zeppelin, and Zeznzo ransomware variants during the training phase. Here, “HPC Records” refers to the time series data obtained by extracting the HPC register data for multiple time frames during the dynamic analysis of an executable sample. Once the classifier has been built, the particular HPC records for the chosen ransomware variants are passed during the testing step to determine the classifier’s efficacy. For the dynamic analysis of the executables, we kept the same parameters as before, i.e., the timeframe value is set to 100 milliseconds, and the total capture duration is set to 2 seconds. The findings from this scenario are summarized in Table 9. Based on the results, it is clear that Random Forest outperforms other ensemble classifiers in terms of detection accuracy and specificity. Based on our findings, we emphasize that using the top five HPC registers-Branch-loads, L1-dcache-loads, L1-dcache-stores, dTLB-loads, L1-dcache-load-misses-for feature extraction during the dynamic analysis of ransomware within the first 2 seconds of its execution yields the best results, even with unseen data. The comparison between various state-of-the-art ransomware detection methods based on HPC features is shown in Table 10.

Table 9.

Feature Count	Classifier	Accuracy	Sensitivity	Specificity	Kappa
10	CART	0.9788	0.9683	0.9894	0.9577
	RF	0.9868	0.9788	0.9947	0.9735
	GBM	0.9868	0.9788	0.9947	0.9735
	AdaBoost	0.9815	0.9735	0.9894	0.9630
5	CART	0.9815	0.9735	0.9894	0.9630
	RF	0.9868	0.9788	0.9947	0.9735
	GBM	0.9815	0.9735	0.9894	0.9630
	AdaBoost	0.9815	0.9735	0.9894	0.9630

Table 9. Performance of Various Classifiers on the unseen Multi-timeframe HPC Datasets (Top-10 and Top-5 HPC Features Parameters:- Timeframe = 100ms, Total Duration = 2 Seconds)

Table 10.

Method Name	Approach Type	Number of HPC features considered	Detection Time	Experimentation on Unseen Ransomware samples	Observations
RAPPER [1]	Unsupervised - Auto Encoder	5	In order of seconds (took approximately 4 seconds to detect WannaCry variant)	Not applicable	•Works as an anomaly detection •Sample size is very less (only 4 variants of ransomware is considered for the analysis) •Less emphasis on the pre-encryption behavior of ransomware
RanStop [30]	Supervised - RNN (LSTM)	4	In order of milliseconds (Without considering the latency)	No	•97% Average accuracy is reported •sample size is very small (only 80 Ransomware & 76 Benign samples are considered for the analysis) •The latency analysis for the generation of HPC dataset is not mentioned •The latency analysis for testing a sample executable is not mentioned •Less emphasis on the pre-encryption behavior of ransomware
DeepWare [28]	Supervised -CNN	5	In order of seconds (around 3 seconds)	Yes	•Achieved 98.6% Accuracy with nearly zero FPR •Considered 21 ransomware families for the analysis •Heavier model (Packages and parameters of the model took 178MB) •Less emphasis on the pre-encryption behavior of ransomware
HiPeR (our proposed method)	Supervised -RandomForest	5	In order of seconds (around 3 seconds)	Yes	•Achieved 98.68% Accuracy with nearly zero FRP •Considered 27 ransomware families for the analysis •Employs a light weight ensemble model which is computationally less expensive and doesn’t require GPU for training the model •HiPer - detects ransomware activity in the first 3.5 seconds of payload execution (more emphasis on pre-encryption behavior of ransomware)

Table 10. Comparison between Various State-of-the-art Ransomware Detection Methods based on HPC Features

Latency calculation: In our study, we used a timeframe value of 100 milliseconds and an iteration count of 20 to record the HPC information during the first 2 seconds of application execution. For each iteration, the CPU adds approximately 35 milliseconds of latency to save the output of the HPC registers in a CSV file. Since the iteration count has been set to 20, the overall overhead for the HPC capture is 35 * 20 = 700 milliseconds. Additionally, as indicated in Table 5, the test time for the Random Forest classifier is around 600 milliseconds. The entire test time, which includes constructing a data frame containing HPC information and determining the classifier output, has a 1300 milliseconds overhead. Based on these overhead statistics, we can determine that our proposed method can detect ransomware activity in less than 3.5 seconds, considering the latency.

4.4 Comparison between Benign and Ransomware Executables based on HPC Features

Antivirus applications, in general, require a wide range of permissions during installation. They add registry keys to persist across system reboots, obtain higher privileges, and access various system information, such as date, geolocation, and file structure. Similarly, file encryption software requires permission to scan the entire file structure of the system and make calls to various built-in crypto API calls during its runtime. Although these two applications are benign, they are loosely similar to ransomware activity in general. Moreover, ransomware looks for files to encrypt in the next phase by searching the entire directory path before encrypting the files. We tested the Windows-based application “Everything.exe” for comparative research. It is a freeware desktop search tool for Windows that can quickly locate files and folders by name [36]. To complete the comparison study, we also took into account another widely used and well-known browser application (mozilla-firefox.exe) [25]. We chose the Norton antivirus [27], VeraCrypt file encryption software [34], Mozilla Firefox browser, and Everything file search program to investigate how HPC statistics differ in comparison to multiple ransomware variants.

As a part of this analysis, we captured the HPC statistics for the first 2 seconds of the application execution for all of the selected benign applications, and compared with the ransomware executables. Figure 5 represents the differences among the applications based on three HPC events: ‘L1 dcache loads’, ‘cpu migrations’ and ‘L1 dcache load misses’, respectively. The findings from the comparative analysis are summarised below:

Fig. 5.

•

Antivirus, browser, and file encryption applications exhibited more load on the HPC registers ‘L1-dcache-loads’ and ‘L1-dcache-load-misses’ compared to ransomware executables.

•

When it comes to CPU migrations, the majority of benign programs and ransomware executables behave similarly. As a result, using CPU migrations as a primary characteristic when developing the classifier may result in mis-classifications.

•

Compared to ransomware executables, the file search application Everything.exe exhibited similar loads on the HPC registers ‘cpu-migrations,’ ‘L1-dcache-loads,’ and ‘L1-dcache-load-misses’. Because of this behavior, the file search programs may be mis-classified as ransomware.

4.5 Ransomware Family Wise Analysis based on HPC Statistics

In comparison to benign executables, we perform the HPC feature analysis associated to various families of ransomware. In this experiment, the mean values for HPC features are extracted from the executables of the most recent ransomware families, including AvosLocker, BlackMatter, Bubuk, Cerber, Conti, Hive, LockBit2.0, Revil, and so on. We gathered HPC features during the first 2 seconds of program execution for executables from distinct ransomware families and compared them to benign applications based on the mean values for relevant HPC features for the acquired time series data. We may deduce the following from the data shown in Figure 6:

Fig. 6.

•

The operational load imposed by benign executables is greater than that caused by ransomware executables on the majority of HPC events.

•

The statistics for the HPC events ‘L1 dcache loads’, ‘branch loads’ suggest that the samples belonging to the notorious families of Bubuk and Cerber have exhibited a higher load on the system in comparison with other ransomware variants.

•

In comparison to the benign applications, ransomware executables generate similar ‘cpu migrations’ values throughout their operation. Conti and Hive ransomware executables, in particular, perform more CPU migrations than other ransomware variants.

•

During their execution, most ransomware samples generated HPC values comparable to one another, with a minimum outlier ratio. This behavior resulted in a better detection rate with the ensemble classifiers, as shown in Table 9.

5 Conclusion and Future Work

The utilization of hardware profiling counter registers for malware detection is an emerging field of research, especially in the context of detecting ransomware. Ransomware at the application level employs various tactics such as obfuscation and packaging to evade static detection methods. Additionally, modern ransomware variants leverage low-level system calls to evade detection based on API calls. A viable alternative approach is to analyze the registers of performance counters to identify ransomware activity. In our study, we conducted an analysis of 27 ransomware families using hardware performance counters. Our findings indicate that early detection of ransomware can be achieved by profiling hardware counter information. We proposed a technique that achieves an accuracy of 0.9868 by monitoring just five HPC register values during the initial two seconds of application execution.

With recent advancements in ransomware payload distribution and persistence strategies, early detection of ransomware is considered to be a difficult task. In our work, we identified a few ransomware strains, such as Bubuk, Surtr, and Jigsaw, that started encrypting files within 5 seconds of their execution. Our proposed model effectively detects such ransomware activity before it enters the encryption phase. On the flip side, malware developers may integrate ways to avoid this HPC-based detection in the future, making ransomware have equivalent statistics to benign applications. In such cases, HPC statistics alone may not be sufficient. We are working towards building a comprehensive solution to improve early ransomware detection by integrating file-based and kernel-based events, as well as hardware profiling counters.

References

[1]

Manaar Alam, Sayan Sinha, Sarani Bhattacharya, Swastika Dutta, Debdeep Mukhopadhyay, and Anupam Chattopadhyay. 2020. Rapper: Ransomware prevention via performance counters. arXiv preprint arXiv:2004.01712 (2020).

Abstract

1 Introduction

1.1 Ransomware Pre-encryption Behavior

1.2 Motivation

1.3 Key Contributions of our Work:

2 Related Works

2.1 API Call based Methods

2.2 File and Network based Methods

2.3 Hardware Counters based Methods

3 Proposed Methodology

4 Experiments and Results

4.1 Correlation Analysis on HPC Features

4.2 Feature Importance & Classification

4.3 Experiment with Multi-timeframe HPC Data

4.3.1 Classification results on unseen data:.

4.4 Comparison between Benign and Ransomware Executables based on HPC Features

4.5 Ransomware Family Wise Analysis based on HPC Statistics

5 Conclusion and Future Work

References

Cited By

Index Terms

Recommendations

Early Detection of Ransomware Activity based on Hardware Performance Counters

Ransomware early detection: A survey

Ransomware early detection using deep reinforcement learning on portable executable header

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations