Ransomware has been one of the most prevalent forms of malware over the previous decade, and it continues to be one of the most significant threats today. Recently, ransomware strategies such as double extortion and rapid encryption have encouraged attacker communities to consider ransomware as a business model. With the advent of Ransomware as a Service (RaaS) models, ransomware spread and operations continue to increase. Even though machine learning and signature-based detection methods for ransomware have been proposed, they often fail to achieve very accurate detection. Ransomware that evades detection moves to the execution phase after initial access and installation. Due to the catastrophic nature of a ransomware attack, it is crucial to detect in its early stages of execution. If there is a method to detect ransomware in its execution phase early enough, then one can kill the processes to stop the ransomware attack. However, early detection with dynamic API call analysis is not an ideal solution, as the contemporary ransomware variants use low-level system calls to circumvent the detection methods. In this work, we use hardware performance counters (HPC) as features to detect the ransomware within 3-4 seconds - which may be sufficient, at least in the case of ransomware that takes longer to complete its full execution.
1 Introduction
Ransomware is a type of malware that prohibits or restricts users access to their computers by locking the screen or encrypting their files until a ransom is paid. Recently, RaaS (Ransomware as a Service) is widely being used by the attacker community as it offers the following services as a package: (1) To identify known and unknown vulnerabilities in services to design a payload. (2) To spread the payload through various channels such as phishing, spamming, malvertising, and exploit kits. (3) To locate sensitive files on the victim’s network by scanning multiple extensions and encrypting them with effective cryptographic algorithms. (4) To demand ransom for decryption services to restore original files. Also, some ransomware families perform double extortion, i.e., release sensitive files on the dark web to blackmail the victim into paying higher ransom amounts. As it involves money in its operations, ransomware is seen as a business by the majority of the attack groups. Recently, the Conti group’s ransomware attack on the Costa Rican government; initially demanded $10 million in ransom, which was later increased to $20 million through double extortion. In another instance, the ransomware group Lapsus claimed responsibility for the assault on Nvidia and demanded $1 million to unlock the sensitive data of 1 TB [12]. There are three factors that mainly contribute to the increase in the ransomware activity. One notable factor is the emergence of ransomware as a service (RaaS), i.e., developers offering a simple-to-use ransomware creation kit that customers can purchase from the dark web to launch ransomware attacks on the intended targets. Second, ransomware operators achieve untraceability by employing cryptocurrencies to collect the ransom from users. The final rationale is that ransomware delivery is simple and diverse. To spread dangerous payloads, ransomware use spam, drive-by downloads, malicious advertising, exploit kits and supply chain poisoning. Ransomware threats typically hinder computer operations through two distinct ways. The first method, known as locker ransomware, involves limiting computer access, while the second, known as crypto-ransomware, involves encrypting user data and limiting file access. In our work, we mainly consider windows based crypto ransomware variants since they are the most prevalent forms of ransomware.
Modern-day ransomware strains offer rapid encryption rates to perform file encryption operation. LockBit version-3, the recent variant, is anticipated to encrypt around 25,000 data items per minute. Hence, it is crucial to detect ransomware activity prior to the initiation of encryption. In general, ransomware does not directly encrypt the files; it performs some activities in its pre-encryption stage to escape various detection methods. In our research, we approximated the duration before encryption for different ransomware payloads and compiled the findings in Table 1. The analysis is conducted on a sandbox environment of 1,640 files that consume 17.56 GB of disk space. The sandbox machine has a base RAM of 2048 MB, a total storage space of 128 GB, and Windows 7 installed as the operating system. We noticed that most ransomware executables start to perform encryption around 8 to 10 seconds after they start execution. Consequently, building upon this analysis, we highlight the importance of timely ransomware detection, whereby the detection method identifies ransomware’s suspicious behavior within a 10-second timeframe from the beginning of its execution.
Ransomware performs multiple tasks before encrypting the files to avoid various signature and machine learning-based detection methods. The pre-encryption behavior of a ransomware executable is explained below.
Once initial access is obtained, ransomware proceeds to execute malicious payloads, leading to the execution of adversary-controlled code on either a local or remote system [18]. The most prevalent techniques for malicious code execution include malicious file execution, Windows command line, PowerShell, and Windows Management Instrumentation (WMI).
To ensure persistent access, attackers strive to establish persistence on systems by configuring the ransomware to execute during boot or logon processes. This is achieved through techniques such as leveraging Windows services, modifying run keys, creating scheduled tasks, and manipulating user accounts to maintain compromised access. Next, ransomware developers employ various methods to exploit service flaws and system errors to elevate privileges. In general, the majority of them use PowerShell Empire and Cobalt Strike frameworks to elevate their local privileges. In order to get beyond network safeguards, several ransomware families, such as Clop and Blackbyte, modify the system firewall using netsh.exe. Additionally, most ransomware samples mask their activity by masquerading as benign applications. For example, a BlackCat ransomware sample dropped an executable with the name cmd.exe during its execution. To move laterally in the network, ransomware require credentials which enables attackers to launch the attack remotely. Dumping the LSASS memory is the ransomware perpetrators preferred method. They employ well-known tools like Mimikatz, LaZange, and Empire to carry out such operations.
After penetrating into the system, threat actors often look for the connections that are currently in use to encrypt neighboring hosts. To acquire data on the active connections, they often use netstat, and query session commands. Accounts having high privileges, such as local administrators, administrators of multiple services, service accounts, and groups with elevated permissions, are of interest to attackers. The enumeration of files and directories plays a crucial role in the directory and file discovery process, enabling the determination of whether specific items should be encrypted or exempted from encryption. Additionally, in order to encrypt files, ransomware payloads typically look for specific filename patterns or extensions, such as docx, pptx, xlsx, and the like. Finally, attackers remove backup files, disable automated repair, recovery options and destroy shadow copies which leaves no option for the victims but to pay the ransom.
1.2 Motivation
All the above-mentioned techniques employed during the ransomware execution make the job harder for the detection methods at the application level. To fool the defensive solutions, ransomware mimics benign executables during their operations and uses techniques like process injection to hide their presence [10]. Ransomware use obfuscation and packing techniques to evade static detection. Also, recent ransomware strains leverage low-level system operations to circumvent detection based on API calls [23]. Although the dynamic analysis approaches result in highly accurate models, early detection of ransomware is still a major concern. Based on this insight, our work focuses on exploring the hardware performance counters (HPC) to identify the suspicious nature of ransomware. Consideration of HPC features for ransomware detection serves two purposes. One, ransomware leaves enough traces unaltered at the hardware level, and one can easily capture them using performance counters. Second, HPC statistics can be obtained in real-time and are helpful for the early detection of ransomware.
Hardware performance counters (HPC), commonly referred to as hardware counters, are specialized registers integrated into modern microprocessors. These counters serve the purpose of monitoring system performance across diverse conditions. Performance monitoring entails the collection of data on the functionality of an application or system. Many modern CPUs are equipped with a performance monitoring unit (PMU) comprising various registers, such as the performance monitor data (PMD) and the performance monitor configuration (PMC). Depending on the CPU configuration, the number of registers and counters will vary. These registers are used to measure the application load on the system.
Our primary objective is to detect ransomware activity at an early stage, which is why we place significant emphasis on the pre-encryption phase of ransomware execution. During our research, we observed a notable and statistically significant difference in HPC statistics between benign and ransomware executables during the installation process. To conduct a thorough analysis, we examined 27 recent variants of ransomware. We collected hardware-level performance counter information to create an extensive feature set, incorporating both single and multi-timeframe HPC data. Subsequently, we conducted a feature correlation analysis to determine the most valuable HPC events for early detection of ransomware. Leveraging this feature set, we developed a robust classifier capable of identifying ransomware activity in its initial execution stages.
1.3 Key Contributions of our Work:
Propose an approach for the early detection of crypto-ransomware by considering Hardware Performance Counter features.
A comprehensive analysis on multiple HPC registers to compare AntiVirus, Web browser, File search and File encryption applications with Ransomware executables. This study aids in distinguishing between ransomware and benign applications by leveraging HPC features.
We analyze multiple timeframe settings to capture HPC data and identify the timeframe value that best suits the early detection of ransomware activity.
We conduct feature importance and correlation analysis on HPC registers to identify the optimal feature set for early detection of ransomware activity.
2 Related Works
In the last decade, ransomware has become a growing concern in the digital landscape. The rapid increase in file encryption rates and the implementation of sophisticated evasion techniques have made it increasingly challenging for cybersecurity experts to detect such attacks in real-time. Significant research efforts have been observed in this domain, driven by the intriguing and challenging task of ransomware identification. In general, research in this field can be classified into various areas, such as API call detection, honeypot detection, file entropy detection, encryption-based approaches, network detection, and analysis of hardware profiling counters.
2.1 API Call based Methods
In a study conducted by Sgandurra et al., an approach called “Elderan” was proposed for dynamic ransomware detection with a decent level of accuracy [32]. The researchers employed the Cuckoo sandbox environment to dynamically analyze ransomware executables and extract various features, including API invocations, Registry Key modifications, dropped files, File/Directory changes, and embedded strings. It is important to note that this experiment focused on only 11 distinct ransomware families, and further validation is required to generalize the findings to newer variants of ransomware. Another study by Chen et al. centered around dynamic ransomware analysis and the extraction of API call characteristics [11]. The researchers created API call flow graphs that depict the sequential invocation of API calls during the execution of a sample executable. The frequencies of API call flows were utilized as features to develop a robust detection model using Support Vector Machines (SVM), achieving a detection accuracy of 97.6%. Four types of ransomware, namely CryptoWall, Kollah, Trojan-Ransom, and TeslaCrypt, were considered for analysis in this work.
In a study conducted by Vinayakumar et al., a multi-layer perceptron (MLP) architecture was proposed for identifying ransomware behavior based on API call instances as features [35]. The researchers identified 131 key API calls through feature importance analysis and developed an MLP model for ransomware detection. Their suggested approach reported an AUC score of 1.00 may not hold true when evaluating a larger set of ransomware strains. Another approach by Kok et al., known as PEDA (Pre Encryption Detection Algorithm), focuses on analyzing API call occurrences as features to identify crypto variants of ransomware [19, 20]. This approach operates on two levels: first, by checking known variants of malicious executables in a signature repository, and second, by conducting dynamic analysis to extract characteristics such as API calls, registry changes, network activity and file system changes. Bagging principles are then utilized to construct a robust classifier capable of detecting ransomware activity. Hampton et al. compared ransomware API requests to baselines of regular operating system activity [14]. They examined executables from 14 ransomware families and identified important features for detecting ransomware activity based on API call information. Furthermore, Anand et al. explored a feature set of 135 API calls to identify ransomware activity with an accuracy score of 96% [4]. The authors analyzed 46 ransomware families and compared API call frequencies among various modern-day ransomware variants.
2.2 File and Network based Methods
Jung and Won proposed an approach for detecting ransomware behavior that incorporates context-aware entropy analysis, specifically focusing on identifying unusual encryption operations [15]. Based on their findings, files infected with ransomware exhibit relatively high entropy values. By conducting an analysis based on entropy, the proposed method detects abnormal encryption behaviors and prevents the execution of relevant executables by examining their malicious operations at the host and network levels. The objective of this approach is to proactively anticipate ransomware attacks by creating data backups prior to encryption and identifying malicious activity through entropy analysis. Almashhadani et al. conducted an extensive behavioral analysis of network activities associated with crypto-ransomware, with a case study focusing on the Locky ransomware variant [3]. A network-based intrusion detection system was developed to detect suspicious network activities, including key exchanges and Domain Generation Algorithm (DGA)Command and Control (C&C) communication. The system incorporates two separate classifiers that work concurrently at different levels, namely packet and flow, to identify and flag suspicious activities. Additionally, Charan et al. conducted research on the detection of suspicious communication using character-based and word-based Domain Generation Algorithm (DGA) in their studies [5, 9].
Similarly, Alhawi et al. introduced “NetConverse,” which utilizes the J48 classifier to identify ransomware activity at the network level and achieved a detection rate of 97.1% [2]. The authors analyzed network traffic from nine ransomware families, collecting data on 13 network flow parameters. They used this data to build a classification model based on a sample count of 210. Furthermore, the study titled “PayBreak” presents an intriguing approach that utilizes the principles of a hybrid encryption system to recover encrypted data [21]. The concept behind “PayBreak” revolves around the notion that the security encryption employed on the targeted computer relies on hybrid encryption with symmetric session keys. This approach involves monitoring the utilization of session keys and storing them securely, which allows for the decryption of data that would otherwise require payment of the ransom to recover.
2.3 Hardware Counters based Methods
When it comes to hardware level detection, the authors of several studies have concentrated on various forms of malware, such as viruses, rootkits, worms, and so on, and their impact on hardware performance counters. Patel et al. collected HPC traces of 52 benign and 57 malicious applications tailored to the Linux system for analysis [29]. The authors constructed and assessed hardware-accelerated classifiers on FPGA, measuring power, latency, and area overhead. Their research findings indicate that classifiers, such as logistic regression and multilayer perceptron can achieve a detection accuracy of 90% in identifying malware activity. In another study, Kuruvila et al. used the Raspberry Pi3 model B device to collect HPC events [22]. In the study, a total of 300 benign and 300 malicious executables were taken into consideration. The researchers performed experiments using seven distinct classifiers trained solely on four HPC features. The findings of the study indicated that the Random Forest classifier exhibited superior performance compared to the other classifiers, achieving the highest accuracy of 83.04%. Similarly, Garcia-Serrano proposed an anomaly-based technique for detecting malware activity that considers HPC statistics [13]. The author evaluated two common malware attacks to identify the anomaly: stack overflow and ROP (return-oriented programming). He employed the Local Outlier Factor (LOF), which is a density-based clustering technique, to identify malware activity based on six HPC characteristics. Kadiyala et al. conducted an analysis of hardware and hardware cache events in HPC to identify malware activity [16]. The study involved evaluating 322 malware samples and 293 benign executables. The authors employed Analysis of Variance (ANOVA) to determine the most effective HPC feature selection from the resulting feature set. Their approach considered nine HPC characteristics, resulting in an accuracy score of 98.9% and a low False Positive Rate (FPR) of 0.031%.
Additionally, Bahador et al. introduced HLMD, a hardware-based approach that utilizes behavioral indicators extracted from traces of hardware performance counters to detect malicious applications at their initial execution stage [7]. The behavioral signatures are created using the Singular Value Decomposition (SVD) technique. HLMD employs initial matching and signature matching methods to swiftly determine whether a running application is classified as malware or benign. Initial matching identifies potential malware families that the application may belong to, while signature matching confirms if the application is benign or a potential malware family. The authors evaluated 210 malware and 360 benign samples specific to the Linux operating system for the analysis. HLMD attained an overall precision and recall of 95.19% and 89.96% during the test. In their previous research, known as “HPCMalHunter,” the authors examined 11 malicious and 20 benign executables [6]. HPC events were taken into account, and a dataset was generated using Singular Value Decomposition (SVD) for classification purposes. The results demonstrated that SVM outperformed other methods, achieving a detection rate of 90.69% and a low False Positive Rate (FPR) of 0.79%.
Regarding the detection of ransomware, Olani et al. developed a technique called “DeepWare,” which utilizes Convolutional Neural Networks (CNN) to convert HPC information into images. A CNN classifier is then applied to the image data for the purpose of ransomware detection [28]. The authors collect HPC events every 100 ms in their study to create image data. The dataset used for analysis consists of 515 ransomware samples from 21 different families, along with an equal number of benign samples. According to the authors, DeepWare achieved an accuracy of 98.6% recall score and nearly zero FPR. In another study, Pundir et al. proposed RanStop, a runtime detector for crypto ransomware that utilizes hardware performance counters [30]. The researchers collected HPC data at 20 timestamps with a 100 \(\mu\)s interval. They employed an ML model based on LSTM to achieve an accuracy of 97% while evaluating 76 benign and 80 ransomware executables. Recently, Alam et al. introduced a two-step detection framework named “RAPPER” that utilizes Artificial Neural Networks and Rapid Fourier Transformation to offer a precise solution for ransomware detection using a limited number of tracepoints [1]. The researchers selected hardware events including instructions, cache misses, cache references, branches, and branch misses for their investigation. They developed a watchdog application to collect HPC information at a 10 ms interval. Furthermore, they performed an analysis based on HPC data and presented case studies comparing their findings to ransomware variants such as WannaCry, Petya, Locky, and Vipasana. The summary of the related works is listed in Table 2.
The concept operates primarily on two levels. It begins with a signature repository that verifies known variants of ransomware executables.
•
If the sample is identified as new, dynamic analysis is performed to extract characteristics such as Registry changes, file system modifications, API calls, network activities, and more, prior to the start of encryption.
File entropy information along with API call information
•
Emphasized that ransomware-infected files exhibit a relatively high entropy value.
•
If any abnormal encryption activities are detected through changes in entropy, the execution of relevant executables is prevented by analyzing their malicious operations at the host-level and network level.
A case study performed on the one of the most virulent ransomware families i.e., Locky.
•
It features two independent classifiers that operate in parallel on distinct levels: packet and flow to identify the suspicious activity of ransomware.
Collected HPC traces of 300 benign and 300 malicious apps tailored to the Linux system for analysis.
•
Specific analysis is not performed on ransomware.
•
Conducted tests using seven different classifiers that were trained on only four HPC features. The findings indicated that the Random Forest classifier emerged as the most accurate, with an accuracy score of 83.04%.
The authors present a two-step detection framework that utilizes Artificial Neural Networks and Rapid Fourier Transformation. This framework provides an effective approach for ransomware detection, requiring only a few trace points.
•
The analysis is performed by considering only 4 variants of ransomware.
Table 2. Summary of Related Works
All of the API call-based, static analysis approaches mentioned above exhibit good detection accuracy. However, their ability to accurately classify newer ransomware variants needs to be assessed. At the application level, ransomware may use many techniques, such as obfuscation and packaging, to avoid static detection. Similarly, recent ransomware exploits low-level system functions to avoid detection based on API calls. Furthermore, most studies focusing on HPC-based ransomware detection have only analyzed a limited number of ransomware variants. However, when examining a larger range of ransomware strains, it provides a more comprehensive understanding of the commonalities and differences in ransomware behavior. Another important factor to consider is the early detection of ransomware. Modern-day ransomware stages itself quickly to get into the encryption phase. Most of the discussed dynamic analysis methods focus on achieving accurate models by sacrificing early detection, i.e., the majority of the approaches based on HPC provide less emphasis on the pre-encryption behavior of ransomware in their proposed methods. Additionally, contemporary processors impose a constraint on the simultaneous utilization of HPC registers for collecting performance information. To develop an early detection model for ransomware, it is important to balance the number of HPC registers and the time needed to capture HPC events [31]. Our research addresses the aforementioned shortcomings by analyzing samples of 27 ransomware families and extracting the essential HPC features that aid in the early detection of ransomware.
3 Proposed Methodology
We collected 183 ransomware samples for our analysis from the popular malware repository [24]. We consider the executables in our work to be part of a total of 27 ransomware families to adequately represent the diversity of ransomware behavior. The distribution of sample size for various ransomware families is shown in Table 3. Similarly, from the software informer website [33], we collected the same number of benign samples. The benign sample set we collected includes antivirus applications, text editors, gaming applications, drivers, browsers, compression software, media players, and so on.
Table 3.
Ransomware Family
Sample Size
CABP
1
Intercobros
1
Jigsaw
1
Zeznzo
1
AtomSilo
3
Cuba
3
DemonWare
3
Hello XD
3
Surtr
3
Zeppelin
3
Lorenz
4
Blackout
5
Makop
5
Hive
7
AvosLocker
8
Conti
8
Karma
6
MountLocker
6
Mespinoza
9
GlobeImposter
10
Vovabol
10
Cerber
11
BlackMatter
13
Magniber
13
Revil
13
LockBit
14
Bubuk
19
Table 3. Ransomware Families - Sample Size
Modern CPUs from Intel and AMD are equipped with the capability to support up to ten hardware performance counters, allowing the simultaneous recording of statistics in multiple HPC registers. In our specific case, we utilize an Intel(R) Core(TM) i7-9700 CPU, running at a frequency of 3.00GHz and equipped with ten hardware performance counters, to configure the feature extraction process. To log all HPC events, we employ the use of ‘Perf,’ a tool that utilizes Linux’s performance counters subsystem and offers tracing capabilities [8]. This allows us to effectively capture and analyze the HPC data. Performance counters are hardware registers within the CPU that monitor various hardware events, including instructions, cache misses, and branch predictions. These counters serve as the basis for profiling applications, enabling the tracing of dynamic control flow and identification of performance bottlenecks. The Perf tool offers a set of generalized abstractions that encompass the specific capabilities of different hardware platforms. It provides metrics at the per-task, per-CPU, and per-workload levels, facilitating in-depth performance analysis of the system. Table 4 displays the specific HPC events we consider for our analysis.
LLC-store-misses LLC-stores Branch-load-misses Branch-loads dTLB-load-misses dTLB-loads dTLB-store-misses dTLB-stores iTLB-load-misses iTLB-loads Node-loads Node-stores Top Down fetch bubbles Top Down recovery bubbles Top Down slots issued Top down slots retired Top Down total slots Mem stores Mem loads
Table 4. HPC Events
Among the HPC features listed in Table 4, the Level-1 cache (L1) stands out as the smallest and fastest cache within the system. LLC corresponds to the final tier of the cache hierarchy, signifying the biggest but slowest cache. The instruction cache is distinguished from the data cache by ‘i’ vs. ‘d’. A Translation Look-aside Buffer (TLB) is a cache memory that stores recent translations of virtual memory to physical addresses, facilitating faster retrieval. The count of misses indicates how often a requested data item was not found in the cache.
To ensure the safety of the system and prevent ransomware from causing harm to other files or network devices, we executed the malicious payloads within a sandbox environment. In this setup, the payload was run on a Windows-7 sandbox environment running on top of the Linux operating system. During the execution of the payload, we collected HPC statistics from the sandbox container. The sandbox environment utilized in our analysis consisted of 1,640 files occupying a total disk space of 17.56 GB. We selected Ubuntu 18.04 as the base operating system for our analysis due to its security measures that restrict the execution of ransomware executables (.exe). Additionally, Ubuntu 18.04 offers a wide range of tools, including Perf, which greatly facilitates our data collection process.
As illustrated in Figure 1, our proposed methodology is made up of three major components.
Fig. 1.
•
A sandbox environment was utilized to execute both ransomware payloads and benign executables.
•
An automated script handles the entire process of extracting HPC features.
•
An ensemble classifier is constructed using the collected HPC features to efficiently detect ransomware behavior.
We have developed a script file that automates the process of extracting HPC features. The script executes a payload within a sandbox environment and collects HPC information in a vectorized format for specified time intervals. The script begins by retrieving the list of executables, and each executable is then transferred to the sandbox environment for dynamic analysis. Upon delivering an executable to the sandbox environment, the script identifies the process ID (referred to as “sbpid”) of the sandbox machine. The subsequent step involves utilizing the sandbox machine’s process ID as a parameter in the Perf tool to capture the information stored in the HPC registers.
Perf command example:
After collecting the HPC stats for the specified interval of \(`T\)’ seconds or milliseconds, the captured HPC information is stored in a CSV file (HPCstat.csv - from the above example). Later, the script will end the execution of the payload by terminating the corresponding process ID, and the sandbox environment will be restored to its initial state.
As part of building a classifier, we employ ensemble learning approaches to build a strong classification model to accurately detect ransomware behavior. Ensemble learning aims to enhance generalization beyond that of a single estimator by combining the predictions of multiple base estimators trained using a specific learning algorithm.
4 Experiments and Results
The concept of utilizing HPC registers in malware analysis involves capturing the counter values during the execution of an application over a predefined time interval. These HPC counter statistics are subsequently employed to compare malicious and benign programs, discern their dissimilarities and resemblances. Utilizing the observed variations, an appropriate machine learning classifier is constructed to identify malicious behavior based on the hardware performance counters (HPC).
Modern processors have a limitation on the number of HPC registers that can be utilized concurrently to gather performance data. Both contemporary Intel and AMD CPUs offer a maximum of 10 or 11 hardware performance counters, which is a significant factor to consider when designing a restricted feature set. This means that we can collect statistics from a maximum of ten HPC registers simultaneously. Furthermore, determining the optimal timeframe for collecting HPC data is essential to achieve optimal results in the early detection of ransomware. The timeframe refers to the duration, measured in milliseconds or seconds, for which the HPC register data is captured. Striking a balance between the number of HPC events considered during model construction and the time required for feature extraction is crucial for effective early ransomware detection.
Considering the restriction imposed by modern processors on the utilization of HPC registers for smooth data extraction, we can set a maximum limit of 10 HPC registers. However, since there are many HPC events listed in Table 4, it is important to identify which events play an important role in ransomware detection. To achieve this, we perform feature importance calculation for all the sets of HPC events and then select the Top-10 contributing HPC functionalities for seamless data extraction.
As we experimented with multiple ransomware strains, we observed that ransomware starts to encrypt the whole file system roughly 8 to 10 seconds after it begins to execute. Based on the aforementioned statistics, we have decided to collect HPC statistics ranging from 5 to 10 seconds of application execution to deal with the swift detection of ransomware.
As illustrated in Figure 1, we initially assigned the time frame value ‘T’ to 5 seconds and obtained HPC register information for both benign and ransomware samples from all available events (refer to Table 4). This process is repeated three times, i.e., the HPC events for each sample are extracted three times to create a robust dataset that minimizes the variation across identical samples. Similarly, the HPC data extraction is carried out by changing the timeframe value ‘T’ to 6, 7, 8, 9, and 10 seconds. For each case, the dimensionality of the benign and ransomware datasets is (sample size * 3, total number of HPC features). In our case, we have 183 benign samples and 183 ransomware samples, and the total number of HPC features is 39 (reference: Table 4), which gives us the dimensionality of the benign and ransomware datasets as (549, 39), respectively.
4.1 Correlation Analysis on HPC Features
The correlation analysis assists in identifying the positive and negative relationships among the given attributes. In our work, we utilize the Pearson correlation to assess the statistical association or relationship between two continuous variables. Correlations can be broadly categorized into two types: positive correlation and negative correlation. Positive Correlation: When feature A increases, feature B also tends to increase, and when feature A decreases, feature B tends to decrease. Both features exhibit a consistent linear relationship, moving in tandem. Negative Correlation: When one feature increases, the other feature tends to decrease, and vice versa. There exists an inverse relationship between the variables. In cases where two or more independent variables display a high level of correlation, it indicates redundancy, and one of the variables can be considered as a duplicate and removed. High correlation among independent variables implies that changes in one variable will result in significant fluctuations in the others, leading to instability in the model outcomes. Even a slight alteration in the data or model can cause substantial fluctuations and instability in the model’s predictions. Thus, Pearson’s correlation helps in understanding the dependency among various HPC features over a time frame, which essentially aids in the feature selection process. The Pearson correlation coefficient (r) is calculated using the formula below:
As we have carried out HPC data extraction for multiple time frame values ranging from 5 seconds to 10 seconds, we have a total of six different datasets, one for each time frame. We use Pearson correlation analysis on each dataset generated for each timeframe individually to understand the relationship between HPC features for that particular time frame. The correlation matrix representation for the time frames of 5 and 10 seconds is shown in Figures 2(a) and 2(b), respectively. We notice that positive correlations among the HPC features increase with the increase in the time frame. Also, we can see that negative correlations tend to vanish with an increase in the timeframe value. Drawing from these insights, we employed a feature selection algorithm called Boruta, which is based on the Random Forest approach, to identify the essential features crucial for constructing a robust classification model [26].
Fig. 2.
4.2 Feature Importance & Classification
We employ the Boruta algorithm, an extension of the Random Forest classification technique, to select significant features by considering non-linear or complex relationships between variables [17]. The algorithm operates by duplicating the original dataset’s features and rearranging the values in each column to achieve randomness. These newly created features are known as “shadow features.” The shadow features are combined with the original features to create a new feature space with double the dimension of the initial dataset. A classifier, preferably a Random Forest, is constructed on this new set of features. The algorithm then uses a statistical test called the Z-Score to determine the significance of each feature. It compares the actual feature’s relevance with the maximum importance of shadow features. If the actual feature is more important, it is retained; otherwise, it is removed from the dataset. The dataset used in the next iteration is derived from the attributes that qualified in the first iteration. The algorithm then constructs shadow features using these attributes and calculates their significance. This procedure is repeated until a predetermined number of iterations or until all features have been confirmed or eliminated.
Considering the maximum number of hardware performance counters available on contemporary processors, we have set the feature set limit for the dataset to 10. Therefore, current processors can gather statistics from up to 10 HPC registers simultaneously. Additionally, certain older Intel processors feature a maximum of six hardware performance counters. We have identified the Top-5 and Top-10 most contributing features using the Boruta feature selection algorithm. Based on these two distinct feature sets, we generated datasets and constructed various ensemble classifiers to detect ransomware activity with decent accuracy.
We used the Boruta feature selection technique to identify the Top-5 and Top-10 features from the HPC statistics obtained for each time frame spanning from 5 to 10 seconds. For each time frame, we created a dataset based on the Top-5 and Top-10 features and constructed a classification model using RandomForest. To assess the model’s performance,we utilized a train-test split ratio of 75:25 and conducted a 10-fold repeated cross-validation with a repeat value of three. Since our focus is on early detection of ransomware, we observed that the classifier achieved optimal accuracy of 0.9818 on the dataset created from HPC statistics collected over 7 seconds. The corresponding Boruta feature significance results for the 7-second dataset are shown in Figure 3. Accuracy, sensitivity, and specificity were employed as evaluation metrics, and the outcomes are depicted in Figure 4.
Fig. 3.
Fig. 4.
In addition, Table 5 displays the overall test time for different classifiers, which refers to the total time required to generate a data frame with HPC statistics to determine the classifier output. We observed that Random Forest had the minimal test time and is therefore best suited for early detection of ransomware. The overall classification results across multiple datasets are presented in Table 6. Table 7 indicates the Top-10 features (HPC events) for detecting ransomware activity with decent accuracy.
Table 5.
Classifier
Test duration for a single sample (in ms)
Random Forest
630
CART
2293
GBM
2146
AdaBoost
2300
Table 5. Test Duration for Single Sample Executable (in ms)
Table 6.
Time Frame
Feature Count
Classifier
Accuracy
Sensitivity
Specificity
Kappa
5 Seconds
Top-5
RandomForest
0.7555
0.7956
0.7153
0.5109
CART
0.7336
0.7591
0.7080
0.4672
GBM
0.7555
0.8029
0.7080
0.5109
AdaBoost
0.7153
0.7153
0.7153
0.4307
Top-10
RandomForest
0.7482
0.7737
0.7226
0.4964
CART
0.7518
0.7591
0.7445
0.5036
GBM
0.7628
0.7737
0.7518
0.5255
AdaBoost
0.7482
0.7518
0.7445
0.4964
6 Seconds
Top-5
RandomForest
0.9161
0.8832
0.9489
0.8321
CART
0.9124
0.8905
0.9343
0.8248
GBM
0.9088
0.8832
0.9343
0.8175
AdaBoost
0.9197
0.8905
0.9489
0.8394
Top-10
RandomForest
0.9234
0.8905
0.9562
0.8467
CART
0.9197
0.8905
0.9489
0.8394
GBM
0.9124
0.8905
0.9343
0.8248
AdaBoost
0.9380
0.9051
0.9708
0.8759
7 Seconds
Top-5
RandomForest
0.9818
0.9854
0.9781
0.9635
CART
0.9818
0.9927
0.9708
0.9635
GBM
0.9854
0.9927
0.9781
0.9708
AdaBoost
0.9854
0.9927
0.9781
0.9708
Top-10
RandomForest
0.9818
0.9854
0.9781
0.9635
CART
0.9818
0.9854
0.9781
0.9635
GBM
0.9781
0.9854
0.9708
0.9562
AdaBoost
0.9927
0.9927
0.9927
0.9854
8 Seconds
Top-5
RandomForest
0.9672
0.9562
0.9781
0.9343
CART
0.9672
0.9635
0.9708
0.9343
GBM
0.9708
0.9562
0.9854
0.9416
AdaBoost
0.9672
0.9562
0.9781
0.9343
Top-10
RandomForest
0.9781
0.9708
0.9854
0.9562
CART
0.9708
0.9708
0.9708
0.9416
GBM
0.9781
0.9781
0.9781
0.9562
AdaBoost
0.9745
0.9781
0.9708
0.9489
9 Seconds
Top-5
RandomForest
0.9677
0.9648
0.9708
0.9355
CART
0.9677
0.9577
0.9781
0.9355
GBM
0.9677
0.9648
0.9708
0.9355
AdaBoost
0.9677
0.9577
0.9781
0.9355
Top-10
RandomForest
0.9821
0.9859
0.9781
0.9641
CART
0.9857
0.9859
0.9854
0.9713
GBM
0.9821
0.9859
0.9781
0.9641
AdaBoost
0.9892
0.9930
0.9854
0.9785
10 Seconds
Top-5
RandomForest
0.9854
0.9854
0.9854
0.9708
CART
0.9781
0.9781
0.9781
0.9562
GBM
0.9891
0.9927
0.9854
0.9781
AdaBoost
0.9964
0.9927
1.0000
0.9927
Top-10
RandomForest
0.9891
0.9927
0.9854
0.9781
CART
0.9781
0.9781
0.9781
0.9562
GBM
0.9891
0.9781
1.0000
0.9781
AdaBoost
0.9927
0.9854
1.0000
0.9854
Table 6. Performance of Various Classifiers on the HPC Datasets (5 to 10 Seconds)
Table 7.
Top-10 HPC Features identified using Boruta on 7sec HPC dataset
In our previous experiment, we extracted HPC features at a single time frame, specifically after ‘n’ seconds of application execution, where ‘n’ ranged from 5 to 10 seconds. Although we achieved decent detection accuracy with a 7-second time frame, extracting HPC data over a single time frame has its limitations. They are:
•
We do not have access to time-series data to observe the application’s load during the first few seconds of execution.
•
Obtaining a single time frame HPC dataset cannot capture sudden changes in the application load on specific HPC events.
Considering the above-mentioned limitations, we decided to capture HPC register information over multiple time frames. In essence, we partitioned the HPC capture duration into multiple time frames and extracted HPC register information for each individual frame. For instance, we set the total HPC capture duration to 2 seconds and the time frame value to 100 milliseconds. This means we captured HPC register information every 100 milliseconds during the first 2 seconds of application execution. This method of HPC feature extraction enables us to obtain a substantial amount of HPC register information in a short amount of time, and the resulting time-series data can effectively aid in building a better classifier.
We have configured the time frame value to be 100 milliseconds, representing the duration during which the HPC register data is recorded. As shown in Algorithm 1 (Perf Command), the sleep time is set to 0.1 seconds, equivalent to 100 milliseconds. In order to facilitate the early identification of ransomware samples, we have defined the overall duration for HPC capture as 2 seconds, with a specific emphasis on the first 2 seconds of application execution. As the total HPC capture duration is set to 2 seconds (i.e., 2000 milliseconds), and the time frame value is set to 100 milliseconds, we set the iteration limit to 20.
The iteration limit indicates how many times we need to capture the HPC features during the given total duration. In our scenario, we set the iteration limit to 20, and each time we capture the HPC information from the Top-10 HPC features mentioned in Table 7. The captured HPC information for multiple time frames is merged to generate a row vector that stores the HPC register information for a sample executable. This feature extraction process is repeated three times for each sample to create a robust dataset that minimizes the variation across identical samples. In our case, we have 183 benign and 183 ransomware samples, and the dimensionality of the benign and ransomware datasets is defined below:
•
Top-10 Features, 100 milliseconds timeframe = (sample size * 3, Number of HPC features captures for each timeframe * iteration limit) = (183 * 3, 10 * 20) = (549,200)
•
Top-5 Features, 100 milliseconds timeframe = (sample size * 3, Number of HPC features captures for each timeframe * iteration limit) = (183 * 3, 5 * 20) = (549,100)
The dataset collected for each case, i.e., the top five features and the top ten features, is used to build a classification model capable of identifying ransomware activity with decent accuracy. To determine the most suitable classifier for the specific scenario, we explored ensemble methods such as boosting (GBM and Adaboost) and bagging (Random Forest, CART-Classification, and Regression Tree) in our study. The classification model was evaluated using a 75:25 train-test split ratio and 10-fold repeated cross-validation with three repetitions. Accuracy, sensitivity, specificity, and kappa score were selected as metrics to assess the model’s performance in each case, and the outcomes are presented in Table 8. Notably, Random Forest emerged as the optimal classifier for the given scenario, exhibiting high accuracy and a low false positive rate. Also, from Table 8, we can observe that with a decrease in the number of HPC features, the accuracy of the classifier on the test dataset is reduced. However, Random Forest maintained accuracy levels with a minimal feature set.
Table 8.
HPC Feature Count
Classifier
Accuracy
Sensitivity
Specificity
Kappa
Top-10
CART
0.9726
0.9815
0.9634
0.9451
RF
0.9909
0.9939
0.9878
0.9817
GBM
0.9848
0.9817
0.9878
0.9695
AdaBoost
0.9878
0.9939
0.9817
0.9756
Top-5
CART
0.9695
0.9756
0.9634
0.9390
RF
0.9909
0.9939
0.9878
0.9817
GBM
0.9756
0.9817
0.9695
0.9512
AdaBoost
0.9817
0.9878
0.9756
0.9634
Table 8. Performance of Various Classifiers on the Multi-timeframe HPC Datasets (Top-10 and Top-5 HPC Features Parameters:- Timeframe = 100ms, Total Duration = 2 Seconds)
One notable finding from this experiment is that dividing the total duration into multiple timeframes leads to significantly improved results. This approach allows us to capture more features within the same duration by distributing them across multiple timeframes. Comparing the outcomes of a single timeframe HPC dataset for 5 seconds (Reference: Table 6 - accuracy: 0.7555 and specificity: 0.7153) with those of a multiple timeframe HPC dataset for 2 seconds (Reference: Table 8 - accuracy: 0.9909 and specificity: 0.9878), we can deduce the same.
In our research, we have chosen a timeframe value of 100 milliseconds. Setting the timeframe value too low, such as 10ms or 50ms, would require capturing an excessive amount of HPC data within a very short duration. For example, if the timeframe parameter is set to 10ms and the total duration is 2000ms, the HPC data capture would occur 200 times throughout the 2-second execution of the application. This frequent capturing of HPC values introduces a small delay each time the data is saved, adding to the overall execution time. On the other hand, increasing the timeframe value to 500 milliseconds or higher may lead to a model with lower accuracy and specificity scores. Since our goal is to achieve early detection, it is crucial to select the optimal timeframe that balances detection accuracy and specificity. Based on our findings, we have identified the Top-5 significant HPC registers (Branch-loads, L1-dcache-loads, L1-dcache-stores, dTLB-loads, L1-dcache-load-misses) for every 100ms during the dynamic analysis of ransomware within the initial 2 seconds of execution. This approach allows us to evaluate a smaller subset of HPC registers for a shorter duration, which is ideal for early detection of ransomware.
4.3.1 Classification results on unseen data:.
The purpose of developing a robust ransomware detection engine is to detect unseen/newer ransomware variants during the testing process. As part of this, we do not train the classifier with all ransomware variants, but instead, maintain a selected list of families relevant to the testing phase. This technique allows us to evaluate the model’s effectiveness in recognizing newer ransomware variants.
While conducting this study, we deliberately avoided using HPC records for samples belonging to Magniber, Makop, Mespinoza, MountLocker, Revil, Surtr, Vovabol, Zeppelin, and Zeznzo ransomware variants during the training phase. Here, “HPC Records” refers to the time series data obtained by extracting the HPC register data for multiple time frames during the dynamic analysis of an executable sample. Once the classifier has been built, the particular HPC records for the chosen ransomware variants are passed during the testing step to determine the classifier’s efficacy. For the dynamic analysis of the executables, we kept the same parameters as before, i.e., the timeframe value is set to 100 milliseconds, and the total capture duration is set to 2 seconds. The findings from this scenario are summarized in Table 9. Based on the results, it is clear that Random Forest outperforms other ensemble classifiers in terms of detection accuracy and specificity. Based on our findings, we emphasize that using the top five HPC registers-Branch-loads, L1-dcache-loads, L1-dcache-stores, dTLB-loads, L1-dcache-load-misses-for feature extraction during the dynamic analysis of ransomware within the first 2 seconds of its execution yields the best results, even with unseen data. The comparison between various state-of-the-art ransomware detection methods based on HPC features is shown in Table 10.
Table 9.
Feature Count
Classifier
Accuracy
Sensitivity
Specificity
Kappa
10
CART
0.9788
0.9683
0.9894
0.9577
RF
0.9868
0.9788
0.9947
0.9735
GBM
0.9868
0.9788
0.9947
0.9735
AdaBoost
0.9815
0.9735
0.9894
0.9630
5
CART
0.9815
0.9735
0.9894
0.9630
RF
0.9868
0.9788
0.9947
0.9735
GBM
0.9815
0.9735
0.9894
0.9630
AdaBoost
0.9815
0.9735
0.9894
0.9630
Table 9. Performance of Various Classifiers on the unseen Multi-timeframe HPC Datasets (Top-10 and Top-5 HPC Features Parameters:- Timeframe = 100ms, Total Duration = 2 Seconds)
In order of seconds (took approximately 4 seconds to detect WannaCry variant)
Not applicable
•Works as an anomaly detection •Sample size is very less (only 4 variants of ransomware is considered for the analysis) •Less emphasis on the pre-encryption behavior of ransomware
In order of milliseconds (Without considering the latency)
No
•97% Average accuracy is reported •sample size is very small (only 80 Ransomware & 76 Benign samples are considered for the analysis) •The latency analysis for the generation of HPC dataset is not mentioned •The latency analysis for testing a sample executable is not mentioned •Less emphasis on the pre-encryption behavior of ransomware
•Achieved 98.6% Accuracy with nearly zero FPR •Considered 21 ransomware families for the analysis •Heavier model (Packages and parameters of the model took 178MB) •Less emphasis on the pre-encryption behavior of ransomware
HiPeR (our proposed method)
Supervised -RandomForest
5
In order of seconds (around 3 seconds)
Yes
•Achieved 98.68% Accuracy with nearly zero FRP •Considered 27 ransomware families for the analysis •Employs a light weight ensemble model which is computationally less expensive and doesn’t require GPU for training the model •HiPer - detects ransomware activity in the first 3.5 seconds of payload execution (more emphasis on pre-encryption behavior of ransomware)
Table 10. Comparison between Various State-of-the-art Ransomware Detection Methods based on HPC Features
Latency calculation: In our study, we used a timeframe value of 100 milliseconds and an iteration count of 20 to record the HPC information during the first 2 seconds of application execution. For each iteration, the CPU adds approximately 35 milliseconds of latency to save the output of the HPC registers in a CSV file. Since the iteration count has been set to 20, the overall overhead for the HPC capture is 35 * 20 = 700 milliseconds. Additionally, as indicated in Table 5, the test time for the Random Forest classifier is around 600 milliseconds. The entire test time, which includes constructing a data frame containing HPC information and determining the classifier output, has a 1300 milliseconds overhead. Based on these overhead statistics, we can determine that our proposed method can detect ransomware activity in less than 3.5 seconds, considering the latency.
4.4 Comparison between Benign and Ransomware Executables based on HPC Features
Antivirus applications, in general, require a wide range of permissions during installation. They add registry keys to persist across system reboots, obtain higher privileges, and access various system information, such as date, geolocation, and file structure. Similarly, file encryption software requires permission to scan the entire file structure of the system and make calls to various built-in crypto API calls during its runtime. Although these two applications are benign, they are loosely similar to ransomware activity in general. Moreover, ransomware looks for files to encrypt in the next phase by searching the entire directory path before encrypting the files. We tested the Windows-based application “Everything.exe” for comparative research. It is a freeware desktop search tool for Windows that can quickly locate files and folders by name [36]. To complete the comparison study, we also took into account another widely used and well-known browser application (mozilla-firefox.exe) [25]. We chose the Norton antivirus [27], VeraCrypt file encryption software [34], Mozilla Firefox browser, and Everything file search program to investigate how HPC statistics differ in comparison to multiple ransomware variants.
As a part of this analysis, we captured the HPC statistics for the first 2 seconds of the application execution for all of the selected benign applications, and compared with the ransomware executables. Figure 5 represents the differences among the applications based on three HPC events: ‘L1 dcache loads’, ‘cpu migrations’ and ‘L1 dcache load misses’, respectively. The findings from the comparative analysis are summarised below:
Fig. 5.
•
Antivirus, browser, and file encryption applications exhibited more load on the HPC registers ‘L1-dcache-loads’ and ‘L1-dcache-load-misses’ compared to ransomware executables.
•
When it comes to CPU migrations, the majority of benign programs and ransomware executables behave similarly. As a result, using CPU migrations as a primary characteristic when developing the classifier may result in mis-classifications.
•
Compared to ransomware executables, the file search application Everything.exe exhibited similar loads on the HPC registers ‘cpu-migrations,’ ‘L1-dcache-loads,’ and ‘L1-dcache-load-misses’. Because of this behavior, the file search programs may be mis-classified as ransomware.
4.5 Ransomware Family Wise Analysis based on HPC Statistics
In comparison to benign executables, we perform the HPC feature analysis associated to various families of ransomware. In this experiment, the mean values for HPC features are extracted from the executables of the most recent ransomware families, including AvosLocker, BlackMatter, Bubuk, Cerber, Conti, Hive, LockBit2.0, Revil, and so on. We gathered HPC features during the first 2 seconds of program execution for executables from distinct ransomware families and compared them to benign applications based on the mean values for relevant HPC features for the acquired time series data. We may deduce the following from the data shown in Figure 6:
Fig. 6.
•
The operational load imposed by benign executables is greater than that caused by ransomware executables on the majority of HPC events.
•
The statistics for the HPC events ‘L1 dcache loads’, ‘branch loads’ suggest that the samples belonging to the notorious families of Bubuk and Cerber have exhibited a higher load on the system in comparison with other ransomware variants.
•
In comparison to the benign applications, ransomware executables generate similar ‘cpu migrations’ values throughout their operation. Conti and Hive ransomware executables, in particular, perform more CPU migrations than other ransomware variants.
•
During their execution, most ransomware samples generated HPC values comparable to one another, with a minimum outlier ratio. This behavior resulted in a better detection rate with the ensemble classifiers, as shown in Table 9.
5 Conclusion and Future Work
The utilization of hardware profiling counter registers for malware detection is an emerging field of research, especially in the context of detecting ransomware. Ransomware at the application level employs various tactics such as obfuscation and packaging to evade static detection methods. Additionally, modern ransomware variants leverage low-level system calls to evade detection based on API calls. A viable alternative approach is to analyze the registers of performance counters to identify ransomware activity. In our study, we conducted an analysis of 27 ransomware families using hardware performance counters. Our findings indicate that early detection of ransomware can be achieved by profiling hardware counter information. We proposed a technique that achieves an accuracy of 0.9868 by monitoring just five HPC register values during the initial two seconds of application execution.
With recent advancements in ransomware payload distribution and persistence strategies, early detection of ransomware is considered to be a difficult task. In our work, we identified a few ransomware strains, such as Bubuk, Surtr, and Jigsaw, that started encrypting files within 5 seconds of their execution. Our proposed model effectively detects such ransomware activity before it enters the encryption phase. On the flip side, malware developers may integrate ways to avoid this HPC-based detection in the future, making ransomware have equivalent statistics to benign applications. In such cases, HPC statistics alone may not be sufficient. We are working towards building a comprehensive solution to improve early ransomware detection by integrating file-based and kernel-based events, as well as hardware profiling counters.
Omar M. K. Alhawi, James Baldwin, and Ali Dehghantanha. 2018. Leveraging machine learning techniques for windows ransomware network traffic detection. In Cyber Threat Intelligence. Springer, 93–106.
Ahmad O. Almashhadani, Mustafa Kaiiali, Sakir Sezer, and Philip O’Kane. 2019. A multi-classifier network-based crypto ransomware detection system: A case study of Locky ransomware. IEEE Access 7 (2019), 47053–47067.
P. Mohan Anand, P. V. Sai Charan, and Sandeep K. Shukla. 2022. A comprehensive API call analysis for detecting windows-based ransomware. In 2022 IEEE International Conference on Cyber Security and Resilience (CSR). IEEE, 337–344.
P. Mohan Anand, T. Gireesh Kumar, and P. V. Sai Charan. 2020. An ensemble approach for algorithmically generated domain name detection using statistical and lexical analysis. Procedia Computer Science 171 (2020), 1129–1136.
Mohammad Bagher Bahador, Mahdi Abadi, and Asghar Tajoddin. 2014. HPCMalHunter: Behavioral malware detection using hardware performance counters and singular value decomposition. In 2014 4th International Conference on Computer and Knowledge Engineering (ICCKE). IEEE, 703–708.
Mohammad Bagher Bahador, Mahdi Abadi, and Asghar Tajoddin. 2019. HLMD: A signature-based approach to hardware-level behavioral malware detection and classification. The Journal of Supercomputing 75, 8 (2019), 5551–5582.
P. V. Charan, Sandeep K. Shukla, and P. Mohan Anand. 2020. Detecting word based DGA domains using ensemble models. In International Conference on Cryptology and Network Security. Springer, 127–143.
P. V. Sai Charan, P. Mohan Anand, Sandeep K. Shukla, Naveen Selvan, and Hrushikesh Chunduri. 2022. DOTMUG: A threat model for target specific APT attacks–misusing google teachable machine. In 2022 10th International Symposium on Digital Forensics and Security (ISDFS). IEEE, 1–8.
Zhi-Guo Chen, Ho-Seok Kang, Shang-Nan Yin, and Sung-Ryul Kim. 2017. Automatic ransomware detection and analysis based on dynamic API calls flow graph. In Proceedings of the International Conference on Research in Adaptive and Convergent Systems. 196–201.
Nikolai Hampton, Zubair Baig, and Sherali Zeadally. 2018. Ransomware behavioural analysis on windows platforms. Journal of Information Security and Applications 40 (2018), 44–51.
Hiromasa Kaneko. 2021. Examining variable selection methods for the predictive performance of regression models and the proportion of selected variables and selected random variables. Heliyon 7, 6 (2021), e07356.
S. H. Kok, Azween Abdullah, N. Z. Jhanjhi, and Mahadevan Supramaniam. 2019. Prevention of crypto-ransomware using a pre-encryption detection algorithm. Computers 8, 4 (2019), 79.
S. H. Kok, A. Azween, and N. Z. Jhanjhi. 2020. Evaluation metric for crypto-ransomware detection using machine learning. Journal of Information Security and Applications 55 (2020), 102646.
Eugene Kolodenker, William Koch, Gianluca Stringhini, and Manuel Egele. 2017. PayBreak: Defense against cryptographic ransomware. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. 599–611.
Abraham Peedikayil Kuruvila, Shamik Kundu, and Kanad Basu. 2020. Analyzing the efficiency of machine learning classifiers in hardware-based malware detectors. In 2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI). IEEE, 452–457.
Mohan Anand Putrevu, Venkata Sai Charan Putrevu, and Sandeep Kumar Shukla. 2023. Early detection of ransomware activity based on hardware performance counters. In Proceedings of the 2023 Australasian Computer Science Week. 10–17.
Daniele Sgandurra, Luis Muñoz-González, Rabih Mohsen, and Emil C. Lupu. 2016. Automated dynamic analysis of ransomware: Benefits, limitations and use for detection. arXiv preprint arXiv:1609.03020 (2016).
R. Vinayakumar, K. P. Soman, K. K. Senthil Velan, and Shaunak Ganorkar. 2017. Evaluating shallow and deep networks for ransomware detection and classification. In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), IEEE, 259–265.
Mohamed TAl-rimy BAlmalki S(2024)A Ransomware Early Detection Model based on an Enhanced Joint Mutual Information Feature Selection MethodEngineering, Technology & Applied Science Research10.48084/etasr.709214:4(15400-15407)Online publication date: 2-Aug-2024
Woralert CLiu CBlasingame Z(2024)Towards Effective Machine Learning Models for Ransomware Detection via Low-Level Hardware InformationProceedings of the International Workshop on Hardware and Architectural Support for Security and Privacy 202410.1145/3696843.3696847(10-18)Online publication date: 2-Nov-2024
ACSW '23: Proceedings of the 2023 Australasian Computer Science Week
Modern-day ransomware variants are quick in their operations and start to encrypt the files within a few seconds after the initial payload execution. This poses an exigency towards early detection of ransomware payloads. Although there are multiple ...
In recent years, ransomware attacks have exploded globally, and it has become one of the most significant cyber threats to digital infrastructure. Such attacks have been targeting ranging from individuals to critical infrastructure or large ...
With the increasing number of ransomware attacks on critical infrastructures, there is an urgent need to develop effective systems that can detect ransomware early. In order to achieve this objective, many detection solutions rely on machine ...
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Mohamed TAl-rimy BAlmalki S(2024)A Ransomware Early Detection Model based on an Enhanced Joint Mutual Information Feature Selection MethodEngineering, Technology & Applied Science Research10.48084/etasr.709214:4(15400-15407)Online publication date: 2-Aug-2024
Woralert CLiu CBlasingame Z(2024)Towards Effective Machine Learning Models for Ransomware Detection via Low-Level Hardware InformationProceedings of the International Workshop on Hardware and Architectural Support for Security and Privacy 202410.1145/3696843.3696847(10-18)Online publication date: 2-Nov-2024
Zeraatkar AKamran PKaur IRamu NSheaves TAl-Asaad H(2024)On the Performance of Malware Detection Classifiers Using Hardware Performance Counters2024 International Conference on Smart Applications, Communications and Networking (SmartNets)10.1109/SmartNets61466.2024.10577644(1-6)Online publication date: 28-May-2024
Ispahany JIslam MIslam MKhan M(2024)Ransomware Detection Using Machine Learning: A Review, Research Limitations and Future DirectionsIEEE Access10.1109/ACCESS.2024.339792112(68785-68813)Online publication date: 2024
K. AK.B. N(2024)An intelligent ransomware attack detection and classification using dual vision transformer with Mantis Search Split Attention NetworkComputers and Electrical Engineering10.1016/j.compeleceng.2024.109509119:PAOnline publication date: 1-Oct-2024
Cen MJiang FQin XJiang QDoss R(2024)Ransomware early detectionComputer Networks: The International Journal of Computer and Telecommunications Networking10.1016/j.comnet.2023.110138239:COnline publication date: 12-Apr-2024
Mahboubi ABui HAboutorab HLuong KCamtepe SAnsari K(2024)A Lightweight Detection of Sequential Patterns in File System Events During Ransomware AttacksWeb Information Systems Engineering – WISE 202410.1007/978-981-96-0576-7_16(204-215)Online publication date: 2-Dec-2024