survey

Open access

A Survey on Thwarting Memory Corruption in RISC-V

Authors:

Marco Brohet,

Francesco RegazzoniAuthors Info & Claims

ACM Computing Surveys, Volume 56, Issue 2

Article No.: 28, Pages 1 - 29

https://doi.org/10.1145/3604906

Published: 14 September 2023 Publication History

PDF eReader

Abstract

With embedded devices becoming more pervasive and entrenched in society, it is paramount to keep these systems secure. A threat plaguing these systems consists of software vulnerabilities that cause memory corruption, potentially allowing an attacker to breach the device. Software-based countermeasures exist, but suffer from high overhead. In this survey, we investigate whether this could be mitigated using dedicated hardware. Driven by the advancements of open hardware, we focus on implementations for RISC-V, a novel and open architecture tailored for customization. We distinguish between methods validating memory accesses beforehand, obfuscating information necessary for an attack, and detecting memory values corrupted earlier. We compare on qualitative metrics, such as the security coverage and level of transparency, and performance in both software and hardware. Although current implementations do not easily allow for a fair comparison as their evaluation methodologies vary widely, we show that current implementations are suitable to minimize the runtime overhead with a relatively small area overhead. Nevertheless, we identified that further research is still required to mitigate more fine-grained attacks such as intra-object overflows, to integrate into more sophisticated protected execution environments towards resilient systems that are automatically recoverable, and to move towards more harmonized evaluation.

1 Introduction

In today’s world, every aspect of our daily life is surrounded by embedded systems. We always have our smartphones within reach to make life easier. Even critical applications, such as the power grid and health care, rely on electronic systems. With the development of autonomous cars and the Internet of Things (IoT), embedded systems will be even more pervasive. A successful attack could therefore have a tremendous impact. For instance, Greenberg [30] explains how his Jeep was brought down while driving on the highway. The Mirai botnet, consisting solely out of IoT devices, was able to perform one of the largest distributed denial-of-service (DDoS) attacks with a volume of more than 600 Gbps [2]. Hence, it is paramount that embedded systems remain secure.

Embedded systems face many security threats [78]. One of these consists of software vulnerabilities that lead to memory corruption. That is, an attacker is able to read from and/or write to sensitive memory regions, possibly allowing an attacker to gain full control over the victim process and wreak havoc. The survey of Szekeres et al. [66] gives a comprehensive overview of the different types of memory corruption attacks and software mitigations that have been developed in response. Strikingly, the main conclusion is that these mitigations are not always suitable due to high overhead in performance and binary compatibility issues. The former is especially troublesome in embedded environments, which are known to be constrained in terms of available resources [27].

Interestingly, Xiao et al. [78] propose to look beyond the realm of software. The authors suggest to build a framework that has in its inner layer an operating system (OS) that is tightly coupled with the hardware to provide strong process isolation. However, isolation alone does not solve the vulnerability itself, but rather contains the security breach to the affected process. Nevertheless, securing software through hardware seems as an appealing solution to the problem while keeping the performance overhead to a minimum and maintaining full binary compatibility. Additionally, the increase in circuit area and power consumption of such a hardware-assisted implementation must be kept low to reduce the device’s cost. Investigating the feasibility of such a hardware-oriented approach is nevertheless not trivial, as the architecture of embedded devices is often purchased as intellectual property. As such, it cannot easily be customized for research purposes [31].

To overcome this challenge, at least in the microprocessor domain, Hill et al. [34] suggest to make use of RISC-V, a promising modern instruction set architecture (ISA) that is open-source and freely customizable. Thus, one could take an open RISC-V platform, implement a new security primitive on top of it, and evaluate its impact on the performance. Thanks to new advancements, reconfigurable hardware, usually in the form of a Field-Programmable Gate Array (FPGA), can now fit a whole System-on-Chip (SoC), including one based on RISC-V. ASIC implementations of open hardware are also sufficiently mature to be deployed in the real world. Being open hardware, it is fundamental that we are aware of all security mechanisms available. To the best of our knowledge, in literature, there exists no systematic review yet of the state-of-the-art in hardware-based defenses against memory corruption. Our work fills this gap with a particular focus on RISC-V-based implementations. We provide a taxonomy of how such defenses can be implemented and whether they fully protect against all memory corruption attacks. This allows us to establish to which extent a hardware-assisted approach is indeed beneficial in RISC-V-based architectures, being used in embedded environments with constrained resources or in larger systems, and provide suggestions for future research in this area.

This survey is structured as follows. We begin by discussing preliminaries of our research in Section 2. Next, we review the state-of-the-art in RISC-V defenses that aim to thwart memory corruption in Section 3. We highlight novelties of our survey compared to previous work in Section 4 and discuss and conclude our work in Section 5, where we also recommend directions for future research.

2 Preliminaries

In this section, we briefly discuss the rise of RISC-V and motivate why we focus on this architecture in our survey. We also explain what a memory corruption attack entails, we define our threat model and finish with the evaluation criteria that we employ.

2.1 RISC-V

The main benefit of an open-source ISA is that it can easily be used within a platform to develop and test new concepts in, whether in the area of security or performance. RISC-V is an example of such an ISA that was formally published for the first time in 2014 [4]. In their search for the ideal open ISA, the authors concluded that a reduced instruction set computer (RISC) ISA would be preferred, as it is easy to execute and had a better commercial track record than competing ISAs such as those of a complex instruction set computer (CISC) nature. Even though open-source RISC ISAs such as SPARC and OpenRISC were already available, the authors decided to build upon RISC-V, an ISA that the authors and others already conceived in 2010 for research and education purposes. They were mainly motivated by the fact that over time several pitfalls have been discovered in the designs of pre-existing ISAs. A detailed comparison between RISC-V and other architectures is given by Waterman [72], but Asanović and Patterson [4] highlight among others that it has built-in support for 128-bit addressing and was designed to easily add custom ISA extensions. Thus, RISC-V was designed to learn from the lessons from previous architectures to become future-proof.

Naturally, this makes RISC-V very appealing for architectural research. Several open cores implementing the ISA have been developed in response, ranging from Ibex (formerly Zero-riscy) for ultra-low-power conditions [17] to SonicBOOM for high performance [80]. These soft-cores, which are written in synthesizable code, can be run on reprogrammable hardware, out of which field-programmable gate arrays (FPGAs) are the most widely known example. This allows for far more flexibility in prototyping new functionality rather than if we were to target an application-specific integrated circuit (ASIC). On the other hand, industry is also starting to adopt RISC-V, with support coming from Microsoft and Intel, among others [31]. Semico Research even estimates that 62.4 billion RISC-V cores will be in use by 2025 [57]. The hardware security solutions discussed here therefore do not only have academic value, but even have the potential to be deployed directly in practice.

2.2 Memory Corruption

Switching from the RISC-V architecture to the realm of software, let us consider a program that is vulnerable to memory corruption. Szekeres et al. [66] classifies the different memory attacks such a program can be exposed to as either a control-flow hijack, a data-only attack, or as an information leak. Note that for data-only attacks, we opted to follow the more recent insights in the work of Cheng et al. [14] to refer to these as data-oriented attacks.

2.2.1 Control-Flow Hijack.

The classic example of exploiting memory corruption is to overwrite a code pointer, out of which the return address on the stack is the most famous one. When the function call ends, the control-flow is then not returned to the callee, but to the corrupted address. In a code-injection attack, it is used to divert the control-flow to malicious code that the attacker has placed in a writable area. This attack is commonly mitigated by marking writable buffers as non-executable, but can be circumvented with a code-reuse attack. In that case, the control-flow is diverted to pre-existing code in the binary that the attacker uses in its advantage. With return-oriented programming (ROP), an attacker tries to find gadgets in the binary, which are small snippets of code that end with a return statement. The attacker then stitches a sequence of these gadgets and executes this by manipulating the return addresses. A variant where indirect jumps are misused is called jump-oriented programming (JOP).

2.2.2 Data-Oriented Attack.

An attacker is not necessarily restricted to merely overwriting code pointers. To illustrate, let us consider the snippet of code in Listing 1. If an attacker is able to overwrite the authenticated variable with a non-zero value before it is read, the attacker can mislead the program into executing the sensitive code while keeping the control-flow untouched. These attacks are referred to as direct data manipulation (DDM) [14] or non-control-data attacks [66]. A data-oriented version of ROP and JOP, data-oriented programming (DOP), has also been observed in the wild. Cheng et al. [14] describe a DOP attack where a while-loop is abused to dispatch the execution of DOP gadgets.

2.2.3 Information Leak.

While the previous two types of attacks concentrate on overwriting memory, an attacker can also aim to read from sensitive memory, for example, to leak cryptographic keys [66]. A typical example is a format-string attack. Functions such as printf() receive as their first argument a format-string that specifies how the printed output should be formatted using the other arguments that are passed on. When the attacker is able to control the format-string, it can make use of special formatting capabilities to arbitrarily read from memory. Another example is the Heartbleed bug [33], which caused sensitive data beyond the intended buffer to be sent back to the attacker. This was caused by the fact that an attacker could request more data than was reserved for.

2.3 Threat Model

Now that it is clear what a memory corruption attack actually entails, we proceed with formalizing the threat model that we consider in this survey. We assume a RISC-V device on which a program is running that is susceptible to memory corruption. We already established that this is a valid assumption, as code-level vulnerabilities facilitating memory corruption attacks are one of the root causes of the insecurity of embedded devices [78]. This threat is also recognized by Fournaris and Sklavos [27]. The assumed attacker is able to interact with the victim program. A connection to the internet is not necessarily a prerequisite, as the Stuxnet attack illustrates that even air-gapped devices with no physical connection to the outside world can still be attacked [39].

Our focus is on defenses integrated in the RISC-V core’s execution pipeline to detect or even prevent memory corruption from occurring. An intrusion detection system (IDS) that monitors at the network level for malicious activity, for example, is hence out of scope. We also do not cover inter-process isolation techniques, because then these vulnerabilities are neither prevented nor detected—it merely limits the reach an attacker can have. Finally, we rule out defenses that presume flaws in any protocol in use by the victim process or implementation thereof. For instance, we assume that all encryption algorithms in use are secure, and that only authenticated users can perform sensitive operations.

2.4 Evaluation Criteria

In this survey, we focus on hardware-assisted defenses implemented in a RISC-V core. In several cases, dedicated software routines were also required to fully unlock the protection. As in these settings the software support is only ancillary, these types of defenses are within the scope of this survey. Defenses that do not require changes to the processor architecture’s hardware are not considered. The defenses are evaluated based on the following metrics. We distinguish between qualitative metrics and performance. For performance, we further subdivide into software- and hardware-based metrics.

2.4.1 Qualitative Metrics.

Security coverage. Indicates whether the security solution aims to thwart all memory corruption attacks, or only a subset thereof. This allows the system designer to finetune which parts of the system to protect.

Transparent protection. Indicates if the source code of the program is not required to enforce the protection. This is especially beneficial when only the closed-source binary of the program is available. Sometimes, only partial protection is available without, e.g., recompilation.

Fully automated. Indicates, if the protection cannot be applied fully transparently, whether the procedure for adding support to a program has been fully automated. If manual instrumentation is required, then the burden is increased for the programmer to enable the protection, which makes it less likely for the protection to be deployed. Sometimes, manual instrumentation is only required to properly support special use cases.

OS-independent. Indicates whether the OS needs to explicitly support the protection scheme. Deployment is easier if the OS does not need to be modified. Additionally, if an attacker is able to gain administrative access to the OS through other means, for example, by means of a protocol error, then OS-dependent protection schemes could be switched off.

Legacy support. Indicates whether unmodified binaries can still run on the processor. This is important if the system designer expects legacy programs to be used. Note that this is different from our transparent protection metric, which indicates whether unmodified binaries are also protected or not.

Open source. Indicates whether the hardware implementation of the modified processor is publicly available. This allows the designer to integrate the solution directly in its own processor, or perform experiments with workloads that the system is expected to process.

2.4.2 Software-Based Metrics.

Software benchmark suites can be used to quantify the overhead incurred by the protection scheme on the software execution. We report the benchmarks used, if applicable, and the following metrics relative to the original, unprotected execution.

Binary size increase. Indicates the average increase in size of the binary, i.e., due to added instructions and/or metadata, quantified in bytes. This gives the system designer an estimation of how much storage space is required to protect the software.

Runtime overhead. Indicates the average increase in runtime in seconds. This gives the system designer an estimation of the slowdown experienced with the protection enabled.

2.4.3 Hardware-Based Metrics.

The processor can be evaluated on hardware, either on FPGA or as an ASIC. We report what type of FPGA and/or ASIC technology is targeted and the following metrics relative to the original, unprotected processor.

Area. Indicates the amount of area occupied to estimate the fabrication cost. For FPGAs, this is usually done in the amount of required lookup tables (LUTs). Note that LUTs for different FPGA families can be different and one should therefore not compare two works evaluated on different FPGA types directly with each other. For ASIC, the area can be expressed independent of the technology used in Gate Equivalents (GE), which is the area expressed in \(\,\mathrm{\mu }\mathrm{m}^{2}\) divided by the area required for a simple two-input NAND gate for the technology targeted. When the area is reported for ASIC, we convert this to GE wherever possible to allow for a technology-independent comparison.

Power. Indicates the average amount of extra power consumed by the security solution. This is especially important for embedded systems with small power supplies. Although one can distinguish between the static consumption already incurred by the circuit itself and dynamic consumption which depends on the switching activity and thus the input, in literature we mostly found references to the static consumption, and thus we only consider this one.

3 RISC-V Defenses

As can be witnessed from Figure 1, a memory corruption attack consists of three different stages. First of all, a pointer needs to be crafted that refers to a memory area that is of the attacker’s interest, i.e., to either read from or write to that memory. To prevent the read or write operation from occurring, access validation can be performed to verify whether the source and/or destination of that operation is legitimate. If that fails, then in the case of a write operation, the attacker would still need to find out with what value it must overwrite the memory—or, when leaking information, how it must interpret this data. Information obfuscation aims to make it virtually impossible for the attacker to understand how the data is organized in memory. For an information leak, this is the last line of defense. In the case of a control-flow hijack or a data-oriented attack, corruption detection can be employed to verify whether a memory value to be read has not been altered maliciously in the meantime. If so, then the execution can be halted before the corrupted value is actually used. In this section, we review the state-of-the-art of implementations of such defenses in RISC-V cores along this categorization.

Listing 1.

Fig. 1.

3.1 Access Validation

An invalid memory access violates what we refer to as memory safety. What this entails is illustrated in Figure 2, where the pointer buf is associated with the memory area colored in green. When buf references memory beyond its intended area, a spatial violation occurs. However, when buf is within bounds but is used after the memory area has been released, we refer to this as a temporal violation. Although enforcing full memory safety can thwart all memory corruption attacks, this approach is associated with a significant performance overhead. The software-only approach of AddressSanitizer [58] incurred a 73% slowdown and a 150% code size increase on average with SPEC CPU2006 on Intel Westmere-EP.

Fig. 2.

3.1.1 Fat Pointers.

An approach to achieve spatial safety is by leveraging fat pointers. The idea is to determine for each pointer to which object it is supposed to belong. Then, the base address and the bound of the object are stored intertwined with the pointer, for example, by using unused bits in the pointer itself. This information allows one to verify for each memory access whether its destination is within bounds or not.

Shakti-T [49], a modified Shakti processor, has a slightly different approach as it rather associates the base and bound with the object itself. This optimizes for the case where many pointers refer to the same object. These aliased pointers then get the same identifier, thus also guaranteeing consistency in case of re-allocation. For temporal safety, the identifier is invalidated upon de-allocation. The compiler automatically inserts instructions to dynamically provide the base and bound at runtime, which are then stored in dedicated hardware storage. The resource occupation on the Virtex UltraScale FPGA VCU108 amounts to 1914 LUTs, while in ASIC after synthesis with UMC’s \(55 \,{\rm nm}\) library the area occupation is 11834 cells. Effectiveness was evaluated with several custom test cases of spatial and temporal errors, and no software metrics were analyzed.

Shakti-MS [16] aims to reduce the hardware complexity by storing the metadata not in dedicated hardware storage, but on the stack. Using 32-bit addresses, a 32-bit base and bound are stored alongside to prevent spatial attacks. Although now the address space becomes limited to \(4 \,{\rm GiB}\), this is not necessarily a great concern in embedded systems. Each address is also extended with a 32-bit hash ID that is based on a 64-bit random number generated for each stack frame and heap object. This number is randomized after each deallocation to prevent temporal attacks in both the stack and the heap. Evaluation reported a runtime overhead of 13% for several test cases from SPEC and SARD, and a code size overhead of 11%. The increase in area is significantly less than with Shakti-T, with 700 LUTs on FPGA and 4100 cells on ASIC, using the same technology as Shakti-T. Effectiveness was shown by mitigating all buffer overflow attacks from several SARD datasets.

Woodruff et al. [76] were the first to present Capability Hardware Enhanced RISC Instructions (CHERI), an alternative approach to memory addressing for RISC architectures by replacing pointers with capabilities. These are unforgeable objects that are associated with specific regions in memory [26]. Each capability memory access consults a CHERI co-processor to retrieve the associated metadata, such as base and bound for spatial safety. Temporal safety is achieved through garbage collection and revocation [77]. In addition, CHERI allows for other use cases such as sandboxing, although this needs to be explicitly added by the programmer. Capability-enforced memory is tagged, allowing supported and legacy software to coexist, but full protection can only be achieved when the OS is aware of the capability. The eighth version [73], which is the most current version at the time of writing, contains a draft application of CHERI to RISC-V. However, to the best of our knowledge, this approach was not yet evaluated on RISC-V, although Woodruff et al. [76] did evaluate an earlier version of CHERI on MIPS, and all tools have been made publicly available [73].

3.1.2 Segmentation.

Instead of tracking all pointers, the following solutions segment the program code and then define for each segment which regions in memory it is allowed to access. The granularity of segmentation is adjustable, for example, by treating each function as a segment. This enables more dynamic policies, such as employing fine-grained segmentation when sensitive code is executed while switching to a more coarse-grained version in more trustworthy environments. We remark that this approach of data isolation is different from the concept of inter-process isolation that we deemed out of scope in Section 2.3. To be precise, segmentation, or inner-process isolation, is primarily focused at mitigating memory corruption attacks within a vulnerable process, while inter-process isolation focuses on preventing an attacker from taking over other processes after a vulnerable process has been hijacked.

In-Process Memory Protection via User-Level Partitioning (IMPULP) [82] allows the programmer to manually specify code segments that are untrusted—in their case, the program itself is considered to be trusted, but library functions are not. Thus, for each library call, the programmer can specify up to 16 objects in memory that the call is allowed to access. This information is stored in dedicated hardware storage. The compiler automatically inserts instructions to let the hardware dynamically know which objects to track. A copy of the return address is also stored to detect ROP attacks. The OS needs to be adapted for this to work. Effectiveness was shown on a modified Rocket core on the Virtex-7 FPGA VC707 evaluation board by mitigating Heartbleed and benchmarks from MIT [83]. By protecting only library calls, SPEC CPU2006 incurred less than 0.2% runtime overhead. On FPGA, 5.4% (2619) more LUTs were used, while the power usage increased with 4.3% (\(0.164 \,{\rm W}\)). Targeting unknown technology, the area on ASIC increased with 0.7% (\(16598 \,\mathrm{\mu }\mathrm{m}^{2}\)) and power with 0.5% (\(0.982 \,{\rm W}\)). Because the area expressed in \(\,\mathrm{\mu }\mathrm{m}^{2}\) is directly proportional to the area expressed in GE, assuming \(2607004 \,{\rm GE}\) for the original Rocket core [38], the area would then amount to approximately \(18 \,{\rm kGE}\).

HardScope [50, 52] follows a more automated approach by automatically instrumenting for each function which static and automatic variables it is allowed to access. Only for dynamically allocated objects on the heap, manual intervention by the programmer is required. The metadata is stored in dedicated hardware storage. Effectiveness was demonstrated by mitigating the DOP attack in ProFTPD with a Pulpino core on the Zynq-7000 ZedBoard FPGA board. With CoreMark, an average runtime overhead of 3.2% was reported with a binary size increase of 11%. The configuration of HardScope required to run CoreMark increased the number of LUTs with 198% (30520). Note that Pulpino strives to be a small core, and hence any increase in area quickly becomes relatively large. On ASIC, after synthesis with the NanGate \(45 \,{\rm nm}\) technology, the area overhead amounted to 800000 transistors. Assuming CMOS with four transistors for a simple two-input NAND gate, this equates to \(200 \,{\rm kGE}\). A simulated version is available publicly [51].

Dam [48] does not perform adjustable segmentation, but aims to discover sensitive data and only protect that memory. The authors argue that DOP attacks rely on corrupting (1) data pointers to stitch data-oriented gadgets, and (2) loop decision variables to schedule the gadgets, and hence only these variables are under protection. Static compile-time analysis reveals the locations of these sensitive variables, and these locations are tagged. Then, at runtime, a second tag is used that identifies the instruction that is trying to write to it. When an illegitimate write has occurred, the second tag is invalid and an attack is detected. A runtime overhead of 6.48% was observed with the RISC-V test suite while simulating with a Rocket core. No analysis on hardware was performed, but we remark that storing a tag for each 64-bit word can be a costly exercise. Effectiveness was demonstrated by mitigating a DOP attack in wu-ftpd.

3.1.3 Guarded Buffers.

When a write to a buffer occurs without proper bounds checking, it could linearly overflow into memory beyond the buffer itself. Guard Lines [69] aims to exploit this linear property by employing guards, which are small memory regions that trigger a CPU exception when accessed. They are placed around allocated memory for spatial safety and within de-allocated memory for temporal safety. Dynamically allocated memory is automatically protected for all binaries, as long as the pre-modified system allocator is used, but recompilation is required to protect static variables and the stack. However, temporal violations occurring after de-allocated memory is allocated again remain under the radar. OS support is required to enable the usage of the guards. Evaluation for RISC-V was only performed to verify that Heartbleed and MIT test cases were mitigated.

3.1.4 Hardware Monitoring.

The previous methods required compiler support for full protection. The following three solutions are hardware monitors that track the conditions of the CPU while it executes a program. The main benefit of this approach is that it can be deployed transparently, because it should not require explicit support in software.

ARMOR [32] is a hardware monitor that protects the heap. It tracks all allocation and de-allocation calls and stores this information in a special range cache, for which the authors performed an exhaustive exploration to find the most optimal configuration. ARMOR then uses this metadata to determine for each heap access whether the destination is currently allocated or not. However, this approach is unable to detect overflow errors within the heap, which caused it to miss two vulnerabilities in SARD’s Juliet test suite. Like Guard Lines, it cannot detect incorrect references to de-allocated memory that has been allocated again. Evaluation with a Rocket core showed an increase of 0.88% (\(4200 \,\mathrm{\mu }\mathrm{m}^{2}\)) in area for the control logic and 6.9% (\(10.6 \,\mathrm{m}\mathrm{W}\)) in power with NanGate’s \(15 \,{\rm nm}\) open library. Following the same reasoning as with IMPULP, the area occupation would then amount to approximately \(23 \,{\rm kGE}\). The cache incurred a 2.9% runtime overhead with SPEC CPU2006.

RiskiM [38] is a hardware monitor for the kernel. It aims to protect the integrity of the kernel memory and of the control status registers (CSRs) that are used to regulate the system configuration. The kernel provides at boot time, where its integrity is assumed to be still untouched, metadata about which protected regions are writable from which instructions and which values are allowed; and the same for the CSRs. This information is stored in dedicated hardware storage. Only static policies are supported, which means that dynamically allocated memory is not protected. Results were reported for a modified Rocket core on a Zynq-7000 ZC706 FPGA board, which showed an increase of 2322 LUTs and 0.53% (\(9 \,\mathrm{m}\mathrm{W}\)) power overhead. The approach was also evaluated on ASIC with a commercial \(45 \,{\rm nm}\) library. In that case, the area increase was 1.3% (33664 GE) in post-synthesis gate count. With LMbench, a 0.73% runtime overhead was measured.

PHMon [22] employs a more generic approach by providing a programmable hardware monitor. A PHMon application consists of a number of Match Units (MUs) that monitor the received information trace and trigger an Action Unit (AU) when a match is found to act upon. Similar to CHERI, this concept is suitable for various use cases, but most importantly it can monitor all memory accesses, although the programmer must manually interface with the monitor to do so. Thus, memory safety could be enforced in a similar vein as described in Section 3.1.2. The authors created a proof-of-concept with two MUs on a Rocket core by manually mitigating Heartbleed with a negligible runtime overhead. On ASIC, a 5% power overhead and a 13.5% area overhead was measured with NanGate’s \(45 \,{\rm nm}\) library, which would be equivalent to approximately \(352 \,{\rm kGE}\) assuming the same Rocket configuration as with RiskiM. On the Zynq-7000 Zedboard FPGA, a 16% increase in the number of LUTs was reported. The hardware implementation has been made open source [21].

3.1.5 Discussion.

The reported evaluation metrics have been summarized in Table 1. In terms of the qualitative metrics, we observe that only Shakti-T, Shakti-MS, CHERI, and HardScope offer full memory safety. Recompilation is in selected use-cases not necessary for HardScope and Guard Lines, but it is required in all other cases for full protection, even for the hardware monitors. Except for IMPULP and PHMon, a fully automated approach was provided in the form of compiler extensions. For CHERI, manual instrumentation is available in case more advanced control is needed. Half of the implementations required some form of patches to the OS to operate optimally, but since most implementations targeted simple single-threaded execution environments, the question is whether in more advanced execution contexts explicit support from the OS would be required. Luckily, for all implementations, it was still possible to run legacy binaries albeit unprotected, but only for CHERI and PHMon, the full stack was open source. Conclusively, while these implementations do require access to the original source code of the program to be protected, the protection can usually be applied fully automated, although a compatible OS might need to be installed first.

Table 1.

								Software			FPGA			ASIC
	Security coverage\(^{\rm I}\)	Transparent	Fully automated	OS-independent	Legacy support	Open source	RISC-V core	Benchmark	Binary size	Runtime	Technology	Area (LUTs)	Power (mW)	Technology	Area	Power (mW)
3.1.1 Fat pointers
Shakti-T [49]	●	○	●	●	●	○	Shakti	-	-	-	Ultrascale VCU108	1.9k	-	UMC \(55 \,{\rm nm}\)	11k cells	-
Shakti-MS [16]	●	○	●	●	●	○	Shakti	SPEC+SARD (selected)	11%	13%	Ultrascale VCU108	0.7k	-	UMC \(55 \,{\rm nm}\)	4.1k cells	-
CHERI [73, 76]	●	○	●	◐	●	●	-	-	-	-	-	-	-	-	-	-
3.1.2 Segmentation
IMPULP [82]	◐	○	○	○	●	○	Rocket	SPEC’06	-	<0.2%	Virtex-7 VC707	5.4% (3k)	4.3% (164)	-	0.7% (\(18 \,{\rm kGE}\))\(^{\rm II}\)	0.5% (982)
HardScope [50, 52]	●	◐	◐	●	●	◐	RI5CY	CoreMark	11%	3.2%	Zynq-7000 Zedboard	198% (31k)	-	NanGate \(45 \,{\rm nm}\)	\(200 \,{\rm kGE}\)\(^{\rm II}\)	-
Dam [48]	◐	○	●	●	●	○	Rocket	RISC-V tests	-	6.5%	-	-	-	-	-	-
3.1.3 Guarded buffers
Guard Lines [69]	◐	◐	●	○	●	○	Rocket	-	-	-	-	-	-	-	-	-
3.1.4 Hardware monitors
ARMOR [32]	◐	◐	●	●	●	○	Rocket	SPEC’06	11%	3.2%	-	-	-	NanGate \(15 \,{\rm nm}\)	0.9% (\(23 \,{\rm kGE}\))\(^{\rm II}\)	6.9% (10.6)
RiskiM [38]	◐	○	●	○	●	○	Rocket	LMBench	-	0.7%	Zynq-7000 ZC706	3k	0.6% (10)	Commercial \(45 \,{\rm nm}\)	1.6% (\(40 \,{\rm kGE}\))	-
PHMon [22]	◐	○	○	●	●	●	Rocket	OpenSSL (Heartbleed)	-	\(\pm\)0%	Zynq-7000 Zedboard	16%	-	NanGate \(45 \,{\rm nm}\)	13.5% (\(352 \,{\rm kGE}\))\(^{\rm II}\)	5%

Table 1. Comparison of RISC-V Policies for Enforcing Memory Safety

\(^{\rm I}\)Indicates whether memory safety is enforced for all stack, heap and global variables, or only for selected use cases.

\(^{\rm II}\)Converted to GE from what was originally reported.

The dash (-) indicates that the data was not available in literature. A fully colored circle (●) indicates “yes” or “always the case”, halfway colored (◐) indicates “partially”, and an empty circle indicates “no” or “never the case”. Relative increases are with respect to the (execution on the) unmodified processor core. A detailed explanation of our metrics can be found in Section 2.4.

If we focus on the effectiveness of enforcing memory safety, then we can conclude that this depends on the level of granularity that is employed. For example, ARMOR monitored calls to the heap allocator and used this information to verify whether each heap access was to currently allocated memory or not. Hence, overflow from one heap object into a consecutive object would still remain unnoticed. An even more complicated example can be found in Listing 2, where we observe overflow within a heap object. As malloc() is unaware of the structure of the object being allocated and only knows its size, detecting such overflows would only be possible with information available at compile-time. To the best of our knowledge, all implementations discussed are unable to detect this spatial violation, as they all treat heap objects as a single entity. Further research would be required to evaluate the overhead of an approach that is able to detect such an attack.

Table 1 also shows that the performance metrics we considered were not reported homogeneously. Shakti-MS and HardScope were the only works to report on the increase of the size of the program’s binary file due to added instructions and/or metadata. While most implementations did report on the incurred runtime overhead, a wide variety of benchmark suites was used. This ranged from the very lightweight RISC-V unit tests to the more powerful benchmarks of SPEC CPU2006. Guard Lines did report that benchmark suites have limited compatibility with RISC-V, but our review shows that it is possible to run widely recognized benchmarks such as those from SPEC and LMBench on RISC-V, or at least a subset. Although most of the implementations were evaluated on the Rocket platform, it would not be fair to directly compare the data for different benchmarks. Additionally, half of the implementations only protected against a subset of the attacks, and hence the close-to-zero overhead of PHMon’s proof-of-concept is incomparable with the 13% of Shakti-MS providing full memory safety. Nevertheless, it seems promising for embedded systems that the solutions reported here have a runtime overhead of a much favorable order of magnitude compared to the results for AddressSanitizer, at most 13% versus 73%. The exact improvements in runtime overhead by employing a hardware-assisted solution can be quantified once the source code of all solutions is accessible.

In the case of the performance metrics for hardware, power consumption was often missing. Only IMPULP and the hardware monitors considered at least the static power consumption, but for different technologies. Area occupancy was available most of the times, but also here it was reported heterogeneously. The number of LUTs were reported for various types of FPGAs and mixed with absolute and relative increases compared to the size of the original processor core. This is easier to compare for ASIC, as the technology-independent unit of measure GE is directly proportional to the area expressed in \(\,\mathrm{\mu }\mathrm{m}^{2}\). Since Riskim [38] reported Rocket’s original size in GE, we decided to derive this for the Rocket-based implementations. IMPULP and ARMOR have a relatively similar impact to Rocket’s area occupancy, but is much smaller than HardScope’s area of \(200 \,{\rm kGE}\). Then again, HardScope offers more protection and is still relatively small compared to PHMon’s overhead at around \(352 \,{\rm kGE}\) without security applications. The question is how much this will increase for a PHMon-based solution enforcing full memory safety. For the Shakti-based implementations we could not estimate these numbers, but since Shakti-T and Shakti-MS were evaluated using the same methodology, we can conclude that Shakti-MS’s shift from storing the metadata in the program’s memory rather than in dedicated hardware has indeed decreased its hardware complexity.

Taking everything into account, the data reported in Table 1 seems to be promising for hardware-accelerated solutions enforcing memory safety to be implemented in embedded systems. While access to the original source code seems to be a requirement for full protection, and a modified OS in some cases, this process can usually be fully automated. The numbers reported for the runtime overhead seem to point that a hardware-assisted implementation could be beneficial to significantly reduce the overhead, but access to the source code of these works would be required to develop a uniform platform to exactly determine the benefit. The increase in power consumption and binary size was not often part of the evaluation, and the area occupation was not reported homogeneously, but we could still infer the size of the implementations for the majority of the solutions that were based on Rocket. We look forward to see PHMon- and CHERI-based implementations for enforcing full memory safety in RISC-V and how more advanced types of buffer overflows such as within arrays can be thwarted.

3.2 Information Obfuscation

Even if arbitrary memory can be overwritten, an attacker still needs to know which value must be written to which part of the memory to carry out the attack. For an information leak, the location must be known, but also how the data obtained must be interpreted. To ensure that this information remains obscured, Bhatkar and Sekar [8] originally proposed data space randomization (DSR) to randomize the representation of pointers and the contents of data variables. Their implementation incurred an average runtime overhead of 15%. A weaker policy is address space randomization (ASR), which only randomizes the representation of code pointers to prevent control-flow hijacking. An example of what can be randomized is shown in Figure 3.

Listing 2.

Fig. 3.

3.2.1 Address Space Randomization.

Morpheus [28] randomizes all pointers by displacing the code and data address spaces with distinct offsets. Additionally, both code and pointers are encrypted on the fly with the lightweight QARMA block cipher [5] to obfuscate their internal representations. As encryption and decryption is performed at runtime, the OS is responsible for generating the keys. Because it must be known before execution where code and pointers are stored in the binary, compile-time analysis is used to expose this information in the resulting binary. To break the defense, an attacker must exhaustively probe the system to retrieve the information needed. Morpheus counters this by re-randomizing the encryption and displacement periodically, by default every 50 ms. The frequency is dynamically increased when malicious behavior such as a memory scan is detected at runtime. The authors argue that an attack must take several minutes, and hence the default period of \(50 \,{\rm ms}\) offers sufficient protection to make an attack practically impossible. However, relative distances within an address space are not randomized. The implementation was evaluated with a software-based gem5 simulation of the MinorCPU, where on average a runtime overhead of 0.9% was measured with MiBench and SPEC CPU2006. Its effectiveness was evaluated with custom test cases.

3.2.2 Data Space Randomization.

Hardware Assisted Randomization of Data (HARD) [7] encrypts all program data to thwart data-oriented attacks. The compiler analyzes which objects in memory are referenced by which pointers in the program. These objects are categorized in a maximum of 2048 equivalence classes. That is, pointers in the same class reference the same objects. The OS generates for every class a 64-bit encryption key with which the objects are XOR-ed. When a pointer requests access to memory, the memory is only decrypted when the pointer belongs to the correct class. For interaction with external code and data, encryption can be temporarily disabled. However, as XOR encryption is susceptible for a known-plaintext attack, the authors recommend to combine it with ASR. Evaluation with a Rocket core on a Xilinx Zynq ZC702 evaluation board showed a 6.61% runtime overhead with SPEC CINT 2000. In ASIC, after synthesis with yosys, the die area increased with 21% compared to a Rocket core with L1 caches only. As this configuration is smaller than used by RiskiM [38], we can only set an upper limit of \(547 \,{\rm kGE}\) for the approximated area occupation. Its effectiveness was demonstrated by mitigating the wu-ftpd DOP attack.

PUFCanary [19] does not randomize existing variables, but adds randomization in the form of canaries at the end of each stack buffer. Therefore, if an attacker aims to make use of buffer overflow, it is forced to also overwrite the canary before it reaches the actual value of interest. This means that it will only detect linear buffer overflows, similar to Guard Lines as discussed in Section 3.1.3. The generated canary is validated after execution of a sensitive function, such as memcpy(). Thus, if the attacker overwrites the canary with an incorrect value, an attack should immediately be detected. The canary is generated using a physically unclonable function (PUF) that is used to output a device-specific random value for the address location of the canary. The PUF response is then XOR-ed with a secret that is generated by the OS at initial setup of the process. Effectiveness was demonstrated with a Rocket core on a Zynq-7000 ZYBO board by mitigating stack-based attacks in the Wilander Buffer Overrun Suite [75], with which a runtime overhead of 2.3% was observed. The security module increased the area with 2.9%.

3.2.3 Discussion.

The reported evaluation metrics have been summarized in Table 2. With only three solutions found for enforcing ASR and DSR in RISC-V, this is the least popular research direction in this survey. Similar to the policies enforcing memory safety, source code access and recompilation is necessary for full protection, although legacy binaries can still be executed. The two types of defenses diverge on support from the OS. While for enforcing memory safety this was not always necessary, in ASR and DSR, the implementations all assume that the OS can be trusted to be responsible for managing the assignment of random numbers for each process. Because even the OS is not bulletproof, additional measures are required to guarantee the integrity of the OS itself, for example, by combining it with RiskiM’s kernel monitor. Even under the assumption that the kernel is secure, the question remains whether the ASR and DSR policies proposed can mitigate all memory corruption attacks. For instance, although Morpheus is able to randomize addresses with a marginal runtime overhead, it does not randomize the distances between two stack variables. This still leaves room for linear buffer overflows and format-string attacks based on relative addressing. A combined approach with a DSR solution would be required to fully mitigate this kind of attack. Moreover, the main focus of current DSR policies is to protect the stack. As the heap can also suffer from memory corruption attacks, a combination with a heap memory safety solution such as ARMOR would be preferred.

Table 2.

								Software			FPGA			ASIC
	Security coverage\(^{\rm I}\)	Transparent	Fully automated	OS-independent	Legacy support	Open source	RISC-V core	Benchmark	Binary size	Runtime	Technology	Area (LUTs)	Power (mW)	Technology	Area	Power (mW)
3.2.1 Address space randomization (ASR)
Morpheus [28]	●	○	●	○	●	○	MinorCPU (gem5)	MiBench & SPEC’06	-	0.9%	-	-	-	-	-	-
3.2.2 Data space randomization (DSR)
HARD [7]	●	○	●	○	●	○	Rocket	SPEC’00	-	6.6%	-	-	-	-	21% (<\(547 \,{\rm kGE}\))\(^{\rm II}\)	-
PUFCanary [19]	◐	○	●	○	●	○	Rocket	Wilander	-	2.3%	Zynq-7000 ZYBO	2.9%	-	-	-	-

Table 2. Comparison of RISC-V Policies for Enforcing Memory Safety

\(^{\rm I}\)Indicates whether either ASR or DSR is fully enforced, or only for selected use cases.

\(^{\rm II}\)Converted to GE from what was originally reported.

We consider the same parameters and report the data in a similar way as in Table 1.

An interesting observation is that these ASR and DSR policies depend on modules that can be shared, i.e., the PUF-based random number generator in PUFCanary and the QARMA encryption module in Morpheus. These modules might already be available in the form of crypto-accelerators in the processor to provide accelerated encryption and random number generation for software. If these are indeed already present, then the ASR and DSR policies can simply attach to them instead of providing them themselves. The QARMA module used in Morpheus can occupy \(18 \,{\rm kGE}\), which is almost half the size of RiskiM (\(40 \,{\rm kGE}\)). PUFCanary did not provide an estimation for the PUF’s size in hardware. Continuing on the performance results presented in this section, we observe similarly as in Section 3.1.5 that it was not reported homogeneously. Morpheus had the most extensive benchmark evaluation by using both SPEC CPU2006 and MiBench and retaining on average less than 1% runtime overhead, but it was in software-based simulation and so no estimate on the required hardware resources could be given. On the other hand, PUFCanary only measured the runtime overhead on the stack-based Wilander attacks, which were not designed to benchmark performance with. For the hardware metrics, we could set an upper bound for the size of HARD, but further research would be required if it is larger than, e.g., PHMon (\(\pm\)\(352 \,{\rm kGE}\)).

Conclusively, we observe that there is still room for improvement in how addresses and data can be randomized in RISC-V. Current solutions are by themselves insufficient to thwart all memory corruption attacks, although randomization makes it harder to execute such an attack. The measured runtime overheads do not seem to diverge much from the memory safety previously discussed, although we must be careful with comparing these numbers as the used benchmarks differ. On the other hand, software-based DSR did not seem to incur much larger slowdowns. Further research is required to better quantify the performance improvements of a hardware-assisted solution for information obfuscation. A benefit of these approaches is that ASR and DSR policies depend on cryptographic modules that can be shared. Therefore, if secure hardware-based random number generators and encryption implementations are already available, then these can already be utilized to reduce the added overhead.

3.3 Corruption Detection

Our last line of defense against control-flow hijacking and data-oriented attacks consists of techniques that aim to determine upon usage of a particular value in memory whether it has been corrupted previously [66]. Policies enforcing control-flow integrity (CFI) aim to detect corrupted code pointers, which can be further divided into backward-edge protection for return addresses and forward-edge protection for the destination of indirect calls and jumps. Likewise, with the advent of data-oriented attacks, the scope was broadened to address corruption in any kind of data variable with data-flow integrity (DFI). While DFI is a stronger policy than CFI, it can be a costly exercise. The software-based approach of DOPdefenderPlus [70] increased the code size with 18.1% and the runtime with 32.9% using unknown x86 hardware running SPEC CPU2006.

A remark on the terminology used. Abadi et al. [1] introduced the concept of CFI. Their approach consisted of the runtime enforcement of a statically computed control-flow graph (CFG), which is depicted in Figure 4. A CFG node denotes a basic block (BB) of code, and a direct edge from one BB to another indicates an allowed call, jump or return. Thus, the CFG captures all valid source-destination pairs in the control-flow. Several authors, such as De et al. [18], constrain the definition of CFI to the runtime enforcement of a CFG. However, we follow the broader notion of Szekeres et al. [66] and consider any kind of corruption detection in code pointers as a CFI-enforcing policy, even if only partial protection is provided. The same applies to our notion of DFI.

Fig. 4.

Listing 3.

3.3.1 Control-Flow Graphs.

As stated earlier, the first policy to enforce CFI was to statically compute a CFG and enforce it at runtime. In the case of CCFI-Cache [15], this is done with the following two components. Firstly, a cache module contains the valid source-destination pairs that are statically determined after compilation. This metadata is also hashed to ensure its integrity. The checker then uses this cache at runtime to verify that the control flow has not been diverted. Backward-edge protection is guaranteed with a shadow stack. An interrupt is raised when a violation is detected, which puts the OS in charge with processing this event. Evaluation with PicoRV32 on a Nexys 4 DDR Artix-7 FPGA incurred an increase of 11.3% (1250) in LUTs. A runtime overhead between 2% and 63% and a code size overhead between 9% and 30% were measured with custom test cases.

FIXER [18, 19] uses a similar approach as CCFI-Cache, but implements this in a dedicated security co-processor. The main benefit of this approach is that the main core does not have to be modified, which makes it easier to be integrated in existing SoC architectures. Each component, such as the shadow stack, is implemented as a separate security module. However, since there is only one, multi-processor environments do require more effort to be supported. Additionally, FIXER requires source-code annotations, while CCFI-Cache does not. Coupled with a Rocket core on a Zynq FPGA, the security module occupies an additional 2.9% in area. A runtime overhead of 1.5% was measured with the RISC-V tests.

3.3.2 Data Shadowing.

One of the implemented use cases of PHMon, the programmable hardware monitor discussed in Section 3.1.4, was a shadow stack. It relies on a protected memory area allocated by the OS where a copy of the return address is stored when a function is called. After the call is finished, the return address on the stack is compared with the copy to verify its integrity, thus solely enforcing backward-edge protection. Nevertheless, recompilation is not required for the shadow stack to be enabled. The average runtime overhead measured with Rocket running MiBench, SPECint2000 and SPECint2006 on the Zynq-7000 Zedboard FPGA was 0.9%. On ASIC after synthesis with NanGate’s \(45 \,{\rm nm}\) library, a 5% power overhead and a 13.5% area overhead were measured, while on FPGA 16% more LUTs were utilized. Effectiveness was demonstrated using custom test cases.

TrustFlow-X [11] generalizes the concept of a shadow stack by supplementing the ISA with a generic interface for handling sensitive data. When memory is being stored using this interface, TrustFlow-X automatically stores a copy of it in trusted memory outside of the control of the attacker. When this memory is loaded again, TrustFlow-X verifies whether the output from memory matches the corresponding copy in trusted memory. Compiler extensions are provided that can automatically detect and instrument sensitive code pointers for full CFI, but the programmer can also manually indicate to protect application-speicifc sensitive data, such as buffers holding encryption keys. Although TrustFlow-X is able to repair corrupted data, the size of the trusted memory defines the maximum amount of data that can be protected, and this must be determined prior to manufacturing the device. Only static data is supported, which means that dynamically allocated data is out of scope, and no support for Linux was developed. Its effectiveness was demonstrated using a Rocket core on an Artix-7 35T Arty FPGA with the RIPE benchmark suite. Negligible runtime overhead was measured with CoreMark, while the binary size did not grow as the regular load/store instructions were replaced with those of TrustFlow-X. Excluding the trusted memory, the number of occupied LUTs increased with 1.03% (118).

3.3.3 Dynamic Information Flow Tracking.

Originally proposed by Suh et al. [65], dynamic information flow tracking (DIFT) does not focus on instructions that illegitimately modify memory, but rather on how potentially malicious input is used during the execution of a program. It does so by tagging—or tainting—the memory that contains the untrusted input. When tainted memory is used for a computation, the taint is propagated to the location where the result is stored. An implementation is free to decide under which conditions the tag is propagated. The final step consists of detecting whether sensitive data has been tainted. For example, when the program counter has been tainted, this might indicate that the control-flow has been maliciously diverted. The problem is that you need to keep the tag per data block. A 4-bit tag for each 32-bit block already means that 12.5% of your memory is occupied by the tags themselves.

Chen et al. [13] implemented DIFT in a separate co-processor like FIXER in Section 3.3.1. By monitoring the program counter in parallel for being tainted, control-flow hijacking was mitigated. The OS must place the required metadata in the CSRs. Evaluation was conducted with a Rocket core on an Artix-7 35T Arty FPGA and the area overhead consisted of 1.9% (290) in LUTs. PAGURUS [55] is also implemented off-chip, as it aims to protect accelerators in heterogeneous SoCs. The metadata is placed in dedicated hardware storage. Effectiveness was demonstrated by detecting a code pointer being overwritten by a custom accelerator connected with a RI5CY core on a Virtex-7 V2000T FPGA, requiring 1600 LUTs. Palmiero et al. [54] implemented DIFT to check for taint in load addresses, store addresses, and the program counter. This approach successfully mitigated buffer overflows [75] and format-string attacks [67], thus allowing to enforce DFI. The metadata is stored in CSRs by the OS. The RI5CY core was extended with 0.82% in the number of LUTs on a Zynq-7000 Zedboard FPGA. Because the tags are processed in parallel, no runtime overhead is imposed.

3.3.4 Crypto-Based Measures.

With DSR, as discussed in Section 3.2, encryption was used to obfuscate the representation of values of interest. The following three implementations employ either hashing or encryption to enforce CFI by ensuring that only valid control-flow transitions, respectively, have the correct hash or can successfully be decrypted at runtime.

Zipper [42] enforces backward-edge protection by computing the Keccak hash value of the return addresses in a chained structure. The Rocket core was used in the evaluation, and as that core only uses a 40-bit address space, the upper 24 bits of the return address register are used to store the upper 24 bits of the hash value. The hash value itself is computed by combining the latest return address with the previous hash value. This design ensures that even if the key is leaked, overwriting the hash value is still not possible. Compiler support was added to insert new instructions for calculating and verifying the hash value, although precompiled binaries could also be protected by simply adding these instructions to all function calls and returns. Evaluation with a Rocket core on a Virtex-7 FPGA VC707 evaluation board revealed a runtime overhead of 1.86% with SPEC CPU2000. The hash module consisted of 793 LUTs. Its effectiveness was validated with custom test cases.

Li et al. [43] enforce forward-edge protection by encrypting the first instruction of all valid destinations. Before execution of a program, which needs to be signalled by the OS, each first instruction of a call-site is XOR-ed by the CPU with the response of a PUF to establish forward-edge protection. While this ensures that every device has a unique key, the defense of the entire device is broken when the key is leaked. No support is required from the compiler or the kernel, as the protection is transparently applied solely by the processor. For backward-edge protection, the return addresses are hashed with a tx3tx3tRL-inspired hash module [35] that also uses the PUF response as the key, but it is not cryptographically secure. Evaluation was conducted with a RI5CY core. Its effectiveness was validated using the RIPE benchmark and the runtime overhead was 1.3% with MiBench.

Werner et al. [74] employ a stronger approach than Li et al. [43] by using a sponge-based authenticated encryption scheme with a 64-bit sponge state and the PRINCE block cipher [9]. As the internal state for authentication and encryption is shared, the decryption of code depends on all preceding instructions. Therefore, during compilation and encryption of the program, valid paths in the code are identified and only those encrypted. Because there is not a single path in the code, e.g., when a function is called from two distinct places, the binary is instrumented with patching information that is used during decryption to support multiple flows. Evaluation was conducted with a RI5CY core, where the area occupation was increased by 32% (\(28.8 \,{\rm kGE}\)) after synthesis with UMC’s \(65 \,{\rm nm}\) library. An average increase of 19.8% in code size and 9.1% in runtime was observed with several custom tests.

3.3.5 Runtime Attestation.

The concept of runtime attestation was introduced in RISC-V by LO-FAT [24]. Its protocol consists of a verifier \(\mathcal {V}\) that wants to attest a program’s execution path on a remote prover \(\mathcal {P}\). For all encountered control-flow related transitions, \(\mathcal {P}\) computes a 512-bit SHA-3 hash. This happens in parallel with the execution of the program, and hence does not incur any runtime overhead. To defend against the corruption of loop variables, which is a type of data-oriented attack, \(\mathcal {P}\) also tracks branch instructions and stores information about detected loops. However, this incurs two resp. five extra cycles. The hash value, together with a signing key and a nonce, form an attestation report that is sent to \(\mathcal {V}\). Finally, \(\mathcal {V}\) uses the CFG of the program to verify whether the report constitutes for a legitimate execution path. RI5CY core. On a Zynq-7000 Zedboard FPGA, around 2000 LUTs extra were in use. Effectiveness was analyzed in a simulated RIC5Y core with the Open Syringe Pump, which was shown to incur no overhead on the runtime.

In order to thwart more advanced DOP attacks, LiteHAX [23] implements runtime attestation with the addition of data-flow attestation. \(\mathcal {P}\) still reports control-flow related transitions, but the dedicated loop handling is replaced by computing the hash value of the memory reads and writes in order. An intermediate attestation report is sent when the buffer is full to \(\mathcal {V}\), which performs a dynamic, context-sensitive analysis of the execution path to verify that it is legitimate. Data-flow analysis with symbolic execution is performed to retrieve the allowed memory accesses each instruction may perform. This is only conducted within the current execution path to reduce the computational overhead. The report can then be used to verify that each read or write was valid. For \(\mathcal {P}\), a zero performance penalty is incurred, and the area increase on a Zynq-7000 Zedboard FPGA is limited to around 1600 LUTs. Effectiveness was analyzed in simulation also with the Open Syringe Pump, utilizing 35 BRAMs of \(36 \,{\rm Kb}\). No performance impact was incurred.

3.3.6 Data-Flow Graphs.

HDFI [62] enforces DFI by making use of tagged memory. The 1-bit tag is only set for sensitive memory areas, such as the storage locations of pointers. Static analysis reveals which instructions are allowed to write to these locations. This way, it could be interpreted as if a data-flow graph (DFG) is generated, similar to how a CFG is used to enforce CFI. Only when legitimate instructions write to sensitive memory, the tag is retained. Thus, when a sensitive value is used and its tag is set, the last write to it must have corrupted the data. Similar to PHMon, not all data is immediately protected, but security applications can be written to define what data should be protected and how. The OS allocates memory for the metadata to be stored in. Evaluation was conducted with a Rocket chip on a Zynq-7000 ZC706 FPGA board. Without security applications activated, a runtime overhead of 0.98% was observed with SPEC CINT 2000. Heartbleed, RIPE test cases, and several custom attacks were successfully mitigated. However, the authors state that the data-flow analysis might not be precise enough to thwart all attacks, and hence recommend to couple it with memory safety enforcement for full protection. The source code has been made publicly available [63].

TMDFI [46] also generates a DFG to enforce DFI, but extends on HDFI by making use of 8-bit tags. This allowed more fine-grained policies to be set up by supporting up to 255 data flows, while HDFI is only capable of differentiating between sensitive and regular memory. When an instruction writes to a memory location, the tag of the instruction is propagated to the location in memory. When this memory is used, a dedicated co-processor verifies whether the tag of the last write is allowed by the computed DFG. The tags themselves are stored in main memory. With a subset of the RISC-V test cases, an average runtime overhead of 39.34% was observed in simulation. The implementation has been made open source [45].

3.3.7 Discussion.

The reported evaluation metrics have been summarized in Table 3. We found 14 detection methods, making this area more popular than the 10 access validation policies reported in Section 3.1 and significantly more than the 3 obfuscation methods in Section 3.2. Half of them focused on enforcing CFI, and the others on DFI. Especially the CFI policies implemented the protection transparently for the program, or at least provided automated means to enable the protection. A majority of the policies also required support from the OS, but all works still allowed legacy binaries to be executed. Three works had their source publicly available. Both the Rocket and the RI5CY core seem on par in terms of usage in this area, but a wide variety of benchmarks was again employed. Hardware evaluation was usually conducted to report on FPGA LUT usage, with only PHMon and Werner et al. [74] providing estimates for ASIC post-synthesis area occupancy.

Table 3.

								Software			FPGA			ASIC
	Security coverage\(^{\rm I}\)	Transparent	Fully automated	OS-independent	Legacy support	Open source	RISC-V core	Benchmark	Binary size	Runtime	Technology	Area (LUTs)	Power (mW)	Technology	Area	Power (mW)
3.3.1 Control-flow graphs
CCFI-Cache [15]	CFI: ●	○	●	○	●	○	PicoRV32	Custom	9–30%	2–63%	Artix-7 Nexys4 DDR	11.3%	-	-	-	-
FIXER [18, 19]	CFI: ●	○	○	●	●	○	Rocket	RISC-V tests	-	1.5%	Zynq	2.9%	-	-	-	-
3.3.2 Data shadowing
PHMon [22]	CFI: ◐	●	●	○	●	●	Rocket	SPEC & MiBench	0%	0.9%	Zynq-7000 Zedboard	16%	-	NanGate \(45 \,{\rm nm}\)	13.5% (\(352 \,{\rm kGE}\))\(^{\rm II}\)	5%
TrustFlow-X [11]	DFI: ◐	○	◐	○	●	○	Rocket	CoreMark	0%	<1%	Artix-7 35T	1.0% (118)	-	-	-	-
3.3.3 Dynamic information flow tracking
Chen et al. [13]	DFI: ◐	○	○	○	●	○	Rocket	Custom	-	0%	Artix-7 35T	1.9% (290)	-	-	-	-
PAGURUS [55]	DFI: ◐	○	○	●	●	○	RI5CY	-	-	-	Virtex-7 V2000T	1.5k	-	-	-	-
Palmiero et al. [54]	DFI: ◐	○	○	○	●	○	RI5CY	-	-	-	Zynq-7000 Zedboard	0.8%	-	-	-	-
3.3.4 Crypto-based measures
Zipper [42]	CFI: ◐	◐	●	●	●	○	Rocket	SPEC’00	-	1.9%	Virtex-7 VC707	0.8k	-	-	-	-
Li et al. [43]	CFI: ●	●	●	○	●	○	RI5CY	MiBench	-	1.3%	-	-	-	-	-	-
Werner et al. [74]	CFI: ●	○	●	●	●	○	RI5CY	Custom	19.8%	9.1%	-	-	-	UMC \(65 \,{\rm nm}\)	32% (\(29 \,{\rm kGE}\))	-
3.3.5 Runtime attestation
LO-FAT [24]	CFI: ●	●	●	○	●	○	RI5CY	OpenSy- ringePump	0%	0%	Zynq-7000 Zedboard	2k	-	-	-	-
LiteHAX [23]	DFI: ●	○	●	○	●	○	RI5CY	Custom	0%	0%	Zynq-7000 Zedboard	1.6k	-	-	-	-
3.3.6 Data-flow graphs
HDFI [62]	DFI: ◐	○	○	○	●	●	Rocket	SPEC’00	-	1.0%	-	-	-	-	-	-
TMDFI [46]	DFI: ◐	○	○	●	●	●	Rocket	RISC-V tests	-	39%	-	-	-	-	-	-

Table 3. Comparison of RISC-V Policies for Enforcing CFI and DFI

\(^{\rm I}\)Indicates whether either ASR or DSR is fully enforced, or only for selected use cases.

\(^{\rm II}\)Converted to GE from what was originally reported.

We consider the same parameters and report the data in a similar way as in Table 1.

In the area of CFI, PHMon’s shadow stack and Zipper were the only solutions to solely implement backward-edge protection. For the other policies, forward-edge protection mainly involved a static approach by inferring a CFG at compile-time and enforcing it at runtime. This still leaves room for a specific type of control-flow hijacking. Consider, for example, Listing 3, where the function pointer panel can either point to admin_panel or user_panel, depending on whether the correct password is entered. As the length of the input for name is not checked, an attacker could overflow the buffer and use that to overwrite panel with the location of admin_panel. A CFG would not classify this as malicious behavior, as there is a valid execution possible where panel can point to admin_panel.

Only the DIFT implementations, of which Chen et al. [13] and Piccolboni et al. [55] focused on enforcing CFI, are capable of identifying this as an attack, because overwriting panel will taint the pointer. When the call is executed, the program counter will also become tainted, and hence checking the program counter for taint would suffice to thwart this attack. Moreover, a major benefit of the DIFT implementations is that the taint tags can be processed in parallel with the execution. Nevertheless, concerns have been raised whether DIFT’s propagation rules are capable of completely preventing memory corruption without incurring false-positives [60, 61]. Additionally, it can be expensive to store a tag for each memory block, especially when caches are involved that require aligned accesses. For instance, TMDFI keeps an 8-bit tag for each 64-bit block, which means that 12.5% of the memory becomes occupied by metadata. This is also the reason why an implementation such as that of TrustFlow-X, where a copy of sensitive data is stored, cannot be used to completely thwart all memory corruption.

An interesting approach is the remote attestation implementations of LO-FAT and LiteHAX. LiteHAX’s approach of periodically sending an attestation report of the current execution to an external verifier incurs no overhead, and allows the verification of the execution to be offload to a more powerful machine. Both works were evaluated on the same FPGA as HardScope, and we see that the remote attestation works occupy much less LUTs (around 2k) than HardScope (around 31k). However, the frequency at which the reports are sent must not be too short, and it must not take too long before the device learns that malicious execution was detected. The authors of Morpheus [28] gave an analysis of what time an attacker would need to break their solution. We would welcome such an analysis for outrunning the remote attestation reports.

All in all, while we found many solutions for enforcing CFI and DFI, the majority of these implementations do not aim to fully thwart data-oriented attacks. For those that do, we can distinguish between solutions based on memory tagging, such as DIFT, and remote attestation. The main question with memory tagging is what the ideal tag size would be. Already 8-bit tags incurred a 39% runtime overhead in TMDFI, which does not seem promising, especially considering the fact that a software-based solution already incurs a similar overhead. Besides that, HDFI admits that for programs with complicated data flow, the analysis conducted here might be insufficient to detect an attack there. On the other hand, remote attestation might detect an attack too late, allowing an attacker to take over the system before the verifier’s message comes in. The most promising solutions are in the area of CFI, where often the protection is already implemented in a way transparent to the program. Further research is required to better compare these solutions on software- and hardware-based metrics and more examples of data-oriented attacks to improve the comparison of the security performance of DFI implementations.

4 Related Work

The issue of securing embedded systems is multifaceted and several aspects have already been reviewed in previous literature. In this section, we discuss the main surveys on related topics. Previous literature mostly focuses on other architectures, as RISC-V is relatively new. One exception is the work of Lu [47] that provides a broad overview of the different security components in a RISC-V processor, ranging from cryptographic accelerators to protection against physical attacks. Because of this, we also discuss mitigations in other architectures and provide an overview on security measures for other threat models.

Surveys on memory corruption. Szekeres et al. [66] provide a comprehensive overview of memory corruption attacks and defenses. While we use this taxonomy in our work, we distinguish ourselves in the fact that we take a novel approach by focusing on defenses with a foundation in hardware. Larsen et al. [40] target a different branch of software mitigations by exploring the field of diversification. In short, the idea is to randomize several implementation aspects of a program on each device to minimize the impact of a single vulnerability. Additionally, Song et al. [64] survey sanitizers that dynamically analyze a specific execution of a program to identify bugs before the actual deployment. Lastly, Wang et al. [71] specifically target data-oriented attacks under the name program-data attacks. Although the authors also discuss hardware-based measures such as HardScope and HFDI, their focus is still primarily on a software-oriented approach. de Clercq and Verbauwhede [20] dicuss hardware-based CFI implementations, but do not consider data-oriented attacks which is part of our work.

Mitigation in ISAs other than RISC-V. Intel introduced Memory Protection Extensions (MPX) to provide spatial safety, but the runtime overhead was substantial and did not compare well with state-of-the-art software-based approaches [53]. Nowadays, MPX is no longer supported [36]. Intel’s Control-flow Enforcement Technology (CET) to mitigate control-flow hijacking in hardware seems more promising [37]. The extension consists of a shadow stack for backward-edge protection and a new ENDBRACH instruction which is placed near valid targets of indirect calls and jumps to enforce forward-edge protection. Platforms without support for CET interpret this instruction as a NOP, which enables legacy platforms to properly execute binaries compiled with CET support. However, CET-enabled Intel hardware is yet to be released [10]. Meanwhile, ARMv8.3-A adds support for Pointer Authentication, which uses a fat-pointer inspired approach to achieve memory safety [44]. This concept was generalized with the introduction of memory tagging in ARMv8.5-A. SPARC supports memory tagging out-of-the-box [59].

Other attack models. Ravi et al. [56] summarized the threats for embedded systems and lists, besides what we address in our survey, physical attacks and protocol weaknesses. The former is also discussed by the work of Batina et al. [6], which covers several hardware-based defenses. The study contains two other areas of interest. Firstly, the authors review Trusted Execution Environments (TEEs) that provide secure inter-process isolation. Secondly, the authors also discuss Spectre- and Meltdown-like micro-architectural attacks. However, they did not cover memory corruption, which is the focus of our work. Previous work already covered TEEs [68, 79, 81] and micro-architectural attacks [12, 29] in depth.

5 Conclusions and Future Work

With the proliferation of embedded systems in daily life, it is of the upmost importance that such systems remain secure. Recent advancements, amongst others the rise of open hardware with RISC-V, have enabled the community to validate architectural extensions for security in actual hardware. In this survey, we have reviewed the state-of-the-art in defenses against memory corruption in software that have been implemented in a RISC-V core. The countermeasures have been categorized as either validating memory accesses, obfuscating information necessary to launch an attack, or detecting whether data has been corrupted previously. Purely software-based approaches are known for inducing high performance impacts and potentially raising binary incompatibility issues. The data reported in literature shows that a hardware-assisted approach has the potential to improve on runtime overhead while keeping the increase in circuit area to a minimum, making this approach viable for embedded systems that are known to be subject to constrained environments. Still, full protection is only possible when information available at compile-time is taken into consideration, and therefore the issue of binary compatibility remains unsolved. Besides our observation that there is still room for improvement for the effectiveness of the reviewed countermeasures, especially with respect to how we can protect against fine-grained data-oriented attacks, we identified the following directions for future research.

Effectiveness of DIFT. With information flow tracking, the question was raised whether this could fully mitigate all data-oriented attacks while not incurring false positives, especially in the case of programs with overly complicated data flows. For future research, we recommend to investigate if it could be worthwhile to use DIFT in conjunction with other methods to provide full protection. An example could be to enforce CFI with DIFT and thwart data-oriented attacks using data-flow attestation.

Beyond simple execution environments. Various implementations only evaluated programs running within a simple execution context, e.g., inside a baremetal environment without an OS available. This raises the question how these solutions perform in a multi-threaded and/or multi-process environment. For example, as each thread needs its own distinct stack within the same process, a shadow stack must be added for each hardware thread available. Additionally, to which extent realtime deadlines in critical systems can still be met with these protections enabled has not been evaluated yet. Future research is required to investigate the most efficient protection in more advanced execution environments.

Integration into TEEs. Even though TEEs provide inter-process isolation, it remains important that memory corruption vulnerabilities are mitigated within these execution environments. Keystone [41] is an example of a RISC-V-based TEE that does not require changes to the hardware. It would be interesting to explore how Keystone could be enhanced with a hardware-assisted approach to ensure that memory corruption cannot occur within the isolated processes.

Post-detection policies. When a memory corruption attack has successfully been detected, the next question becomes what to do afterwards. Most implementations have chosen to generate an exception and abort execution of the program. However, for example, in the case of a server that needs to stay alive, an attacker could leverage this to launch a denial-of-service (DoS) attack as the server is constantly brought down and restarted. This could be addressed by having the hardware automatically investigate the root cause of the vulnerability, for example, by using a runtime attestation report that captures all relevant events. An interesting idea to explore would be to have the device automatically fix the vulnerability to make sure it does not happen again. This way, computer systems become more resilient towards software attacks.

Evaluation harmonization. In this survey, we focused on evaluating the overhead imposed by the hardware-assisted defenses in terms of code size, runtime, area, and power. The main difficulty we encountered was that various evaluation base platforms were used, which does not easily allow us to directly compare the results with each other. If all implementations would make their source code publicly accessible, then we could adopt the approach of Dörflinger et al. [25] of using the same evaluation procedure to benchmark various RISC-V processors. This also allows us to report on metrics that were missing in the evaluation, which was usually the case for code size and power overhead. Moreover, when more advanced data-oriented attacks surface, we can easily check whether current solutions are still able to thwart them or not.

References

[1]

Martín Abadi, Mihai Budiu, Úlfar Erlingsson, and Jay Ligatti. 2005. Control-flow integrity. In Proceedings of the 12th ACM Conference on Computer and Communications Security. DOI:

Abstract

1 Introduction

2 Preliminaries

2.1 RISC-V

2.2 Memory Corruption

2.2.1 Control-Flow Hijack.

2.2.2 Data-Oriented Attack.

2.2.3 Information Leak.

2.3 Threat Model

2.4 Evaluation Criteria

2.4.1 Qualitative Metrics.

2.4.2 Software-Based Metrics.

2.4.3 Hardware-Based Metrics.

3 RISC-V Defenses

3.1 Access Validation

3.1.1 Fat Pointers.

3.1.2 Segmentation.

3.1.3 Guarded Buffers.

3.1.4 Hardware Monitoring.

3.1.5 Discussion.

3.2 Information Obfuscation

3.2.1 Address Space Randomization.

3.2.2 Data Space Randomization.

3.2.3 Discussion.

3.3 Corruption Detection

3.3.1 Control-Flow Graphs.

3.3.2 Data Shadowing.

3.3.3 Dynamic Information Flow Tracking.

3.3.4 Crypto-Based Measures.

3.3.5 Runtime Attestation.

3.3.6 Data-Flow Graphs.

3.3.7 Discussion.

4 Related Work

5 Conclusions and Future Work

References

Cited By

Index Terms

Recommendations

An Overview of Prevention/Mitigation against Memory Corruption Attack

Dynamic memory access monitoring based on tagged memory

SEnFuzzer: Detecting SGX Memory Corruption via Information Feedback and Tailored Interface Analysis

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations