Our last line of defense against control-flow hijacking and data-oriented attacks consists of techniques that aim to determine upon usage of a particular value in memory whether it has been corrupted previously [
66]. Policies enforcing
control-flow integrity (
CFI) aim to detect corrupted code pointers, which can be further divided into
backward-edge protection for return addresses and
forward-edge protection for the destination of indirect calls and jumps. Likewise, with the advent of data-oriented attacks, the scope was broadened to address corruption in any kind of data variable with
data-flow integrity (
DFI). While DFI is a stronger policy than CFI, it can be a costly exercise. The software-based approach of DOPdefenderPlus [
70] increased the code size with 18.1% and the runtime with 32.9% using unknown x86 hardware running SPEC CPU2006.
3.3.1 Control-Flow Graphs.
As stated earlier, the first policy to enforce CFI was to statically compute a CFG and enforce it at runtime. In the case of CCFI-Cache [
15], this is done with the following two components. Firstly, a cache module contains the valid source-destination pairs that are statically determined after compilation. This metadata is also hashed to ensure its integrity. The checker then uses this cache at runtime to verify that the control flow has not been diverted. Backward-edge protection is guaranteed with a shadow stack. An interrupt is raised when a violation is detected, which puts the OS in charge with processing this event. Evaluation with PicoRV32 on a Nexys 4 DDR Artix-7 FPGA incurred an increase of 11.3% (1250) in LUTs. A runtime overhead between 2% and 63% and a code size overhead between 9% and 30% were measured with custom test cases.
FIXER [
18,
19] uses a similar approach as CCFI-Cache, but implements this in a dedicated security co-processor. The main benefit of this approach is that the main core does not have to be modified, which makes it easier to be integrated in existing SoC architectures. Each component, such as the shadow stack, is implemented as a separate security module. However, since there is only one, multi-processor environments do require more effort to be supported. Additionally, FIXER requires source-code annotations, while CCFI-Cache does not. Coupled with a Rocket core on a Zynq FPGA, the security module occupies an additional 2.9% in area. A runtime overhead of 1.5% was measured with the RISC-V tests.
3.3.2 Data Shadowing.
One of the implemented use cases of PHMon, the programmable hardware monitor discussed in Section
3.1.4, was a
shadow stack. It relies on a protected memory area allocated by the OS where a copy of the return address is stored when a function is called. After the call is finished, the return address on the stack is compared with the copy to verify its integrity, thus solely enforcing backward-edge protection. Nevertheless, recompilation is not required for the shadow stack to be enabled. The average runtime overhead measured with Rocket running MiBench, SPECint2000 and SPECint2006 on the Zynq-7000 Zedboard FPGA was 0.9%. On ASIC after synthesis with NanGate’s
\(45 \,{\rm nm}\) library, a 5% power overhead and a 13.5% area overhead were measured, while on FPGA 16% more LUTs were utilized. Effectiveness was demonstrated using custom test cases.
TrustFlow-X [
11] generalizes the concept of a shadow stack by supplementing the ISA with a generic interface for handling sensitive data. When memory is being stored using this interface, TrustFlow-X automatically stores a copy of it in trusted memory outside of the control of the attacker. When this memory is loaded again, TrustFlow-X verifies whether the output from memory matches the corresponding copy in trusted memory. Compiler extensions are provided that can automatically detect and instrument sensitive code pointers for full CFI, but the programmer can also manually indicate to protect application-speicifc sensitive data, such as buffers holding encryption keys. Although TrustFlow-X is able to repair corrupted data, the size of the trusted memory defines the maximum amount of data that can be protected, and this must be determined prior to manufacturing the device. Only static data is supported, which means that dynamically allocated data is out of scope, and no support for Linux was developed. Its effectiveness was demonstrated using a Rocket core on an Artix-7 35T Arty FPGA with the RIPE benchmark suite. Negligible runtime overhead was measured with CoreMark, while the binary size did not grow as the regular load/store instructions were replaced with those of TrustFlow-X. Excluding the trusted memory, the number of occupied LUTs increased with 1.03% (118).
3.3.3 Dynamic Information Flow Tracking.
Originally proposed by Suh et al. [
65],
dynamic information flow tracking (
DIFT) does not focus on instructions that illegitimately modify memory, but rather on how potentially malicious input is used during the execution of a program. It does so by tagging—or
tainting—the memory that contains the untrusted input. When tainted memory is used for a computation, the taint is propagated to the location where the result is stored. An implementation is free to decide under which conditions the tag is propagated. The final step consists of detecting whether sensitive data has been tainted. For example, when the program counter has been tainted, this might indicate that the control-flow has been maliciously diverted. The problem is that you need to keep the tag per data block. A 4-bit tag for each 32-bit block already means that 12.5% of your memory is occupied by the tags themselves.
Chen et al. [
13] implemented DIFT in a separate co-processor like FIXER in Section
3.3.1. By monitoring the program counter in parallel for being tainted, control-flow hijacking was mitigated. The OS must place the required metadata in the CSRs. Evaluation was conducted with a Rocket core on an Artix-7 35T Arty FPGA and the area overhead consisted of 1.9% (290) in LUTs. PAGURUS [
55] is also implemented off-chip, as it aims to protect accelerators in heterogeneous SoCs. The metadata is placed in dedicated hardware storage. Effectiveness was demonstrated by detecting a code pointer being overwritten by a custom accelerator connected with a RI5CY core on a Virtex-7 V2000T FPGA, requiring 1600 LUTs. Palmiero et al. [
54] implemented DIFT to check for taint in load addresses, store addresses, and the program counter. This approach successfully mitigated buffer overflows [
75] and format-string attacks [
67], thus allowing to enforce DFI. The metadata is stored in CSRs by the OS. The RI5CY core was extended with 0.82% in the number of LUTs on a Zynq-7000 Zedboard FPGA. Because the tags are processed in parallel, no runtime overhead is imposed.
3.3.4 Crypto-Based Measures.
With DSR, as discussed in Section
3.2, encryption was used to obfuscate the representation of values of interest. The following three implementations employ either hashing or encryption to enforce CFI by ensuring that only valid control-flow transitions, respectively, have the correct hash or can successfully be decrypted at runtime.
Zipper [
42] enforces backward-edge protection by computing the Keccak hash value of the return addresses in a chained structure. The Rocket core was used in the evaluation, and as that core only uses a 40-bit address space, the upper 24 bits of the return address register are used to store the upper 24 bits of the hash value. The hash value itself is computed by combining the latest return address with the previous hash value. This design ensures that even if the key is leaked, overwriting the hash value is still not possible. Compiler support was added to insert new instructions for calculating and verifying the hash value, although precompiled binaries could also be protected by simply adding these instructions to all function calls and returns. Evaluation with a Rocket core on a Virtex-7 FPGA VC707 evaluation board revealed a runtime overhead of 1.86% with SPEC CPU2000. The hash module consisted of 793 LUTs. Its effectiveness was validated with custom test cases.
Li et al. [
43] enforce forward-edge protection by encrypting the first instruction of all valid destinations. Before execution of a program, which needs to be signalled by the OS, each first instruction of a call-site is XOR-ed by the CPU with the response of a PUF to establish forward-edge protection. While this ensures that every device has a unique key, the defense of the entire device is broken when the key is leaked. No support is required from the compiler or the kernel, as the protection is transparently applied solely by the processor. For backward-edge protection, the return addresses are hashed with a
tx3tx3tRL-inspired hash module [
35] that also uses the PUF response as the key, but it is not cryptographically secure. Evaluation was conducted with a RI5CY core. Its effectiveness was validated using the RIPE benchmark and the runtime overhead was 1.3% with MiBench.
Werner et al. [
74] employ a stronger approach than Li et al. [
43] by using a
sponge-based authenticated encryption scheme with a 64-bit sponge state and the PRINCE block cipher [
9]. As the internal state for authentication and encryption is shared, the decryption of code depends on all preceding instructions. Therefore, during compilation and encryption of the program, valid paths in the code are identified and only those encrypted. Because there is not a single path in the code, e.g., when a function is called from two distinct places, the binary is instrumented with patching information that is used during decryption to support multiple flows. Evaluation was conducted with a RI5CY core, where the area occupation was increased by 32% (
\(28.8 \,{\rm kGE}\)) after synthesis with UMC’s
\(65 \,{\rm nm}\) library. An average increase of 19.8% in code size and 9.1% in runtime was observed with several custom tests.
3.3.5 Runtime Attestation.
The concept of
runtime attestation was introduced in RISC-V by LO-FAT [
24]. Its protocol consists of a verifier
\(\mathcal {V}\) that wants to attest a program’s execution path on a remote prover
\(\mathcal {P}\). For all encountered control-flow related transitions,
\(\mathcal {P}\) computes a 512-bit SHA-3 hash. This happens in parallel with the execution of the program, and hence does not incur any runtime overhead. To defend against the corruption of loop variables, which is a type of data-oriented attack,
\(\mathcal {P}\) also tracks branch instructions and stores information about detected loops. However, this incurs two resp. five extra cycles. The hash value, together with a signing key and a nonce, form an attestation report that is sent to
\(\mathcal {V}\). Finally,
\(\mathcal {V}\) uses the CFG of the program to verify whether the report constitutes for a legitimate execution path. RI5CY core. On a Zynq-7000 Zedboard FPGA, around 2000 LUTs extra were in use. Effectiveness was analyzed in a simulated RIC5Y core with the Open Syringe Pump, which was shown to incur no overhead on the runtime.
In order to thwart more advanced DOP attacks, LiteHAX [
23] implements runtime attestation with the addition of
data-flow attestation.
\(\mathcal {P}\) still reports control-flow related transitions, but the dedicated loop handling is replaced by computing the hash value of the memory reads and writes in order. An intermediate attestation report is sent when the buffer is full to
\(\mathcal {V}\), which performs a dynamic, context-sensitive analysis of the execution path to verify that it is legitimate. Data-flow analysis with symbolic execution is performed to retrieve the allowed memory accesses each instruction may perform. This is only conducted within the current execution path to reduce the computational overhead. The report can then be used to verify that each read or write was valid. For
\(\mathcal {P}\), a zero performance penalty is incurred, and the area increase on a Zynq-7000 Zedboard FPGA is limited to around 1600 LUTs. Effectiveness was analyzed in simulation also with the Open Syringe Pump, utilizing 35 BRAMs of
\(36 \,{\rm Kb}\). No performance impact was incurred.
3.3.6 Data-Flow Graphs.
HDFI [
62] enforces DFI by making use of
tagged memory. The 1-bit tag is only set for sensitive memory areas, such as the storage locations of pointers. Static analysis reveals which instructions are allowed to write to these locations. This way, it could be interpreted as if a
data-flow graph (
DFG) is generated, similar to how a CFG is used to enforce CFI. Only when legitimate instructions write to sensitive memory, the tag is retained. Thus, when a sensitive value is used and its tag is set, the last write to it must have corrupted the data. Similar to PHMon, not all data is immediately protected, but security applications can be written to define what data should be protected and how. The OS allocates memory for the metadata to be stored in. Evaluation was conducted with a Rocket chip on a Zynq-7000 ZC706 FPGA board. Without security applications activated, a runtime overhead of 0.98% was observed with SPEC CINT 2000. Heartbleed, RIPE test cases, and several custom attacks were successfully mitigated. However, the authors state that the data-flow analysis might not be precise enough to thwart all attacks, and hence recommend to couple it with memory safety enforcement for full protection. The source code has been made publicly available [
63].
TMDFI [
46] also generates a DFG to enforce DFI, but extends on HDFI by making use of 8-bit tags. This allowed more fine-grained policies to be set up by supporting up to 255 data flows, while HDFI is only capable of differentiating between sensitive and regular memory. When an instruction writes to a memory location, the tag of the instruction is propagated to the location in memory. When this memory is used, a dedicated co-processor verifies whether the tag of the last write is allowed by the computed DFG. The tags themselves are stored in main memory. With a subset of the RISC-V test cases, an average runtime overhead of 39.34% was observed in simulation. The implementation has been made open source [
45].
3.3.7 Discussion.
The reported evaluation metrics have been summarized in Table
3. We found 14 detection methods, making this area more popular than the 10 access validation policies reported in Section
3.1 and significantly more than the 3 obfuscation methods in Section
3.2. Half of them focused on enforcing CFI, and the others on DFI. Especially the CFI policies implemented the protection transparently for the program, or at least provided automated means to enable the protection. A majority of the policies also required support from the OS, but all works still allowed legacy binaries to be executed. Three works had their source publicly available. Both the Rocket and the RI5CY core seem on par in terms of usage in this area, but a wide variety of benchmarks was again employed. Hardware evaluation was usually conducted to report on FPGA LUT usage, with only PHMon and Werner et al. [
74] providing estimates for ASIC post-synthesis area occupancy.
In the area of CFI, PHMon’s shadow stack and Zipper were the only solutions to solely implement backward-edge protection. For the other policies, forward-edge protection mainly involved a static approach by inferring a CFG at compile-time and enforcing it at runtime. This still leaves room for a specific type of control-flow hijacking. Consider, for example, Listing 3, where the function pointer panel can either point to admin_panel or user_panel, depending on whether the correct password is entered. As the length of the input for name is not checked, an attacker could overflow the buffer and use that to overwrite panel with the location of admin_panel. A CFG would not classify this as malicious behavior, as there is a valid execution possible where panel can point to admin_panel.
Only the DIFT implementations, of which Chen et al. [
13] and Piccolboni et al. [
55] focused on enforcing CFI, are capable of identifying this as an attack, because overwriting
panel will taint the pointer. When the call is executed, the program counter will also become tainted, and hence checking the program counter for taint would suffice to thwart this attack. Moreover, a major benefit of the DIFT implementations is that the taint tags can be processed in parallel with the execution. Nevertheless, concerns have been raised whether DIFT’s propagation rules are capable of completely preventing memory corruption without incurring false-positives [
60,
61]. Additionally, it can be expensive to store a tag for each memory block, especially when caches are involved that require aligned accesses. For instance, TMDFI keeps an 8-bit tag for each 64-bit block, which means that 12.5% of the memory becomes occupied by metadata. This is also the reason why an implementation such as that of TrustFlow-X, where a copy of sensitive data is stored, cannot be used to completely thwart all memory corruption.
An interesting approach is the remote attestation implementations of LO-FAT and LiteHAX. LiteHAX’s approach of periodically sending an attestation report of the current execution to an external verifier incurs no overhead, and allows the verification of the execution to be offload to a more powerful machine. Both works were evaluated on the same FPGA as HardScope, and we see that the remote attestation works occupy much less LUTs (around 2k) than HardScope (around 31k). However, the frequency at which the reports are sent must not be too short, and it must not take too long before the device learns that malicious execution was detected. The authors of Morpheus [
28] gave an analysis of what time an attacker would need to break their solution. We would welcome such an analysis for outrunning the remote attestation reports.
All in all, while we found many solutions for enforcing CFI and DFI, the majority of these implementations do not aim to fully thwart data-oriented attacks. For those that do, we can distinguish between solutions based on memory tagging, such as DIFT, and remote attestation. The main question with memory tagging is what the ideal tag size would be. Already 8-bit tags incurred a 39% runtime overhead in TMDFI, which does not seem promising, especially considering the fact that a software-based solution already incurs a similar overhead. Besides that, HDFI admits that for programs with complicated data flow, the analysis conducted here might be insufficient to detect an attack there. On the other hand, remote attestation might detect an attack too late, allowing an attacker to take over the system before the verifier’s message comes in. The most promising solutions are in the area of CFI, where often the protection is already implemented in a way transparent to the program. Further research is required to better compare these solutions on software- and hardware-based metrics and more examples of data-oriented attacks to improve the comparison of the security performance of DFI implementations.