GoFetch is a microarchitectural side-channel attack
that can extract secret keys from
constant-time cryptographic implementations via
data memory-dependent prefetchers (DMPs).
We show that DMPs are present in many Apple CPUs
and pose a real threat to multiple cryptographic implementations,
allowing us to extract keys from OpenSSL Diffie-Hellman,
Go RSA, as well as CRYSTALS Kyber and Dilithium.
In follow-up work, we reverse engineered the semantics of the Intel DMP and demonstrated new techniques that can leak information even when invalid pointers are dereferenced by the DMP. For more information, see our paper “Peek-a-Walk: Leaking Secrets via Page Walk Side Channels” (paper linked here, code linked here)!
GoFetch won the Pwnie Award 2024 for Best Cryptographic Attack 🎆 !!!
A HID configuration bit (SYS_APL_HID11_EL1[30]) was found by Hector Martin (marcan) to disable DMPs on m1 and m2 CPUs. Setting this chicken bit requires kernel support that is not available in macOS at this time. See @marcan's post for further details (and thank you Hector!).
Contact us at [email protected]
The GoFetch attack is based on a CPU feature called data memory-dependent prefetcher (DMP), which is present in the latest Apple processors. We reverse-engineered DMPs on Apple m-series CPUs and found that the DMP activates (and attempts to dereference) data loaded from memory that "looks like" a pointer. This explicitly violates a requirement of the constant-time programming paradigm, which forbids mixing data and memory access patterns.
To exploit the DMP, we craft chosen inputs to cryptographic operations, in a way where pointer-like values only appear if we have correctly guessed some bits of the secret key. We verify these guesses by monitoring whether the DMP performs a dereference through cache-timing analysis. Once we make a correct guess, we proceed to guess the next batch of key bits. Using this approach, we show end-to-end key extraction attacks on popular constant-time implementations of classical (OpenSSL Diffie-Hellman Key Exchange, Go RSA decryption) and post-quantum cryptography (CRYSTALS-Kyber and CRYSTALS-Dilithium).
The Apple m-series DMP was first discovered by Augury, which suggested that DMPs might mix data and addresses under some conditions. However, we found that the DMP activation criteria outlined by Augury are overly restrictive. This prevents Augury's findings from being sufficient to mount attacks on real-world constant-time cryptography.
GoFetch shows that the DMP is significantly more aggressive than previously thought, and thus poses a much greater security risk. Specifically, we find that any value loaded from memory is a candidate for being dereferenced (literally!). This allows us to sidestep many of Augury's limitations and demonstrate end-to-end attacks on real constant-time code.
Constant-time programming is a paradigm that aims to harden code against side-channel attacks by ensuring that all operations take the same amount of time, regardless of their operands. In particular, constant-time code cannot contain secret-dependent branches, loops, or other control structures. Moreover, as the CPU caches different addresses with attacker-observable latency, constant-time code cannot mix data and addresses in any way and prohibits the use of secret-dependent memory accesses or array indices.
We show that even if a victim correctly separates data from addresses by following the constant-time paradigm, the DMP will generate secret-dependent memory access on the victim's behalf, resulting in variable-time code susceptible to our key-extraction attacks.
Yes, but only on some processors. We observe that the DIT bit set on m3 CPUs effectively disables the DMP. This is not the case for the m1 and m2. Also, Intel's counterpart, DOIT bit, can be used to disable DMP on the Raptor Lake processors.
Update (April 2024): A HID configuration bit (SYS_APL_HID11_EL1[30]) was found by Hector Martin (marcan) to disable DMPs on m1 and m2 CPUs. Setting this chicken bit requires kernel support that is not available in macOS at this time. See @marcan's post for further details (and thank you Hector!).
This work was partially supported by the Air Force Office of Scientific Research (AFOSR) under award number FA9550-20-1-0425; the Defense Advanced Research Projects Agency (DARPA) under contract numbers W912CG-23-C-0022 and HR00112390029; the National Science Foundation (NSF) under grant numbers 1954712, 1954521, 2154183, 2153388, and 1942888; the Alfred P. Sloan Research Fellowship; and gifts from Intel, Qualcomm, and Cisco.