Compiler correctness, in its simplest form, is defined as the inclusion of the set of traces of the compiled program in the set of traces of the original program. This is equivalent to the preservation of all trace properties. Here, traces collect, for instance, the externally observable events of each execution. However, this definition requires the set of traces of the source and target languages to be the same, which is not the case when the languages are far apart or when observations are fine-grained. To overcome this issue, we study a generalized compiler correctness definition, which uses source and target traces drawn from potentially different sets and connected by an arbitrary relation. We set out to understand what guarantees this generalized compiler correctness definition gives us when instantiated with a non-trivial relation on traces. When this trace relation is not equality, it is no longer possible to preserve the trace properties of the source program unchanged. Instead, we provide a generic characterization of the target trace property ensured by correctly compiling a program that satisfies a given source property, and dually, of the source trace property one is required to show to obtain a certain target property for the compiled code. We show that this view on compiler correctness can naturally account for undefined behavior, resource exhaustion, different source and target values, side channels, and various abstraction mismatches. Finally, we show that the same generalization also applies to many definitions of secure compilation, which characterize the protection of a compiled program linked against adversarial code.
1 Introduction
Compiler correctness is an old idea [46, 49, 50] that has seen a significant revival in recent times. This new wave was started by the creation of the CompCert verified C compiler [41] and continued by the proposal of many significant extensions and variants of CompCert [10, 11, 15, 29, 36, 37, 51, 67, 73, 76, 80] and the success of many other milestone compiler verification projects, including Vellvm [83], Pilsner [56], CakeML [77], and CertiCoq [5]. Verification through proof assistants allows the user of a compiler to trust the proofs without diving into all of the details. Still, to clearly understand the benefits and limitations of using a verified compiler, she has to deeply understand the statement of correctness. This is true not just for correct compilation, but also for secure compilation, which is the more recent idea that a compilation chain should not just provide correctness but also security against co-linked adversarial components [4, 32].
Basic Compiler Correctness. The gold standard for compiler correctness is semantic preservation, which intuitively says that the semantics of a compiled program (in the target language) is compatible with the semantics of the original program (in the source language). For practical verified compilers, such as CompCert [41] and CakeML [77], semantic preservation is stated extrinsically, by referring to traces. In these two settings, a trace is an ordered sequence of events—such as inputs from and outputs to an external environment—that are produced by the execution of a program.
A basic definition of compiler correctness can be given by the inclusion of the set of traces of the compiled program in the set of traces of the original program. Formally [41]:
This definition says that for any whole1 source program , if we compile it (denoted ), execute it in the semantics of the target language, and observe a trace , then the original can produce the same trace in the semantics of the source language. 2 This definition is simple and easy to understand, since it only references a few familiar concepts: a compiler between a source and a target language, each equipped with a trace-producing semantics (usually nondeterministic).
Beyond Basic Compiler Correctness. Definition 1.1 implicitly assumes that the source and target traces are drawn from the very same set, and requires that any target trace produced by a compiled program can be faithfully reproduced by the source program. In practice, existing verified compiler adopts a less restrictive formulation of compiler correctness:
CompCert [41] The original compiler correctness theorem of CompCert [41] can be seen as an instance of basic compiler correctness, but it does not provide any guarantees for programs that can exhibit undefined behavior [68]. As allowed by the C standard, such unsafe programs are not even considered to be in the source language, so are not quantified over. This has important practical implications, since undefined behavior often leads to exploitable security vulnerabilities [16, 30, 31] and serious confusion even among experienced C and C++ developers [40, 68, 78, 79]. As such, since 2010, CompCert provides an additional top-level correctness theorem3 that better accounts for the presence of unsafe programs by providing guarantees for them up to the point when they encounter undefined behavior [68]. This new theorem goes beyond the basic correctness definition above, as a target trace need only correspond to a source trace up to the occurrence of undefined behavior in the source trace.
CakeML [77] Compiler correctness for CakeML accounts for memory exhaustion in target executions. Crucially, memory exhaustion events cannot occur in source traces, only in target traces. Hence, dually to CompCert, compiler correctness only requires source and target traces to coincide up to the occurrence of a memory exhaustion event in the target trace.
Trace-relating Compiler Correctness. Generalized formalizations of compiler correctness like the ones above can be naturally expressed as instances of a uniform definition, which we call trace-relating compiler correctness. This generalizes basic compiler correctness by (a) considering that source and target traces belong to possibly distinct sets and , and (b) being parameterized by an arbitrary trace relation.
This definition requires that, for any target trace produced by the compiled program , there exists a source trace that can be produced by the original program and is related to according to (i.e., ). By choosing the trace relation appropriately, one can recover the different notions of compiler correctness presented above:
Basic CC Take to be . Trivially, the basic CC of Definition 1.1 is .
CompCert Undefined behavior is modeled in CompCert as a trace-terminating event that can occur in any of its languages (source, target, and all intermediate languages), so for a given phase (or composition thereof), we have . Nevertheless, the relation between source and target traces with which to instantiate CC to obtain CompCert’s current theorem is the following (note that we denote finite traces–or prefixes– as ):
A compiler satisfying CC for this trace relation can turn a source prefix ending in undefined behavior (where “” is concatenation) either into the same prefix in the target (first disjunct) or into a target trace that starts with the prefix but then continues arbitrarily (second disjunct, “” is the prefix relation).
CakeML Here, target traces are sequences of symbols from an alphabet that has a specific trace-terminating event, , which is not available in the source alphabet (i.e., . Then, the compiler correctness theorem of CakeML can be obtained by instantiating CC with the following relation:
The resulting CC instance relates a target trace ending in after executing prefix to a source trace that first produces and then continues in a way given by the semantics of the source program.
Beyond undefined behavior and resource exhaustion, there are many other practical uses for CC: In this article, we show that it also accounts for differences between source and target values, for a single source output being turned into a series of target outputs, and for side-channels.
On the flip side, the compiler correctness statement and its implications can be more difficult to understand for CC than for . The full implications of choosing a particular relation can be subtle. In fact, using a bad relation can make the compiler correctness statement trivial or unexpected. For instance, it should be easy to see that if one uses the total relation, which relates all source traces to all target ones, the CC property holds for every compiler, yet it might take one a bit more effort to understand that the same is true even for the following relation:
Reasoning about Trace Properties. To understand more about a particular CC instance, we propose to also look at how it preserves trace properties—defined as sets of allowed traces [39]—from the source to the target. For instance, it is well known that is equivalent to the preservation of all trace properties (where reads “ satisfies property ” and stands for ):
However, to the best of our knowledge, similar results have not been formulated for trace relations beyond equality, when it is no longer possible to preserve the trace properties of the source program unchanged. For trace-relating compiler correctness, where source and target traces can be drawn from different sets and related by an arbitrary trace relation, there are two crucial questions to ask:
(1)
For a source trace property of a program—established for instance by formal verification—what is the strongest target property that any CC compiler is guaranteed to ensure for the produced target program?
(2)
For a target trace property , what is the weakest source property we need to show of the original source program to obtain for the result of any CC compiler?
Far from being mere hypothetical questions, they can help the developer of a verified compiler better understand the compiler correctness theorem they are proving, and we expect that any user of such a compiler will need to ask either one or the other if they are to make use of that theorem. In this work, we provide a simple and natural answer to these questions, for any instance of CC. Building upon a bijection between relations and Galois connections [6, 26, 54], we observe that any trace relation corresponds to two property mappings and , which are functions mapping source properties to target ones ( standing for “to target”) and target properties to source ones ( standing for “to source”):
The existential image of , , answers the first question above by mapping a given source property to the target property that contains all target traces for which there exists a related source trace that satisfies . Dually, the universal image of , , answers the second question by mapping a given target property to the source property that contains all source traces for which all related target traces satisfy . We introduce two new correct compilation definitions in terms of trace property preservation():
•
TP quantifies over all source trace properties and uses to obtain the corresponding target properties;
•
TP quantifies over all target trace properties and uses to obtain the corresponding source properties.
We prove that these two definitions are equivalent to CC, yielding a novel trinitarian view of compiler correctness (Figure 1). Contributions.
Fig. 1.
•
We propose a new trinitarian view of compiler correctness that accounts for non-trivial relations between source and target traces. While, as discussed above, specific instances of the CC definition have already been used in practice, we seem to be the first to propose assessing the meaningfulness of CC instances in terms of how properties are preserved between the source and the target, and in particular by looking at the property mappings and induced by the trace relation . We prove that CC, TP, and TP are equivalent for any trace relation (Section 2.2), as illustrated in Figure 1. In the opposite direction, we show that for every trace relation corresponding to a given Galois connection [26], an analogous equivalence holds.
•
We extend these results from the preservation of trace properties to the larger class of subset-closed hyperproperties, e.g., noninterference (Section 3.1), 4 and to the classes of safety properties (Section 3.2) and all hyperproperties (Section 3.3).
•
We use CC compilers of various complexities to illustrate that our view on compiler correctness naturally accounts for undefined behavior (Section 4.1), resource exhaustion (Section 4.2), different source and target values (Section 4.3), and differences in the granularity of data and observable events (Section 4.4). We expect these ideas to extend to other discrepancies between source and target traces. For each compiler, we show how to choose the relation between source and target traces and how the induced property mappings preserve interesting trace properties and subset-closed hyperproperties. We look at the way particular and work on different kinds of properties and how the produced properties can be expressed for different kinds of traces.
•
We analyze the impact of correct compilation on noninterference [28], showing what can still be preserved (and thus also what is lost) when target observations are finer than source ones, e.g., side-channel observations (Section 5). We formalize the guarantee obtained by correct compilation of a noninterfering program as abstract noninterference [27], a weakening of target noninterference. Dually, we identify a family of declassifications of target noninterference for which source reasoning is possible.
•
We show that the trinitarian view also extends to a large class of secure compilation definitions [3], formally characterizing the protection of the compiled program against linked adversarial code (Section 6). For each secure compilation definition, we again propose both a property-free characterization in the style of CC and two characterizations in terms of preserving a class of source or target properties satisfied against arbitrary adversarial contexts. The additional quantification over contexts allows for finer distinctions when considering different property classes, so we study mapping classes not only of trace properties and hyperproperties, but also of relational hyperproperties [3].
•
We provide instances of secure compilers that preserve three different classes of hyperproperties (trace, safety, and hypersafety properties) when targeting a language with additional trace events that are not possible in the source (Section 7).
The results and insights that we provide often follow one’s expected intuition and may be considered unsurprising. However, our framework is the first to capture such expectations formally and precisely, and as such it provides a uniform way to discuss these and to formalize future (possibly surprising) ones. The article closes with discussions of related (Section 8) and future work (Section 9). Some technical proofs can be found in the Appendix (Section B).
The traces considered in our examples are structured, usually as sequences of events. We notice, however, that unless explicitly mentioned, all our definitions and results are more general and make no assumption whatsoever about the structure of traces. Most of the theorems formally or informally mentioned in the article were mechanized in the Coq proof assistant and are marked with . This development has around 10K lines of code and is available at the following address: https://github.com/secure-compilation/different_traces.
2 Trace-relating Compiler Correctness
In this section, we start by generalizing the trace property preservation definitions at the end of the introduction to and , which depend on two arbitrary mappings and (Section 2.1). We prove that, whenever and form a Galois connection, and are equivalent (Theorem 2.4). We then exploit a bijective correspondence between trace relations and Galois connections to close the trinitarian view (Section 2.2), with two main benefits: First, it helps us assess the meaningfulness of a given trace relation by looking at the property mappings it induces; second, it allows us to construct new compiler correctness definitions starting from a desired mapping of properties. Finally, we generalize the classic result that compiler correctness (i.e., ) is enough to preserve not just trace properties but also all subset-closed hyperproperties [18]. For this, we show that CC is also equivalent to subset-closed hyperproperty preservation, for which we also define both a version in terms of and a version in terms of (Section 3.1).
2.1 Property Mappings
As explained in Section 1, trace-relating compiler correctness CC, by itself, lacks a crisp description of which trace properties are preserved by compilation. Since even the syntax of traces can differ between source and target, one can either focus on trace properties of the source (and then interpret them in the target) or on trace properties of the target (and then interpret them in the source). Formally, we need two property mappings, and , which lead us to the following generalization of trace property preservation ():
For an arbitrary source program , interprets a source property as the target guarantee for . Dually, defines a source obligation sufficient for the satisfaction of a target property after compilation. Ideally:
(i)
Given , the target interpretation of the source obligation should actually guarantee that holds, i.e., ;
(ii)
Dually for , we would not want the source obligation for to be harder than itself, i.e., .
These requirements are satisfied when the two maps form a Galois connection between the posets of source and target properties ordered by inclusion. We briefly recall the definition and the characteristic property of Galois connections [20, 47].
We will often write to denote a Galois connection, or simply , or even when the involved posets are clear from context.
If two property mappings, and , form a Galois connection on trace properties ordered by set inclusion, then Lemma 2.3 (with and ) tells us that they satisfy conditions above, i.e., and .5 These conditions on and are sufficient to show the equivalence of the criteria they define, respectively, and .
Proof. Notice that if a program satisfies a property , then it satisfies every less restrictive i.e., bigger property . Building on this:
() Assume and that satisfies . Apply to and and deduce that satisfies .
() Assume and that satisfies . Apply to and deducing satisfies .
2.2 Trace Relations and Property Mappings
We now investigate the relation between CC, , and . We show that for a trace relation and its corresponding Galois connection (Lemma 2.7), the three criteria are equivalent (Theorem 2.8). This equivalence offers interesting insights for both verification and the design of a correct compiler. For a CC compiler, the equivalence makes explicit both the guarantees one has after compilation () and source proof obligations to ensure the satisfaction of a given target property (). However, a compiler designer might first determine the target guarantees the compiler itself must provide, i.e., , and then prove an equivalent statement, CC, for which more convenient proof techniques exist in the literature [9, 77].
When trace relations are considered, the corresponding existential and universal images can be used to instantiate Definition 2.1 leading to the trinitarian view already mentioned in Section 1.
This result relies both on Theorem 2.4 and on the fact that the existential and universal images of a trace relation form a Galois connection (). The theorem can be stated in a slightly more general form (Theorem 2.8), exploiting an isomorphism between the category of sets and relations and a subcategory of monotonic predicate transformers [26]. We specialize this isomorphism to what is of interest for our purposes and deduce a bijective correspondence between trace relations and Galois connections on properties.
The bijection just introduced allows us to generalize Theorem 2.6 and switch anytime between the three views of compiler correctness described earlier.
Note that sometimes the lifted properties may be trivial: The target guarantee can be the true property (the set of all traces) or the source obligation the false property (the empty set of traces). This might be the case when source observations abstract away too much information (Section 4.2 presents an example).
3 Preserving Other (Hyper)Property Classes
In this section, we investigate how to preserve other classes of (hyper)properties beyond trace properties: subset-closed hyperproperties (Section 3.1), safety properties (Section 3.2), and arbitrary hyperproperties that are not just subset-closed (Section 3.3). For each of these classes, we start by giving an intuition of what it means to preserve such a class in the equal-trace setting, then we study preservation of that class in the trace-relating setting. For subset-closed hyperproperties, we have to refine the Galois connection to ensure the information “ is subset-closed” is not lost with the application of . Similarly, when looking at safety properties, we have to preserve the information that a propery is a safety property. For arbitrary hyperproperties one might instead require that no information at all is lost during the (pre or post) composition of and . The section concludes with a comparison of the criteria in terms of relative strengths (Section 3.4).
3.1 Preservation of Subset-closed Hyperproperties
Hyperproperty preservation is a strong requirement in general. Fortunately, many interesting hyperproperties are subset-closed( for short) (e.g., noninterference), and these are known to be preserved by refinement [18]. When the trace semantics is common to source and target languages, a subset-closed hyperproperty is preserved if the behaviors of the compiled program refine the behaviors of the source program, which coincides with the statement of . We generalize this result to the trace-relating setting, introducing two other equivalent characterizations of CC in terms of preservation of subset-closed hyperproperties (Theorem 3.3). To do so, we close under subsets the images of both and so source subset-closed hyperproperties are mapped to target subset-closed ones and vice versa.
First, a hyperproperty is defined as a set of sets of traces, (recall that is the set of all traces) [18]. A program satisfies a hyperproperty when its complete set of traces, which from now on we will call its behavior, is a member of the hyperproperty.
To talk about hyperproperty preservation in the trace-relating setting, we need an interpretation of source hyperproperties into the target and vice versa. The one we consider builds on top of the two trace property mappings and , which are naturally lifted to hyperproperty mappings. This way, we are able to extract two hyperproperty mappings from a trace relation similarly to Section 2.2:
Formally, we are defining two new mappings, this time on hyperproperties, but with a small abuse of notation, we still denote them and .
Interestingly, it is not possible to apply the argument used for to show that a CC compiler guarantees whenever . This is because direct images do not necessarily preserve subset-closure [44, 55]. We therefore close the image of and under subsets (denoted as ) and obtain the following result:
The use of in Theorem 3.3 implies a loss of precision in preserving subset-closed hyperproperties through compilation. In Section 5, we focus on a specific security-relevant subset-closed hyperproperty, noninterference, and show that such a loss of precision can be seen as a declassification. Instead, now we define the trinity and the related formal machinery for safety properties preservation.
3.2 Preserving Safety Properties
The class of Safety properties collects all trace properties prescribing that “something bad never happens” or equivalently, all trace properties whose violation can be monitored and, once observed, no longer restored [18]. More abstractly, safety properties can be defined as the closed sets of a topology [18, 58], with no need to consider any particular structure on the traces. To ease the presentation, we consider the trace model adopted by Abate et al. [3] where traces resemble lists and streams of events. This model naturally comes with a notion of prefixes and a relation between a prefix and a trace , written . Intuitively, is a safety property if any trace violating the property extends a “bad prefix” that witnesses such a violation. Every safety property is therefore uniquely defined by the set of its “bad prefixes.” We recall below the definition and the characterization of safety properties in terms of sets of finite prefixes .
Due to this characterization of safety properties through finite prefixes (Definition 3.4), the preservation of all and only the safety properties is equivalent to restricted to finite prefixes.
Unfolding , we can interpret as follows: Whenever produces a trace that violates a specific safety property, namely, the one defined by the singleton prefix set , then violates the same safety property, producing a trace but possibly distinct from .
The generalization we propose of to the trace-relating setting, states that whenever produces a trace that violates a target safety property, then violates the source interpretation of the property, i.e., its image through .7 The following theorem defines and its two equivalent formulations:
Coherent with the informal meaning we aimed to give to , quantifies over target safety properties, while quantifies over arbitrary source properties, but imposes the composition of with , which maps an arbitrary target property to the target safety property that best over-approximates . 8 More precisely, is a closure operator on target properties, with being the class of target safety properties.
In Figure 2 the blue and red ellipses represent source and target properties properties, respectively, and are connected by . The red ellipse is the class of all target safety properties. is a Galois connection between target properties and the target safety properties, as is a closure operator [21]. Finally, the composition of Galois connections is still a Galois connection [21]. Hence,
is a Galois connection between source properties and target safety properties, which we used to prove the equivalence (). We notice that this argument generalizes to arbitrary closure operators on target properties (). We come back to this in Section 6, where more such results will be needed when considering other classes of properties being preserved by secure compilers. Now, we define the trinity for arbitrary hyperproperties, not just the subset-closed ones.
Fig. 2.
3.3 Preserving Non-subset Closed Hyperproperties
Subset-closed hyperproperties are not expressive enough to all capture interesting properties, e.g., possibilistic notions of information-flow [18], so we aim to briefly discuss the preservation of arbitrary hyperproperties. In general, one cannot lift a Galois connection over trace properties to a Galois connection over arbitrary hyperproperties.
While two out of three of the criteria we introduce in this section are equivalent under no assumptions (), for a comparison with the third one, we require that no information is lost in the pre or post composition of and . For this, we label the trinity in Theorem 3.8 as weak.
To start, we note that the following strengthening of , denoted , is equivalent to the preservation of arbitrary hyperproperties. Here, is the set of all traces of :
requires that the behavior of is exactly the same as the behavior of . We generalize this to the trace-relating setting by requiring that the behavior of coincide with the target interpretation of the source properties describing the behavior of . 9
In other words, it is still possible (and sound) to deduce a source obligation for a given target hyperproperty () when no information is lost in the composition . Dually, (and hence ) is a consequence of when no information is lost in composing in the other direction, .
3.4 Comparing the Presented Criteria
At this point, we have presented four trinities of criteria that preserve trace properties, subset-closed hyperproperties, safety properties, and arbitrary hyperproperties. Figure 3 sums up our trinities and orders them according their relative strength.
Fig. 3.
In Section 6, we will also consider, in the setting of secure compilation, the class of safety hyperproperties or hypersafety, and relational hyperproperties. In the setting of correct compilation—which focuses only on whole programs—it is straightforward to show that the trinity for hypersafety coincides with the one for safety properties in the same way the trinity of trace properties and subset-closed hyperproperties coincide. Similarly the trinity for relational hyperproperties coincides with the one for hyperproperties.
4 Instances of Trace-relating Compiler Correctness
The trace-relating view of compiler correctness above can serve as a unifying framework for studying a range of interesting compilers. This section provides several representative instantiations of the framework: source languages with undefined behavior that compilation can turn into arbitrary target behavior (Section 4.1), target languages with resource exhaustion that cannot happen in the source (Section 4.2), changes in the representation of values (Section 4.3), and differences in the granularity of data and observable events (Section 4.4).
4.1 Undefined Behavior
We start by expanding upon the discussion of undefined behavior in Section 1. We first study the model of CompCert, where source and target alphabets are the same, including the event for undefined behavior. The trace relation weakens equality by allowing undefined behavior to be replaced with an arbitrary sequence of events.
This relation can be easily generalized to other settings. For instance, consider the setting in which we compile down to a low-level language like machine code. Target traces can now contain new events that cannot occur in the source: Indeed, in modern architectures like x86 a compiler typically uses only a fraction of the available instruction set. Some instructions might even perform dangerous operations, such as writing to the hard drive or controlling a device that is hidden from the source language. Formally, the source and target do not have the same events anymore. Thus, we consider a source alphabet, , and a target alphabet, . The trace relation is defined in the same way and we obtain the same property mappings as above, except that target traces now have more events (some of which may be dangerous), the arbitrary continuations of target traces get more interesting. For instance, consider a new event that represents writing data on the hard drive, and suppose we want to prove that this event cannot happen for a compiled program. Then, proving this property requires exactly proving that the source program exhibits no undefined behavior [14]. More generally, what one can prove about target-only events can only be either that they cannot appear (because there is no undefined behavior) or that any of them can appear (in the case of undefined behavior).
In Section 7.1, we study a similar example, showing that even in a safe language linked adversarial contexts can cause dangerous target events that have no source correspondent.
4.2 Resource Exhaustion
Let us return to the discussion about resource exhaustion in Section 1.
We conclude this subsection by noting that the resource exhaustion relation and the undefined behavior relation from the previous subsection can easily be combined. Indeed, given a relation and a relation defined as above on the same sets of traces, we can build a new relation that allows both refinement of undefined behavior and resource exhaustion by taking their union: . A compiler that is or is trivially CC, though the converse is not true.
4.3 Different Source and Target Values
This section first presents the common language formalization (Section 4.3.1) that the following (Section 4.3.2) and later instances (Section 4.4 and Section 7.1) build upon. This shared language formalization does not contain a key language feature, namely, the expressions that generate actions and thus labels. This is because each instance deals with specific ways to generate actions, so each instance will define its own extension to each of the languages defined below. Additionally, each instance will define its own compiler and the trace relation used to attain CC.
4.3.1 Shared Source and Target Language Formalization.
The source language is a pure, statically typed expression language whose expressions include naturals, Booleans, a Boolean conditional and a conditional for expressions that reduce to , arithmetic and relational operations, and sequencing.
Types are either (naturals) or (Booleans) and typing is standard.
The language semantics deal with actions , lists of actions and expression results . A list of actions is a list of individual actions , which are instance-dependant and thus presented later; the same holds for source traces .
The source language has a standard big-step operational semantics () that tells how an expression generates a list of actions and a result .
The target language is analogous to the source one, except that it is untyped, it only has naturals and its only conditional is .
The semantics of the target language is also given in big-step style; since its rules are a subset of the source rules, they are omitted. Since we only have naturals and all expressions operate on them, no error result is possible in the target.
4.3.2 Different Source and Target Values.
In this instance, we extend the source language with expressions to perform Booleans and natural inputs, while the target only has expressions to input naturals. To compile the , the target is also extended with a conditional that checks if an expression is less than another.
Source actions are Boolean and natural inputs and source traces are lists of actions together with a final result . Target actions are just natural inputs .
The source extensions respect typing and thus well-typed programs never produce error (). The semantics of the extensions adds elements to the traces.
The compiler is homomorphic, translating a source expression to the same target expression; the only differences are natural numbers (and conditionals).
When compiling an if-then-else the then and else branches of the source are swapped in the target because of the compilation of Booleans.
Relating Traces. We relate basic values (naturals and Booleans) in a non-injective fashion, as noted below. Then, we extend the relation to lists of inputs pointwise (Rules Empty and Cons) and lift that relation to traces (Rules Nat and Bool).
Property mappings. The property mappings and induced by the trace relation defined above capture the intuition behind encoding Booleans as naturals:
•
the source-to-target mapping allows to be encoded by any non-zero number;
•
the target-to-source mapping requires that be replaceable by both and .
Compiler correctness. With the relation above, the compiler is proven to satisfy CC.
Simulations with different traces. In the settings where , it is customary to prove compiler correctness showing a forward simulation (i.e., a simulation between source and target transition system); then, using determinacy [24, 48] of the target language and input totality [25, 82] (receptiveness) of the source, this forward simulation is flipped into a backward simulation (a simulation between target and source transition system), as described by Beringer et al. [9], Leroy [42]. This “flipping” is useful, because forward simulations are often much easier to prove (by induction on the transitions of the source) than backward ones. For the proof of Theorem 4.3, we had to show a backward simulation, as it was not possible to define a forward one and then flip it. Hereafter, we show the reason lies in the shape of trace relation itself and discuss when is possible to generalize the flipping to the trace-relating setting.
We first give the main idea of the flipping proof, when the inputs are the same in the source and the target [9, 42]. We only consider inputs, as it is the most interesting case, since with determinacy, nondeterminism only occurs on inputs. Given a forward simulation , and a target program that simulates a source program , is able to perform an input iff so is : otherwise, say, for instance that performs an output, by forward simulation would also perform an output, which is impossible because of determinacy. By input totality of the source, must be able to perform the exact same input as ; using forward simulation and determinacy, the resulting programs must be related.
The trace relation from Section 4.3.2 is not injective (both and are mapped to ), therefore, these arguments do not apply: Not all possible inputs of target programs are accounted for in the forward simulation. To flip a forward simulation into a backward one it is necessary that, for any source program and target program related by the forward simulation , the following diagram is satisfied:
We say that a forward simulation for which this property holds is flippable. For our example compiler, a flippable forward simulation works as follows: Whenever a Boolean input occurs in the source, the target program must perform every strictly positive input (and not just , as suggested by the compiler). Using this property, determinacy of the target, input totality of the source, as well as the fact that any target input has an inverse image through the relation, we can indeed show that the forward simulation can be turned into a backward one: Starting from and an input , we show that there is and as in the diagram above, using the same arguments as when the inputs are the same; because the simulation is flippable, we can close the diagram and obtain the existence of an adequate . From this, we obtain CC.
In fact, we showed that the flippable hypothesis is also sufficient to flip a forward simulation into a backward one, even in the trace-relating setting, and proved it in a general (i.e., language independent) “flipping theorem” (). We have also shown that if the relation defines a bijection between the inputs of the source and the target, then any forward simulation is flippable, hence reobtaining the usual proof technique [9, 42] as a special case.
4.4 Abstraction Mismatches
We now consider how to relate traces where a single source action is compiled to multiple target ones. To illustrate this, we extend our source language to output (nested) pairs of arbitrary size and our target language to send values that have a fixed size. Concretely, the source is analogous to the language of Section 4.3, except that it does not have inputs (nor Booleans for simplicity) but it has pairs. Additionally, it has an expression that can emit a (nested) pair of values in a single action. Given that reduces to a pair, e.g., , expression emits action . That expression is eventually compiled into a sequence of individual sends in the target language , since in the target, sends the value that reduces to, but the language cannot send pairs (although it has pair constructs).
The source and target languages are formally extended (respectively, in the first and second lines below) with pairs and sending constructs as follows: For reasons that we explain when the compiler is presented, we extend the target language with a let-in construct and variables. Finally, source traces are sequences of sent values (which include nested pairs) and target traces are only sequences of natural numbers.
The source additions are well-typed and their semantics is unsurprising; the semantics relies on the usual capture-avoiding substitution of a result for a variable
The compiler is defined inductively on the type derivation of a source expression (). The only interesting case is when compiling a , where we use the source type information concerning the message (i.e., a pair) being sent to deconstruct that pair into a sequence of natural numbers, which is what is sent in the target. This is the reason we need the let-in construct in the target, since we run the pair once (as the argument of the let-in) and then we send all of its projection to avoid duplicating side effects. Technically, since it is defined on the type derivations of terms, the compiler is defined inductively on type derivations (and not simply on terms). Thus, compiling would look like the following (using as a metavariable to range over derivations):
However, note that each judgment uniquely identifies which typing rule is being applied and the underlying derivation. Thus, for compactness, we only write the judgment in the compilation and implicitly apply the related typing rule to obtain the underlying judgments for recursive calls. To differentiate this from the compiler of Section 4.3.2, this compiler has parentheses over its input.
Relating Traces. We start with the trivial relation between numbers: , i.e., numbers are related when they are the same. We cannot build a relation between single actions, since a single source action is related to multiple target ones. Therefore, we define a relation between a source action and a target trace (a list of numbers) inductively on the structure of .
A pair of naturals is related to the two actions that send each element of the pair (Rule Trace-Rel-N-N). If a pair is made of sub-pairs, then we require all such sub-pairs to be related (Rules Trace-Rel-N-M to Trace-Rel-M-M).
We build on these rules to define the relation between source and target traces for which the compiler is correct (4.5). Trivially, traces are related when they are both empty. Alternatively, given related traces, we can concatenate a source action and a second target trace provided that they are related (Rule Trace-Rel-Single). Before proving that the compiler is correct, we need 4.4. Intuitively, that lemma tells us that the way we break down a source sent value into multiple target sends is correct.
With our trace relation, the trace property mappings capture the following intuitions:
•
The target-to-source mapping states that a source property can reconstruct target action as it sees fit. For example, trace is related to and (and many more variations). This gives freedom to the source implementation of a target behavior, which follows from the non-injectivity of . 10
•
The source-to-target mapping “forgets” about the way pairs are nested, but is faithful w.r.t. the values contained in a message. Notice that source safety properties are always mapped to target safety properties. For instance, if prescribes that some bad number is never sent, then prescribes the same number is never sent in the target and . Of course if prescribes that a particular nested pairing like never happens, then is still a target safety property, but the trivial one, since .
5 Trace-relating Compilation and Noninterference Preservation
We now study the relation between trace-relating compilation and noninterference preservation. As mentioned earlier (Section 3.1), in the particular case where source and target observations are drawn from the same set, a correct compiler () is enough to ensure the preservation of all subset-closed hyperproperties, in particular of noninterference (NI) [28]. But in the scenario where target observations are strictly more informative than source observations, this is not the case. In fact, as we will show, the best guarantee one may expect from a correct trace-relating compiler (CC) in such a setting is a weakening (or declassification) of target noninterference that matches the noninterference property satisfied in the source. In certain scenarios, it turns out that the noninterference property of interest in the target comes “for free,” while in others, it does not, and therefore establishing noninterference requires an additional proof effort beyond CC. To formalize this reasoning, this section applies the trinitarian view of trace-relating compilation to the general framework of abstract noninterference (ANI) [27], clarifying the kind of noninterference preservation that follows from a given trace relation and correct compilation.
We first define NI and explain the issue of preserving source NI via a CC compiler (Section 5.1). We then introduce ANI, which allows characterizing various forms of noninterference (Section 5.2), and formulate a theory of ANI preservation via CC, both with respect to a timing insensitive declassification (Section 5.3) and in general (Section 5.4). We also study how to deal with cases such as undefined behavior in the target (Section 5.5). We then answer the dual question, i.e., which source NI should be satisfied to guarantee that compiled programs are noninterfering with respect to target observers (Section 5.6). Finally, we use this formal development to analyze recent work on correct compilers with interesting noninterference guarantees [7, 74], clarifying whether these guarantees follow from correctness alone or not (Section 5.7).
5.1 Noninterference and Trace-relating Compilation
Intuitively, noninterference (NI) requires that publicly observable outputs do not reveal information about private inputs. To define this formally, we need a few additions to our setup. We indicate the (disjoint) input and output projections of a trace as and , respectively.11 Denote with the equivalence class of a trace , obtained using a standard low-equivalence relation that relates low (public) events only if they are equal, and ignores any difference between private events. Then, NI for source traces can be defined as:
That is, source NI comprises the sets of traces that have equivalent low output projections as long as their low input projections are equivalent.
When additional observations are possible in the target, it is unclear whether a noninterfering source program is compiled to a noninterfering target program or not, and if so, whether the notion of NI in the target is the expected (or desired) one. We illustrate this issue by considering a scenario where target traces extend source traces by exposing the execution time. While source noninterference requires that private inputs do not affect public outputs, additionally requires that the execution time is not affected by varying private inputs.
To model the scenario described, we represent target traces as pairs of a source trace and a natural number that denotes the time spent to produce the trace (using for infinite time units). Formally, if denotes the set of source traces, then is the set of target traces, where .
Notice that if two source traces , are low-equivalent, then and , but and .
Consider the following straightforward trace relation, which relates a source trace to any target trace whose first component is equal to it, irrespective of execution time:
A compiler is CC for this trace relation if any trace that can be exhibited in the target can be simulated in the source in some amount of time. For such a compiler, 3.3 says that if satisfies , then satisfies . This hyperproperty is, however, strictly weaker than , as it contains for example , and one cannot conclude that is noninterfering in the target. It is easy to check that
the first equality coming from , and the second from being subset-closed. As we will see, this hyperproperty can be characterized as a form of NI, which one might call timing-insensitive noninterference, i.e., ensured only against attackers that cannot measure execution time. For this characterization, and to describe different forms of noninterference as well as formally analyze their preservation by a CC compiler, we rely on the general framework of abstract noninterference [27].
5.2 Abstract Noninterference
ANI [27] is a generalization of NI whose formulation relies on abstractions (in the sense of Abstract Interpretation [20]) to encompass arbitrary variants of NI. ANI is parameterized by an observer abstraction, which denotes the distinguishing power of the attacker, and a selection abstraction, which specifies when to check NI, and therefore captures a form of declassification [69]. 12 Formally:
By picking , we recover the standard noninterference defined above, where NI must hold for all low inputs (i.e., no declassification of private inputs), and the observational power of the attacker is limited to distinguishing low outputs. The observational power of the attacker can be weakened by choosing a more liberal relation for . For instance, one may limit the attacker to observe the parity of output integer values. Another way to weaken ANI is to use to specify that noninterference is only required to hold for a subset of low inputs.
The operators and are defined over sets of (input and output projections of) traces, explicitly and . When we write like above, this should be understood as a convenience notation for . Likewise, should be understood as , i.e., the powerset lifting of . Additionally, and are required to be upper-closed operators ()—i.e., monotonic, idempotent, and extensive (i.e., ) —on the poset that is the powerset of (input and output projections of) traces ordered by inclusion [27].
5.3 Trace-relating Compilation and ANI for Timing
We can now reformulate our example with observable execution times in target traces in terms of ANI. We have with . In this case, the hyperproperty that a compiled program satisfies whenever satisfies can be described as an instance of ANI:
The definition of tells us that the trace relation does not affect the selection abstraction, i.e., declassification is unaffected. The definition of characterizes an observer that cannot distinguish execution times for noninterfering traces (notice that in the definition of is discarded). For instance, , for any , , . Therefore, in this setting, we know explicitly through that a CC compiler degrades source noninterference to target timing-insensitive noninterference.
5.4 Trace-relating Compilation and ANI in General
While the particular and above can be discovered by intuition, we want to know whether there is a systematic way of obtaining them in general. In other words, for any trace relation and any notion of source NI, what property is guaranteed on noninterfering source programs by any CC compiler?
We can now answer this question generally (Theorem 5.1): Any source notion of noninterference expressible as an instance of ANI is mapped to a corresponding instance of ANI in the target, whenever source traces are an abstraction of target ones (i.e., when is a total and surjective map). For this result, we consider trace relations that can be split into input and output trace relations (denoted as ) such that . The trace relation corresponds to a Galois connection between the sets of trace properties as described in Section 2.2. Similarly, the pair and corresponds to a pair of Galois connections, and , between the sets of input and output properties. In the timing example, time is an output so we have and is defined as .
The target abstract noninterference has to be intended as the best correct approximation of the source one. The mappings are the existential and universal images of the relation , defined by if and only if . Therefore, and are lower and upper adjoints, respectively (Section 2). The operator is the best correct approximation of w.r.t. to [20] (hence, the choice of the notation). A similar result holds for .
Coming back to our example above, we can formally recover the intuitively justified definitions, i.e., .
5.5 Noninterference and Undefined Behavior
As stated above, Theorem 5.1 does not apply to several scenarios from Section 4 such as undefined behavior (Section 4.1). Indeed, in these cases, the relation is not a total map. Nevertheless, we can still exploit our framework to reason about the impact of compilation on noninterference.
Let us consider where is any total and surjective map from target to source inputs (e.g., equality) and is defined as . Intuitively, a CC compiler guarantees noninterference for the compiled program, provided that the target attacker cannot exploit undefined behavior to learn private information. This intuition can be made formal by the following theorem:
Technically, instead of giving us a definition of , the theorem gives a property of it. The property states that, given a target output trace , the attacker cannot distinguish it from any other target output traces produced by other possible compilations () of the source trace it relates to, up to the observational power of the source-level attacker . Therefore, given a source attacker , the theorem characterizes a family of attackers that cannot observe any interference for a correctly compiled noninterfering program. Notice that the target attacker satisfies the premise of the theorem, but defines a trivial hyperproperty, so we cannot prove in general that . Also, this degenerate attacker shows that the family of attackers described in Theorem 5.2 is nonempty, which ensures the existence of a most powerful attacker among them [27].
5.6 From Target NI to Source NI
We now explore the dual question: Under what hypothesis does trace-relating compiler correctness alone allow target noninterference to be reduced to source noninterference? This is of practical interest, as one would be able to protect from target attackers by ensuring noninterference in the source. This task can be made easier if the source language has some static enforcement mechanism [1, 44].
Let us consider the languages from Section 4.4 extended with the ability to accept inputs as (pairs of) values. It is easy to show that the compiler described in Section 4.4 (extended to treat the new input expressions homomorphically) is still CC: Given a target trace with the same inputs of the source one (i.e., ), the compiler of Section 4.4 ensures that simulates the same outputs of (i.e., ). Assume that we want to satisfy a given notion of target noninterference after compilation, i.e., . Recall that the observational power of the target attacker, , is expressed as a property of sequences of values. To express the same property (or attacker) in the source, we have to abstract the way pairs of values are nested. For instance, the source attacker should not distinguish and . In general (i.e., when is not the identity), this argument is valid only when can be represented in the source. More precisely, must consider as equivalent all target inputs that are related to the same source input, because in the source it is not possible to have a finer distinction of inputs. This intuitive correspondence can be formalized as follows:
The results presented in this section formalize and generalize some intuitive facts about compiler correctness and noninterference, clarifying which noninterference property follows “for free” from trace-relating compiler correctness. Of course, in the general case, compiler correctness alone is not a strong enough criterion for dealing with many security properties [8, 23]. This section exploits our ANI-based framework and results to analyze two compilers from the recent literature [7, 74] that are both proven to be correct and to preserve two interesting notions of noninterference: cryptographic constant time (Section 5.7.1) and value-dependent noninterference (Section 5.7.2). For each, we explain how to express compiler correctness as an instance of CC, describe the noninterference property that is implied by the trace relation and the correctness result, and compare it with the noninterference properties of interest as established by their authors.
5.7.1 A Correct Compiler Preserving Cryptographic Constant Time.
Barthe et al. [7] provide a correct compiler (as an extension of CompCert) that also preserves cryptographic constant time (CT). CT is a security property stating that the runtime of a program does not depend on its secret, and thus an attacker cannot extrude secrets of a program by observing its execution time. A CT-preserving compiler takes code that is CT and generates code that also is CT. Thus, a CT-preserving compiler must translate runtime-equivalent source programs into runtime-equivalent target ones. Notice that it is not necessary for the leakage of target programs to be the same of their source counterparts, rather: Source programs with the same leakage must be compiled to target programs with the same leakage.
Barthe et al. [7] prove CT preservation for 17 passes of CompCert. The authors partition the 17 steps in four categories, depending on the proof technique they use to show CT preservation. Every category proves an instance of CC by improving on the existing CompCert simulation. In three out of the four cases this is sufficient to also prove CT preservation, while for the last category a further proof is necessary. In what follows, we first encode CT as an instance of abstract noninterference, i.e., show for which operators and then use our framework to understand why modifying CompCert simulation is sufficient in the first three categories but not in the last one. For each category, Theorem 5.2 applies, so no that respects Equation (1) can notice any interference on compiled programs that were source constant-time. In the first three categories the attacker that defines CT——respects the equation,13 i.e.,
(2)
and CT preservation is therefore a consequence of CC. In the last category, does not respect Equation (2) and the authors have to prove an additional theorem, the CT-diagram.
Trace Model and CT as an instance of ANI. The formal definition of CT is given by extending the semantics of the languages in CompCert and enriching the traces of input and output events with leakages. Leakages are results of execution steps that involve conditional branching or memory access. A program is CT w.r.t. a certain relation over program states [7, Definition 3.2] iff for every two initial states such that , the leakages that can be observed are the same. Notice that in Reference [7, Definition 3.2] the secret is stored in the program states and defined by , therefore to regard CT as an instance of abstract noninterference program states will be regarded as inputs and events together with their leakages as outputs. More precisely, a trace is a sequence of of triples where and are program states and an event in the instrumented semantics, i.e., input/output event and associated leakage.
We consider:
•
to be (the uco corresponding to) the relation defined by iff have the same length with , and .
•
to be (the uco corresponding to) the relation defined by iff have the same length with , and , where denotes the leakage in the event (projection of on the leak-only semantics [7]).
It is easy to check that for the and given above.
We now present more details for each of the four proof techniques adopted by Barthe et al. [7]. Since CT is defined only for safe programs [7, Definition 3.1], we can assume no undefined behavior is ever encountered and have a simpler presentation. We also omit coming from the application of Theorem 5.2, as it always coincides with .
Constant-time security preservation by leakage preservation ([7, Section 5.2]). For compilation passes that belong to this category, the authors prove that the source leakage is preserved exactly in the target. Thus, in this simple case, the theorem proved is CC where is point-wise equality of events together with leakages, the identity and satisfies Equation 2 by idempotency of ,
CT preservation from leakage-erasing simulation ([7, Section 5.3]). In this case, CC is proved for a relation that erases source leakage-only events, i.e., those events that do not contain inputs or outputs, but only the amount of leakage revealed. More precisely (see also Reference [7, Fig. 8]) for and of the same length, iff
The property mapping associated to the above relation, , erases all leak-only events from the traces of a source property. If an attacker cannot notice at any point any difference in the leakages of two traces and we erase the leak-only events from them, then the attacker will still not notice any difference on leakages, therefore it is easy to check that Equation 2 holds also in this case.
CT preservation via memory injection ([7, Section 5.4]). This case is analogous to the one above, save that it rests on a more complex relation involving a memory injection relation (see [7, Definition 5.8]). Intuitively, relates source and target traces that differ at most in leakages due to memory accesses. While in the previous case, leakages where simply erased, here they are modified and crucially with some uniformity. Reasoning as in the previous case, if an attacker cannot notice a difference in the leakages of two traces and we modify equal leakages of the same factor, then the attacker will still not notice any difference on leakages, thus Equation 2 holds.
CT preservation from CT-diagram ([7, Section 5.5]). In this case, does not satisfy Equation 2 because the counting simulation ([7, Definition 5.10]) does not necessarily relate source and target leakages but only the inputs and outputs.14 CC alone does not ensure that an attacker cannot observe any interference in the target leakages, to show preservation of CT the authors need to prove an extra condition, the so-called CT diagram [8].
5.7.2 Value-dependent noninterference.
Sison and Murray [74] introduce a compiler that provably preserves value-dependent noninterference (VDNI) for a concurrent language with shared variables. Value-dependent means that the secrecy level of a variable—low or high—may depend on the value of some other variable, called the control variable of the first, and therefore could change throughout its lifetime.
Preservation of VDNI for concurrent programs enjoys compositionality, meaning that it follows from the preservation of VDNI for each single thread [52] under certain conditions. As the compositionality result is orthogonal to our framework, we can study either (1) the preservation of VDNI for one local thread or for (2) the whole-program,
In the remainder of this section, we focus on the preservation of VDNI for a single thread, which is proven by showing a secure refinement relation between source and compiled threads. Similarly to the previous section, the secure refinement is expressed via a cube diagram (Reference [74], Figure 1) and can be proven directly [52] or split into more obligations [74].
As Sison and Murray [74] use a state transition-based semantics, we first show how to encode this semantics into a trace model by defining the relation based on the secure refinement relation. We then show how to encode VDNI as an instance of abstract noninterference (i.e., both and ). Finally, we apply Theorem 5.2 and conclude that if satisfies , then satisfies given that the trace relation has properties defined in Reference [52, Theorem 5.1].
Source (WHILE) and target (RISC-like assembly) languages are equipped with a determined evaluation step semantics (i.e., a semantics where the only source of nondeterminism are external inputs; Reference [74], Section 2) between thread-local configurations, which are triples of the form . In such a configuration, mds is the access mode state for program variables and mem is a map relating global program variables to their values. Both of these components are common to the source and target language. The tps component denotes the thread-private state. In the source language, it is the program to be executed. In the target language, tps consists of the target program (labelled assembly-language instructions), of a program counter, and of the set of thread-local registers. We denote WHILE configurations by tuples of the form: and RISC configurations by tuples of the form: .
Trace Model and Trace relation. We consider traces that are (possibly infinite) sequences of configurations. The traces produced by a program are the sequences of local configurations that the program may encounter during execution, according to the evaluation semantics. Let be a source trace. The input projection is defined by (the tuple consisting of the access modes and the memory in the first state) and the output projection is defined by (the trace itself). Input/output projections are defined similarly for target traces.
We take the trace relation to be the point-wise lifting of a secure refinement relation (Reference [74], Definition 6). Source and target configurations that are related coincide on the access mode and memory part (i.e., mds = mds’ and mem = mem’; Reference [74], Definition 4), so is simply the identity and coincides with .
VDNI as abstract noninterference. A program satisfies VDNI (Reference [74], Definition 2) if any two of its executions starting in low equivalent memories are related via a strong low bisimulation modulo modes (strong low bisimulation mm). Intuitively, a strong low bisimulation mm is a bisimulation that preserves low-equivalence. Preservation of VDNI is proved by Murray et al. [52] by showing that for every strong low-bisimulation mm for source threads, there exists a target strong low bisimulation mm such that if two source threads are related by , then the compiled threads are related by (Reference [52], Theorem 5.1).
The intuition for the encoding of VDNI as an instance of abstract noninterference is to model low equivalence through the operator , and bisimilarity through . More rigorously, = , where and are defined as following:
For ,
where is the low-equivalence modulo mds (Reference [74], Definition 1).
For
where denotes a strong low bisimulation modulo modes. Similarly where
The relation is a simulation, and therefore CC holds. To apply Theorem 5.2 and conclude that whenever a source program satisfies , then satisfies = , it is sufficient for to satisfy Equation 1, that is,
for . If one is willing to unfold all definitions, then this amounts to show the set of traces “bismilar” to coincides with the set of traces that are bisimilar to some and for some bisimilar to . Splitting the “coincides” (set equality) into the two directions of inclusion, the “” direction is immediate, while for the “” direction one has to prove some properties of , the ones in the definition of ([52, inlined above Theorem 5.1]) which entails preservation of low-equivalence as shown in [52, Theorem 5.1].
In summary, our framework makes it possible to precisely characterize the target noninterference properties that are implied by (trace-relating) correct compilation of source noninterfering programs. As we have shown, such properties are not necessarily as strong as desired. Crucially, the target noninterference property one gets for free for a given trace-relating correct compiler is a function of the trace relation under consideration. By considering more sophisticated trace relations, one could be able to get more interesting noninterference properties in the target for free—but this would likely come at the expense of a more challenging trace-relating compiler correctness proof.
6 Trace-relating Secure Compilation
So far, we have studied compiler correctness criteria for whole, standalone programs. However, in practice, programs do not exist in isolation, but in a context where they interact with other programs, libraries, etc. In many cases, this context cannot be assumed to be benign and could instead behave maliciously to try to disrupt a compiled program.
Hence, in this section, we consider the following secure compilation scenario: A source program is compiled and linked with an arbitrary target-level context, i.e., one that may not be expressible as the compilation of a source context. Compiler correctness does not address this case, as it does not consider arbitrary target contexts, looking instead at whole programs (empty context [41]) or well-behaved target contexts that behave like source ones (as in compositional compiler correctness [33, 37, 56, 76]).
Summary of the work of Abate et al. [3]. To account for this scenario, Abate et al. [3] describe several secure compilation criteria based on the preservation of classes of (hyper)properties (e.g., trace properties, safety, hypersafety, hyperproperties) against arbitrary target contexts. For each of these criteria, they give an equivalent “property-free” criterion, analogous to the equivalence between and . For instance, their robusttrace property preservation criterion () states that, for any trace property , if a source partial program plugged into any context satisfies , then the compiled program plugged into any target context satisfies . Their equivalent criterion to is , which states that for any trace produced by the compiled program, when linked with any target context, there is a source context that produces the same trace. Formally (writing to mean the whole program that results from linking partial program with context ) they define:
In the following, we adopt the notation to mean “robustly satisfies ,” i.e., satisfies irrespective of the contexts () it is linked with. Formally, , where is the same as before. Thus, we write more compactly:
All the criteria of Abate et al. [3] share this flavor of stating the existence of some source context that simulates the behavior of any given target context, with some variations depending on the class of (hyper)properties under consideration. For trace properties, they also have criteria that preserve safety properties plus their version of liveness properties. For hyperproperties, they have criteria that preserve hypersafety properties, subset-closed hyperproperties, and arbitrary hyperproperties. Finally, they define relational hyperproperties, which are relations between the behaviors of multiple programs for expressing, e.g., that a program always runs faster than another. For relational hyperproperties, they have criteria that preserve arbitrary relational properties, relational safety properties, relational hyperproperties, and relational subset-closed hyperproperties.
Each category of criteria provides different kinds of security guarantees (confidentiality or integrity) for the code and data segments of programs. Roughly speaking, the security guarantees due to robust preservation of trace properties regard only protecting the integrity of the program from the context, the guarantees of hyperproperties also regard data confidentiality, and the guarantees of relational hyperproperties may even regard code confidentiality. Naturally, these stronger guarantees are increasingly harder to enforce and prove.
All the criteria of Abate et al. [3] are stated in a setting where source and target traces are the same. In this section, we extend their results to the trace-relating setting, obtaining trintarian views for secure compilation. There are many similarities with Section 2 that show up in the secure compilation setting, too, but also some crucial differences. As in Section 2, the application of or , may lose the information that a property belongs to the class , or that a hyperproperty is subset-closed, which are both crucial for the equivalence with the property-free criterion of Abate et al. [3]. As in Section 2, we solve this problem by interpreting classes of properties as an abstraction of another class of properties induced by a closure operator. Differently from Section 2, the presence of adversarial contexts makes the criteria for subset-closed hyperproperties and trace properties distinct. Abate et al. [3] show that the criterion for robust preservation of hypersafety is distinct from robust safety preservation, and all criteria about classes of trace properties are distinct from their relational counterparts, e.g., robust preservation of relational safety and robust preservation of safety properties are different. We therefore further generalize the argument from Section 3.2 to safety hyperproperties as well as to relational hyperproperties.
Specifically, we provide a trinity for the preservation of trace properties and subset-closed hyperproperties (Section 6.1), of safety properties and hypersafety hyperproperties (Section 6.2), of hyperproperties (Section 6.3), and for 2-relational (hyper)properties (Section 6.4). We conclude the section by studying the relative expressiveness of these criteria (Section 6.5).
Robustness and Compositional Compilation. Before diving into the criteria for robust compilation, it is worth noting the relationship between these and compositional compiler correctness. Compositional compiler correctness () is a statement of compiler correctness for programs that are linked against some contexts. Unlike robustness, which imposes no constraints on the contexts, imposes conditions on the target contexts that compiled programs can be linked against: They need to be related (in ways that vary from work-to-work [38, 56]) to the source contexts [65]. As Patrignani and Garg [64] also point out, the notions of and of robust compilation are incomparable: Neither can be proven stronger than the other. This is not surprising, since robust compilation criteria are used to prove compiler security while is used to prove correctness. 15
The criteria we adopt could be generalized further by adding an extra parameter that qualifies the relation between source and target contexts. Such a general statement would let us express both and robust compilation by picking the correct extra parameter. However, we refrain from presenting such general statements, as the implications in terms of preservation of classes of (hyper)properties has not been studied for them.
6.1 Trace-relating Secure Compilation: Trace Properties and Subset-closed Hyperproperties
This section shows the simple generalization of to the trace-relating setting () and its corresponding trinitarian view (6.1). Then, it presents the trinitarian view for criteria that preserve subset-closed hyperproperties (6.2).
The trinity for robust trace property preservation is the straightforward adaptation of the concepts of Section 2 to the definitions of Abate et al. [3]. Intuitively, these criteria simply deal with partial programs instead of whole programs . Necessarily, these criteria then consider arbitrary program contexts linked with ; the universal quantification over and are tacit in the expression .
We can also generalize Section 2 to robust subset-closed hyperproperties (6.2). However, unlike the correct compilation case of Section 2, the equivalent property-free criterion () does not coincide with , but states the existence of a single source context for all the target traces produced by a program in a given context.
6.2 Trace-relating Secure Compilation: Safety and Hypersafety
In this section, we elaborate the robust preservation of safety (6.3) and hypersafety properties (6.4). Similar to Section 3.2, we consider the trace model adopted by Abate et al. [3] to ease the presentation. Our starting point is the two equivalent criteria for preservation of robust satisfaction of all and only the safety properties [3],
where is a shorthand for .
differs from as it only quantifies over safety properties, and differs from as it quantifies over finite prefixes , rather than complete traces . This comes from the fact that safety properties can be characterized in terms of sets of bad prefixes (as in Definition 3.4). Unfolding , we can interpret as follows: If produces a trace that violates a specific safety property, namely, the one defined by , then there exists in which violates the same safety property, producing a trace but possibly distinct from .
Our generalization of to the trace-relating setting states that whenever produces a trace that violates a target safety property, there exists a source context in which violates the source interpretation of the property, i.e., its image through . The following theorem defines and its two equivalent formulations:
Exactly like Section 3.2, Theorem 6.3 exploits the fact that
is a Galois connection between source properties and target safety properties and the argument generalizes to arbitrary closure operators on target properties (). More interestingly, we can further generalize this idea to hypersafety. Hypersafety lifts the idea of safety with another level of sets (just like hyperproperties do w.r.t. trace properties) to talk about multiple runs of the same program. Just like for safety, hypersafety is concerned with a set of bad prefixes (called ) that no program upholding the hypersafety property should extend. Formally, a hyperproperty is hypersafety if: In Theorem 6.4, we indeed exploit the following Galois connection between source subset-closed hyperproperties and target:
where and is the closure operator that maps an arbitrary target hyperproperty to the target hypersafety that best over-approximates . 16
We conclude this section with the following remark: The reader might wonder about extracting a “new” trace relation from the Galois connection and get another formulation of . We note that this is not possible in general, as the class of safety properties, i.e., closed sets, is not necessarily a powerset and hence Lemma 2.7 cannot be applied.
We already mentioned that some properties of interest for security, e.g., possibilistic information-flow are not subset closed [18]. In this section, we lift the results from Section 3.3 to the secure compilation setting. Once again, the trinity is weak, as the equivalence to requires an extra assumption.
It is therefore possible and correct to deduce a source obligation for a given target hyperproperty () when no information is lost in the composition . However, is a consequence of when no information is lost in composing in the other direction, .
Finally, we turn to relational properties and hyperproperties. Relational hyperproperties, as defined by Abate et al. [3], are predicates on a sequence of behaviors; a sequence of programs has the relational hyperproperty if their behaviors collectively satisfy the predicate. Depending on the arity of the sequence, there exist different subclasses of relational hyperproperties, though, for simplicity, here, we only study relational hyperproperties of arity 2. A key example of a relational hyperproperty is trace equivalence, which holds if two programs have identical behaviors.
All the trinities in this section follow the pattern of their non-relational counterparts. We first explain how one can get a Galois connection between source and target relational properties from a trace relation.
Given a trace relation , we can relate pairs of source traces with pairs of target traces point-wise,
Formally this is , the product of the relation with itself. Therefore, by Lemma 2.7 it corresponds to a Galois connection between source and target relational properties (), that with a little abuse of notation17 we still denote by
Explicitly, for and ,
and are then lifted to relational hyperproperties similarly to Definition 3.2. Explicitly, for and ,
Given a relational property and two programs , we write for
Given a relational hyperproperty , by , we mean
Next, we propose the trinity for 2-relational subset-closed hyperproperties, i.e., elements of that are closed under subsets. Exactly as in the case of subset-closed hyperproperties, the application of and may lose the information of being subset-closed. To guarantee the equivalence of the three criteria, we compose the two mappings with a closure operator that we still denote by .
We move now to the class of relational safety properties, a notion that generalizes safety properties to relations on programs. Similarly to Theorem 6.3, quantifies over target relational safety properties, while quantifies over all source relational property and compose with a closure operator that best approximates a relational property with a relational safety property.
Finally, we present the most general criterion: preservation of arbitrary 2-relational hyperproperties. As for the preservation of arbitrary hyperproperties, this (weak) trinity requires additional assumptions to hold, namely, that the Galois connection is an insertion or a reflection.
6.5 Relating the Secure Compilation Trinities
Figure 4 orders criteria referring to the same trace relation according to their relative strength. If a trinity entails another (denoted by ), then the former provides stronger security for a compilation chain than the latter.
Fig. 4.
The hypotheses of insertion and reflection mentioned in Theorem 6.9 and Theorem 6.5 are highlighted with the labels “Ins” and “Refl.” Recall that when composing with , we quantify over the whole class of source trace properties rather than only safety properties. This is represented by the blue background in . The trinity for the robust preservation of arbitrary trace properties is on the same blue background. Red and green backgrounds are reserved for subset-closed hyperproperties and arbitrary relational properties and serve the same purpose.
We now describe how to interpret the acronyms in Figure 4. All criteria start with meaning they refer to robust preservation (secure compilation criteria). Criteria for relational hyperproperties—here only arity 2 is shown for simplicity—contain . Next, criteria names spell the class of hyperproperties they preserve: for hyperproperties, for subset-closed hyperproperties, for hypersafety, for trace properties, and for safety properties. Finally, property-free criteria end with a while property-full ones involving and end with . Thus, robust () subset-closed hyperproperty-preserving () compilation () is , robust () two-relational () safety-preserving () compilation () is , and so on.
7 Instances of Trace-relating Secure Compilation
This section presents instances of compilers that adopt our framework for secure compilation purposes. We provide three illustrative cases for compilers that, respectively, robustly preserve trace properties (Section 7.1), safety properties (Section 7.2), and hypersafety properties (Section 7.3). The last two examples are not novel instances we devise but rather existing work whose results we recount as instantiations of our framework.
7.1 An Instance of Trace-relating Robust Preservation of Trace Properties
This subsection illustrates trace-relating secure compilation when the target events are strictly more events than the source ones.
The source and target languages used here extend the syntax of the source language of Section 4.3.1. Both languages have outputs of naturals, and the expressions that generate them: and . Additionally, the target has a different output action and its related expression ; this is the only difference between the languages. The extra events in the target model the ability of target language to perform potentially dangerous operations (e.g., writing to the hard drive), which cannot be performed by the source language, and against which source-level reasoning can therefore offer no protection.
Both languages and compilation chains now deal with partial programs , contexts , and linking of those two to produce whole programs . In this setting, a whole program is the combination of a main expression to be evaluated and a set of function definitions (with distinct names) that can refer to their argument () symbolically and can be called by the main expression and by other functions (. The set of functions of a whole program is the union of the functions of a partial program and a context; the latter also contains the main expression.
The extensions of the typing rules and the operational semantics for whole programs are unsurprising and therefore elided. The trace model also follows closely that of Section 4.3: It consists of a list of regular events (including the new outputs) terminated by a result event.18 A partial program and a context can be linked into a whole program when their functions satisfy the requirements mentioned above.
We define the homomorphic compiler () that translates each source construct into its target correspondent. Thus, the compiler never introduces the additional target instruction . Since it is straightforward, the formalization of the compiler is elided.
Relating Traces. In the present model, source and target traces differ only in the fact that the target draws (regular) events from a strictly larger set than the source, i.e., . A natural relation between source and target traces essentially maps a given target trace the source trace that erases from those events that exist only at the target level. This is reasonable, because only target contexts (not compiled programs ) can perform the extra target actions, as the compiler does not introduce them. Let indicate trace filtered to retain only those elements included in alphabet . We define the trace relation as:
In the opposite direction, a source trace is related to many target ones, as any target-only events can be inserted at any point in . The induced mappings for this relation are:
That is, the target guarantee of a source property is that the target has the same source-level behavior, sprinkled with arbitrary target-level behavior. Conversely, the source-level obligation of a target property is the aggregate of those source traces, all of whose target-level enrichments are in the target property.
Since the languages are very similar, it is simple to prove that our compiler is secure according to the trace relation defined above.
7.2 An Instance of Trace-relating Robust Preservation of Safety Properties
I/O events are not the only instance of events that compilers consider. Especially in the setting of secure compilation, where a compartmentalized partial program interacts with a context, interaction traces are often used [3, 35, 59, 64]. Consider a language analogous to that of the previous section, where the context defines a set of functions and the program defines a different set . Interaction traces (generally) record the control flow of calls between these two sets via actions that are and [34]. These actions indicate a call to function with parameter and a return with return value . In case the context calls a function in (or returns to a function in ), the action is decorated with a (i.e., those actions are and ). Dually, the program calling a function in (or returning to it) generates an action decorated with a (i.e., those actions are and ).
Patrignani and Garg [64] consider precisely such a setting. Their languages are simple like those presented here but impure; their source has an ML-like heap and the target has a memory that is indexed by natural numbers and capabilities to protect addresses. Moreover, they define a compiler that preserves safety properties of source programs (i.e., it is in the sense of 6.3) by relying on the target capabilities. The interesting point, however, is that they also consider source and target traces to be distinct, since the two languages have different values. Concretely, the source has and and the target only has , plus in the source, heap addresses are abstract locations while in the target they are . Thus, to prove , they rely on a cross-language relation on values, which is lifted to trace actions, and then lifted point-wise to traces (analogously to what we have done in Section 4.3, 4.4, and 7.1). To relate addresses, their cross-language relation is equipped with a partial bijection between source and target addresses, this bijection grows monotonically with every reduction step.
Besides defining a relation on traces (which is an instance of ), they also define a relation between source and target safety properties that supports concurrent programs. 19 Thus, they really provide an instantiation of that maps all safe source traces to the related target ones. This ensures that no additional target trace is introduced in the target property, and source safety properties are mapped to target safety ones by . Thus, their compiler is proven to generate code that respects , so they really achieve a variation of from 6.3. Their proofs are based on standard techniques either for secure compilation (i.e., trace-based backtranslation [61]) and for correct compilation (i.e., forward/backward simulation [42]).
7.3 An Instance of Trace-relating Robust Preservation of Hypersafety Properties
Patrignani and Garg [63] study the preservation of hypersafety from the perspective of secure compilation. Again, their result can be interpreted in our setting. They consider reactive systems, where trace alphabets are partitioned in input actions and output actions , whose concatenation generate traces . We use the same notation as before and indicate such sequences as and , respectively. The set of target output actions includes an action that has no source counterpart (i.e., ) and whose output does not depend on internal state (thus, it cannot leak secrets). 20 By emitting whenever undesired inputs are fed to a compiled program (e.g., passing a when a is expected), hypersafety is preserved (as does not leak secrets) [63].
More formally, they assume a relation on actions that is total on the source actions and injective. From there, they define —which here corresponds to an instance of — that maps the set of valid source traces to the set of valid target traces (that now mention ) as follows:
where indicates that is an undesired input (intuitively, this is an information that can be derived from the set of source traces [63]).
Informally, given a set of source traces , generates all target traces that are related (point-wise) to a source trace (case ). Then (case ), it adds all traces () with interleavings of undesired input (third conjunct) followed by (first conjunct) as long as the interleavings split a trace that has already been mapped (second conjunct).
is an instance of that maps source hypersafety to target hypersafety (and therefore, safety to safety), thus our theory can be instantiated for the preservation of these classes of hyperproperties as well.
8 Related Work
We already discussed how our results relate to some existing work in correct compilation [41, 77] and secure compilation [3, 63, 64]. We also already mentioned that most of our definitions and results make no assumptions about the structure of traces. One result that partially relies on the structure of traces is 6.3, which refers to finite prefix, suggesting traces should be some sort of sequences of events (or states), as customary when one wants to refer to safety properties [18]. Without a notion of finite prefix, only may look different, but both and are trace-agnostic, as in general safety properties can be defined as the closed sets of any topology on traces [58].
Even for reasoning about safety, hypersafety, or arbitrary hyperproperties, traces can therefore be values, sequences of program states, or of input output events, or even the recently proposed interaction trees [81]. In the latter case, we believe that the compilation from IMP to ASM proposed by Xia et al. [81] can be seen as an instance of , for the relation they call “trace equivalence.”
Compilers Where Our Work Could Be Useful. Our work should be broadly applicable to understanding the guarantees provided by many verified compilers. For instance, Wang et al. [80] recently proposed a CompCert variant that compiles all the way down to machine code, and it would be interesting to see if the model at the end of Section 4.1 applies there, too. This and many other verified compilers [15, 36, 51, 73] beyond CakeML [77] deal with resource exhaustion, and it would be interesting to also apply the ideas of Section 4.2 to them.
Hur and Dreyer [33] devised a correct compiler from an ML language to assembly using a cross-language logical relation to state their CC theorem. They do not have traces, though were one to add them, the logical relation on values would serve as the basis for the trace relation and therefore their result would attain CC.
Switching to more informative traces capturing the interaction between the program and the context is often used as a proof technique for secure compilation [3, 34, 62]. Most of these results consider a cross-language relation, so they probably could be proved to attain one of the criteria from Figure 4.
Generalizations of Compiler Correctness. The compiler correctness definition of Morris [50] was already general enough to account for trace relations, since it considered a translation between the semantics of the source program and that of the compiled program, which he called “decode” in his diagram, reproduced in Figure 5 (left). And even some of the more recent compiler correctness definitions preserve this kind of flexibility [65]. While CC can be seen as an instance of a definition by Morris [50], we are not aware of any prior work that investigated the preservation of properties when the “decode translation” is neither the identity nor a bijection, and source properties need to be re-interpreted as target ones and vice versa. Correct Compilation and Galois Connections. Melton et al. [47] and Sabry and Wadler [70] expressed a strong variant of compiler correctness using the diagram of Figure 5 (right). They require that compiled programs parallel the computation steps () of the original source programs, which can be proven showing the existence of a decompilation map that makes the diagram commute, or equivalently, the existence of an adjoint for ( for both source and target). The “parallel” intuition can be formalized as an instance of CC. Take source and target traces to be finite or infinite sequences of program states (maximal trace semantics [19]), and relate them exactly like Melton et al. [47] and Sabry and Wadler [70].
Fig. 5.
Translation Validation. Translation validation is an important alternative to proving that all runs of a compiler are correct, as it can be more easily applied to realistic compilers. An interesting work about translation validation of security properties has been recently proposed by Namjoshi and Tabajara [53]. They can handle many security properties expressible in terms of automata as long as source and target attackers and the observable traces are the same.
Instantiating the definition of any of the presented criteria with a particular program, one has translation validation criteria with the map describing the target property that is (robustly) satisfied once the translation is validated. For example, one can consider
While the proof technique proposed by Namjoshi and Tabajara [53] might be generalized for —as long as and can be expressed as one of the automata they can handle—they do not work for because of the existential in the conclusion.
Busi et al. [13] are instead considering translation validation criteria in the spirit of , their preliminary work only allows equality as trace relation, but should be subject to a generalization to the trace-relating setting similar to the one we presented in this work.
Proof Techniques. We believe existing proof techniques (beyond the simulations discussed in Section 4.3.2) that have been devised to prove compiler correctness can also be employed to prove that a compiler attains any of the presented criteria. For example, cross-language binary logical relations can be used to relate two terms of two different languages when they “behave the same” [12, 33, 71]. Additionally, they can also be used when multiple programs “behave the same” [66] in a multilanguage semantics setting [45]. Secure compilation results (which rely on the criteria of Section 6) can be proven using variations of the backtranslation proof technique [22, 57, 64]. Presenting this proof techniques is beyond the scope of this article, so we refer the interested reader to the work of Patrignani et al. [61].
9 Conclusion and Future Work
We have extended the property preservation view on compiler correctness to arbitrary trace relations, and we believe that this will be useful for understanding the guarantees various compilers provide. An open question is whether, given a compiler, there exists a most precise relation for which this compiler is correct. As mentioned in Section 1, every compiler is CC for some , but under which conditions is there a most precise relation? In practice, more precision may not always be better, though, as it may be at odds with compiler efficiency and may not align with more subjective notions of usefulness, leading to tradeoffs in the selection of suitable relations. Finally, another interesting direction for future work is studying whether using the relation to Galois connections allows to more easily compose trace relations for different purposes, say, for a compiler whose target language has undefined behavior, resource exhaustion, and side-channels. In particular, are there ways to obtain complex relations by combining simpler ones in a way that eases the compiler verification burden?
Composition for Multipass Compilers. For now, we can already informally argue about the correctness of a multipass compiler, where each step is proved correct for a possibly different trace relation. Concretely, assume is a compilation chain from a source language to an intermediate language and from the intermediate language to a target language . 21 Assume given two relations between traces of these languages: and , such that each compiler is proven to be w.r.t. the expected trace relation: and .
Let us consider the source-to-target compiler that is derived of the composition of the two aforementioned compilers, so . In this case, we obtain the expected result: The correctness of the whole compiler is derived from the individual compiler correctness proofs for each step.
where .
Generalizing this kind of composition to compilers that attain different criteria is unclear. For example, if preserves arbitrary hyperproperties, but preserves 2-relational safety properties, then what can we conclude for ? We leave investigating these interesting matters for future work.
Acknowledgments
We thank Akram El-Korashy and Amin Timany for participating in an early discussion about this work and the anonymous reviewers for their valuable feedback.
Footnotes
1
For simplicity, for now, we ignore separate compilation and linking, returning to it in Section 6.
2
Typesetting convention [60]: We use a font for elements, an , font for ones, and a , font for elements common to both languages.
3
Stated at the top of the CompCert file driver/Complements.v and discussed by Regehr [68].
4
Given the deterministic nature of our programs, we consider notions of noninterference that are often used in deterministic languages. We leave notions of noninterference in nondeterministic languages for future work.
5
While target traces are often “more concrete” than source ones, trace properties (which in Coq we represent as the function type ) are contravariant in and thus target properties correspond to the abstract domain.
6
In case of ambiguity with property satisfaction the class of will be made explicit.
7
At least one other symmetric generalization is possible: For defined by , if produces a trace that violates the target interpretation of , i.e., , then produces thus violating .
8
is the topological closure in the topology where safety properties coincide with the closed sets (see, e.g., Clarkson and Schneider [18] and Pasqua and Mastroeni [58]).
9
At least one generalization is possible: . In this case, holds unconditionally while the other two implications hold under the same, but swapped, hypotheses from Theorem 3.8.
10
Making injective is a matter of adding open and close parentheses actions in target traces.
11
The exact shape of inputs and outputs depends on the scenario. For instance, inputs can be initial memories and outputs trace semantics of programs as in Reference [27, Section 7], while for interactive programs one may want to consider streams like Clark and Hunt [17]. We only require the sets of input and output projections to be disjoint. Further information, such as the ordering of events, is part of the attacker/observer model or the declassification of noninterference itself.
12
To be precise, the original formulation of ANI by Giacobazzi and Mastroeni [27] includes a third parameter , which describes the maximal input variation that the attacker may control. Here, we omit (i.e., take it to be the identity) to simplify the presentation.
13
In each compilation step, source and target traces are drawn from the same set so can be applied to both source and target traces.
14
The interested reader will notice the difference from the previous category by comparing condition (1) of Definition 5.10 and condition (1) of Definition 5.8 by Barthe et al. [7].
15
We remark has been used to conclude security of compilation in the previously discussed work of Sison and Murray [74] (and in its predecessor [52]). However, there is a key difference in the “role” of contexts: In robust compilation criteria, contexts model attackers, while in Sison and Murray [74] contexts are other bits of compiled code. This treatment lets Sison and Murray [74] reason compositionally about the concurrently executing compiled code.
16
. See, e.g., Clarkson and Schneider [18] and Pasqua and Mastroeni [58].
17
Technically, we should write: .
18
Notice that the languages are strictly terminating.
19
They call those safety properties monitors, since they focus on safety [72] and indicate with and with .
20
Technically, they assume a set of actions, but for this analogy a single action suffices.
21
For the intermediate language, we use a , font.
A Proofs
Proof of Theorem 5.1. First, we show that is an , the proof for is the same.
Monotonicity. is composition of monotonic functions, hence it is itself monotonic.
Idempotence. We have to show that for , , that unfolding the definition means
For the inclusion “,”
the inclusion holds, because and the equality comes from idempotency of .
For the inclusion “,”
the inclusion comes from by extensiveness of , and the equality from .
Extensiveness. We have to show that .
The first inclusion is due to extensiveness of , the second by being the upper adjoint of .
For the statement of the theorem to hold, assume and with , we have to show that .
By CC there exists and such that . As a preliminary, apply Lemma B.1 to the relations and deduce is injective. Notice also that by functionality and totality, of and of , and and a similar fact holds for and .
so .
We now show that if is surjective, i.e., injective, then .
Let , we show that for some .
The source property is such that . We only need to show . Let ,
that shows and concludes the proof.
Proof of Theorem 5.2. Assume and with . We have to show that , for an arbitrary that satisfies the condition
By CC there exists and such that . As a preliminary, recall that Lemma B.1 ensures is injective. Moreover, notice that by functionality and totality, of , and .
so .
Proof of Theorem 5.3. Assume and with and satisfying the condition . We have to show that . By CC there exists and such that . As a preliminary, recall that Lemma B.1 ensures is injective. Moreover, notice that by functionality and totality, of , and .
so .
References
[1]
Martín Abadi, Anindya Banerjee, Nevin Heintze, and Jon G. Riecke. 1999. A core calculus of dependency. In Proceedings of the 26th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’99). New York, NY, 147–160. DOI:https://doi.org/10.1145/292540.292555
Carmine Abate, Roberto Blanco, Ştefan Ciobâcă, Adrien Durier, Deepak Garg, Catalin Hritcu, Marco Patrignani, Éric Tanter, and Jérémy Thibault. 2020. Trace-relating compiler correctness and secure compilation. In Proceedings of the 29th European Symposium on Programming: Programming Languages and Systems, Held as Part of the European Joint Conferences on Theory and Practice of Software. 1–28. DOI:
Carmine Abate, Roberto Blanco, Deepak Garg, Cătălin Hriţcu, Marco Patrignani, and Jérémy Thibault. 2019. Journey beyond full abstraction: Exploring robust property preservation for secure compilation. In Proceedings of the 32nd IEEE Computer Security Foundations Symposium (CSF’19). Retrieved from https://arxiv.org/abs/1807.04603.
Gilles Barthe, Sandrine Blazy, Benjamin Grégoire, Rémi Hutin, Vincent Laporte, David Pichardie, and Alix Trieu. 2020. Formal verification of a constant-time preserving C compiler. Proc. ACM Program. Lang. 4, POPL (2020), 7:1–7:30. DOI:https://doi.org/10.1145/3371075
Gilles Barthe, Benjamin Grégoire, and Vincent Laporte. 2018. Secure compilation of side-channel countermeasures: The case of cryptographic “constant-time.” In Proceedings of the 31st IEEE Computer Security Foundations Symposium (CSF’18). 328–343. DOI:
Lennart Beringer, Gordon Stewart, Robert Dockins, and Andrew W. Appel. 2014. Verified compilation for shared-memory C. In Proceedings of the 23rd European Symposium on Programming: Programming Languages and Systems, Held as Part of the European Joint Conferences on Theory and Practice of Software. 107–127. DOI:https://doi.org/10.1007/978-3-642-54833-8_7
Frédéric Besson, Sandrine Blazy, and Pierre Wilke. 2019. A verified Comp-Cert front-end for a memory model supporting pointer arithmetic and uninitialised data. J. Autom. Reason. 62, 4 (2019), 433–480. DOI:https://doi.org/10.1007/s10817-017-9439-z
William J. Bowman and Amal Ahmed. 2015. Noninterference for free. In Proceedings of the ACM SIGPLAN International Conference on Functional Programming.
Qinxiang Cao, Lennart Beringer, Samuel Gruetter, Josiah Dodds, and Andrew W. Appel. 2018. VST-Floyd: A separation logic tool to verify correctness of C programs. J. Autom. Reason. 61, 1–4 (2018), 367–422. DOI:https://doi.org/10.1007/s10817-018-9457-5
Quentin Carbonneaux, Jan Hoffmann, Tahina Ramananandro, and Zhong Shao. 2014. End-to-end verification of stack-space bounds for C programs. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Michael F. P. O’Boyle and Keshav Pingali (Eds.). 270–281. DOI:https://doi.org/10.1145/2594291.2594301
David Clark and Sebastian Hunt. 2008. Non-interference for deterministic interactive programs. In Proceedings of the International Workshop on Formal Aspects in Security and Trust. Springer, 50–66.
P. Cousot and R. Cousot. 1977. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In Proceedings of the 4th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 238–252.
Patrick Cousot and Radhia Cousot. 1979. Systematic design of program analysis frameworks. In Proceedings of the 6th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages. 269–282.
Vijay D’Silva, Mathias Payer, and Dawn Xiaodong Song. 2015. The correctness-security gap in compiler optimization. In Proceedings of the IEEE Symposium on Security and Privacy Workshops. 73–87. DOI:https://doi.org/10.1109/SPW.2015.33
Ronghui Gu, Zhong Shao, Jieung Kim, Xiongnan (Newman) Wu, Jérémie Koenig, Vilhelm Sjöberg, Hao Chen, David Costanzo, and Tahina Ramananandro. 2018. Certified concurrent abstraction layers. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, Jeffrey S. Foster and Dan Grossman (Eds.). 646–661. DOI:https://doi.org/10.1145/3192366.3192381
István Haller, Yuseok Jeon, Hui Peng, Mathias Payer, Cristiano Giuffrida, Herbert Bos, and Erik van der Kouwe. 2016. TypeSan: Practical type confusion detection. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security. 517–528. DOI:https://doi.org/10.1145/2976749.2978405
Chung-Kil Hur and Derek Dreyer. 2011. A Kripke logical relation between ML and assembly. In Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Thomas Ball and Mooly Sagiv (Eds.). 133–146. DOI:https://doi.org/10.1145/1926385.1926402
Alan Jeffrey and Julian Rathke. 2005. Java Jr: Fully abstract trace semantics for a core java language. In Proceedings of the 14th European Symposium on Programming (Lecture Notes in Computer Science), Vol. 3444. 423–438. DOI:https://doi.org/10.1007/978-3-540-31987-0_29
Yannis Juglaret, Cătălin Hriţcu, Arthur Azevedo de Amorim, Boris Eng, and Benjamin C. Pierce. 2016. Beyond good and evil: Formalizing the security guarantees of compartmentalizing compilation. In Proceedings of the IEEE 29th Computer Security Foundations Symposium. 45–60. DOI:
Jeehoon Kang, Chung-Kil Hur, William Mansky, Dmitri Garbuzov, Steve Zdancewic, and Viktor Vafeiadis. 2015. A formal C memory model supporting integer-pointer casts. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation. 326–335. DOI:https://doi.org/10.1145/2737924.2738005
Jeehoon Kang, Yoonseung Kim, Chung-Kil Hur, Derek Dreyer, and Viktor Vafeiadis. 2016. Lightweight verification of separate compilation. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. Retrieved from http://sf.snu.ac.kr/sepcompcert/.
Jeehoon Kang, Yoonseung Kim, Chung-Kil Hur, Derek Dreyer, and Viktor Vafeiadis. 2016. Lightweight verification of separate compilation. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’16). Association for Computing Machinery, New York, NY, 178–190. DOI:https://doi.org/10.1145/2837614.2837642
Leslie Lamport and Fred B. Schneider. 1985. Formal foundation for specification and verification. In Distributed Systems: Methods and Tools for Specification, an Advanced Course. Springer-Verlag, 203–285. DOI:https://doi.org/10.1007/3-540-15216-4_15
A. Melton, D. A. Schmidt, and G. E. Strecker. 1986. Galois connections and computer science applications. In Proceedings of a Tutorial and Workshop on Category Theory and Computer Programming. 299–312. Retrieved from http://dl.acm.org/citation.cfm?id=20081.20099.
F. Lockwood Morris. 1973. Advice on structuring compilers and proving them correct. In Proceedings of the ACM Symposium on Principles of Programming Languages, Patrick C. Fischer and Jeffrey D. Ullman (Eds.). 144–152. DOI:https://doi.org/10.1145/512927.512941
Eric Mullen, Daryl Zuniga, Zachary Tatlock, and Dan Grossman. 2016. Verified peephole optimizations for CompCert. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation. 448–461. DOI:https://doi.org/10.1145/2908080.2908109
Toby C. Murray, Robert Sison, Edward Pierzchalski, and Christine Rizkallah. 2016. Compositional verification and refinement of concurrent value-dependent noninterference. In Proceedings of the IEEE 29th Computer Security Foundations Symposium. IEEE Computer Society, 417–431. DOI:
Kedar S. Namjoshi and Lucas M. Tabajara. 2020. Witnessing secure compilation. In Proceedings of the International Conference on Verification, Model Checking, and Abstract Interpretation. Springer, 1–22.
David A. Naumann and Minh Ngo. 2019. Whither specifications as programs. In Proceedings of the International Symposium on Unifying Theories of Programming. Springer, 39–61. Retrieved from https://arxiv.org/abs/1906.03557.
Georg Neis, Chung-Kil Hur, Jan-Oliver Kaiser, Craig McLaughlin, Derek Dreyer, and Viktor Vafeiadis. 2015. Pilsner: A compositionally verified compiler for a higher-order imperative language. In Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming. 166–178. DOI:https://doi.org/10.1145/2784731.2784764
Max New, William J. Bowman, and Amal Ahmed. 2016. Fully abstract compilation via universal embedding. In Proceedings of the ACM SIGPLAN International Conference on Functional Programming.
Michele Pasqua and Isabella Mastroeni. 2017. On topologies for (hyper)properties. In Joint Proceedings of the 18th Italian Conference on Theoretical Computer Science and the 32nd Italian Conference on Computational Logic co-located with the IEEE International Workshop on Measurements and Networking IEEE M&N’17). (CEUR Workshop Proceedings), Dario Della Monica, Aniello Murano, Sasha Rubin, and Luigi Sauro (Eds.), Vol. 1949. 150–161. Retrieved from http://ceur-ws.org/Vol-1949/ICTCSpaper13.pdf.
Marco Patrignani and Deepak Garg. 2017. Secure compilation and hyperproperty preservation. In Proceedings of the 30th IEEE Computer Security Foundations Symposium. 392–404. DOI:
Marco Patrignani and Deepak Garg. 2019. Robustly safe compilation. In Proceedings of the 28th European Symposium on Programming: Programming Languages and Systems (ESOP’19). Retrieved from https://arxiv.org/abs/1804.00489.
Tahina Ramananandro, Zhong Shao, Shu-Chun Weng, Jérémie Koenig, and Yuchen Fu. 2015. A compositional semantics for verified separate compilation and linking. In Proceedings of the Conference on Certified Programs and Proofs. 3–14. DOI:https://doi.org/10.1145/2676724.2693167
Gabriel Scherer, Max S. New, Nick Rioux, and Amal Ahmed. 2018. FabULous interoperability for ML and a linear language. In Proceedings of the 21st International Conference on Foundations of Software Science and Computation Structures, Held as Part of the European Joint Conferences on Theory and Practice of Software (Lecture Notes in Computer Science), Christel Baier and Ugo Dal Lago (Eds.), Vol. 10803. Springer, 146–162. DOI:
Jaroslav Sevcík, Viktor Vafeiadis, Francesco Zappa Nardelli, Suresh Jagannathan, and Peter Sewell. 2013. CompCertTSO: A verified compiler for relaxed-memory concurrency. J. ACM 60, 3 (2013), 22:1–22:50. DOI:https://doi.org/10.1145/2487241.2487248
Robert Sison and Toby Murray. 2019. Verifying that a compiler preserves concurrent value-dependent information-flow security. In Proceedings of the 10th International Conference on Interactive Theorem Proving (LIPIcs), John Harrison, John O’Leary, and Andrew Tolmach (Eds.), Vol. 141. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 27:1–27:19. DOI:
Lau Skorstengaard, Dominique Devriese, and Lars Birkedal. 2019. StkTokens: Enforcing well-bracketed control flow and stack encapsulation using linear capabilities. Proc. ACM Program. Lang. 3, POPL (Jan. 2019), 19:1–19:28.
Gordon Stewart, Lennart Beringer, Santiago Cuellar, and Andrew W. Appel. 2015. Compositional CompCert. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 275–287. DOI:https://doi.org/10.1145/2676726.2676985
Yong Kiam Tan, Magnus O. Myreen, Ramana Kumar, Anthony Fox, Scott Owens, and Michael Norrish. 2019. The verified CakeML compiler backend. J. Funct. Program. 29 (2019). DOI:
Xi Wang, Haogang Chen, Alvin Cheung, Zhihao Jia, Nickolai Zeldovich, and M. Frans Kaashoek. 2012. Undefined behavior: What happened to my code? In Proceedings of the Asia-Pacific Workshop on Systems. 9. DOI:https://doi.org/10.1145/2349896.2349905
Xi Wang, Nickolai Zeldovich, M. Frans Kaashoek, and Armando Solar-Lezama. 2013. Towards optimization-safe systems: Analyzing the impact of undefined behavior. In Proceedings of the ACM SIGOPS 24th Symposium on Operating Systems Principles. 260–275. DOI:https://doi.org/10.1145/2517349.2522728
Yuting Wang, Pierre Wilke, and Zhong Shao. 2019. An abstract stack based approach to verified compositional compilation to machine code. Proc. ACM Program. Lang. 3, POPL (2019), 62:1–62:30. DOI:https://doi.org/10.1145/3290375
Li-yao Xia, Yannick Zakowski, Paul He, Chung-Kil Hur, Gregory Malecha, Benjamin C. Pierce, and Steve Zdancewic. 2020. Interaction trees: Representing recursive and impure programs in Coq. Proc. ACM Program. Lang. 4, POPL (2020), 51:1–51:32. DOI:https://doi.org/10.1145/3371119
Jianzhou Zhao, Santosh Nagarakatte, Milo M. K. Martin, and Steve Zdancewic. 2012. Formalizing the LLVM intermediate representation for verified program transformations. In Proceedings of the 39th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 427–440. Retrieved from http://www.cis.upenn.edu/stevez/papers/ZNMZ12.pdf.
Thibault JBlanco RLee DArgo SAzevedo de Amorim AGeorges AHriţcu CTolmach ALuo BLiao XXu JKirda ELie D(2024)SECOMP: Formally Secure Compilation of Compartmentalized C ProgramsProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3670288(1061-1075)Online publication date: 2-Dec-2024
Security-preserving compilers generate compiled code that withstands target-level attacks such as alteration of control flow, data leaks, or memory corruption. Many existing security-preserving compilers are proven to be fully abstract, meaning that they ...
Compiler correctness is, in its simplest form, defined as the inclusion of the set of traces of the compiled program into the set of traces of the original program, which is equivalent to the preservation of all trace properties. Here traces ...
This paper is a preliminary report on an experiment in applying Floyd's method of inductive assertions to the compiler correctness problem. Practical postfix translators are considered, and the semantics of source and object languages are characterized ...
Thibault JBlanco RLee DArgo SAzevedo de Amorim AGeorges AHriţcu CTolmach ALuo BLiao XXu JKirda ELie D(2024)SECOMP: Formally Secure Compilation of Compartmentalized C ProgramsProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3670288(1061-1075)Online publication date: 2-Dec-2024
Charguéraud AChlipala AErbsen AGruetter S(2023)Omnisemantics: Smooth Handling of NondeterminismACM Transactions on Programming Languages and Systems10.1145/357983445:1(1-43)Online publication date: 8-Mar-2023
Ballou KSherman E(2023)Identifying Minimal Changes in the Zone Abstract DomainTheoretical Aspects of Software Engineering10.1007/978-3-031-35257-7_13(221-239)Online publication date: 4-Jul-2023