Towards Reproducible Execution of Closed-Source Applications from Internet Archives

Mahadev Satyanarayanan, Computer Science Department, Carnegie Mellon University, USA, [email protected]

Jan Harkes, Computer Science Department, Carnegie Mellon University, USA, [email protected]

James Blakley, Computer Science Department, Carnegie Mellon University, USA, [email protected]

DOI: https://doi.org/10.1145/3589806.3600035
ACM REP '23: 2023 ACM Conference on Reproducibility and Replicability, Santa Cruz, CA, USA, June 2023

Olive enables execution of closed-source applications decades after their creation. With appropriate authentication and authorization, anyone on the Internet can execute any archived application with no more effort than a mouse click. User experience is good, even for an interaction-intensive application. Olive uses virtual machine (VM) technology to encapsulate legacy software, including the operating system and all layers above it. If the legacy hardware is already obsolete at curation time, an emulator for it on more modern hardware can be included within the VM image. This paper is an experience report on the decade-long evolution of this concept.

CCS Concepts: • Applied computing → System forensics; • Applied computing → Evidence collection, storage and analysis; • Computer systems organization → Architectures; • Software and its engineering → Operating systems; • Software and its engineering → Virtual machines; • Software and its engineering → Maintaining software;

Keywords: archival software, execution fidelity, hardware emulation, long-term preservation, VM streaming, edge computing, prefetching

ACM Reference Format:
Mahadev Satyanarayanan, Jan Harkes, and James Blakley. 2023. Towards Reproducible Execution of Closed-Source Applications from Internet Archives. In 2023 ACM Conference on Reproducibility and Replicability (ACM REP '23), June 27--29, 2023, Santa Cruz, CA, USA. ACM, New York, NY, USA 12 Pages. https://doi.org/10.1145/3589806.3600035

1 Software and the Scientific Method

The role of software in the scientific method is illustrated by a scholarly controversy in economics early in the 21st century [44]. In 2010, Reinhart and Rogoff published an analysis of economic data spanning many countries [49, 50]. Coming at the height of a global financial crisis, it had enormous impact on austerity measures worldwide. However, in 2013, Herndon et al [34] refuted their findings by discovering an error in their calculations. They described the error as follows [48]:

“The Reinhart-Rogoff research is best known for its result that, across a broad range of countries and historical periods, economic growth declines dramatically when a country's level of public debt exceeds 90 per cent of gross domestic product.

⋅⋅⋅

When we performed accurate recalculations using their dataset, we found that, when countries’ debt-to-GDP ratio exceeds 90 per cent, average growth is 2.2 per cent, not -0.1 per cent.”

The controversy continues, but regardless of how it is eventually resolved, there is no denying the central role of software (in this case, a Microsoft Excel spreadsheet) in the original analysis, its refutation and its ongoing resolution.

In the Reinhart-Rogoff example, only three years had elapsed since the original publication of results. Imagine, however, that the recalculations were attempted by a researcher 50 years later. Would Microsoft Excel still be in use? If so, would that version accept the original data format? Would the calculations performed be identical in every respect (e.g., handling of rounding errors)? What if Microsoft goes out of business, and Windows ceases to be in use? Our growing dependence on proprietary software challenges the premise of reproducibility, which is the bedrock of science.

At the heart of the scientific method is the ability to precisely reproduce previous results. Lowering this barrier encourages independent validation. The lowest conceivable barrier is one-click execution, much like viewing a PDF document on a web page today. This is difficult to achieve for reasons explained in Section 2. Merely archiving source code and saying “Build it yourself” raises the barrier for future researchers to attempt independent validation. But even this option is not viable for software that is closed-source.

Today, proprietary software is pervasive in science and technology. Examples include data analysis tools to slice and dice raw data, visualization tools to zoom in or zoom out of results, CAD tools to view detailed designs of artifacts, digital twins that emulate engineered systems, and simulation models that use a variety of programming languages, supporting libraries and reference data sets. In many scientific and engineering endeavors, closed-source software plays a crucial role. While exclusive use of open source software in the interests of reproducibility is laudable, the reality is that use of closed-source software is sometimes unavoidable. This paper addresses the part of the reproducibility problem space that involves closed-source software in any aspect of its workflow. It is relevant to settings where there is even a single closed-source application in an otherwise open source workflow.

Since 2012, we have been exploring the creation of an Internet-wide archive of curated virtual machine (VM) images. Our goal is to enable anyone on the Internet, with suitable authentication and authorization, to execute any archived application with no more effort than a mouse click. User experience should be good, even for an interaction-intensive application. Archived executable content should be preserved “as is” forever. In other words, no security patches, bug fixes, or other modifications should be applied. Archived content should thus remain “bug compatible” with the original in perpetuity. Such an Internet-wide archive would be a valuable resource for science, engineering and intellectual property forensics (e.g., prior art discovery in patent litigation).

Our implementation has evolved over a decade in the context of a system called Olive. This paper is an experience report on the three phases of this evolution:

Olive2014: VM execution in the cloud or at the desktop.
vTube: prefetching of pages of a VM image over low bandwidth networks using machine learning.
Olive2022: VM execution on servers located at the edge of the Internet (i.e., “cloudlets” [53]).

Figure 2: Three-tier System Architecture [56]

2 Execution Fidelity

Precise reproduction of software execution, which we call execution fidelity, is a complex problem in which many moving parts must all be perfectly aligned for a solution. Preserving this alignment over space and time is difficult. Many things can change: the hardware, the operating system, dynamically linked libraries, configurations, user preferences, geographic location, execution timing, etc. Even a single change may hurt fidelity or completely break execution.

Unfortunately, the available mechanisms for enforcing execution fidelity are weak. Most software distribution today takes the form of install packages, typically in binary form but sometimes in source form. The act of installing a package involves checking for a wide range of dependencies, discovering missing components, and ensuring that the transitive closure of dependencies involving these components is addressed. Tools have been developed to simplify and partially automate these steps. However, the process still involves considerable skill and knowledge, remains failure-prone, and typically involves substantial time and effort.

These difficulties loom large to any researcher who attempts to re-validate old scientific results. Software install packages themselves are static content, and can be archived in a digital library using the same mechanisms that are used to archive scientific data. However, the chances of successfully installing and executing this software in the distant future are low. These challenges have long stymied efforts to archive executable content [15, 16, 41].

Our approach, therefore, is to pre-construct the transitive closure of dependencies and then freeze it. A small step in this direction is the use of static linking rather than dynamic linking. A bigger step is the use of containers, where even more of the dependencies are prebuilt and frozen. The extreme limit of this approach is construction of a VM image. In that case, the guest operating system and all layers above it are included in the transitive closure that is frozen. If the target hardware is already obsolete at the time the VM image is created, an emulator for it on more modern hardware can be included in the transitive closure that is frozen.

This approach is illustrated by Figure 1. Descending through the dependencies, layers 8 through 6 are the transitive closure that is frozen into a VM image. If an emulator for old hardware is also included, it appears as Layer 5 in Figure 1. The lower layers (4 through 1) represent the future environment in which the frozen dependencies are executed. Although these lower layers are very different in Olive2014 and Olive2022, layers 8 through 5 remain unchanged. In other words, VM images that were created for Olive2014 can be executed unmodified on Olive2022.

Some important tradeoffs are embodied in our approach. First and foremost, static capture of the transitive closure of dependencies bloats the size of archived content. This is, of course, why VM images tend to be large. Historically, dynamic linking became popular precisely because of the smaller memory footprints of applications. Archiving VM images deliberately sacrifices this benefit in return for greatly improved execution fidelity. A second tradeoff involves security vulnerabilities. Dynamic linking is attractive for system maintenance because a security patch only has to be applied once, to a single library. This efficiency is lost when an application is statically bound with all its dependencies. Now, each application has to be patched. However, if the goal is to preserve an accurate record of the vulnerability of the application for posterity, this shortcoming becomes a valuable feature.

Figure 1 only depicts the dependencies at a single node. If the application being archived is spread over multiple nodes of a distributed system, then the dependencies at each node would have to be captured in a VM similar to Figure 1. Furthere, there would need to be a “VM of VMs” that captures cross-node dependencies such as network addressing conventions, inter-node protocols and data exchange formats. Future work, beyond Olive2022, would be needed to implement an execution environment for such a “VM of VMs.” In principle, this concept could be built up hierarchically, though the value of such effort is not clear at this time. This paper focuses on archival preservation at a single node.

3 Why Hardware Virtualization?

In the context of Olive, virtualization refers specifically to hardware virtualization of the Intel x86 architecture, and the term “VM” refers to a virtualized x86 machine. Today, this hardware architecture is dominant and is efficiently virtualized using Intel's VT extensions. Olive benefits indirectly from the many efforts in academia and industry that are aimed at improving the performance and functionality of VM-based systems for cloud computing. As described in Section 6, Olive VMs can archive software written for other hardware architectures. That involves an additional layer of emulation that is nested within the x86 VM, and thus incurs additional runtime overhead. Specific benefits arise from our choice of x86 as the virtualization target rather than software virtualization alternatives such as the Java VM (JVM) [38] or the Dalvik VM [18].

First, the VM interface is compatible with legacy operating systems and their valuable ecosystems of applications. The ability to sustain these ecosystems without code modifications is a powerful advantage of VMs. The ecosystems supported by software virtualization tend to be much smaller. For example, a JVM is only valuable in supporting applications that compile to Java bytecode. In contrast, a VM is language-agnostic and OS-agnostic. In fact, a JVM can be part of VM's ecosystem. Hardware virtualization can thus subsume software virtualization.

Second, a VM interface is narrow and stable relative to typical software interfaces. These attributes help to preserve execution fidelity over long periods of time. The stability of a VM interface arises from the fact that the hardware it emulates itself evolves very slowly and almost always in an upward-compatible manner. In contrast, the pliability of software results in more rapid evolution and obsolescence of interfaces. Keeping up with these changes requires high software maintenance effort. Pliability also leads to widening of narrow interfaces over time. Over time, the burden of sustaining a wide interface compromises execution fidelity.

This reasoning leads to an approach in which VMs play a role for archiving executable content that is analogous to the role played by a standardized document reader format such as PDF today. There may be many alternative paths to producing a VM image, but once produced that image can be saved in an Internet library and executed on demand by anyone with appropriate access privileges.

Lightweight virtualization approaches such as Linux containers [14] and Docker [23] are possible alternatives to VMs. Unfortunately, they were not designed for long-term archiving and would require active maintenance of considerably more software infrastructure than a VM-based Olive. Our desire to preserve complete environments (both Linux and non-Linux, including Windows and MacOS) for decades-long periods favors the use of VMs. For long-term archiving and execution fidelity, there is no substitute for hardware virtualization.

4 Execution Site

The answer to the question “Where should an archived VM image be executed?” has changed over time relative to Figure 2. Back in the 2011–2012 timeframe when Olive was first conceived [54, 55], only Tier-3 (devices) and Tier-1 (the cloud) existed. Following standard practice in cloud computing, we launched VM instances at Tier-1 from VM images archived there. A remote desktop protocol was used for user interactions with this VM instance from a Tier-3 device. We use the generic term “RDP” for this protocol. It spans a wide range of possible implementations such as VNC [51], SPICE [37], PCoIP [11], VMware Blast [62], and Microsoft RDP.

The limitations of Tier-1 as the execution site quickly became apparent. These limitations are inherent to extending keyboard and mouse interactions over a WAN. Our early experiments [61] had suggested that consistent end-to-end latency below 150 ms would suffice for RDP to deliver a good user experience for text-oriented interactive applications. Our experience confirmed this expectation, but it also exposed the limitations of RDP for applications with rich graphics and deeply-immersive user interactions. In those cases, RDP protocol optimizations were not able to fully mask the high latency and jitter of WAN connectivity. It is for precisely this reason that cloud-based interactive applications such as Microsoft Office 365 have been re-implemented to avoid RDP. Unfortunately, modifying the interaction protocol to optimize Tier-1 to Tier-3 roundtrips is not an option for closed-source applications. The whole point is to preserve every aspect of the application unchanged.

Tier-3 was the only other option for execution site in the 2011–2012 timeframe. We created Olive2014 for streamed execution of a VM instance at Tier-3 from a VM image at Tier-1. The user experience is similar to streamed viewing of a video on YouTube. To avoid a long delay before execution begins, Olive2014 starts execution as soon as an initial set of pages arrives at Tier-3. If execution accesses parts of the VM image that have not yet been fetched, the VM equivalent of a page fault is generated and VM instance execution is paused. When the missing page has been fetched from Tier-1, execution resumes. We present details of this mechanism and our experience with it in Section 5.

It is well known that page fault handling over a high-latency network can slow execution unacceptably. To improve user experience under these conditions, we explored the use of machine learning to predict and prefetch the page accesses of VM-encapsulated applications. We summarize our learnings from this work in Section 7.

By 2020–2021, edge computing [52] (Tier-2 in Figure 2) had emerged. Commercial deployments of Tier-2 such as AWS Wavelength [63] and Microsoft Azure Stack Hub [1, 2] now exist. These enable new edge-native applications that are simultaneously bandwidth-hungry, latency-sensitive, and compute-intensive [58].

Although edge computing has emerged for reasons totally unrelated to reproducibility, the network proximity (i.e., low latency and high bandwidth) of Tier-3 to Tier-2 is a perfect fit for RDP. In other words, one can view RDP-based interaction with a VM-encapsulated legacy application as an edge-native application. This suggests the possibility of using Tier-2, rather than Tier-3, as the execution site for VM instances. Such a move has technical and legal/licensing advantages detailed in Section 8. We have therefore created Olive2022, a new implementation that embodies this strategy. We have confirmed that VM images that were created for Olive2014 can be used without any changes in Olive2022. Relative to Figure 1, layers 8 through 5 are unchanged, but layers 4 through 1 are different in Olive2022 and Olive2014.

Figure 4: Olive2014 Client Implementation

5 Olive2014

Figure 3 illustrates the abstract structure of an Olive2014 client. Layers 8 through 5 are unchanged from Figure 1. They are the transitive closure of dependencies frozen into a VM image and archived in the cloud. At the bottom (Layers 1 and 2) is standard Intel x86 hardware running Linux. Above this (Layer 3) is VMNetX, which is the heart of Olive2014. This component implements caching and prefetching of VM images over the Internet. VMNetX presents the illusion of a fully assembled VM image to the KVM/QEMU layer above (Layer 4), which virtualizes the x86 host hardware.

Figure 5: Format of Web Page in Olive Archive

Figure 4 shows how the abstract layers of Figure 3 are mapped to the implementation of an Olive2014 client. Layers 8 through 5 are encapsulated within the VM instance shown on the left. Layer 4 (KVM/QEMU) is explicitly shown in this figure, even though it is typically viewed as an integral part of Linux. Layer 3 (VMNetX) maps to a user-level cache manager and two persistent file caches (named “pristine” and “modified” in Figure 4).

Figure 7: Screenshots of Example Applications Archived in Olive2014

Execution begins with the master boot record (MBR) of the virtual disk image being fetched into the pristine cache. As the VM instance executes, it may access uncached parts of its VM image. VMNetX services the resulting cache misses via HTTP range requests to a standard Apache Web server. The “web page” in this case is a large archival file that contains all components of the VM image, including its disk image, its memory image, and its hardware configuration (i.e., “Domain XML”). Figure 5 shows its layout.

The partitioning of pristine and modified VM state into separate caches makes it easier to ensure that a fresh launch of a VM instance always starts with the bit-exact VM image in the cloud. Missing state that is fetched from the cloud is placed in the pristine cache. As a VM instance executes, some of this state may be modified. If that modified state is written out to disk, it goes into the modified cache. No evictions are ever performed on the modified cache, and it only persists for the lifetime of its VM instance. The lifetime of the contents of the pristine cache is determined by standard LRU management. Since this cache is persistent, state cached by one execution of an archived application may benefit future executions of the same application at this client.

The domain XML information shown in Figure 5 is used for precise configuration of the VM instance used for executing an archived application. This includes details such as number of CPU cores, amount of memory, and peripheral devices. Figure 6 is an example of such a specification.

6 Experience with Olive2014

Olive2014 was successful in delivering archived close-source applications over the Internet. User experience was good even for highly interactive applications with rich graphics content. A collection of VM images for over 15 operating systems and applications was created [6]. These range in vintage from the early 1980s through 2013, and many are closed-source. A 2015 video that includes live demos of many of these applications can be found in YouTube [12].

Because of software licensing restrictions, access to this archival collection over the Internet is strictly limited. To give a taste of this collection, and to highlight the diverse content that can be archived in Olive, we briefly describe a few of the applications below. As mentioned earlier, VM images that were created for Olive2014 have all proved to be usable in Olive2022 also.

Microsoft Office 6.0: Figure 7(a) shows a screenshot of this VM, containing Word, Excel and PowerPoint for Windows 3.1. If Reinhart and Rogoff had published their controversial paper [49] in the 1993-94 timeframe, this is the VM that you would need to re-validate their results today.

NCSA Mosaic: As the world's first widely-used web browser dating back to 1992-93, Mosaic has a unique historical status. This VM, whose screenshot is shown in Figure 7(b), is also interesting for a second reason. The version of Mosaic that it encapsulates was written for the Apple MacOS 7.5 operating system on Motorola 68040 hardware. The VM also encapsulates Basilisk II, an open source hardware emulator for Motorola 68040 on modern Intel x86 hardware running Linux. The bootable disk image of MacOS 7.5 with Mosaic is stored as a file in the virtual file system of the outer Linux guest. In spite of two levels of virtualization, performance is acceptable because modern hardware is so much faster than the original Apple hardware. Pointing the Mosaic browser at modern web sites is instructive. Since Mosaic predates web technologies such as JavaScript, HTTP 1.1, Cascading Style Sheets, and HTML5 it is unable to render content from modern web sites. It is, however, capable of rendering web pages from some older Internet sites.

Chaste 3.1: This computational biology VM (Figure 7(c)) illustrates the value of Olive in stably preserving the environment needed to build from source code. Chaste (Cancer, Heart and Soft Tissue Environment) is a simulation package for computationally demanding problems in biology and physiology that was developed at Oxford University. This particular release of Chaste was packaged with a paper that was published in March 2013 [43]. Less than two years after the paper was published, the source code could no longer be compiled on current Linux releases. Many source code changes are needed before compilation succeeds today. This problem will grow worse over time, and represents exactly the kind of barrier to entry for scientific reproducibility that was mentioned in Section 1. The Chaste VM contains a frozen Linux environment in which the Chaste code successfully compiles. It also contains example data that was published with the paper. Running Chaste on this data produces videos showing visualizations of certain muscle functions. With high confidence of success, a future researcher who wishes to explore a modification to the published software can edit the code in the VM, then compile and run it, and finally view the generated videos, all within a single VM instance. This early success can help the researcher to decide whether it is worth the effort to port the system to a modern software environment.

ChemCollective: This VM (Figure 7(d)) illustrates how Olive can be used to archive frozen snapshots of a cloud service in ready-to-execute form. The ChemCollective is a web-based service for teaching and self-learning chemistry. It contains a collection of virtual labs, scenario-based learning activities, tutorials, and concept tests. Teachers can use the content for pre-labs, for alternatives to textbook homework, and for in-class activities for individuals or teams. The live web site is constantly evolving, as new material is added and old material is updated. This VM represents a frozen snapshot of the web service at one point in time, and contains the complete static data of the web site, an application server, a web server, and a browser. The VM's hostname to IP mapping has been modified to redirect all ChemCollective references back to the local host. This ensures that ChemCollective links traversed by the browser within the VM will map to the frozen ChemCollective service within the VM rather than the live ChemCollective web site.

Great American History Machine: This application was created in the late 1980s for the Andrew environment [45] at Carnegie Mellon University to teach 19th century and early 20th century American history [42]. The version of the application that was ported to Windows 3.1 by the University of Maryland in the early 1990s (Figure 7(d)) was used in history courses at many universities in the United States. This educational software used census and election data to convey important historical concepts such as the origins of the Civil War. Because of lack of financial resources to port the software to newer Windows platforms, it fell into disuse over time. No modern equivalent of this software exists today, but there have been numerous requests for its resurrection.

TurboTax 1997: This application for Windows 3.1 and Windows 95 was used by millions of Americans to prepare their 1997 tax returns. Since TurboTax is updated each year to reflect the current tax laws, a suite of TurboTax VMs from consecutive years can offer unique historical value. Imagine a class in political science, public policy or economics assigning students a project based on TurboTax versions that are ten years apart. By calculating the tax returns for hypothetical families with different sources and amounts of income, students can see for themselves the impact of tax code changes over time. Such active learning can transform the abstract topic of tax law into valuable real-world insights.

7 Prefetching for Last-Mile Networks

Olive2014 uses a pure demand-fetch policy for caching VM state. In other words, the VMNetX cache manager shown in Figure 4 only issues an HTTP range request when it receives a request (via the FUSE interface) for a missing part of a VM image. Last-mile networks such as cellular wireless networks can be painful with this approach. Their low bandwidth and high latency slows demand paging of Olive VMs, and leads to unacceptable user experience.

Prefetching parts of VM state in advance of demand can help in two ways. First, all or part of the cost of cache miss servicing is overlapped with client execution prior to the miss. Second, prefetching in bulk allows TCP windows to grow to optimal size and thus reduces the per-byte transfer cost. Unfortunately, it is well known that prefetching is a double-edged sword. Acting on incorrect prefetching hints can clog a network with junk, thereby hurting overall performance. It can also exacerbate buffer bloat [30].

Prefetching therefore has to be controlled so that it helps as much as possible, but never hurts. In practice, this translates into two key decisions: (1) choosing what state to predictively stream to minimize application performance hiccups during execution; and (2) choosing when and for how long to pause a VM for buffering. These decisions must factor in several criteria such as historical VM behavior, current VM behavior, current network bandwidth, and the unpredictable behavior of human users. Even when an archival VM is launched for use of a specific application, it is still an ensemble of software that may include multiple processes, threads, and code paths interacting in non-deterministic ways. Different users may vary in how they interact with the same application.

In spite of these challenges of prefetching, our experiments show a clear win. Although each VM instance is unique in its access pattern, short stretches of accesses can be dynamically predicted with sufficient accuracy for prefetching. vTube is a derivative of Olive2014 that embodies this work. Experimental results from vTube have been presented in an earlier paper [7].

Figure 8: Clustering in Traces (Source: [7])

Using offline machine learning (ML), vTube performs fine-grained analysis of access traces from previous executions. Despite wide variance from execution to execution and from user to user, this analysis identifies short segments of repeatability called clusters (Figure 8). Clusters are exceptionally stable across multiple executions, and can be used as the basis of high-quality prefetching hints.

At runtime, an online algorithm continuously examines accesses to VM state and tries to detect clusters. When the VM demand fetches part of a cluster, the vTube server identifies a set of clusters deemed necessary in the near future and sends it together with the accessed cluster. The process of identifying the cluster set is illustrated in Figure 9. A time horizon called lookout window is used to bound the scope of prefetching. Within the lookout window, clusters are prefetched based on size and likelihood of access.

Figure 9: Cluster Selection for Prefetching (Source: [7])

Prefetching can significantly improve user experience. Figure 10 shows sample runs of vTube for a VM encapsulating the game Riven over various last-mile networks. For each run, black continuous lines indicate periods of uninterrupted execution. Gray interruptions indicate when VM execution is stalled, either due to explicit buffering or demand misses. Qualitatively, user experience during these sessions is comparable to viewing video over a last-mile network. There is a significant initial period of buffering, after which execution is mostly free of hiccups. There are occasional periods when execution is paused for buffering.

Although vTube was successful as a research project, its functionality was never incorporated into Olive2014. By the time vTube was completed and evaluated, a number of technical and non-technical limitations of using Tier-3 as the execution site for Olive had emerged. Further, industry evolution in the Internet of Things (IoT) led to the emergence of Tier-2. As a result, Tier-3 no longer seems to be the optimal execution site for Olive VMs today. We discuss this most recent phase of Olive's evolution in the next section.

Figure 10: vTube User Experience on Riven (Source: [7])

8 Olive2022

8.1 Limitations of Tier-3 as VM Execution Site

From a user experience point of view, Tier-3 is the best possible site for exection of a VM instance. No other site offers such low latency for RDP interactions. However, user experience is not the only factor to consider in placement. We discuss below two other factors that came to dominate attention after Olive2014 was created.

In the decade from 2012 to 2022, many changes occurred in IT environments. First, users moved away from powerful desktops that could easily run VM instances to ultralight notebooks, tablets and smartphones [27, 28, 29]. Second, many newer Tier-3 devices are based on ARM hardware rather than x86 hardware. Third, as mentioned in Section 4, the advent of IoT has made Tier-2 a commercial reality.

Independent of this IT evolution, software vendors expressed reluctance to make their proprietary software available for one-click execution via Olive2014. There was serious concern about the possibility of lost revenue from software upgrades. Older versions of software would be readily available from an Internet archive, even when Tier-3 hardware is upgraded. Today, the Tier-3 hardware upgrade cycle implicitly drives software upgrades as well. Enforcement of software licensing constraints at Tier-3 is challenging because of the scale of the problem, the open-ended time horizon, the placement of Tier-3 devices behind Internet firewalls, and their management by users rather than professional IT staff.

8.2 Leveraging Tier-2

Edge computing offers a unique opportunity for Olive. Proper runtime choice of cloudlet can ensure network proximity of the user at Tier-3 and the VM instance at Tier-2 with which she is interacting via RDP (Figure 2). It can also ensure hardware requirements, such as presence of a powerful GPU.

Unlike Tier-3, where equipment is typically owned and managed by users, Tier-2 is typically managed by professional IT staff. Today, telcos such as Verizon, Vodafone, and T-Mobile own and operate Mobile Edge Computing (MEC) services at Tier-2, based on Amazon AWS and Microsoft Azure. These services are effectively “bringing the cloud closer,” which is the whole point of edge computing.

From a scaling point of view, there will be far fewer Tier-2 entities than Tier-3 entities. Depending on the size of cloudlets, we expect fan-outs of 10¹ to 10³ to be typical. The orders-of-magnitude smaller scale and the difference in ownership together lower the perceived risk of software piracy. Hence, legal agreements for licensing are expected to be easier to achieve and enforce at Tier-2 than at Tier-3.

Figure 11 shows the abstract structure of an Olive2022 VM instance. Layers 8 through 5 are unchanged from Figures 1 and 3. Relative to Figure 3, VMNetX (Layer 3) is missing. It is not needed because the entire VM image is expected to already be present at the chosen cloudlet via advance provisioning. The only difference between an Olive VM instance at Tier-2 and one at Tier-1 is that RDP interactions traverse far fewer network hops. This shrinks the mean and tail latency, thus resulting in a better user experience.

8.3 Cross-tier Orchestration

The performance benefits of Tier-2 for Olive only apply if the choice of cloudlet is near-optimal at runtime relative to the current location and operating environment of the user's Tier-3 device. Sinfonia is an open source system that enables an application launched on a Tier-3 device to find and dynamically associate with its software back-end on a Tier-2 cloudlet [57]. This association is transient, and may involve launching of the back-end software on the chosen cloudlet. The association is typically stable for a few minutes to a few hours, and may be broken for many reasons: e.g., the app is terminated, the device moves by a large distance, the cloudlet becomes overloaded, etc. Sinfonia can then be used to find a new cloudlet. It can also be used to prepare for a seamless transfer of VM execution across cloudlets via a mechanism such as VM Handoff [32].

Befitting its role as a cross-tier mechanism, Sinfonia has code components that reside at Tier-1, Tier-2 and Tier-3. It returns a short list of plausible targets, much like hostname lookup in DNS. Each target is a VPN endpoint of a private IP address to an application-specific backend on a cloudlet that is “good enough” to meet the app's stated requirements. The details of this discovery process have been described in an earlier paper [57]. The application can just blindly pick any one of the offered cloudlets, or perform application-specific end-to-end runtime performance tests before selecting one of them. The unused backends are asynchronously garbage collected. The Wireguard VPN mechanism [24] is used to bind a private network from an application on a Tier-3 device to a target cloudlet. Figure 12 illustrates the final state at the end of the discovery process. The Olive2022 app is essentially a open source thin client. It consists of an RDP client combined with code to invoke Sinfonia for cloudlet discovery and binding.

8.4 VMs in a Container World

Sinfonia assumes that each cloudlet is a Kubernetes [4] compute cluster, and leverages the Prometheus resource monitoring mechanism of Kubernetes [5]. Kubernetes only deals with containers. It has no support for VMs. We use KubeVirt [3] to bridge this gap. KubeVirt is self-described as a technology that “provides a unified development platform where developers can build, modify, and deploy applications residing in both Application Containers as well as Virtual Machines in a common, shared environment.” In essence, KubeVirt is container-based software that enables VM instances to be launched, managed, and terminated.

Figure 13: Example VM Image Specification in KubeVirt

Without any changes to the internals of an Olive VM image, it is converted into “containerDisk” format. This is a compressed QEMU image in a single layer Docker container. KubeVirt then enables creation of a VM instance from that VM image by defining a VirtualMachineInstance object for the Kubernetes cluster. Figure 13 gives an example of such a specification. KubeVirt is combined with the RDP mechanism virtvnc and a port mapping discovery function unwebsockify into a single container. A custom Helm chart definition makes this a Kubernetes deployment.

Although all of our archived Olive2014 images are usable with this approach, there are some potential challenges that we may encounter in future. QEMU supports emulation of old hardware (e.g., IDE), but KubeVirt only exposes the more modern SATA and virtio devices. Very old DOS and Windows 3.1 virtual machines work fine with virtio because all their access is done through the BIOS, and BIOS emulation is supported by QEMU. Modern VM images that use UUIDs to identify disks also work fine. However, there may be VM images from an intermediate historical period that use IDE with hardcoded device names such as /dev/hda1. KubeVirt would need to be modified to support such images.

9 Future Work and Challenges

A decade of experience with Olive offers compelling evidence that one-click execution of closed-source applications from an Internet archive is a viable vision. Our work is at an exciting point in its evolution, but it is far from complete. The open source Olive2022 infrastructure is a good base to build upon for the future. However, the hard work of curating VM images for a large number of scientifically important proprietary applications from the past, and hosting them in an Internet archive with appropriate licensing safeguards for execution at Tier-2 remain to be done.

Software started becoming important to science and engineering by the late 1950s to early 1960s. There is over 60 years worth of backlog to explore and curate. Obviously, this will have to be a community-wide effort, involving many institutions and individuals worldwide. The scope of work is far too large for a single organization or team of individuals, even with very substantial financial resources. An integral part of this effort will be the software licensing issues mentioned earlier. In addition, Olive2022 will have to address many difficult technical challenges as it evolves. We discuss some of these challenges below, fully acknowledging that there may be many more that we have not yet recognized.

Access to External Data Sets: The data to be processed by an archived application needs to accessible to the guest environment of its VM instance. Today, this is done manually be copying data into a virtual disk of the VM instance. This is inconvenient, error-prone, and limits data to the size of the virtual disk.

A better solution would be to place the data in a external storage repository and make it available as a mountable device appropriate for the vintage of the guest OS. For example, the device may be a floppy disk in the case of an MS-DOS guest. For Windows 3.1, a CD-ROM device may be a better choice. For more modern systems, a USB storage device may be the right abstraction. Many implementation challenges will arise in bridging the storage metaphors of long-obsolete operating systems and modern storage systems.

Our current thinking is that a distributed file system such as AFS [46] or Lustre [39] would be optimal as the external storage repository. Especially in scientific computing, where data sizes can be very large and there is already a mandate to archive experimental data, the bringing together of data and archival compute needs to be simplified and streamlined.

Parallelism and Compute Clusters: The current Olive prototype can exploit multi-core parallelism, but it is not possible to change the number of cores available to the guest. We expect this to become a common requirement in the future, as VMs that were archived a long time ago are launched on modern many-core machines. Also relating to parallelism is the need to exploit cluster-level parallelism for large scientific applications. Today, this involves extensive manual configuration of multiple VMs using VLANs. This is an error-prone and slow workflow with poor reproducibility. One-click launch of an entire ensemble of VMs, correctly interconnected, would be a great simplification. As mentioned in Section 2, the ability to create a “VM of VMs” would be a valuable step towards this goal.

GPU Acceleration: Beyond the original motivation for graphics, the SIMD parallelism of GPUs has been leveraged by the scientific and engineering community for many computations in simulation, finite element modeling, and machine learning. Virtualizing GPUs has proven difficult because there is no standardized external interface for them. There have been many efforts at GPU virtualization [8, 33, 36, 59], but none has yet emerged dominant. While Olive2022 is able to handle static assignment of a GPU to a VM instance, a more dynamic use of GPUs will be needed in the future.

Dynamic Provisioning of Tier-2: As currently implemented, Olive2022 does not use the VMNetX capability for demand paging a VM over the Internet. Nor does it use the prefetching capability of vTube. The current assumption is that Tier-2 entities are statically provisioned, or dynamic provisioning can block until complete. Blocking dynamic provisioning is acceptable only if Tier-1 to Tier-2 bandwidth is high. Otherwise, VM instance launch times will be unacceptably high. These assumptions may have to be revisited in the future, especially if edge computing is used in contexts where Tier-1 to Tier-2 connectivity is poor. Although it is not clear yet whether the problem of scientific reproducibility would be relevant to such settings, the work already done in Olive2014 and vTube can be leveraged if the need arises. Significant implementation effort would still be needed for integration with Kubernetes and KubeVirt.

Containment of Vulnerabilities: As discussed earlier, security patches cannot be applied to archival VM images because that would violate the goal of “as is” preservation. Many new vulnerabilities will arise over decades. A mechanism for safe containment of VM execution will therefore be needed. This will be a challenge because the mechanism has to allow the necessary level of Internet access for an application to function at all, yet narrow the attack surface and contain the blast radius of infected VM instances.

GUIs and Graphics-Intensive Applications: At its most rudimentary, the graphics hardware of a VM instance is a simple bit-mapped display. If the archived application is of much older vintage than the current execution platform, the resolution and performance capabilities of the display expected by the application can easily be emulated on modern hardware. However, if hardware improvements flatten out over time, this assumption may become harder to sustain for archived applications of recent vintage. In cloud computing, this problem is addressed by para-virtualization of the display — i.e., installing a special display driver in the guest environment to help sustain graphics quality and speed. So far, Olive has completely avoided the path of para-virtualization because of the effort involved and to more faithfully preserve the unmodified attributes of the archived application. This may be not be viable for applications of recent vintage that are graphics-intensive.

10 Olive in Context

The unique problem that Olive addresses is one-click execution of closed-source archived applications over the Internet. As discussed in Section 1, this can be a valuable tool for lowering the barrier to scientific reproducibility. However, it is by no means the only tool that will be needed. Olive thus complements, but does not duplicate, many other efforts towards improving scientific reproducibility.

Emulation-as-a-Service Infrastructure (EaaSI) at Yale University aims to resurrect obsolete software programs which can then be used to access and study digital collections at Yale and elsewhere. Rather than trying to deliver one-click access to closed source applications over the Internet, it uses a “library reading room model.” These are trusted physical spaces where concerns such as software piracy can be ignored. Within these trusted physical spaces, VM encapsulation is used as in the case of Olive. A key limitation of this approach is the need for a user to physically travel to the trusted space. Consistent with the ethos of the Internet, Olive extends the reach of the library to wherever in the world the user is located.

RunMyCode.org [60] is a cloud-based service that enables authors to create companion web pages for published scientific papers. The service accepts code written in C++, FORTRAN, MATLAB, R and RATS. An IT staff team associated with the cloud service performs safety and executability checks on submitted code before it is accepted. Once a companion web page is created, users can submit scripts that use code from that page. These scripts are executed on a cluster in the cloud, and the results are returned to the user. Olive aims at a much lower level of abstraction and focuses on preserving executability of closed-source code over a timescale of decades.

ReproZip [13] aims to simplify the re-creation of computations described in research publications. This tool automatically captures the provenance of an experiment and creates a package of all its library dependencies as well as its workflow specification. The package can then be disseminated or archived. ReproZip is specifically designed for Linux environments. Olive's use of VM encapsulation may offer a way to extend ReproZip to non-Linux settings.

There has been considerable effort by the scientific community in creating workflow management software. Examples include the ISI Pegasus framework [21], the IPython interactive shell [47], the Taverna Workbench [35], the Sumatra management tool [19, 20], the Galaxy tool suite [10, 31], the Madagascar platform [26], VisTrails [9], and verifiable visualizations [25]. There has also been significant effort in creating data sharing tools and repositories such as Dexy [22], Duraspace [40], and DataVerse [17]. Although these efforts do not overlap with Olive, they may be able to leverage its functionality. For example, a workflow tool could be extended to produce a snapshot of its state as an Olive VM. This could be useful for dissemination, and to serve as a permanent easy-to-run marker in that workflow.

As mentioned earlier, it may be helpful to think of an Olive VM as similar to a PDF file in document production. One uses tools such as Latex, Microsoft Word or Google Docs for authoring. The evolution of the document can be captured using CVS or git, or Word's or Doc's internal change tracking mechanism. However, a landmark version of the document can also be saved in a PDF file for convenient one-click viewing. Undoubtedly, Olive will only be one tool of many that will be needed on the path to reproducibility.

11 Conclusion

Executable content ranging from simulation models to visualization tools plays an increasingly important role in scientific research. The ability to archive these artifacts for posterity would be an important transformative step. Imagine being able to reach back across time to execute the simulation model of a long-dead scientist on new data that you have just acquired. What do the results suggest? Would they have changed the conclusions of that scientist? Although you aren't quite bringing the scientist back to life, you are collaborating with that person in a way that is not possible unless a capability such as Olive is available to you.

As software grows in significance in science and engineering, lowering the barrier to reproduction of previous results becomes increasing valuable. The lowest conceivable barrier is one-click execution, much like viewing a PDF document on a web page today. Although difficult to achieve for reasons explained in this paper, we have shown how it can be successfully implemented. We look forward to working with other teams in the ACM Emerging Interest Group for Reproducibility and Replicability to advance scholarship in science and engineering through Olive.

ACKNOWLEDGMENTS

We thank the anonymous reviewers for their thoughtful and constructive comments, which have helped to improve the paper. Many individuals who are not listed as authors of this paper have played important roles in the conception, implementation, and decade-long evolution of Olive. We list them below in chronological order of involvement in the project:

Vas Bala inspired the concept of an Internet-wide archive of VM images, and organized a workshop at IBM Research in April 2011 to explore this concept with a diverse group of stakeholders. He was the earliest champion of the Olive vision.
Gloriana St. Clair and Erika Linke immediately saw the potential of the Olive vision for the library community and were co-PIs in the creation of Olive2014.
Benjamin Gilbert was the implementor of Olive2014, as well as the primary curator of the archival applications for it.
Daniel Ryan assisted Benjamin Gilbert in curating Olive2014 applications.
Vint Cerf was an early articulator of the problem addressed by Olive, and worked tirelessly to evangelize the need for viable solutions. His April 2015 keynote talk to the Internet Information Preservation Conference entitled “Digital Vellum: Interacting with Digital Objects over Centuries” included live demos of Olive2014 [12].

The creation of Olive2014 was supported by the Sloan Foundation and the Institute for Museum and Library Services. The creation of Olive2022 was supported by the National Science Foundation under grant CNS-2106862. The work towards Olive2022 was done in the CMU Living Edge Lab, which is supported by Intel, ARM, Vodafone, Deutsche Telekom, CableLabs, Crown Castle, InterDigital, Seagate, Microsoft, the VMware University Research Fund, and the Conklin Kistler family fund. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of any sponsoring entity or the U.S. government.

REFERENCES

[n. d.]. Azure private multi-access edge compute (MEC). (#1). Last accessed: February 7, 2023.
[n. d.]. Azure Stack Hub overview. (#1). Last accessed: February 7, 2023.
[n. d.]. Building a Virtualization API for Kubernetes. (#1). Last accessed on February 17, 2023.
[n. d.]. Kubernetes. (#1). Last accessed on February 16, 2023.
[n. d.]. Prometheus. (#1). Last accessed on February 17, 2023.
[n. d.]. Virtual Machines in Our Collection. (#1). Last accessed on February 11, 2023.
Yoshihisa Abe, Roxana Geambasu, Kaustubh Joshi, Andres Lagar-Cavilla, and Mahadev Satyanarayanan. 2013. vTube: Efficient Streaming of Virtual Appliances Over Last-Mile Networks. In Proceedings of the ACM Symposium on Cloud Computing. Santa Clara, CA.
Ardalan Amiri Sani, Kevin Boos, Shaopu Qin, and Lin Zhong. 2014. I/O Paravirtualization at the Device File Boundary. In Proceedings of ACM ASPLOS.
Louis Bavoil, Steven P. Callahan, Patricia J. Crossno, Juliana Freire, Carlos E. Scheidegger, Claudio T. Silva, , and Huy T. Vo. 2005. VisTrails: Enabling Interactive Multiple-View Visualizations. In Proceedings of IEEE Visualization. 135–142.
D. Blankenberg, G. von Kuster, E. Bouvier, B. Baker, E. Afgan, N. Stoler, B. Rebolledo-Jaramillo, The Galaxy Team, J. Taylor, and A. Nekrutenko. 2014. Dissemination of scientific software with Galaxy ToolShed. Genome Biology 15 (February 2014), 403.
Louis Casanovas and Edy Kristianto. 2017. Comparing RDP and PcoIP protocols for desktop virtualization in VMware enviroment. In 2017 5th International Conference on Cyber and IT Service Management (CITSM).
Vint Cerf and Mahadev Satyanarayanan. 2015. Digital Vellum: Interacting with Digital Objects over Centuries. Keynote Talk: Internet Information Preservation Conference (IIPC GA2015). (#1).
Fernando Chirigati, Dennis Shasha, and Juliana Freire. 2013. ReproZip: Using Provenance to Support Computational Reproducibility. In Proceedings of the 5th USENIX Workshop on the Theory and Practice of Provenance. Lombard, Illinois.
Containers2015 [n. d.]. Linux Containers. https://linuxcontainers.org/. Accessed on January 9, 2015.
P. Conway. [n. d.]. Preservation in the Digital World. (#1), Originally published March 1996, Last accessed May 28, 2023.
P. Conway. 2010. Preservation in the Age of Google: Digitization, Digital Preservation, and Dilemmas. Library Quarterly 80, 1 (2010).
Merce Crosas. 2011. The Dataverse Network: An Open-Source Application for Sharing, Discovering and Preserving Data. D-Lib Magazine 17, 1/2 (January/February 2011).
Dalvik2011 [n. d.]. Dalvik (software). (#1), Last accessed May 28, 2023.
A.P. Davison. 2012. Automated capture of experiment context for easier reproducibility in computational research. Computing in Science and Engineering 14 (2012).
A.P. Davison, M. Mattioni, D. Samarkanov, and B. Teleczuk. 2014. Sumatra: A Toolkit for Reproducible Research. In Implementing Reproducible Research, V. Stodden, F. Leisch, and R.D. Peng (Eds.). Chapman and Hall/CRC, 57–79.
Ewa Deelman, Gurmeet Singh, Mei hui Su, James Blythe, Yolanda Gil, Carl Kesselman, Gaurang Mehta, Karan Vahi, G. Bruce Berriman, John Good, Anastasia Laity, Joseph C. Jacob, and Daniel S. Katz. 2005. Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Scientific Programming Journal 13 (2005).
dexy [n. d.]. dexy: The Most Powerful, Flexible Documentation Tool Ever. (#1), Last accessed May 29, 2023.
Docker2015 [n. d.]. Build, Ship, and Run Any App, Anywhere. https://www.docker.com/. Accessed on January 9, 2015.
Jason A. Donenfeld. 2017. WireGuard: Next Generation Kernel Network Tunnel. In Proceedings of the 24th Annual Network and Distributed System Security Symposium, (NDSS) (San Diego, CA).
T. Etiene, C. Scheidegger, L. Nonato, M. Kirby, and C. Silva. 2009. Verifiable Visualization for Isosurface Extraction. IEEE Transactions on Visualization and Computer Graphics 15, 6 (2009), 1227–1234.
Sergey Fomel, Paul Sava, Ioan Vlad, Yang Liu, and Vladimir Bashkardin. 2013. Madagascar: open-source software project for multidimensional data analysis and reproducible computational experiments. Journal of Open Research Software 1, 1 (2013).
Gartner. 2017. Gartner Says Worldwide Device Shipments Will Decline 0.3 Percent in 2017. (#1), Last accessed February 14, 2023.
Gartner. 2020. Gartner Forecasts Worldwide Device Shipments to Decline 14% in 2020 Due to Coronavirus Impact. (#1), Last accessed February 14, 2023.
Gartner. 2023. Gartner Says Worldwide PC Shipments Declined 28.5% in Fourth Quarter of 2022 and 16.2% for the Year. (#1), Last accessed February 14, 2023.
Jim Gettys and Kathleen Nichols. 2012. Bufferbloat: Dark Buffers in the Internet. Communications of the ACM 55, 1 (January 2012).
J. Goecks, A. Nekrutenko, J. Taylor, and The Galaxy Team. 2010. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biology 11, 8 (August 2010).
Kiryong Ha, Yoshihisa Abe, Tom Eiszler, Zhuo Chen, Wenlu Hu, Brandon Amos, Rohit Upadhyaya, Padmanabhan Pillai, and Mahadev Satyanarayanan. 2017. You Can Teach Elephants to Dance: Agile VM Handoff for Edge Computing. In Proceedings of the Second ACM/IEEE Symposium on Edge Computing.
J. G. Hansen. 2007. Blink: Advanced Display Multiplexing for Virtualized Applications. In Proceedings of ACM Network and Operating System Support for Digital Audio and Video (NOSSDAV).
Thomas Herndon, Michael Ash, and Robert Pollin. 2013. Does High Public Debt Stifle Economic Growth? A Critique of Reinhart and Rogoff. Working Paper 322. Political Economy Research Institute, University of Massachussets Amherst. (#1), Last accessed May 28, 2023.
D. Hull, K. Wolstencroft, R. Stevens, C. Goble, M. Pocock, P. Li, and T. Oinn. 2006. Taverna: a tool for building and running workflows of services. Nucleic Acids Research 34 (2006), 729–732.
H. Andres Lagar-Cavilla, Niraj Tolia, Mahadev Satyanarayanan, and Eyal de Lara. 2007. VMM-Independent Graphics Acceleration. In Proceedings of the 3rd international Conference on Virtual Execution Environments. San Diego, CA.
Yuqing Lan and Hao Xu. 2014. Research on technology of desktop virtualization based on SPICE protocol and its improvement solutions. Frontiers of Computer Science 8, 6 (2014).
Tim Lindholm and Frank Yellin. 1999. The Java Virtual Machine Specification (2nd Edition). Prentice Hall.
Lustre [n. d.]. Lustre File System. http://lustre.org. Accessed June 19, 2014.
markow2012 2012. DuraSpace Offers DuraCloud Access To Internet2 Members. http://duraspace.org/node/1268.
B. Matthews, A. Shaon, J. Bicarreguil, and C. Jones. 2010. A Framework for Software Preservation. The International Journal of Digital Curation 5, 1 (June 2010).
David W. Miller and John Modell. 1988. Teaching United States History with the Great American History Machine. Historical Methods: A Journal of Quantitative and Interdisciplinary History 21, 3 (1988), 121–134. (#1), Last accessed May 28, 2023.
Gary R. Mirams, Christopher J. Arthurs, Miguel O. Bernabeu, Rafel Bordas, Jonathan Cooper, Alberto Corrias, Yohan Davit, Sara-Jane Dunn, Alexander G. Fletcher, Daniel G. Harvey, Megan E. Marsh, James M. Osborne, Pras Pathmanathan, Joe Pitt-Francis, James Southern, Nejib Zemzemi, and David J. Gavaghan. 2013. Chaste: An Open Source C++ Library for Computational Physiology and Biology. PLoS Computational Biology 9, 3 (March 2013).
Peter Monaghan. 2013. ’They Said at First That They Hadn't Made a Spreadsheet Error, When They Had’. The Chronicle of Higher Education (April 2013). (#1), Last accessed May 28, 2023.
James H. Morris, Mahadev Satyanarayanan, Michael H. Conner, John H. Howard, David S. Rosenthal, and F. Donelson Smith. 1986. Andrew: A Distributed Personal Computing Environment. Communications of the ACM 29, 3 (1986).
openafs [n. d.]. OpenAFS. http://openafs.org. Accessed June 19, 2014.
Fernando Perez and Brian E. Granger. 2007. IPython: a System for Interactive Scientific Computing. Computing in Science and Engineering 9, 3 (May 2007), 21–29.
Robert Pollin and Michael Ash. 2013. Austerity after Reinhart and Rogoff. Financial Times (April 2013). (#1), Last accessed May 28, 2023.
Carmen M. Reinhart and Kenneth S. Rogoff. 2010. Growth in a Time of Debt. American Economic Review 100, 2 (May 2010), 573–78.
Carmen M. Reinhart and Kenneth S. Rogoff. 2010. Growth in a Time of Debt. Working Paper 15639. National Bureau of Economic Research. (#1), Last accessed May 28, 2023.
T. Richardson, Q. Stafford-Fraser, K.R. Wood, and A. Hopper. 1998. Virtual network computing. IEEE Internet Computing 2, 1 (1998).
Mahadev Satyanarayanan. 2017. The Emergence of Edge Computing. IEEE Computer 50, 1 (January 2017).
Mahadev Satyanarayanan, Paramvir Bahl, Ramón Caceres, and Nigel Davies. 2009. The Case for VM-Based Cloudlets in Mobile Computing. IEEE Pervasive Computing 8, 4 (October-December 2009).
Mahadev Satyanarayanan, Vasanth Bala, Gloriana St. Clair, and Erika Linke. 2011. Collaborating with executable content across space and time. In Proceedings of the 7th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom2011).
Mahadev Satyanarayanan, Vasanth Bala, Gloriana St. Clair, and Erika Linke. 2014. Collaborating with executable content across space and time. ICST Transactions on Collaborative Computing 1, 1 (May 2014).
Mahadev Satyanarayanan, Wei Gao, and Brandon Lucia. 2019. The Computing Landscape of the 21st Century. In Proceedings of the 20th International Workshop on Mobile Computing Systems and Applications (HotMobile ’19). Santa Cruz, CA.
Mahadev Satyanarayanan, Jan Harkes, Jim Blakley, Marc Meunier, Govindarajan Mohandoss, Kiel Friedt, Arun Thulasi, Pranav Saxena, and Brian Barritt. 2022. Sinfonia: Cross-tier orchestration for edge-native applications. Frontiers in the Internet of Things 1 (October 2022). (#1).
Mahadev Satyanarayanan, Guenter Klas, Marco Silva, and Simone Mangiante. 2019. The Seminal Role of Edge-Native Applications. In Proceedings of the 2019 IEEE International Conference on Edge Computing (EDGE). Milan, Italy.
L. Shi, H. Chen, and J. Sun. 2009. vCUDA: GPU Accelerated High Performance Computing in Virtual Machines. In IEEE International Symposium on Parallel & Distributed Processing.
Victoria Stodden, Christophe Hurlin, and Christophe Perignon. 2012. RunMyCode.org: a novel dissemination and collaboration platform for executing published computational results. In Analyzing and Improving Collaborative eScience with Social Networks (eSoN 12); Workshop with IEEE e-Science 2012. Chicago, IL, USA. Also available at SSRN: (#1), Last accessed May 29, 2023.
Niraj Tolia, David G Andersen, and Mahadev Satyanarayanan. 2006. Quantifying interactive user experience on thin clients. IEEE Computer 39, 3 (2006), 46–52.
VMware. [n. d.]. VMware Blast Extreme Display Protocol in VMware Horizon. (#1), Last accessed May 28, 2023.
Vodafone Press Release. [n. d.]. Vodafone uses AWS Wavelength to launch first Multi-access Edge Computing services in European region. (#1), Originally published June 2021, Last accessed May 28, 2023.

CC-BY license image
This work is licensed under a Creative Commons Attribution International 4.0 License.

ACM REP '23, June 27–29, 2023, Santa Cruz, CA, USA