Parallel Metric Tree Embedding Based on an Algebraic View on Moore-Bellman-Ford

A metric tree embedding of expected stretch $\alpha \ge 1$ maps a weighted $n$-node graph $G = (V, E, \operatorname{\omega })$ to a weighted tree $T = (V_T, E_T , \operatorname{\omega }_T)$ with $V \subseteq V_T$ such that, for all $v,w \in V$, $\operatorname{dist}(v, w, G) \le \operatorname{dist}(v, w, T),$ and . Such embeddings are highly useful for designing fast approximation algorithms as many hard problems are easy to solve on tree instances. However, to date, the best parallel $(\operatorname{polylog}n)$-depth algorithm that achieves an asymptotically optimal expected stretch of $\alpha \in \operatorname{O}(\log n)$ requires $\operatorname{\Omega }(n^2)$ work and a metric as input.

In this article, we show how to achieve the same guarantees using $\operatorname{polylog}n$ depth and $\operatorname{\tilde{O}}(m^{1+\varepsilon })$ work, where $m = |E|$ and $\varepsilon \gt 0$ is an arbitrarily small constant. Moreover, one may further reduce the work to $\operatorname{\tilde{O}}(m + n^{1+\varepsilon })$ at the expense of increasing the expected stretch to $\operatorname{O}(\varepsilon ^{-1} \log n)$.

Our main tool in deriving these parallel algorithms is an algebraic characterization of a generalization of the classic Moore-Bellman-Ford algorithm. We consider this framework, which subsumes a variety of previous “Moore-Bellman-Ford-like” algorithms, to be of independent interest and discuss it in depth. In our tree embedding algorithm, we leverage it to provide efficient query access to an approximate metric that allows sampling the tree using $\operatorname{polylog}n$ depth and $\operatorname{\tilde{O}}(m)$ work.

We illustrate the generality and versatility of our techniques by various examples and a number of additional results. Specifically, we (1) improve the state of the art for determining metric tree embeddings in the Congest model, (2) determine a $(1 + \hat{\varepsilon })$-approximate metric regarding the distances in a graph $G$ in polylogarithmic depth and $\operatorname{\tilde{O}}(n(m+n^{1 + \varepsilon }))$ work, and (3) improve upon the state of the art regarding the $k$-median and the buy-at-bulk network design problems.

1 INTRODUCTION

In many graph problems the objective is closely related to distances in the graph. Prominent examples are shortest path problems, minimum weight spanning trees, a plethora of Steiner-type problems [27], the traveling salesman, finding a longest simple path, and many more.

If approximation is viable or mandatory, a successful strategy is to approximate the distance structure of the weighted graph $G$ by a simpler graph $G^{\prime }$, where “simpler” can mean fewer edges, smaller degrees, being from a specific family of graphs, or any other constraint making the considered problem easier to solve. One then proceeds to solve a related instance of the problem on $G^{\prime }$ and map the solution back to $G$, yielding an approximate solution to the original instance. Naturally, this requires a mapping with bounded impact on the objective value.

A standard tool is metric embedding, mapping $G = (V, E, \operatorname{\omega })$ to $G^{\prime } = (V^{\prime }, E^{\prime }, \operatorname{\omega }^{\prime })$, such that $V \subseteq V^{\prime }$ and $\operatorname{dist}(v, w, G) \le \operatorname{dist}(v, w, G^{\prime }) \le \alpha \operatorname{dist}(v, w, G)$ for some $\alpha \ge 1,$ referred to as stretch.¹ An especially convenient class of metric embeddings are metric tree embeddings, plainly because very few problems are hard to solve on tree instances. The utility of tree embeddings originates in the fact that, despite their extremely simple topology, it is possible to randomly construct an embedding of any graph $G$ into a tree $T$ so that the expected stretch satisfies $\alpha \in \operatorname{O}(\log n)$ [23]. By linearity of expectation, this ensures an expected approximation ratio of $\operatorname{O}(\log n)$ for most problems; repeating the process $\log (\varepsilon ^{-1})$ times and taking the best result, one obtains an $\operatorname{O}(\log n)$-approximation with probability at least $1 - \varepsilon$.

A substantial advantage of tree embeddings lies in the simplicity of applying the machinery once they are computed: Translating the instance on $G$ to one on $T$, solving the instance on $T$, and translating the solution back tends to be extremely efficient and highly parallelizable; we demonstrate this in Sections 9 and 10. Note also that the embedding can be computed as a preprocessing step, which is highly useful for online approximation algorithms [23]. Hence, a low-depth small-work parallel algorithm in the vein of Fakcharoenphol, Rao, and Talwar [23] (FRT) gives rise to fast and efficient parallel approximations for a large class of graph problems. Unfortunately, the tradeoff between depth and work achieved by state-of-the-art parallel algorithms for this purpose is suboptimal. Concretely, all algorithms of $\operatorname{polylog}n$ depth use $\operatorname{\Omega }(n^2)$ work, whereas we are not aware of any stronger lower bound than the trivial $\operatorname{\Omega }(m)$ work bound.²

Our Contribution. Our main contribution is to reduce the amount of work for sampling from the FRT distribution—a random distribution of tree embeddings—to $\operatorname{\tilde{O}}(m^{1+\varepsilon })$ while maintaining $\operatorname{polylog}n$ depth. This article is organized in two parts. The first establishes the required techniques:

Our key tool is an algebraic interpretation of Moore-Bellman-Ford-like (MBF-like) algorithms described in Section 2. As our framework subsumes a large class of known algorithms and explains them from a different perspective—we demonstrate this using numerous examples in Section 3—we consider it to be of independent interest.
Section 4 proposes a sampling technique for embedding a graph $G$ in which $d$-hop distances $(1 + \hat{\varepsilon })$ approximate exact distances into a complete graph $H$, where $H$ has polylogarithmic Shortest Path Diameter (SPD) and preserves $G$-distances $(1 + \hat{\varepsilon })^{\operatorname{O}(\log n)}$ approximately.
We devise an oracle that answers MBF-like queries by efficiently simulating an iteration of an MBF-like algorithm on $H$ in Section 5. It uses only the edges of $G$ and polylogarithmic overhead, resulting in $\operatorname{\tilde{O}}(dm)$ work with respect to $G$ (i.e., subquadratic work) per iteration; We use $d \in \operatorname{polylog}n$.

The second part applies our techniques and establishes our results:

A first consequence of our techniques is that we can query the oracle with All-Pairs Shortest Paths (APSP) to determine w.h.p.³ a $(1 + \operatorname{o}(1))$-approximate metric on $G$ using $\operatorname{\tilde{O}}(nm^{1+\varepsilon })$ work and $\operatorname{polylog}n$ depth. We discuss this in Section 6.
Theorem 6.2 ($(1 + \operatorname{o}(1))$-Approximate Metric).Given a weighted graph $G = (V, E, \operatorname{\omega })$ and a constant $\varepsilon \gt 0$, we can w.h.p. compute, using $\operatorname{\tilde{O}}(n(m+n^{1+\varepsilon }))$ work and $\operatorname{polylog}n$ depth, a $(1 + 1 / \operatorname{polylog}n)$-approximate metric of $\operatorname{dist}(\cdot , \cdot , G)$ on $V$ .
Preprocessing the graph by computing a sparse spanner, one can get close to quadratic work at the expense of a constant approximation ratio.
Theorem 6.3 ($\operatorname{O}(1)$-Approximate Metric).For a weighted graph $G = (V, E, \operatorname{\omega })$ and a constant $\varepsilon \gt 0$, we can w.h.p. compute an $\operatorname{O}(1)$-approximate metric of $\operatorname{dist}(\cdot ,\cdot ,G)$ on $V$ using $\operatorname{\tilde{O}}(n^{2+\varepsilon })$ work and $\operatorname{polylog}n$ depth.
In Section 7, we show that, for any constant $\varepsilon \gt 0$, there is a randomized parallel algorithm of depth $\operatorname{polylog}n$ and work $\operatorname{\tilde{O}}(m^{1 + \varepsilon })$ (w.h.p.) that samples from a metric tree embedding of expected stretch $\operatorname{O}(\log n)$. This follows from the preceding techniques and the fact that sampling from the FRT distribution is MBF-like.
Corollary 7.10. Given the weighted incidence list of a graph $G$ and an arbitrary constant $\varepsilon \gt 0$, we can w.h.p. sample from a tree embedding of expected stretch $\operatorname{O}(\log n)$ using depth $\operatorname{polylog}n$ and work $\operatorname{\tilde{O}}(m^{1 + \varepsilon })$ .
Applying the spanner construction of Baswana and Sen [9] as a preprocessing step, the work can be reduced to $\operatorname{\tilde{O}}(m + n^{1+\varepsilon })$ at the expense of stretch $\operatorname{O}(\varepsilon ^{-1} \log n)$.
Corollary 7.11. Suppose we are given the weighted incidence list of a graph $G$. Then, for any constant $\varepsilon \gt 0$ and any , we can w.h.p. sample from a tree embedding of $G$ of expected stretch $\operatorname{O}(k \log n)$ using depth $\operatorname{polylog}n$ and work $\operatorname{\tilde{O}}(m + n^{1 + 1/k + \varepsilon })$ .
Our techniques allow us to improve over previous distributed algorithms computing tree embeddings in the Congest [41] model. We reduce the best-known round complexity for sampling from a tree embedding of expected stretch $\operatorname{O}(\log n)$ from $\operatorname{\tilde{O}}(n^{1/2 + \varepsilon } + \operatorname{D}(G))$, where $\varepsilon \gt 0$ is an arbitrary constant and $\operatorname{D}(G)$ is the unweighted hop diameter of $G$, to $(n^{1/2} + \operatorname{D}(G)) n^{\operatorname{o}(1)}$. This is detailed in Section 8.
Theorem 8.1. There is a randomized distributed algorithm that w.h.p. samples from a metric tree embedding of expected stretch $\operatorname{O}(\log n)$ in $\min \lbrace (\sqrt {n} + \operatorname{D}(G)) n^{o(1)} , \operatorname{\tilde{O}}(\operatorname{SPD}(G)) \rbrace$ rounds of the Congest model.
We illustrate the utility of our main results by providing efficient approximation algorithms for the $k$-median and buy-at-bulk network design problems. Blelloch et al. [14] devise polylogarithmic depth parallel algorithms based on FRT embeddings for these problems assuming a metric as input. We provide polylogarithmic depth parallel algorithms for the more general case where the metric is given implicitly by $G$, obtaining more work-efficient solutions for a wide range of parameters. The details are given in Sections 9 and 10, respectively.
Theorem 9.2. For any fixed constant $\varepsilon \gt 0$, w.h.p., an expected $\operatorname{O}(\log k)$-approximation to $k$-median on a weighted graph can be computed using $\operatorname{polylog}n$ depth and $\operatorname{\tilde{O}}(m^{1+\varepsilon } + k^3)$ work.
Theorem 10.2. For any constant $\varepsilon \gt 0$, w.h.p., an expected $\operatorname{O}(\log n)$-approximation to the buy-at-bulk network design problem can be computed using $\operatorname{polylog}n$ depth and $\operatorname{\tilde{O}}(\min \lbrace m+n(k+n^{\varepsilon }), n^2 \rbrace) \subseteq \operatorname{\tilde{O}}(n^2)$ work.

Section 11 concludes this article.

Our Approach. The algorithm of Khan et al. [30], formulated for the Congest model [41], gives rise to an $\operatorname{\tilde{O}}(\operatorname{SPD}(G))$-depth parallel algorithm sampling from the FRT distribution. The SPD is the maximum, over all $v,w \in V$, of the minimum hop-length of a shortest $v$-$w$-path. Intuitively, $\operatorname{SPD}(G)$ captures the number of iterations of MBF-like algorithms in $G$: Each iteration updates distances until the $(\operatorname{SPD}(G) + 1)$-th iteration does not yield new information. Unfortunately, $\operatorname{SPD}(G) = n - 1$ is possible, so a naive application of this algorithm results in poor performance.

A natural idea is to reduce the number of iterations by adding “shortcuts” to the graph. Cohen [17] provides an algorithm of depth $\operatorname{polylog}n$ and work $\operatorname{\tilde{O}}(m^{1+\varepsilon })$ that computes a $(d, \hat{\varepsilon })$-hop set with $d \in \operatorname{polylog}n$: This is a set $E^{\prime }$ of additional edges such that $\operatorname{dist}(v, w, G) \le \operatorname{dist}^d(v, w, G^{\prime }) \le (1 + \hat{\varepsilon }) \operatorname{dist}(v, w, G)$ for all $v, w \in V$, where $\hat{\varepsilon }\in 1 / \operatorname{polylog}n$ and $\operatorname{dist}^d(v, w, G^{\prime })$ is the minimum weight of a $v$-$w$-path with at most $d$ edges in $G$ augmented with $E^{\prime }$. Note carefully that $\varepsilon$ is different from $\hat{\varepsilon }$. In other words, Cohen computes a metric embedding with the additional property that polylogarithmically many MBF-like iterations suffice to determine $(1 + 1 / \operatorname{polylog}n)$-approximate distances.

The course of action might now seem obvious: Run Cohen's algorithm, then run the algorithm by Khan et al. on the resulting graph for $d \in \operatorname{polylog}n$ rounds and conclude that the resulting output corresponds to a tree embedding of the original graph $G$ of stretch $\operatorname{O}((1 + 1 / \operatorname{polylog}n) \log n) = \operatorname{O}(\log n)$. Alas, this reasoning is flawed: Constructing FRT trees crucially relies on the fact that the distances form a metric (i.e., satisfy the triangle inequality). An approximate triangle inequality for approximate distances is insufficient since the FRT construction relies on the subtractive form of the triangle inequality; that is, $\operatorname{dist}(v,w,G^{\prime }) - \operatorname{dist}(v,u,G^{\prime }) \le \operatorname{dist}(w,u,G^{\prime })$ for arbitrary $u,v,w \in V$.

Choosing a different hop set does not solve the problem. Hop sets guarantee that $d$-hop distances approximate distances, but any hop set that fulfills the triangle inequality on $d$-hop distances has to reduce the SPD to at most $d$ (i.e., yield exact distances):

Observation 1.1. Let $G$ be a graph augmented with a $(d, \hat{\varepsilon })$-hop set.⁴ If $\operatorname{dist}^d(\cdot , \cdot , G)$ is a metric, then $\operatorname{dist}^d(\cdot , \cdot , G)=\operatorname{dist}(\cdot , \cdot , G)$, i.e., $\operatorname{SPD}(G) \le d$.

Proof. Let $\pi$ be a shortest $u$-$v$-path in $G$. Since $\operatorname{dist}^d(\cdot , \cdot , G)$ fulfills the triangle inequality,

\begin{equation} \operatorname{dist}(u,v,G)\le \operatorname{dist}^d(u, v, G) \le \sum _{\lbrace u_1, u_2\rbrace \in \pi } \operatorname{dist}^d(u_1, u_2, G) \le \sum _{\lbrace u_1, u_2\rbrace \in \pi } \operatorname{\omega }(u_1, u_2) = \operatorname{dist}(u, v, G). \end{equation} (1)

We overcome this obstacle by embedding $G^{\prime }$ into a complete graph $H$ on the same node set that $(1 + \operatorname{o}(1))$-approximates distances in $G$ and fulfills $\operatorname{SPD}(H) \in \operatorname{polylog}n$. In other words, where Cohen preserves distances exactly and ensures existence of approximately shortest paths with few hops, we preserve distances approximately but guarantee that we obtain exact shortest paths with few hops. This yields a sequence of embeddings:

Start with the original graph $G$,
augment $G$ with a $(d, 1 / \operatorname{polylog}n)$-hop set [17], yielding $G^{\prime }$, and
modify $G^{\prime }$ to ensure a small SPD, resulting in $H$ (Section 4).

Unfortunately, this introduces a new obstacle: As $H$ is complete, we cannot explicitly compute $H$ without incurring $\operatorname{\Omega }(n^2)$ work.

MBF-like Algorithms. This is where our novel perspective on MBF-like algorithms comes into play. We can simulate an iteration of any MBF-like algorithm on $H$ using only the edges of $G^{\prime }$ and polylogarithmic overhead, resulting in an oracle for MBF-like queries on $H$. Since $\operatorname{SPD}(H) \in \operatorname{polylog}n$, the entire algorithm runs in polylogarithmic time and with a polylogarithmic work overhead with respect to $G^{\prime }$.

In an iteration of an MBF-like algorithm,

the information stored at each node is propagated to its neighbors,
each node aggregates the received information, and
optionally filters out irrelevant parts.

For example, in order for each node to determine the $k$ nodes closest to it, each node stores node–distance pairs (initially only themselves at distance 0) and then iterates the following steps:

communicate the node–distance pairs to the neighbors (distances uniformly increased by the corresponding edge weight),
aggregate the received values by picking the node-wise minimum, and
discard all but the pairs corresponding to the $k$ closest sources.

It is well-known [3, 39, 43] that distance computations can be performed by multiplication with the (weighted) adjacency matrix $A$ over the min-plus semiring (see Definition A.2 in Appendix A). For instance, if $B = A^h$ with $h \ge \operatorname{SPD}(G)$, then $b_{vw} = \operatorname{dist}(v,w,G)$. In terms of $\mathcal {S}_{\min ,+}$, propagation is the “multiplication” with an edge weight and aggregation is “summation.” The $(i+1)$-th iteration results in $x^{(i+1)} = r^V A x^{(i)}$, where $r^V$ is the (node-wise) filter and $x \in M^V$ the node values. Both $M$ and $M^V$ form semimodules: A semimodule supports scalar multiplication (propagation) and provides a semigroup (representing aggregation). Compare Definition A.3 in Appendix A—over $\mathcal {S}_{\min ,+}$.

In other words, in an $h$-iteration MBF-like algorithm, each node determines its part of the output based on its $h$-hop distances to all other nodes. However, for efficiency reasons, various algorithms [4, 7, 8, 29, 33–35] compute only a subset of these distances. The role of the filter is to remove the remaining values to allow for better efficiency. The core feature of an MBF-like algorithm is that filtering is compatible with propagation and aggregation: If a node discards information and then propagates it, the discarded parts must be “uninteresting” at the receiving node as well. We model this using a congruence relation on the node states; filters pick a suitable (efficiently encodable) representative of the node state's equivalence class.

Constructing FRT Trees. This helps us to sample from the FRT distribution as follows. First, we observe that an MBF-like algorithm can acquire the information needed to represent an FRT tree. Second, we can simulate any MBF-like algorithm on $H$—without explicitly storing $H$—using polylogarithmic overhead and MBF-like iterations on $G^{\prime }$. The previously mentioned sampling technique decomposes the vertices and edges of $H$ into $\Lambda \in \operatorname{O}(\log n)$ levels. We may rewrite the adjacency matrix of $H$ as $A_H = \bigoplus _{\lambda = 0}^\Lambda P_{\lambda } A_{\lambda }^d P_{\lambda }$, where $\oplus$ is the “addition” of functions induced by the semimodule, $P_{\lambda }$ is a projection on nodes of at least level $\lambda$, and $A_{\lambda }$ is a (slightly stretched) adjacency matrix of $G^{\prime }$. We are interested in $r^V A_H^h x^{(0)}$—$h$ iterations on the graph $H$ followed by applying the node-wise filter $r^V$. The key insight is that the congruence relation allows us to apply intermediate filtering steps without changing the outcome as filtering does not change the equivalence class of a state. Hence, we may compute $(r^V \bigoplus _{\lambda = 0}^\Lambda P_{\lambda } (r^V A_{\lambda })^d P_{\lambda })^h x^{(0)}$ instead. This repeated application of $r^V$ keeps the intermediate results small, ensuring that we can perform multiplication with $A_{\lambda }$ with $\operatorname{\tilde{O}}(|E| + |E^{\prime }|) \subseteq \operatorname{\tilde{O}}(m^{1+\varepsilon })$ work. Since $d \in \operatorname{polylog}n$, $\Lambda \in \operatorname{O}(\log n)$, and each $A_\lambda$ accounts for $|E|+|E^{\prime }|$ edges, this induces only polylogarithmic overhead with respect to iterations in $G^{\prime }$, yielding a highly efficient parallel algorithm of depth $\operatorname{polylog}n$ and work $\operatorname{\tilde{O}}(m^{1 + \varepsilon })$.

1.1 Related Work

We confine the discussion to undirected graphs.

Classical Distance Computations. The earliest—and possibly also most basic—algorithms for Single-Source Shortest Paths (SSSP) computations are Dijkstra's algorithm [21] and the MBF algorithm [11, 24, 40]. From the perspective of parallel algorithms, Dijkstra's algorithm performs excellently in terms of work, requiring $\operatorname{\tilde{O}}(m)$ computational steps, but suffers from being inherently sequential, processing one vertex at a time.

Algebraic Distance Computations. The MBF algorithm can be interpreted as a fixed-point iteration $Ax^{(i+1)} = Ax^{(i)}$, where $A$ is the adjacency matrix of the graph $G$ and “addition” and “multiplication” are replaced by $\min$ and $+$, respectively. This structure is known as the min-plus semiring (a.k.a. tropical semiring) (compare Section 1.2), which is a well-established tool for distance computations [3, 39, 43]. From this point of view, $\operatorname{SPD}(G)$ is the number of iterations until a fixed point is reached. MBF thus has depth $\operatorname{\tilde{O}}(\operatorname{SPD}(G))$ and work $\operatorname{\tilde{O}}(m\operatorname{SPD}(G))$, where small $\operatorname{SPD}(G)$ are possible.

One may overcome the issue of large depth entirely by performing the fixed-point iteration on the matrix by setting $A^{(0)} := A$ and iterating $A^{(i+1)} := A^{(i)} A^{(i)}$; after $\lceil \log \operatorname{SPD}(G) \rceil \le \lceil \log n \rceil$ iterations, a fixed point is reached [19]. The final matrix then has as entries exactly the pairwise node distances and the computation has polylogarithmic depth. This comes at the cost of $\operatorname{\Omega }(n^3)$ work (even if $m \ll n^2$) but is as work-efficient as $n$ instances of Dijkstra's algorithm for solving APSP in dense graphs without incurring depth $\operatorname{\Omega }(n)$.

Mohri [39] solved various shortest-distance problems using the $\mathcal {S}_{\min ,+}$ semiring and variants thereof. While Mohri's framework is quite general, our approach is different in crucial aspects:

Mohri uses an individual semiring for each problem and then solves it by a general algorithm. Our approach, on the other hand, is more generic as well as easier to use: We use off-the-shelf semirings—usually just $\mathcal {S}_{\min ,+}$—and combine them with appropriate semimodules carrying problem-specific information. Further problem-specific customization happens in the definition of a congruence relation on the semi-ring; it specifies which parts of a node's state can be discarded because they are irrelevant for the problem. We demonstrate the modularity and flexibility of the approach by various examples in Section 3, which cover a large variety of distance problems.
In our framework, node states are semimodule elements and edge weights are semiring elements; hence, there is no multiplication of node states. Mohri's approach, however, does not make this distinction and hence requires the introduction of an artificial “multiplication” of node states.
Mohri's algorithm can be interpreted as a generalization of Dijkstra's algorithm [21] because it maintains a queue and, in each iteration, applies a relaxation technique to the dequeued element and its neighbors. This strategy is inherently sequential; to the best of our knowledge, we are the first to present a general algebraic framework for distance computations that exploits the implicit parallelism of the MBF algorithm.
In Mohri's approach, choosing the global queueing strategy is not only an integral part of an algorithm, but it also simplifies the construction of the underlying semirings as one may rule that elements are processed in a “convenient” order. Our framework is flexible enough to achieve counterparts even of Mohri's more involved results without such assumptions; concretely, we propose a suitable semiring for solving the $k$-Shortest Distance Problem ($k$-SDP) and the $k$-Distinct-Shortest Distance Problem ($k$-DSDP) in Section 3.3.

Approximate Distance Computations. As metric embeddings reproduce distances only approximately, we may base them on approximate distance computation in the original graph. Using rounding techniques and embedding $\mathcal {S}_{\min ,+}$ into a polynomial ring, this enables us to use fast matrix multiplication to speed up the aforementioned fixed-point iteration $A^{(i+1)} := A^{(i)} A^{(i)}$ [43]. This reduces the work to $\operatorname{\tilde{O}}(n^{\omega })$ at the expense of only $(1 + \operatorname{o}(1))$-approximating distances, where $\omega \lt 2.3729$ [32] denotes the fast matrix-multiplication exponent. However, even if the conjecture that $\omega = 2$ holds true, this technique must result in $\operatorname{\Omega }(n^2)$ work simply because $\operatorname{\Omega }(n^2)$ pairwise distances are computed.

Regarding SSSP, there was no work-efficient low-depth parallel algorithm for a long time, even when allowing approximation. This was referred to as the “sequential bottleneck”: Matrix-matrix multiplication was inefficient in terms of work, while sequentially exploring (shortest) paths resulted in depth $\operatorname{\Omega }(\operatorname{SPD}(G))$. Klein and Subramanian [31] showed that depth $\operatorname{\tilde{O}}(\sqrt {n})$ can be achieved with $\operatorname{\tilde{O}}(m\sqrt {n})$ work, beating the $n^2$ work barrier with sublinear depth in sparse graphs. As an aside, similar bounds were later achieved for exact SSSP computations by Shi and Spencer [42].

In a seminal paper, Cohen [17] proved that SSSP can be $(1+\operatorname{o}(1))$-approximated with depth $\operatorname{polylog}n$ and near-optimal $\operatorname{\tilde{O}}(m^{1+\varepsilon })$ work for any constant choice of $\varepsilon \gt 0$; her approach is based on the aforementioned hop set construction. Similar guarantees can be achieved deterministically. Henziger et al. [29] focus on Congest algorithms, which can be interpreted in our framework to yield hop sets $(1 + 1 / \operatorname{polylog}n)$-approximating distances for $d \in 2^{\operatorname{O}(\sqrt {\log n})} \subset n^{\operatorname{o}(1)}$ and can be computed using depth $2^{\operatorname{O}(\sqrt {\log n}) }\subset n^{\operatorname{o}(1)}$ and work $m 2^{\operatorname{O}(\sqrt {\log n})}\subset m^{1+\operatorname{o}(1)}$. In a recent breakthrough, Elkin and Neiman obtained hop sets with substantially improved tradeoffs [22], both for the parallel setting and the Congest model. On the negative side, Abboud et al. [1] proved principal limitations of hop sets by providing lower bounds on the tradeoffs between the parameters.

Our embedding technique is formulated independently from the underlying hop set construction, whose performance is reflected in the depth and work bounds of our algorithms. While the improvements by Elkin and Neiman do not enable us to achieve a work bound of $m^{1+\operatorname{o}(1)}$ when sticking to our goals of depth $\operatorname{polylog}n$ and expected stretch $\operatorname{O}(\log n)$, they can be used to obtain better tradeoffs between the parameters.

Metric Tree Embeddings. When metrically embedding into a tree, it is, in general, impossible to guarantee a small stretch. For instance, when the graph is a cycle with unit edge weights, it is impossible to embed it into a tree without having at least one edge with stretch $\operatorname{\Omega }(n)$. However, on average, the edges in this example are stretched by a constant factor only, justifying the hope that one may be able to randomly embed into a tree such that, for each pair of nodes, the expected stretch is small. A number of elegant algorithms [4, 7, 8, 23] compute tree embeddings, culminating in the one by Fakcharoenphol, Rao, and Talwar [23] (FRT) that achieves stretch $\operatorname{O}(\log n)$ in expectation. This stretch bound is optimal in the worst case, as illustrated by expander graphs [8]. Mendel and Schwob show how to sample from the FRT distribution in $\operatorname{O}(m \log ^3 n)$ steps [37]. This upper bound has recently been improved: Blelloch et al. present an algorithm that requires time $\operatorname{O}(m \log n)$ w.h.p. Both algorithms match the trivial $\Omega (m)$ lower bound up to polylogarithmic factors. However, their approach relies on a pruned version of Dijkstra's algorithm for distance computations and hence does not lead to a low-depth parallel algorithm.

Several parallel and distributed algorithms compute FRT trees [14, 26, 30]. These algorithms and ours have in common that they represent the embedding by Least Element (LE) lists, which were first introduced in Cohen [16, 18]. In the parallel case, the state-of-the-art solution due to Blelloch et al. [14] achieves $\operatorname{O}(\log ^2 n)$ depth and $\operatorname{O}(n^2 \log n)$ work. However, Blelloch et al. assume the input to be given as an $n$-point metric, where the distance between two points can be queried at constant cost. Note that our approach is more general, as a metric can be interpreted as a complete weighted graph of SPD 1; a single MBF-like iteration reproduces the result by Blelloch et al. Moreover, this point of view shows that the input required to achieve subquadratic work must be a sparse graph. Furthermore, Blelloch et al. determine LE lists in $\operatorname{O}(D \log n)$ depth and $\operatorname{O}(W \log n)$ work, where $D$ and $W$ are the depth and work of an SSSP computation that is used as a black box [12]. However, to date only approximate SSSP algorithms simultaneously achieve $W\in \operatorname{\tilde{O}}(m)$ and $D\in \operatorname{polylog}n$; because calling an approximate SSSP algorithm multiple times does not result in approximate distances that respect the triangle inequality, this approach cannot be used to efficiently determine an FRT-style embedding. For graph inputs, we are not aware of any metric tree embedding algorithm achieving $\operatorname{polylog}n$ depth and a nontrivial work bound (i.e., not incurring the $\operatorname{\Omega }(n^3)$ work caused by relying on matrix-matrix multiplication).

In the distributed setting, Khan et al. [30] show how to compute LE lists in $\operatorname{O}(\operatorname{SPD}(G) \log n)$ rounds in the Congest model [41]. On the lower bound side, trivially $\operatorname{\Omega }(\operatorname{D}(G))$ rounds are required, where $\operatorname{D}(G)$ is the maximum hop distance (i.e., ignoring weights) between nodes. However, even if $\operatorname{D}(G) \in \operatorname{O}(\log n)$, $\operatorname{\tilde{\Omega }}(\sqrt {n}),$ rounds are necessary [20, 26]. Extending the algorithm by Khan et al. [26], it is shown how to obtain a round complexity of $\operatorname{\tilde{O}}(\min \lbrace n^{1/2+\varepsilon },\operatorname{SPD}(G)\rbrace + \operatorname{D}(G))$ for any $\varepsilon \gt 0$ at the expense of increasing the stretch to $\operatorname{O}(\varepsilon ^{-1}\log n)$. We partly build on these ideas; specifically, the construction in Section 4 can be seen as a generalization of the key technique from Khan et al. [26]. As detailed in Section 8, our framework subsumes these algorithms and can be used to improve on the result from Khan et al. [26]: Leveraging further results [29, 35], we obtain a metric tree embedding with expected stretch $\operatorname{O}(\log n)$ that is computed in $\min \lbrace n^{1/2+\operatorname{o}(1)} + \operatorname{D}(G)^{1+\operatorname{o}(1)}, \operatorname{\tilde{O}}(\operatorname{SPD}(G))\rbrace$ rounds.

1.2 Notation and Preliminaries

We consider weighted, undirected graphs $G = (V, E, \operatorname{\omega })$ without loops or parallel edges with nodes $V$, edges $E$, and edge weights . Unless specified otherwise, we set $n := |V|$ and $m := |E|$. For an edge $e = \lbrace v,w\rbrace \in E$, we write $\operatorname{\omega }(v, w) := \operatorname{\omega }(e)$, $\operatorname{\omega }(v, v) := 0$ for $v \in V$, and $\operatorname{\omega }(v,w) := \infty$ for $\lbrace v,w\rbrace \notin E$. We assume that the ratio between maximum and minimum edge weight is polynomially bounded in $n$ and that each edge weight and constant can be stored with sufficient precision in a single register.⁵ We assume that $G$ is connected and given in the form of an adjacency list.

Let $p \subseteq E$ be a path. $p$ has $|p|$ hops and weight $\operatorname{\omega }(p) := \sum _{e \in p} \operatorname{\omega }(e)$. For the nodes $v, w \in V$ let $\operatorname{P}(v, w, G)$ denote the set of paths from $v$ to $w$ and $\operatorname{P}^h(v, w, G)$ the set of such paths using at most $h$ hops. We denote by $\operatorname{dist}^h(v, w, G) := \min \lbrace \operatorname{\omega }(p) \mid p \in \operatorname{P}^h(v, w, G) \rbrace$ the minimum weight of an $h$-hop path from $v$ to $w$, where $\min \emptyset := \infty$; the distance between $v$ and $w$ is $\operatorname{dist}(v,w,G):=\operatorname{dist}^n(v,w,G)$. The shortest path hop distance between $v$ and $w$ is $\operatorname{hop}(v, w, G) := \min \lbrace |p| \mid p\in \operatorname{P}(v, w, G) \wedge \operatorname{\omega }(p) = \operatorname{dist}(v, w, G) \rbrace$; $\operatorname{MHSP}(v, w, G) := \lbrace p \in \operatorname{P}^{\operatorname{hop}(v,w,G)}(v, w, G) \mid \operatorname{\omega }(p) = \operatorname{dist}(v, w, G) \rbrace$ denotes all min-hop shortest paths from $v$ to $w$. Finally, the Shortest Path Diameter (SPD) of $G$ is $\operatorname{SPD}(G) := \max \lbrace \operatorname{hop}(v, w, G) \mid v,w \in V \rbrace$, and is the unweighted hop diameter of $G$.

We sometimes use $\min$ and $\max$ as binary operators, assume , and define, for a set $N$ and , $\binom{N}{k} := \lbrace M \subseteq N \mid |M| = k \rbrace$ and denote by $\operatorname{id}:N \rightarrow N$ the identity function. Furthermore, we use weak asymptotic notation hiding polylogarithmic factors in $n$: $\operatorname{O}(f(n) \operatorname{polylog}(n)) = \operatorname{\tilde{O}}(f(n))$, etc.

Model of Computation. We use an abstract model of parallel computation similar to those used in circuit complexity; the goal here is to avoid distraction by details such as read or write collisions or load balancing issues typical to PRAM models, noting that these can be resolved with (at most) logarithmic overheads. The computation is represented by a Directed Acyclic Graph (DAG) with constantly bounded maximum indegree, where nodes represent words of memory that are given as input (indegree 0) or computed out of previously determined memory contents (non-zero indegree). Words are computed with a constant number of basic instructions (e.g., addition, multiplication, comparison, etc.); here, we also allow for the use of independent randomness. For simplicity, a memory word may hold any number computed throughout the algorithm. As pointed out earlier, $\operatorname{O}(\log n)$-bit words suffice for our purpose.

An algorithm defines, given the input, the DAG and how the nodes’ content is computed as well as which nodes represent the output. Given an instance of the problem, the work is the number of nodes of the corresponding DAG and the depth is its longest path. Assuming that there are no read or write conflicts, the work is thus (proportional to) the time required by a single processor (of uniform speed) to complete the computation, whereas the depth lower-bounds the time required by an infinite number of processors. Note that the DAG may be a random graph because the algorithm may use randomness, implying that work and depth may be random variables. When making probabilistic statements, we require that they hold for all instances (i.e., the respective probability bounds are satisfied after fixing an arbitrary instance).

Probability. A claim holds w.h.p. if it occurs with a probability of at least $1 - n^{-c}$ for any fixed choice of ; $c$ is a constant for the purposes of $\operatorname{O}$-notation. We use the following basic statement frequently and implicitly throughout this article:

Lemma 1.2. Let $\mathcal {E}_1, \dots , \mathcal {E}_k$ be events occurring w.h.p., and $k \in \operatorname{poly}n$. $\mathcal {E}_1 \cap \dots \cap \mathcal {E}_k$ occurs w.h.p.

Proof. We have $k \le an^b$ for fixed and choose that all $\mathcal {E}_i$ occur with a probability of at least $1 - n^{-c^{\prime }}$ with $c^{\prime } = c + b + \log _n a$ for some fixed $c \ge 1$. The union bound yields

(2)

hence $\mathcal {E}_1 \cap \dots \cap \mathcal {E}_k$ occurs w.h.p. as claimed.

Hop Sets. A graph $G = (V, E, \operatorname{\omega })$, contains a $(d, \hat{\varepsilon })$-hop set if

\begin{equation} \forall v,w \in V:\quad \operatorname{dist}^d(v, w, G) \le (1 + \hat{\varepsilon }) \operatorname{dist}(v, w, G); \end{equation} (3)

(i.e., if its $d$-hop distances are a $(1 + \hat{\varepsilon })$-approximation of its distances). This definition is based on Cohen [ 17], who describes how to efficiently add edges to $G$ to establish this property.

Distance Metrics. The min-plus semiring , also referred to as the tropical semiring, forms a semiring (i.e., a ring without additive inverses; see Definition A.2 in Appendix A). Unless explicitly stated otherwise, we associate $\oplus$ and $\odot$ with the addition and multiplication of the underlying ring throughout the article; in this case, we use $a \oplus b := \min \lbrace a,b\rbrace$ and $a \odot b := a+b$. Observe that $\infty$ and 0 are the neutral elements with respect to $\oplus$ and $\odot$, respectively. We sometimes write $x \in \mathcal {S}_{\min ,+}$ instead of to refer to the elements of a semiring. Furthermore, we follow the standard convention to occasionally leave out $\odot$ and give it priority over $\oplus$ (e.g., interpret $ab \oplus c$ as $(a \odot b) \oplus c$ for all $a,b,c \in \mathcal {S}_{\min ,+}$).

The min-plus semiring is a well-established tool to determine pairwise distances in a graph via the distance product (see, e.g., [3, 39, 43]). Let $G = (V, E, \operatorname{\omega })$ be a weighted graph and let $A \in \mathcal {S}_{\min ,+}^{V \times V}$ be its adjacency matrix $A$, given by

\begin{equation} (a_{vw}) := { \left\lbrace \begin{array}{@{}l@{\quad }l@{}}0 & \text{if }v=w\\ \operatorname{\omega }(v,w) & \text{if }\lbrace v,w\rbrace \in E\\ \infty & \text{otherwise.} \end{array}\right. } \end{equation} (4)

Throughout this article, the operations involved in matrix addition and multiplication are the operations of the underlying semiring; that is, for square matrices $A, B$ with row and column index set $V$ we have

\begin{align} (A \oplus B)_{vw} &= \min \lbrace a_{vw}, b_{vw} \rbrace \text{ and} \end{align} (5)

\begin{align} (AB)_{vw} &= \min _{u \in V} \lbrace a_{vu} + b_{uw} \rbrace . \end{align} (6)

The distance product $A^h$ corresponds to $h$-hop distances (i.e., $(A^h)_{vw} = \operatorname{dist}^h(v, w, G))$ [ 3]. In particular, this corresponds to the exact distances between all pairs of nodes for $h \ge \operatorname{SPD}(G)$.

2 MBF-LIKE ALGORITHMS

The MBF algorithm [11, 24, 40] is both fundamental and elegant. In its classical form, it solves the SSSP problem: In each iteration, each node communicates its current upper bound on its distance to the source node $s$ (initially $\infty$ at all nodes but $s$) plus the corresponding edge weight to its neighbors, which then keep the minimum of the received values and their previously stored one. Iterating $h$ times determines all nodes’ $h$-hop distances to $s$.

Over the years, numerous algorithms emerged that use similar iterative schemes for distributing information [4, 7, 8, 23, 29, 33–35]. It is natural to ask for a characterization that captures all of these algorithms. In this section, we propose such a characterization: the class of MBF-like algorithms. The common denominator of these algorithms is the following:

An initial state vector $x^{(0)} \in M^V$ contains information initially known to each node.
In each iteration, each node first propagates information along all incident edges.
All nodes then aggregate the received information. This and the previous step are precisely the same as updating the state vector $x^{(i)}$ by the matrix-vector product $x^{(i+1)} = A x^{(i)}$ over the min-plus semiring.
Finally, irrelevant information is filtered out before moving on to the next iteration.

As a concrete example consider $k$-Source Shortest Paths ($k$-SSP), the task of determining for each node the list of its $k$ closest nodes. To this end, one needs to consider all nodes as sources (i.e., run the multi-source variant of the classic MBF algorithm with all nodes as sources). Nodes store values in , so that in iteration $i$ each node $v \in V$ can store for all $w \in V$. Initially, $x^{(0)}_{vw}$ is 0 if $v = w$ and $\infty$ everywhere else (the 0-hop distances).⁶ Propagating these distances over an edge of weight $\operatorname{\omega }(e)$ means uniformly increasing them by $\operatorname{\omega }(e)$. During aggregation, each node picks, for each target node, the smallest distance reported so far. This is costly since each node might learn non-$\infty$ distances values for all other nodes. To increase efficiency, we filter out, in each iteration and at each node, all source–distance pairs but the $k$ pairs with smallest distance. This reduces the amount of work per iteration from $\operatorname{\tilde{\Theta }}(mn)$ to $\operatorname{\tilde{\Theta }}(mk)$.

The filtering step generalizes from classic MBF to an MBF-like algorithm, with the goal of reducing work. The crucial characteristics exploited by this idea are the following:

Propagation and aggregation are interchangeable. It makes no difference whether two pieces of information are propagated separately or as a single aggregated piece of information.
Filtering or not filtering after aggregation has no impact on the correctness (i.e., the output) of an algorithm, only on its efficiency.

In this section, we formalize this approach for later use in more advanced algorithms. To this end, we develop a characterization of MBF-like algorithms in Sections 2.1–2.3 and establish basic properties in Section 2.4. We demonstrate that our approach applies to a wide variety of known algorithms in Section 3. In order to maintain self-containment without obstructing presentation, basic algebraic definitions are given in Appendix A.

2.1 Propagation and Aggregation

Let $M$ be the set of node states, i.e., the possible values that an MBF-like algorithm can store at a vertex. We represent propagation of $x \in M$ over an edge of weight by $s \odot x$, where , and aggregation of $x,y \in M$ at some node by $x \oplus y$, where $\oplus :M \times M \rightarrow M$; the discussion of filtering is deferred to Section 2.2. Concerning the aggregation of information, we demand that $\oplus$ is associative and has a neutral element $\bot \in M$ encoding “no available information,” hence $(M, \oplus)$ is a semigroup with neutral element $\bot$. Furthermore, we require for all and $x,y \in M$ (note that we “overload” $\oplus$ and $\odot$):

\begin{equation} 0 \odot x = x\\ \end{equation} (7)

\begin{equation} \infty \odot x = \bot \\ \end{equation} (8)

\begin{equation} s \odot (x \oplus y) = (s \odot x) \oplus (s \odot y) \\ \end{equation} (9)

\begin{equation} (s \oplus t) \odot x = (s \odot x) \oplus (t \odot x) \\ \end{equation} (10)

\begin{equation} (s \odot t) \odot x = s \odot (t \odot x) . \end{equation} (11)

Our requirements are quite natural: Equations ( 7) and ( 8) state that propagating information over zero distance (e.g., keeping it at a vertex) does not alter it and that propagating it infinitely far away (i.e., “propagating” it over a nonexisting edge) means losing it, respectively. Note that 0 and $\infty$ are the neutral elements with respect to $\odot$ and $\oplus$ in $\mathcal {S}_{\min ,+}$. Equation ( 9) says that propagating aggregated information is equivalent to aggregating propagated information (along identical distances), Equation ( 10) means that propagating information over a shorter of two edges is equivalent to moving it along both edges and then aggregating it (information “becomes obsolete” with increasing distance), and Equation ( 11) states that propagating propagated information can be done in a single step.

Altogether, this is equivalent to demanding that $\mathcal {M} = (M, \oplus , \odot)$ is a zero-preserving semimodule (see Definition A.3 in Appendix A) over $\mathcal {S}_{\min ,+}$. A straightforward choice of $\mathcal {M}$ is the direct product of $|V|$ copies of , which is suitable for most of the applications we consider.

Definition 2.1 (Distance Map). The distance map semimodule is given by, for all $s \in \mathcal {S}_{\min ,+}$ and $x,y \in \mathcal {D}$,

\begin{equation} (x \oplus y)_v := x_v \oplus y_v = \min \lbrace x_v, y_v\rbrace \\ \end{equation} (12)

\begin{equation} (s \odot x)_v := s \odot x_v = s + x_v \end{equation} (13)

where $\bot := (\infty ,\ldots ,\infty)^\top \in \mathcal {D}$ is the neutral element with respect to $\oplus$.

Corollary 2.2. $\mathcal {D}$ is a zero-preserving semimodule over $\mathcal {S}_{\min ,+}$ with zero $\bot = (\infty , \dots , \infty)^\top$ by Lemma A.4.

Distance maps can be represented by only storing the non-$\infty$ distances (and their indices from $V$). This is of interest when there are few non-$\infty$ entries, which can be ensured by filtering (see below). In the following, we denote by $|x|$ the number of non-$\infty$ entries of $x \in \mathcal {D}$. The following lemma shows that this representation allows for efficient aggregation.

Lemma 2.3. Suppose $x_1, \dots , x_n \in \mathcal {D}$ are stored in lists of index–distance pairs as above. Then $\bigoplus _{i=1}^n x_i$ can be computed with $\operatorname{O}(\log n)$ depth and $\operatorname{O}(\sum _{i=1}^n |x_i| \log n)$ work.

Proof. We sort $\bigcup _{i=1}^n x_i$ in ascending lexicographical order. This can be done in parallel with $\operatorname{O}(\log (\sum _{i=1}^n|x_i|)) \subseteq \operatorname{O}(\log n)$ depth and $\operatorname{O}(\sum _{i=1}^n |x_i| \log n)$ work [2]. Then we delete each pair for which the next smaller pair has the same index; the resulting list thus contains, for each $v \in V$ for which there is a non-$\infty$ value in some list $x_i$, the minimum such value. As this operation is easy to implement with $\operatorname{O}(\log n)$ depth and $\operatorname{O}(\sum _{i=1}^n |x_i| \log n)$ work, the claim follows.

While $\mathcal {S}_{\min ,+}$ and $\mathcal {D}$ suffice for most applications and are suitable to convey our ideas, it is sometimes necessary to use a different semiring. We elaborate on this in Section 3. Hence, rather than confining the discussion to semimodules over $\mathcal {S}_{\min ,+}$, in the following we make general statements about an arbitrary semimodule $\mathcal {M} = (M, \oplus , \odot)$ over an arbitrary semiring $\mathcal {S} = (S, \oplus , \odot)$ wherever it does not obstruct the presentation. It is, however, helpful to keep $\mathcal {S} = \mathcal {S}_{\min ,+}$ and $\mathcal {M} = \mathcal {D}$ in mind.

2.2 Filtering

MBF-like algorithms achieve efficiency by maintaining and propagating—instead of the full amount of information nodes are exposed to—only a filtered (small) representative of the information they obtained. Our goal in this section is to capture the properties that a filter must satisfy to not affect output correctness. We start with a congruence relation, an equivalence relation compatible with propagation and aggregation, on $\mathcal {M}$. A filter $r:\mathcal {M} \rightarrow \mathcal {M}$ is a projection mapping all members of an equivalence class to the same representative within that class; compare Definition 2.6.

Definition 2.4 (Congruence Relation). Let $\mathcal {M} = (M, \oplus , \odot)$ be a semimodule over the semiring $\mathcal {S}$ and $\sim$ an equivalence relation on $M$. We call $\sim$ a congruence relation on $\mathcal {M}$ if and only if

\begin{equation} \forall s \in \mathcal {S}, \forall x,x^{\prime } \in \mathcal {M}:\quad x \sim x^{\prime } \Rightarrow sx \sim sx^{\prime } \\ \end{equation} (14)

\begin{equation} \forall x,x^{\prime },y,y^{\prime } \in \mathcal {M}:\quad x \sim x^{\prime } \wedge y \sim y^{\prime } \Rightarrow x \oplus y \sim x^{\prime } \oplus y^{\prime }. \end{equation} (15)

A congruence relation induces a quotient semimodule.

Observation 2.5. Denote by $[x]$ the equivalence class of $x \in \mathcal {M}$ under the congruence relation $\sim$ on the semimodule $\mathcal {M}$. Set ${M}/_{\sim } := \lbrace [x] \mid x \in \mathcal {M} \rbrace$. Then ${\mathcal {M}}/_{\sim } := ({M}/_{\sim }, \oplus , \odot)$ is a semimodule with the operations $[x] \oplus [y] := [x\oplus y]$ and $s \odot [x] := [sx]$.

An MBF-like algorithm performs efficient computations by implicitly operating on this quotient semimodule (i.e., on suitable, typically small, representatives of the equivalence classes). Such representatives are obtained in the filtering step using a representative projection, also referred to as a filter. We refer to this step as filtering since, in all our applications and examples, it discards a subset of the available information that is irrelevant to the problem at hand.

Definition 2.6 (Representative Projection). Let $\mathcal {M} = (M, \oplus , \odot)$ be a semimodule over the semiring $\mathcal {S}$ and $\sim$ a congruence relation on $\mathcal {M}$. Then $r:\mathcal {M} \rightarrow \mathcal {M}$ is a representative projection with respect to $\sim$ if and only if

\begin{equation} \forall x \in \mathcal {M}:\quad x \sim r(x) \\ \end{equation} (16)

\begin{equation} \forall x,y \in \mathcal {M}:\quad x \sim y \Rightarrow r(x) = r(y). \end{equation} (17)

Observation 2.7. A representative projection is a projection (i.e., $r^2 = r)$.

In the following, we typically first define a suitable projection $r$; this projection in turn defines equivalence classes $[x]:=\lbrace y \in \mathcal {M} \mid r(x)=r(y) \rbrace$. The following lemma is useful when we need to show that equivalence classes defined this way yield a congruence relation (i.e., are suitable for MBF-like algorithms).

Lemma 2.8. Let $\mathcal {M}$ be a semimodule over the semiring $\mathcal {S}$, let $r:\mathcal {M} \rightarrow \mathcal {M}$ be a projection, and for $x,y \in \mathcal {M}$, let $x \sim y :\Leftrightarrow r(x) = r(y)$. Then $\sim$ is a congruence relation with representative projection $r$ if:

\begin{equation} \forall s \in \mathcal {S}, \forall x,x^{\prime } \in \mathcal {M}:\quad r(x) = r(x^{\prime }) \Rightarrow r(sx) = r(sx^{\prime })\text{, and} \\ \end{equation} (18)

\begin{equation} \forall x,x^{\prime },y,y^{\prime } \in \mathcal {M}:\quad r(x) = r(x^{\prime }) \wedge r(y) = r(y^{\prime }) \Rightarrow r(x \oplus y) = r(x^{\prime } \oplus y^{\prime }). \end{equation} (19)

Proof. Obviously, $\sim$ is an equivalence relation, and $r$ fulfills Equations (16) and (17). Conditions (2.8) and (2.9) directly follow from the preconditions of the lemma.

An MBF-like algorithm has to behave in a compatible way for all vertices in that each vertex follows the same propagation, aggregation, and filtering rules. This induces a semimodule structure on the (possible) state vectors of the algorithm in a natural way.

Definition 2.9 (Power Semimodule). Given a node set $V$ and a zero-preserving semimodule $\mathcal {M} = (M, \oplus , \odot)$ over the semiring $\mathcal {S}$, we define $\mathcal {M}^V = (M^V, \oplus , \odot)$ by applying the operations of $\mathcal {M}$ coordinatewise (i.e., $\forall v,w\in M^V, \forall s \in \mathcal {S}$):

\begin{equation} (x \oplus y)_v := x_v\oplus y_v\text{ and} \\ \end{equation} (20)

\begin{equation} (s \odot x)_v := s \odot x_v. \end{equation} (21)

Furthermore, by $r^V$ we denote the componentwise application of a representative projection $r$ of $\mathcal {M}$,

\begin{equation} (r^V x)_v := r(x_v). \end{equation} (22)

This induces the equivalence relation $\sim$ on $\mathcal {M}$ via $x \sim y$ if and only if $x_v \sim y_v$ for all $v \in V$.

Observation 2.10. $\mathcal {M}^V$ is a zero-preserving semimodule over $\mathcal {S}$ and $\bot ^V := (\bot ,\dots ,\bot)^\top \in \mathcal {M}^V$ is its neutral element with respect to $\oplus$, where $\bot$ is the neutral element of $\mathcal {M}$. The equivalence relation $\sim$ induced by $r^V$ is a congruence relation on $\mathcal {M}^V$ with representative projection $r^V$.

2.3 The Class of MBF-like Algorithms

The following definition connects the properties introduced and motivated earlier:

Definition 2.11 (MBF-like Algorithm). A MBF-like algorithm $\mathcal {A}$ is determined by

a zero-preserving semimodule $\mathcal {M}$ over a semiring $\mathcal {S}$,
a congruence relation on $\mathcal {M}$ with representative projection $r:\mathcal {M} \rightarrow \mathcal {M}$, and
initial values $x^{(0)} \in \mathcal {M}^V$ for the nodes (which may depend on the input graph).

On a graph $G$ with adjacency matrix $A$, $h$ iterations of $\mathcal {A}$ determine

\begin{equation} \mathcal {A}^h(G) := x^{(h)} := r^V A^h x^{(0)}. \end{equation} (23)

Since $\mathcal {A}$ reaches a fixed point after at most $i = \operatorname{SPD}(G) \lt n$ iterations (i.e., a state where $x^{(i+1)} = x^{(i)}$), we abbreviate $\mathcal {A}(G) := A^n(G)$.

Note that the definition of the adjacency matrix $A \in \mathcal {S}^{V \times V}$ depends on the choice of the semiring $\mathcal {S}$. For the standard choice of $\mathcal {S} = \mathcal {S}_{\min ,+}$, which suffices for all our core results, we define $A$ in Equation (4); examples using different semirings and the associated adjacency matrices are discussed in Sections 3.2–3.4.

The $(i+1)$-th iteration of an MBF-like algorithm $\mathcal {A}$ determines $x^{(i+1)} := r^V A x^{(i)}$ (propagate, aggregate, and filter). Thus, $h$ iterations yield $(r^V A)^h x^{(0)}$, which we show to be identical to $r^V A^h x^{(0)}$ in Corollary 2.17 of Section 2.4.

2.4 Preserving State-Equivalence Across Iterations

As motivated earlier, MBF-like algorithms filter intermediate results; a representative projection $r^V$ determines a small representative of each node state. This maintains efficiency: Nodes propagate and aggregate only small representatives of the relevant information instead of the full amount of information they are exposed to. However, as motivated in, for example, Section 2.2, filtering is only relevant regarding efficiency, but not the correctness of MBF-like algorithms.

In this section, we formalize this concept in the following steps.

We introduce the functions needed to iterate MBF-like algorithms without filtering (i.e., multiplications with [adjacency] matrices). These Simple Linear Functions (SLFs) are a proper subset of the linear⁷ functions on $\mathcal {M}^V$.
The next step is to observe that SLFs are well-behaved with respect to the equivalence classes ${\mathcal {M}^V}/_{\sim }$ of node states.
Equivalence classes of SLFs mapping equivalent inputs to equivalent outputs yield the functions required for the study of MBF-like algorithms. These form a semiring of (a subset of) the functions on ${\mathcal {M}^V}/_{\sim }$.
Finally, we observe that $r^V \sim \operatorname{id}$, formalizing the concepts of “operating on equivalence classes of node states” and “filtering being optional with respect to correctness.”

An SLF $f$ is “simple” in the sense that it corresponds to matrix-vector multiplications; thatis, it maps $x \in \mathcal {M}^V$ such that $(f(x))_v$ is a linear combination of the coordinates $x_w$, $w \in V$, of $x$.

Definition 2.12 (Simple Linear Function). Let $\mathcal {M}$ be a semimodule over the semiring $\mathcal {S}$. Each matrix $A \in \mathcal {S}^{V \times V}$ defines an SLF $A:\mathcal {M}^V \rightarrow \mathcal {M}^V$ (and vice versa) by

\begin{equation} A(x)_v := (Ax)_v = \bigoplus _{w \in V} a_{vw} x_w. \end{equation} (24)

Thus, each iteration of an MBF-like algorithm is an application of an SLF given by an adjacency matrix followed by an application of the filter $r^V$. In the following, fix a semiring $\mathcal {S}$, a semimodule $\mathcal {M}$ over $\mathcal {S}$, and a congruence relation $\sim$ on $\mathcal {M}$. Furthermore, let $F$ denote the set of SLFs (i.e., matrices $A \in \mathcal {S}^{V \times V}$), each defining a function $A:\mathcal {M}^V \rightarrow \mathcal {M}^V$.

Example 2.13 (Non-Simple Linear Function). We remark that not all linear functions on $\mathcal {M}^V$ are SLFs. Choose $V = \lbrace 1,2\rbrace$, $\mathcal {S} = \mathcal {S}_{\min ,+}$, and $\mathcal {M} = \mathcal {D}$. Consider $f:\mathcal {M}^V \rightarrow \mathcal {M}^V$ given by

\begin{equation} f\binom{(x_{11}, x_{12})}{(x_{21}, x_{22})} := \binom{(x_{11} \oplus x_{12}, \infty)}{\bot }. \end{equation} (25)

While $f$ is linear, $f(x)_1$ is not a linear combination of $x_1$ and $x_2$. Hence, $f$ is not an SLF.

Let $A,B \in F$ be SLFs. Denote by $A(x) \mapsto Ax$ the application of the SLF $A$ to the argument $x \in \mathcal {M}^V$. Furthermore, we write $(A \oplus B)(x) \mapsto A(x) \oplus B(x)$ and $(A \circ B)(x) \mapsto A(B(x))$ for the addition and concatenation of SLFs, respectively. We proceed to Lemma 2.14, in which we show that matrix addition and multiplication are equivalent to the addition and concatenation of SLF functions, respectively. It follows that the SLFs form a semiring that is isomorphic to the matrix semiring of SLF matrices. Hence, we may use $A(x)$ and $Ax$ interchangeably in the following.

Lemma 2.14. $\mathcal {F} := (F, \oplus , \circ)$, where $\oplus$ denotes the addition of functions and $\circ$ their concatenation, is a semiring. Furthermore, $\mathcal {F}$ is isomorphic to the matrix-semiring over $\mathcal {S}$; that is, for all $A,B \in F$ and $x\in \mathcal {M}^V$,

\begin{equation} (A \oplus B)(x) = (A \oplus B)x\text{ and} \\ \end{equation} (26)

\begin{equation} (A \circ B)(x) = ABx. \end{equation} (27)

Proof. Let $A, B \in F$ and $x \in \mathcal {M}^V$ be arbitrary. Regarding Equations (26) and (27), observe that we have

\begin{equation} (A \oplus B)x = Ax \oplus Bx = A(x) \oplus B(x) = (A \oplus B)(x)\text{ and}\\ \end{equation} (28)

\begin{equation} ABx = A(Bx) = A(B(x)) = (A \circ B)(x), \end{equation} (29)

respectively; addition and concatenation of SLFs are equivalent to addition and multiplication of their respective matrices. It follows that $\mathcal {F}$ is isomorphic to the matrix semiring $(\mathcal {S}^{V \times V}, \oplus , \odot)$ and hence $\mathcal {F}$ is a semiring as claimed.

Recall that MBF-like algorithms project node states to appropriate equivalent node states. SLFs correspond to matrices and (adjacency) matrices correspond to MBF-like iterations. Hence, it is important that SLFs are well-behaved with respect to the equivalence classes ${\mathcal {M}^V}/_{\sim }$ of node states. Lemma 2.15 states that this is the case (i.e., that $Ax \sim Ax^{\prime }$ for all $x^{\prime } \in [x]$).

Lemma 2.15. Let $A \in F$ be an SLF. Then we have, for all $x, x^{\prime } \in \mathcal {M}^V$,

\begin{equation} x \sim x^{\prime } \quad \Rightarrow \quad Ax \sim Ax^{\prime }. \end{equation} (30)

Proof. First, for , let $x_1, \dots , x_k, x^{\prime }_1, \dots , x^{\prime }_k \in \mathcal {M}$ be such that $x_i \sim x^{\prime }_i$ for all $1 \le i \le k$. We show that for all $s_1, \dots , s_k \in \mathcal {S}$ it holds that

\begin{equation} \bigoplus _{i=1}^k s_i x_i \sim \bigoplus _{i=1}^k s_i x^{\prime }_i. \end{equation} (31)

We argue that Equation ( 31) holds by induction over $k$. For $k = 1$, the claim trivially follows from Equation ( 14). Regarding $k \ge 2$, suppose the claim holds for $k - 1$. Since $x_k \sim x_k^{\prime }$, we have that $s_k x_k \sim s_k x_k^{\prime }$ by ( 2.8). The induction hypothesis yields $\bigoplus _{i=1}^{k-1} s_i x_i \sim \bigoplus _{i=1}^{k-1} s_i x_i^{\prime }$. Hence,

\begin{equation} \bigoplus _{i=1}^k s_i x_k = \left(\bigoplus _{i=1}^{k-1} s_i x_i \right) \oplus s_k x_k \stackrel{(2.9)}{\sim } \left(\bigoplus _{i=1}^{k-1} s_i x^{\prime }_i \right) \oplus s_k x^{\prime }_k = \bigoplus _{i=1}^k s_i x^{\prime }_k. \end{equation} (32)

As for the original claim, let $v \in V$ be arbitrary and note that we have

\begin{equation} (Ax)_v = \bigoplus _{w \in V} a_{vw} x_v \stackrel{(2.25)}{\sim } \bigoplus _{w \in V} a_{vw} x^{\prime }_v = (Ax^{\prime })_v. \end{equation} (33)

Due to Lemma 2.15, each SLF $A \in F$ not only defines a function $A:\mathcal {M}^V \rightarrow \mathcal {M}^V$, but also a function $A:{\mathcal {M}^V}/_{\sim } \rightarrow {\mathcal {M}^V}/_{\sim }$ with $A[x] := [Ax]$ ($A[x]$ does not depend on the choice of the representant $x^{\prime } \in [x]$). This is important since MBF-like algorithms implicitly operate on ${\mathcal {M}^V}/_{\sim }$ and because they do so using adjacency matrices, which are SLFs. As a natural next step, we rule for SLFs $A,B \in F$ that

\begin{equation} A \sim B \quad :\Leftrightarrow \quad \forall x \in \mathcal {M}^V:Ax \sim Bx; \end{equation} (34)

that is, that they are equivalent if and only if they yield equivalent results when presented with the same input. This yields equivalence classes ${F}/_{\sim } = \lbrace [A] \mid A \in F \rbrace$. This implies, by Equation ( 34), that $[A][x] := [Ax]$ is well-defined. In Theorem 2.16, we show that the equivalence classes of SLFs with respect to summation and concatenation form a semiring ${\mathcal {F}}/_{\sim }$. As MBF-like algorithms implicitly work on ${\mathcal {M}^V}/_{\sim }$, we obtain with ${\mathcal {F}}/_{\sim }$ precisely the structure that may be used to manipulate the state of MBF-like algorithms, which we leverage throughout this article.

Theorem 2.16. Each $[A] \in {F}/_{\sim }$ defines an SLF on ${\mathcal {M}^V}/_{\sim }$. Furthermore, ${\mathcal {F}}/_{\sim } := ({F}/_{\sim }, \oplus , \circ)$, where $\oplus$ denotes the addition and $\circ$ the concatenation of functions, is a semiring of SLFs on ${\mathcal {M}^V}/_{\sim }$ with

\begin{equation} {}[A] \oplus [B] = [A \oplus B]\text{ and} \\ \end{equation} (35)

\begin{equation} {}[A] \circ [B] = [AB]. \end{equation} (36)

Proof. As argued earlier, for any $A\in F$, $[A]\in {F}/_{\sim }$ is well-defined on ${\mathcal {M}^V}/_{\sim }$ by Lemma 2.15. Equations (35) and (36) follow from Equations (26) and (27), respectively:

\begin{equation} {}[A \oplus B][x] = [(A \oplus B)x] \stackrel{(2.20)}{=} [(A \oplus B)(x)] = ([A] \oplus [B])([x]) \\ \end{equation} (37)

\begin{equation} {}[AB][x] = [ABx] \stackrel{(2.21)}{=} [(A \circ B)(x)] = [A \circ B]([x]). \end{equation} (38)

To see that $[A]$ is linear, let $s \in \mathcal {S}$ and $x,y \in \mathcal {M}^V$ be arbitrary and compute

\begin{eqnarray} [A][x] \oplus [A][y] = [Ax] \oplus [Ay] = [Ax \oplus Ay] = [A(x \oplus y)] = [A][x \oplus y]\nonumber\\ = [A]([x] \oplus [y])\text{ and} \end{eqnarray} (39)

\begin{equation} [A](s[x]) = [A(sx)] = [s(Ax)] = s[Ax] = s[A][x]. \end{equation} (40)

This implies that ${\mathcal {F}}/_{\sim }$ is a semiring of linear functions. As each function $[A]$ is represented by multiplication with (any) SLF $A^{\prime }\in [A]$, $[A]$ is an SLF.

The following corollary is a key property used throughout this article. It allows us to apply filter steps whenever convenient. We later use this to simulate MBF-like iterations on an implicitly represented graph whose edges correspond to entire paths in the original graph. This is efficient only because we have the luxury of applying intermediate filtering repeatedly without affecting the output.

Corollary 2.17 ($r^V \sim \operatorname{id}$). For any representative projection $r$ on $\mathcal {M}$, we have $r^V \sim \operatorname{id}; that is,$ for any SLF $A \in F$ it holds that

\begin{equation} r^V A \sim A r^V \sim A. \end{equation} (41)

In particular (as promised in Section 2.3) for any MBF-like algorithm $\mathcal {A}$, we have

\begin{equation} \mathcal {A}^h(G) \stackrel{(2.17)}{=} r^V A^h x^{(0)} \stackrel{(2.35)}{=} (r^V A)^h x^{(0)}. \end{equation} (42)

Finally, we stress that both the restriction to SLFs and the componentwise application of $r$ in $r^V$ are crucial for Corollary 2.17.

Example 2.18 (Non-Simple Linear Functions break Corollary 2.17). Consider $V$, $\mathcal {M}$, and $f$ from Example 2.13. If $r(x)=(x_1,\infty)$ for all $x\in \mathcal {M}$, we have that

\begin{equation} r^Vf\binom{(2,1)}{\bot }=\binom{(1,\infty)}{\bot }\not\sim \binom{(2,\infty)}{\bot }=fr^V\binom{(2,1)}{\bot }, \end{equation} (43)

implying that $r^Vf \not\sim fr^V$.

Example 2.19 (Non–Component-Wise Filtering Breaks Corollary 2.17). Consider $V = \lbrace 1,2\rbrace$, $\mathcal {S} = \mathcal {S}_{\min ,+}$, and $\mathcal {M} = \mathcal {D}$. Suppose $f$ is the SLF given by $fx := ({{x_1 \oplus x_2}\atop{\bot }})$ and $r^V(x) := ({{x_1}\atop{\bot }})$; that is, $r^V$ is not a component-wise application of some representative projection $r$ on $\mathcal {M}$, but still a representative projection on $\mathcal {M}^V$. Then we have that

\begin{equation} r^V f \binom{(2, \infty)}{(1, \infty)} = r^V \binom{(1, \infty)}{\bot } = \binom{(1, \infty)}{\bot } \not\sim \binom{(2, \infty)}{\bot } = f \binom{(2, \infty)}{\bot } = fr^V \binom{(2, \infty)}{(1, \infty)}, \end{equation} (44)

again implying that $r^Vf \not\sim fr^V$.

3 A COLLECTION OF MBF-LIKE ALGORITHMS

For the purpose of illustration and to demonstrate the generality of our framework, we show that a variety of standard algorithms are MBF-like algorithms; due to the machinery established earlier, this is a trivial task in many cases. In order to provide an unobstructed view on the machinery— and since this section is not central to our contributions—we defer proofs to Appendix B.

We demonstrate that some more involved distributed algorithms in the Congest model have a straightforward and compact interpretation in our framework in Section 8. They compute metric tree embeddings based on the FRT distribution; we present them alongside an improved distributed algorithm based on the other results of this work.

MBF-like algorithms are specified by a zero-preserving semimodule $\mathcal {M}$ over a semiring $\mathcal {S}$, a representative projection of a congruence relation on $\mathcal {M}$, initial states $x^{(0)}$, and the number of iterations $h$ (compare Definition 2.11). While this might look like a lot, typically, a standard semiring and semimodule can be chosen; the general-purpose choices of $\mathcal {S} = \mathcal {S}_{\min ,+}$ and $\mathcal {M} = \mathcal {D}$ (see Definition 2.1 and Corollary 2.2) or $\mathcal {M} = \mathcal {S}_{\min ,+}$ (every semiring is a zero-preserving semimodule over itself) usually are up to the task. Refer to Sections 3.2 and 3.3 for examples that require different semirings. However, even in these cases, the semirings and semimodules specified in Sections 3.2 and 3.3 can be reused. Hence, all that is left to do in most cases is to pick an existing semiring and semimodule, choose , and specify a representative projection $r$.

3.1 MBF-like Algorithms over the Min-Plus Semiring

We demonstrate that the min-plus semiring $\mathcal {S}_{\min ,+}$ (a.k.a. the tropical semiring) is the semiring of choice to capture many standard distance problems. Note that we also use $\mathcal {S}_{\min ,+}$ in our core result (i.e., for sampling FRT trees). For the sake of completeness, first recall the adjacency matrix $A$ of the weighted graph $G$ in the semiring $\mathcal {S}_{\min ,+}$ from Equation (4) and the distance-map semimodule $\mathcal {D}$ from Definition 2.1; consider the initialization $x^{(0)} \in \mathcal {D}^V$ with

\begin{equation} x^{(0)}_{vw} := { \left\lbrace \begin{array}{@{}l@{\quad }l@{}}0 & \text{if $v = w$ and} \\ \infty & \text{otherwise,} \end{array}\right. } \end{equation} (45)

and observe that the entries of

\begin{equation} x^{(h)} := A^h x^{(0)} \end{equation} (46)

correspond to the $h$-hop distances in $G$:

Lemma 3.1. For and $x^{(h)}$ from Equation (46), we have

\begin{equation} x^{(h)}_{vw} = \operatorname{dist}^h(v,w,G). \end{equation} (47)

It is well-known that the min-plus semiring can be used for distance computations [3, 39, 43]. Nevertheless, for the sake of completeness, we prove Lemma 3.1 in terms of our notation in Appendix B.

As a first example, we turn our attention to source detection. It generalizes all examples covered in this section, saving us from proving each one of them correct; well-established examples like SSSP and APSP follow. Source detection was introduced by Lenzen and Peleg [36]. Note, however, that we include a maximum considered distance $d$ in the definition.

Example 3.2 (Source Detection [36]). Given a weighted graph $G = (V,E,\operatorname{\omega })$, sources $S \subseteq V$, hop and result limits , and a maximum distance , $(S,h,d,k)$-source detection is the following problem: For each $v \in V$, determine the $k$ smallest elements of $\lbrace (\operatorname{dist}^h(v,s,G), s) \mid s \in S, \operatorname{dist}(v,s,G) \le d \rbrace$ with respect to lexicographical ordering, or all of them if there are fewer than $k$.

Source detection is solved by $h$ iterations of an MBF-like algorithm with $\mathcal {S} = \mathcal {S}_{\min ,+}$, $\mathcal {M} = \mathcal {D}$,

\begin{equation} r(x)_v \mapsto { \left\lbrace \begin{array}{@{}l@{\quad }l@{}}x_v & \text{if $v \in S$, $x_v \le d$, and $x_v$ is among $k$ smallest entries of $x$ (ties broken by index),} \\ \infty & \text{otherwise,} \end{array}\right. } \end{equation} (48)

and $x^{(0)}_{vv} = 0$ if $v \in S$ and $x^{(0)}_{vw} = \infty$ in all other cases.

Since it may not be obvious that $r$ is a representative projection, we prove it in Appendix B.

Example 3.3 (SSSP). Single-Source Shortest Paths (SSSP) requires us to determine the $h$-hop distance to $s \in V$ for all $v \in V$. It is solved by an MBF-like algorithm with $\mathcal {S} = \mathcal {M} = \mathcal {S}_{\min ,+}$, $r = \operatorname{id}$, and $x^{(0)}_s = 0$, $x^{(0)}_v = \infty$ for all $v \ne s$.

Equivalently, one may use $(\lbrace s\rbrace , h, \infty , 1)$-source detection, effectively resulting in $\mathcal {M} = \mathcal {S}_{\min ,+}$. When only storing the non-$\infty$ entries, only the $s$-entry is relevant; however, the vertex ID of $s$ is stored as well—and $r = \operatorname{id}$, too.

Example 3.4 ($k$-SSP). The $ k$-SSP requires us to determine, for each node, the $k$ closest nodes in terms of the $h$-hop distance $\operatorname{dist}^h(\cdot ,\cdot ,G)$. It is solved by an MBF-like algorithm, as it corresponds to $(V, h, \infty , k)$-source detection.

Example 3.5 (APSP). All-Pairs Shortest Paths (APSP) is the task of determining the $h$-hop distance between all pairs of nodes. It is solved by an MBF-like algorithm because we can use $(V,h,\infty ,n)$-source detection, resulting in $\mathcal {M} = \mathcal {D}$, $r = \operatorname{id}$, and $x^{(0)}$ from Equation (45).

Example 3.6 (MSSP). In the Multi-Source Shortest Paths (MSSP) problem, each node is looking for the $h$-hop distances to all nodes in a designated set $S \subseteq V$ of source nodes. This is solved by the MBF-like algorithm for $(S,h,\infty ,|S|)$-source detection.

Example 3.7 (Forest Fires). The nodes in a graph $G$ form a distributed sensor network, the edges represent communication channels, and edge weights correspond to distances. Our goal is to detect, for each node $v$, if there is a node $w$ on fire within distance $\operatorname{dist}(v,w,G) \le d$ for some , where every node initially knows if it is on fire. As a suitable MBF-like algorithm, pick $h = n$, $\mathcal {S} = \mathcal {M} = \mathcal {S}_{\min ,+}$,

\begin{equation} r(x) \mapsto { \left\lbrace \begin{array}{@{}l@{\quad }l@{}}x & \text{if $x \le d$ and} \\ \infty & \text{otherwise,} \end{array}\right. } \end{equation} (49)

and $x^{(0)}_v = 0$ if $v$ is on fire and $x^{(0)}_v = \infty$ otherwise.

Example 3.7 can be handled differently by using $(S,n,d,1)$-source detection, where $S$ are the nodes on fire. This also reveals the closest node on fire, whereas the solution from Example 3.7 works in anonymous networks. One can interpret both solutions as instances of SSSP with a virtual source $s \notin V$ that is connected to all nodes on fire by an edge of weight 0. This, however, requires a simulation argument and additional reasoning if the closest node on fire is to be determined.

3.2 MBF-like Algorithms over the Max-Min Semiring

Some problems require using a semiring other than $\mathcal {S}_{\min ,+}$. As an example, consider the Widest Path Problem (WPP), also referred to as the bottleneck shortest path problem: Given two nodes $v$ and $w$ in a weighted graph, find a $v$-$w$-path maximizing the lightest edge in the path. More formally, we are interested in the widest-path distance between $v$ and $w$:

Definition 3.8 (Widest-Path Distance). Given a weighted graph $G = (V, E, \operatorname{\omega })$, a path $p$ has width $\operatorname{width}(p) := \min \lbrace \operatorname{\omega }(e) \mid e \in p \rbrace$. The $h$-hop widest-path distance between $v,w \in V$ is

\begin{equation} \operatorname{width}^h(v, w, G) := \max _{p \in \operatorname{P}^h(v,w,G)} \lbrace \operatorname{width}(p)\rbrace . \end{equation} (50)

We abbreviate $\operatorname{width}(v, w, G) := \operatorname{width}^n(v, w, G)$.

An application of the WPP are trust networks: The nodes of a graph are entities, and an edge $\lbrace v,w\rbrace$ of weight $0 \lt \operatorname{\omega }(v,w) \le 1$ encodes that $v$ and $w$ trust each other with $\operatorname{\omega }(v,w)$. Assuming trust to be transitive, $v$ trusts $w$ with $\max _{p \in \operatorname{P}(v,w,G)} \min _{e \in p} \operatorname{\omega }(e) = \operatorname{width}(v,w,G)$. The WPP requires a semiring supporting the $\max$ and $\min$ operations:

Definition 3.9 (Max-Min Semiring). We refer to as the max-min semiring.

Lemma 3.10. $\mathcal {S}_{\max ,\min }$ is a semiring with neutral elements 0 and $\infty$.

Proof in Appendix B.

Corollary 3.11. $\mathcal {S}_{\max ,\min }$ is a zero-preserving semimodule over itself. Furthermore, we have that with, for all and ,

\begin{equation} (x \oplus y)_v := \max \lbrace x_v, y_v \rbrace \\ \end{equation} (51)

\begin{equation} (s \odot x)_v := \min \lbrace s, x_v \rbrace \end{equation} (52)

is a zero-preserving semimodule over $\mathcal {S}_{\max ,\min }$ with zero $\bot = (0, \dots , 0)^\top$ by Lemma A.4.

As adjacency matrix of $G = (V, E, \operatorname{\omega })$ with respect to $\mathcal {S}_{\max ,\min }$, we propose $A \in \mathcal {S}_{\max ,\min }^{V \times V}$ with

\begin{equation} (a_{vw}) := { \left\lbrace \begin{array}{@{}l@{\quad }l@{}}\infty & \text{if $v = w$,} \\ \operatorname{\omega }(v,w) & \text{if $\lbrace v,w\rbrace \in E$, and} \\ 0 & \text{otherwise.} \end{array}\right. } \end{equation} (53)

This is a straightforward adaptation of the adjacency matrix with respect to $\mathcal {S}_{\min ,+}$ in Equation ( 4). As an initialization, $x^{(0)} \in \mathcal {W}^V$ in which each node knows the trivial path of unbounded width to itself, but nothing else, is given by

\begin{equation} x^{(0)}_{vw} := { \left\lbrace \begin{array}{@{}l@{\quad }l@{}}\infty & \text{if $v = w$ and} \\ 0 & \text{otherwise.} \end{array}\right. } \end{equation} (54)

Then,

multiplications with $A$ (i.e., $h$ iterations) yield

\begin{equation} x^{(h)} := A^h x^{(0)} \end{equation} (55)

which corresponds to the $h$-hop widest-path distance:

Lemma 3.12. Given $x^{(h)}$ from Equation (55), we have

\begin{equation} x^{(h)}_{vw} = \operatorname{width}^h(v, w, G). \end{equation} (56)

Proof in Appendix B.

Example 3.13 (Single-Source Widest Paths). Single-Source Widest Paths (SSWP) asks for, given a weighted graph $G = (V, E, \operatorname{\omega })$, a designated source node $s \in V$, and , the $h$-hop widest-path distance $\operatorname{width}^h(s, v, G)$ for every $v \in V$. It is solved by an MBF-like algorithm with $\mathcal {S} = \mathcal {M} = \mathcal {S}_{\max ,\min }$, $r = \operatorname{id}$, and $x^{(0)}_s = \infty$ and $x^{(0)}_v = 0$ for all $v \ne s$.

Example 3.14 (All-Pairs Widest Paths). All-Pairs Widest Paths (APWP) asks for, given $G = (V, E, \operatorname{\omega })$ and , $\operatorname{width}^h(v, w, G)$ for all $v,w \in V$. APWP is MBF-like; it is solved by choosing $\mathcal {S} = \mathcal {S}_{\max ,\min }$, $\mathcal {M} = \mathcal {W}$, $r = \operatorname{id}$, and $x^{(0)}$ from Equation (54) by Lemma 3.12.

Example 3.15 (Multi-Source Widest Paths). In the Multi-Source Widest Paths (MSWP) problem, each node is looking for the $h$-hop widest path distance to all nodes in a designated set $S \subseteq V$ of source nodes. This is solved by the same MBF-like algorithm as for APWP in Example 3.14 when changing $x^{(0)}$ to $x^{(0)}_{vw} = \infty$ if $v = w \in S$ and $x^{(0)}_{vw} = 0$ otherwise.

3.3 MBF-like Algorithms over the All-Paths Semiring

Mohri discusses $k$-SDP, where each $v \in V$ is required to find the $k$ shortest paths to a designated source node $s \in V$, in the light of his algebraic framework for distance computations [39]. Our framework captures this application as well but requires a different semiring than $\mathcal {S}_{\min ,+}$: While $\mathcal {S}_{\min ,+}$ suffices for many applications (see Section 3.1), it cannot distinguish between different paths of the same length. This is a problem in the $k$-SDP because there may be multiple paths of the same length among the $k$ shortest.

Observation 3.16. No semimodule $\mathcal {M}$ over $\mathcal {S}_{\min ,+}$ can overcome this issue: The left-distributive law (A.9) requires, for all $x \in \mathcal {M}$ and $s, s^{\prime } \in \mathcal {S}_{\min ,+}$, that $sx \oplus s^{\prime }x = (s \oplus s^{\prime })x$. Consider different paths $\pi \ne \pi ^{\prime }$ ending in the same node with $\operatorname{\omega }(\pi) = s = s^{\prime } = \operatorname{\omega }(\pi ^{\prime })$. With respect to $\mathcal {S}_{\min ,+}$ and $\mathcal {M}$, the left-distributive law yields $sx \oplus s^{\prime }x = \min \lbrace s, s^{\prime }\rbrace \odot x$; that is, propagating $x$ over $\pi$, over $\pi ^{\prime }$, or over both and then aggregating must be indistinguishable in the case of $s = s^{\prime }$.

This does not mean that the framework of MBF-like algorithms cannot be applied, but rather it indicates that the toolbox needs a more powerful semiring than $\mathcal {S}_{\min ,+}$. The motivation of this section is to add such a semiring, the all-paths semiring $\mathcal {P}_{\min ,+}$, to the toolbox. Having established $\mathcal {P}_{\min ,+}$, the advantages of the previously established machinery are available: pick a semimodule (or use $\mathcal {P}_{\min ,+}$ itself) and define a representative projection. We demonstrate this for $k$-SDP and a variant.

The basic concept of $\mathcal {P}_{\min ,+}$ is simple: remember paths instead of adding up “anonymous” distances. Instead of storing the sum of the traversed edges’ weight, store the string of edges. We also add the ability to remember multiple paths into the semiring. This includes enough features in $\mathcal {P}_{\min ,+}$; we do not require dedicated semimodules for $k$-SDP, and we use the fact that $\mathcal {P}_{\min ,+}$ is a zero-preserving semimodule over itself.

We begin the technical part with a convenient representation of paths: Let $P \subset V^+$ denote the set of non-empty, loop-free, directed paths on $V$, denoted as tuples of nodes. Furthermore, let $\circ \subseteq P^2$ be the relation of concatenable paths defined by

\begin{equation} (v_1, \dots , v_k) \circ (w_1, \dots , w_\ell) \quad :\Leftrightarrow \quad v_k = w_1. \end{equation} (57)

By abuse of notation, when and if its operands are concatenable, we occasionally use $\circ$ as the concatenation operator. Furthermore, we use $\lbrace (\pi ^1, \pi ^2) \mid \pi = \pi ^1 \circ \pi ^2 \rbrace$ as a shorthand for the rather cumbersome $\lbrace (\pi ^1, \pi ^2) \mid \pi ^1, \pi ^2 \in P \wedge \pi ^1 \circ \pi ^2 \wedge \text{$\pi $ is the concatenation of $\pi ^1$ and $\pi ^2$} \rbrace$ to iterate over all two-splits of $\pi$. We call a path $\pi$ valid with respect to $G$ if $\pi \in \operatorname{P}(G)$ and invalid with respect to $G$ otherwise.

As motivated earlier, the all-paths semiring can store multiple paths. We represent this using vectors in storing a non-$\infty$ weight for every encountered path and $\infty$ for all paths not encountered so far. This can be efficiently represented by implicitly leaving out all $\infty$ entries.

Definition 3.17 (All-Paths Semiring). We call the all-paths semiring, where $\oplus$ and $\odot$ are defined, for all $\pi \in P$ and $x,y \in \mathcal {P}_{\min ,+}$, by

\begin{equation} (x \oplus y)_\pi := \min \lbrace x_\pi , y_\pi \rbrace \text{ and} \\ \end{equation} (58)

\begin{equation} (x \odot y)_\pi := \min \lbrace x_{\pi ^1} + y_{\pi ^2} \mid \pi = \pi ^1 \circ \pi ^2 \rbrace . \end{equation} (59)

We say that $x$ contains $\pi$ (with weight $x_\pi$) if and only if $x_\pi \lt \infty$.

Summation picks the smallest weight associated with each path in either operand; multiplication $(x \odot y)_\pi$ finds the lightest estimate for $\pi$ composed of two-splits $\pi = \pi ^1 \circ \pi ^2$, where $\pi ^1$ is picked from $x$ and $\pi ^2$ from $y$. Observe that $\mathcal {P}_{\min ,+}$ supports upper bounds on path lengths; we do not, however, use this feature. Intuitively, $\mathcal {P}_{\min ,+}$ stores all encountered paths with their exact weights; in this mindset, summation corresponds to the union and multiplication to the concatenability-obeying Cartesian product of the paths contained in $x$ and $y$.

Lemma 3.18. $\mathcal {P}_{\min ,+}$ is a semiring with neutral elements

\begin{equation} 0 := (\infty , \dots , \infty)^\top \text{ and} \\ \end{equation} (60)

\begin{equation} 1_\pi := { \left\lbrace \begin{array}{@{}l@{\quad }l@{}}0 & \text{if $\pi = (v)$ for some $v \in V$ and} \\ \infty & \text{otherwise} \end{array}\right. } \end{equation} (61)

with respect to $\oplus$ and $\odot$, respectively.

Proof in Appendix B.

Corollary 3.19. $\mathcal {P}_{\min ,+}$ is a zero-preserving semimodule over itself.

Computations on a graph $G = (V, E, \operatorname{\omega })$ with respect to $\mathcal {P}_{\min ,+}$ require (this is a generalization of Equation (4)) an adjacency matrix $A \in \mathcal {P}_{\min ,+}^{V \times V}$ defined by

\begin{equation} (a_{vw})_\pi := { \left\lbrace \begin{array}{@{}l@{\quad }l@{}}1_\pi & \text{if $v = w$,} \\ \operatorname{\omega }(v,w) & \text{if $\pi = (v,w)$, and} \\ \infty & \text{otherwise.} \end{array}\right. } \end{equation} (62)

On the diagonal, $a_{vv} = 1_{\pi }$ contains exactly the zero-hop paths of weight 0; all nontrivial paths are “unknown” in $a_{vv}$ (i.e., accounted for with an infinite weight). An entry $a_{vw}$ with $v \ne w$ contains, if present, only the edge $\lbrace v,w\rbrace$, represented by the path $(v,w)$ of weight $\operatorname{\omega }(v,w)$; all other paths are not contained in $a_{vw}$. An initialization where each node $v$ knows only about the zero-hop path $(v)$ is represented by the vector $x^{(0)} \in \mathcal {P}_{\min ,+}^V$ with

\begin{equation} \left(x^{(0)}_v \right)_\pi := { \left\lbrace \begin{array}{@{}l@{\quad }l@{}}0 & \text{if $\pi = (v)$ and} \\ \infty & \text{otherwise.} \end{array}\right. } \end{equation} (63)

Then,

multiplications of $x^{(0)}$ with $A$ (i.e., $h$ iterations) yield $x^{(h)}$ with

\begin{equation} x^{(h)} := A^h x^{(0)}. \end{equation} (64)

As expected, $x^{(h)}_v$ contains exactly the $h$-hop paths beginning in $v$ with their according weights:

Lemma 3.20. Let $x^{(h)}$ be defined as in Equation (63), with respect to the graph $G = (V, E, \operatorname{\omega })$. Then, for all $v \in V$ and $\pi \in P$

\begin{equation} \left(x^{(h)}_v \right)_\pi = { \left\lbrace \begin{array}{@{}l@{\quad }l@{}}\operatorname{\omega }(\pi) & \text{if $\pi \in \operatorname{P}^h(v, \cdot , G)$ and} \\ \infty & \text{otherwise.} \end{array}\right. } \end{equation} (65)

Proof in Appendix B.

With the all-paths semiring $\mathcal {P}_{\min ,+}$ established, we turn to the $k$-SDP, our initial motivation for adding $\mathcal {P}_{\min ,+}$ to the toolbox of MBF-like algorithms in the first place.

Definition 3.21 ($k$-Shortest Distance Problem [39]). Given a graph $G = (V, W, \operatorname{\omega })$ and a designated source vertex $s \in V$, the $k$-SDP asks: For each node $v \in V$ and considering all $v$-$s$-paths, what are the weights of the $k$ lightest such paths? In the $k$-DSDP, the path weights have to be distinct.

Observe that with the preceding definitions of $A$ and $x^{(h)}$, we always associate a path $\pi$ with either its weight $\operatorname{\omega }(\pi)$ or with $\infty$; in particular, invalid paths always are associated with $\infty$. Formally, $G$ induces a subsemiring of $\mathcal {P}_{\min ,+}$. In addition to being an interesting observation, these properties are required for the representative projections defined later $(r$ breaks for the $k$-DSDP when facing inconsistent non-$\infty$ values for the same path) so we formalize them. Let $G = (V, E, \operatorname{\omega })$ be a graph and let be the restriction of to exact path weights and $\infty$:

\begin{equation} x_\pi \in \lbrace \operatorname{\omega }(\pi), \infty \rbrace ; \end{equation} (66)

recall that $\operatorname{\omega }(\pi) = \infty$ for all paths $\pi$ invalid with respect to $G$.

Definition 3.22 (Graph-Induced All-Paths Semiring). Let $G$ be a graph and as above. Then we refer to $\mathcal {P}_{\min ,+}(G) := (D(G), \oplus , \odot)$ as the all-paths semiring induced by $G$, where $\oplus$ and $\odot$ are the same as in Definition 3.17.

The next step is to show that $\mathcal {P}_{\min ,+}(G)$ is a semiring.⁸

Lemma 3.23. $\mathcal {P}_{\min ,+}(G)$ is a semiring.

Proof in Appendix B.

Corollary 3.24. $\mathcal {P}_{\min ,+}(G)$ is a zero-preserving semimodule over itself.

Observe that we have $A \in \mathcal {P}^{V \times V}_{\min ,+}(G)$ as well as $x^{(0)} \in \mathcal {P}^V_{\min ,+}(G)$. It follows that Lemma 3.20 holds for $\mathcal {P}_{\min ,+}(G)$ as much as it does for $\mathcal {P}_{\min ,+}$. Furthermore, observe that the restriction to $\mathcal {P}_{\min ,+}(G)$ happens implicitly, simply by starting with the preceding initialization. There is no information about $\mathcal {P}_{\min ,+}(G)$ that needs to be distributed in the graph in order to run an MBF-like algorithm over $\mathcal {P}_{\min ,+}(G)$.

In order to solve the $k$-SDP, we require a representative projection that reduces the abundance of paths stored in an unfiltered $x^{(h)}$ to the relevant ones. Relevant in this case simply means to keep the $k$ shortest $v$-$s$-paths in $x^{(h)}_v$. In order to formalize this, let $P(v,w,x)$ denote, for $x \in \mathcal {P}_{\min ,+}(G)$ and $v,w \in V$, the set of all $v$-$w$-paths contained in $x$:

\begin{equation} P(v,w,x) := \lbrace \pi \in P \mid \text{$\pi $ is a $v$-$w$-path with $x_{\pi }\ne \infty $} \rbrace . \end{equation} (67)

Order $P(v,w,x)$ ascendingly with respect to the weights $x_\pi$, breaking ties using lexicographical order on $P$. Then let $P_k(v,w,x)$ denote the set of the first (at most) $k$ entries of that sequence:

\begin{equation} P_k(v,w,x) := \lbrace \pi \mid \text{$(x_{\pi }, \pi)$ is among the $k$ smallest of $\lbrace (x_{\pi ^{\prime }}, \pi ^{\prime }) \mid \pi ^{\prime } \in P(v,w,x) \rbrace $} \rbrace . \end{equation} (68)

We define the (representative, see below) projection $r:\mathcal {P}_{\min ,+}(G) \rightarrow \mathcal {P}_{\min ,+}(G)$ by

\begin{equation} r(x)_\pi \mapsto { \left\lbrace \begin{array}{@{}l@{\quad }l@{}}x_\pi & \text{if $\pi \in P_k(v,s,x)$ for some $v \in V$ and} \\ \infty & \text{otherwise.} \end{array}\right. } \end{equation} (69)

If $x_\pi = r(x)_\pi$, we say that $r$ keeps $\pi$ and otherwise that $r$ discards $\pi$. The projection $r$ keeps, for each $v \in V$, exactly the $k$ shortest $v$- $s$-paths contained in $x$. Following the standard approach (see Lemma 2.8) we define vectors $x,y \in \mathcal {P}_{\min ,+}$ to be equivalent if and only if their entries for $P_k(\cdot ,s,x)$ do not differ:

\begin{equation} \forall x,y \in \mathcal {P}_{\min ,+}(G):\quad x \sim y \quad :\Leftrightarrow \quad r(x) = r(y). \end{equation} (70)

Lemma 3.25. $\sim$ is a congruence relation on $\mathcal {P}_{\min ,+}(G)$ with representative projection $r$.

Proof in Appendix B.

Observe that $r$ is defined to maintain the $k$ shortest $v$-$s$-paths for all $v \in V$, potentially storing $k|V|$ paths instead of just $k$. Intuitively, one could argue that $r^V x^{(h)}_v$ only needs to contain $k$ paths since they all start in $v$, which is what the algorithm should actually be doing. This objection is correct in that this is what actually happens when running the algorithm with initialization $x^{(0)}$: By Lemma 3.20, $x^{(h)}_v$ contains the $h$-hop shortest paths starting in $v$ and $r$ removes all that do not end in $s$ or are too long. On the other hand, the objection is flawed. In order for $r$ to behave correctly with respect to all $x \in \mathcal {P}_{\min ,+}$, especially those less nicely structured than $x^{(h)}_v$ where all paths start at $v,$ we must define $r$ as it is; otherwise, the proof of Lemma 3.25 fails for mixed starting-node inputs.

Example 3.26 ($k$-Shortest Distance Problem). $k$-SDP (compare Definition 3.21) is solved by an MBF-like algorithm $\mathcal {A}$ with $\mathcal {S} = \mathcal {M} = \mathcal {P}_{\min ,+}(G)$, the representative projection and congruence relation defined in Equations (69) and (70), the choices of $A$ and $x^{(0)}$ from Equations (62) and (63), and $h = \operatorname{SPD}(G)$ iterations.

By Lemma 3.20 and due to $h = \operatorname{SPD}(G)$, $x^{(h)}_v$ contains all paths that start in $v$, associated with their weights. Since $\mathcal {A}^h(G) = r^V x^{(h)}$, by definition of $r$ in Equation (69), $(r^V x^{(h)})_v = r(x^{(h)}_v)$ contains the subset of those paths that have the $k$ smallest weights and start in $v$ (i.e., precisely what $k$-SDP asks for).

We remark that solving a generalization of $k$-SDP looking for the $k$ shortest $h$-hop distances is straightforward using $h$ iterations. Furthermore, note that our approach reveals the actual paths along with their weights.

Example 3.27 ($k$-Distinct-Shortest Distance Problem). $k$-DSDP from Definition 3.21 can be solved analogously to $k$-SDP in Example 3.26.

In order for this to work, the definition of $P_k(v,w,x)$ in Equation (68) needs to be adjusted. For each of the $k$ smallest weights in $x$, the modified $\bar{P}_k(v, w, x)$ contains only one representative: the path contained in $x$ of that weight that is first with respect to lexicographical order on $P$. This results in

\begin{equation} \bar{P}^{\prime }_k(v,w,x) := \lbrace \pi \mid \text{$x_\pi $ is among the $k$ smallest of $\lbrace x_{\pi ^{\prime }} \mid \pi ^{\prime } \in P(v,w,x) \rbrace $} \rbrace \text{ and}\\ \end{equation} (71)

\begin{equation} \bar{P}_k(v,w,x) := \lbrace \pi \mid \text{$\pi $ is lexicographically first of $\lbrace \pi ^{\prime } \in \bar{P}^{\prime }_k(v,w,x) \mid x_{\pi ^{\prime }} = x_\pi \rbrace $} \rbrace . \end{equation} (72)

The proof of Lemma 3.25 works without modification when replacing Equation ( 68) with Equations ( 71)–( 72).

3.4 MBF-like Algorithms over the Boolean Semiring

A well-known semiring is the Boolean semiring $\mathcal {B} = (\lbrace 0,1\rbrace , \vee , \wedge)$. By Lemma A.4, $\mathcal {B}^V$ is a zero-preserving semimodule over $\mathcal {B}$. It can be used to check for connectivity in a graph⁹ using the adjacency matrix

\begin{equation} (a_{vw}) := { \left\lbrace \begin{array}{@{}l@{\quad }l@{}}1 & \text{if $v=w$ or $\lbrace v,w\rbrace \in E$ and} \\ 0 & \text{otherwise} \end{array}\right. } \end{equation} (73)

together with initial values

\begin{equation} x^{(0)}_{vw} := { \left\lbrace \begin{array}{@{}l@{\quad }l@{}}1 & \text{if $v=w$ and} \\ 0 & \text{otherwise} \end{array}\right. } \end{equation} (74)

indicating that each node $v \in V$ is connected to itself. An inductive argument reveals that

\begin{equation} \left(A^h x^{(0)} \right)_{vw} = 1 \quad \Leftrightarrow \quad \operatorname{P}^h(v,w,G) \ne \emptyset . \end{equation} (75)

Example 3.28 (Connectivity). Given a graph, we want to check which pairs of nodes are connected by paths of at most $h$ hops. This is solved by an MBF-like algorithm using $\mathcal {S} = \mathcal {B}$, $\mathcal {M} = \mathcal {B}^V$, $r = \operatorname{id}$, and $x^{(0)}$ from Equation (74). This example easily generalizes to single-source and multi-source connectivity variants.

4 THE SIMULATED GRAPH

In order to sample from a tree embedding of the graph $G$, we need to determine LE lists (compare Section 7) for a random permutation of the nodes. These are the result of an MBF-like algorithm using $\mathcal {S}_{\min ,+}$ and $\mathcal {D}$; its filter $r$ ensures that $|r(x^{(i)})_v| \in \operatorname{O}(\log n)$ w.h.p. for all $i$ (i.e., that intermediate results are small). This allows for performing an iteration with $\operatorname{\tilde{O}}(m)$ work. However, doing so requires $\operatorname{SPD}(G)$ iterations, which in general can be as large as $n - 1$, conflicting with our goal of polylogarithmic depth.

To resolve this problem, we reduce the SPD, accepting a slight increase in stretch. The first step is to use Cohen's $(d, 1 / \operatorname{polylog}n)$-hop set [17]: a small number of additional (weighted) edges for $G$, such that for all $v,w \in V$, $\operatorname{dist}^d(v,w,G^{\prime }) \le (1 + \hat{\varepsilon }) \operatorname{dist}(v,w,G)$, where $G^{\prime }$ is $G$ augmented with the additional edges and $\hat{\varepsilon }\in 1 / \operatorname{polylog}n$. Her algorithm is sufficiently efficient in terms of depth, work, and number of additional edges. Yet our problem is not solved: The $d$-hop distances in $G^{\prime }$ only approximate distances (compare Observation 1.1), but constructing FRT trees critically depends on the triangle inequality and thus on the use of exact distances.

In this section, we resolve this issue. After augmenting $G$ with the hop set, we embed it into a complete graph $H$ on the same node set so that $\operatorname{SPD}(H) \in \operatorname{O}(\log ^2 n)$, keeping the stretch limited. Where hop sets preserve distances exactly and ensure the existence of approximately shortest paths with few hops, $H$ preserves distances approximately but guarantees that we obtain exact shortest paths with few hops. Note that explicitly constructing $H$ causes $\operatorname{\Omega }(n^2)$ work; we circumnavigate this obstacle in Section 5 with the help of the machinery developed in Section 2.

Since our construction requires to first add the hop set to $G$, assume for the sake of presentation that $G$ already contains a $(d, \hat{\varepsilon })$-hop set for fixed and throughout this section. We begin our construction of $H$ by sampling levels for the vertices $V$: Every vertex starts at level 0. In step $\lambda \ge 1$, each vertex in level $\lambda - 1$ is raised to level $\lambda$ with probability $\frac{1}{2}$. We continue until the first step $\Lambda + 1$ where no node is sampled. $\operatorname{\lambda }(v)$ refers to the level of $v \in V$. We define the level of an edge $e \in E$ as $\operatorname{\lambda }(e) := \min \lbrace \operatorname{\lambda }(v) \mid v \in e \rbrace$, the minimal level of its incident vertices; as $H$ will be a complete graph, we thus have $\operatorname{\lambda }(\lbrace v,w\rbrace) = \min \lbrace \operatorname{\lambda }(v),\operatorname{\lambda }(w)\rbrace$ for each $v,w\in V$, $v\ne w$.

Lemma 4.1. W.h.p., $\Lambda \in \operatorname{O}(\log n)$.

Proof. For , $v \in V$ has $\operatorname{\lambda }(v) \lt c \log n$ with probability $1 - (\frac{1}{2})^{c \log n} = 1 - n^{-c}$ (i.e., w.h.p.). Lemma 1.2 yields that all nodes have a level of $\operatorname{O}(\log n)$ w.h.p. and the claim follows.

The idea is to use the levels in the following way. We devise a complete graph $H$ on $V$. An edge of $H$ of level $\lambda$ is weighted with the $d$-hop distance between its endpoints in $G$—a $(1 + \hat{\varepsilon })$-approximation of their exact distance because $G$ contains a $(d, \hat{\varepsilon })$-hop set by assumption and multiplied with a penalty of $(1 + \hat{\varepsilon })^{\Lambda - \lambda }$. This way, high-level edges are “more attractive” for shortest paths because they receive smaller penalties.

Definition 4.2 (Simulated graph $H$). Let $G = (V, E, \operatorname{\omega })$ be a graph that contains a $(d, \hat{\varepsilon })$-hop set with levels sampled as above. We define the complete graph $H$ as

\begin{gather} H := \left(V, \binom{V}{2}, \operatorname{\omega }_\Lambda \right) \end{gather} (76)

\begin{gather} \operatorname{\omega }_\Lambda (\lbrace v,w\rbrace) \mapsto (1 + \hat{\varepsilon })^{\Lambda - \operatorname{\lambda }(v,w)} \operatorname{dist}^d(v, w, G). \end{gather} (77)

We formalize the notion of high-level edges being “more attractive” than low-level paths: In $H$, any min-hop shortest path between two nodes of level $\lambda$ is exclusively comprised of edges of level $\lambda$ or higher; no min-hop shortest path's level locally decreases. Therefore, all min-hop shortest paths can be split into two subpaths, the first of monotonically increasing and the second of monotonically decreasing levels.

Lemma 4.3. Consider $v,w \in V$, $\lambda = \operatorname{\lambda }(v,w)$, and $p \in \operatorname{MHSP}(v, w, H)$. Then all edges of $p$ have level at least $\lambda$.

Proof. The case $\lambda = 0$ is trivial. Consider $1 \le \lambda \le \Lambda$ and, for the sake of contradiction, let $q$ be a nontrivial maximal subpath of $p$ containing only edges of level strictly less than $\lambda$. Observe that $q \in \operatorname{MHSP}(v^{\prime }, w^{\prime }, H)$ for some $v^{\prime },w^{\prime }\in V$ with $\operatorname{\lambda }(v^{\prime }),\operatorname{\lambda }(w^{\prime }) \ge \lambda$. We have

\begin{equation} \operatorname{\omega }_\Lambda (q) \ge (1 + \hat{\varepsilon })^{\Lambda - (\lambda - 1)} \operatorname{dist}(v^{\prime }, w^{\prime }, G). \end{equation} (78)

However, the edge $e = \lbrace v^{\prime },w^{\prime }\rbrace$ has level $\operatorname{\lambda }(v^{\prime },w^{\prime })\ge \lambda$ and weight

\begin{equation} \operatorname{\omega }_\Lambda (e) \le (1 + \hat{\varepsilon })^{\Lambda - \lambda } \operatorname{dist}^d(v^{\prime }, w^{\prime }, G) \le (1 + \hat{\varepsilon })^{\Lambda - (\lambda - 1)} \operatorname{dist}(v^{\prime }, w^{\prime }, G) \le \operatorname{\omega }_\Lambda (q) \end{equation} (79)

by construction. Since $|q|$ is maximal and $\operatorname{\lambda }(v^{\prime }), \operatorname{\lambda }(w^{\prime }) \ge \lambda$, $q$ can only be a single edge of level $\lambda$ or higher, contradicting the assumption.

Knowing that edge levels in min-hop shortest paths are first monotonically increasing and then monotonically decreasing, the next step is to limit the number of hops spent on each level.

Lemma 4.4. Consider vertices $v$ and $w$ of $H$ with $\operatorname{\lambda }(v), \operatorname{\lambda }(w) \ge \lambda$. Then w.h.p., one of the following statements holds:

\begin{gather} \operatorname{hop}(v, w, H) \in \operatorname{O}(\log n)\text{ or} \end{gather} (80)

\begin{gather} \forall p \in \operatorname{MHSP}(v, w, H) \exists e \in p:\operatorname{\lambda }(e) \ge \lambda +1. \end{gather} (81)

Proof. Condition on the event $\mathcal {E}_{V_\lambda }$ that $V_\lambda \subseteq V$ is the set of nodes with level $\lambda$ or higher (with level $\lambda + 1$ not yet sampled). Let $H_\lambda := (V_\lambda , ({{V_\lambda }\atop{2}}), \operatorname{\omega }_\lambda)$ with $\operatorname{\omega }_\lambda (\lbrace v,w\rbrace) \mapsto (1 + \hat{\varepsilon })^{\Lambda - \lambda } \operatorname{dist}^d(v, w, G)$ denote the subgraph of $H$ spanned by $V_\lambda$ and capped at level $\lambda$.

Consider $p \in \operatorname{MHSP}(v, w, H_\lambda)$. Observe that independently for all $u \in V_\lambda$, and hence for all $e \in p$. This probability holds independently for every other edge of $p$. If $|p| \ge 2c \log _{4/3} n$ for some choice of , the probability that $p$ contains no edge of level $\lambda + 1$ or higher is bounded from above by $(\frac{3}{4})^{|p|/2} \le (\frac{3}{4})^{c \log _{4/3} n} = n^{-c}$, so $p$ contains such an edge w.h.p.

Fix a single arbitrary $p \in \operatorname{MHSP}(v, w, H_\lambda)$. Let $\mathcal {E}_p$ denote the event that $p$ fulfills $|p| \in \operatorname{O}(\log n)$ or contains an edge of level $\lambda + 1$ or higher; as argued earlier, $\mathcal {E}_p$ occurs w.h.p. Note that we cannot directly apply the union bound to deduce a similar statement for all $q \in \operatorname{MHSP}(v, w, H_\lambda)$: There are more than polynomially many $v$-$w$-paths. Instead, we argue that if $\mathcal {E}_p$ holds, it follows that all $q \in \operatorname{MHSP}(v, w, H)$ must behave as claimed.

To show that all $q \in \operatorname{MHSP}(v, w, H)$ fulfill Equations (80) or (81) under the assumption that $\mathcal {E}_p$ holds, first recall that $q$ only uses edges of level $\lambda$ or higher by Lemma 4.3. Furthermore, observe that $\operatorname{\omega }_\Lambda (q) \le \operatorname{\omega }_\Lambda (p)$, as $q$ is a shortest path with respect to $\operatorname{\omega }_\Lambda$. If $q$ contains an edge of level $\lambda + 1$ or higher, Equation (81) holds for $q$. Otherwise, we have $\operatorname{\omega }_\lambda (q) = \operatorname{\omega }_\Lambda (q)$, and distinguish two cases:

Case 1 ($|p| \in \operatorname{O}(\log n)$): We have

\begin{equation} \operatorname{\omega }_\Lambda (p) \le \operatorname{\omega }_\lambda (p) \le \operatorname{\omega }_\lambda (q) = \operatorname{\omega }_\Lambda (q), \end{equation} (82)

so $\operatorname{\omega }_\Lambda (q) = \operatorname{\omega }_\Lambda (p)$ and $|q| \le |p| \in \operatorname{O}(\log n)$ follows from $q \in \operatorname{MHSP}(v, w, H)$.
Case 2 ($p$ contains an edge of level $\lambda + 1$ or higher): This yields $\operatorname{\omega }_\Lambda (p) \lt \operatorname{\omega }_\lambda (p)$, implying

\begin{equation} \operatorname{\omega }_\Lambda (p) \lt \operatorname{\omega }_\lambda (p) \le \operatorname{\omega }_\lambda (q) = \operatorname{\omega }_\Lambda (q), \end{equation} (83)

which contradicts $q \in \operatorname{MHSP}(v, w, H)$.

So far, we condition on $\mathcal {E}_{V_\lambda }$. In order to remove this restriction, let $\mathcal {E}_{vw}$ denote the event that Equation (80) or Equation (81) holds for $v,w \in V$. The above case distinction shows that for an arbitrary . We conclude that

(84)

(85)

(86)

(87)

\begin{equation} = 1 - n^{-c}, \end{equation} (88)

which is the statement of the lemma.

We argue above that any min-hop shortest path in $H$ traverses every level at most twice, Lemma 4.4 states that each such traversal, w.h.p., only has a logarithmic number of hops, and Lemma 4.1 asserts that, w.h.p., there are only logarithmically many levels. Together, this means that min-hop shortest paths in $H$ have $\operatorname{O}(\log ^2 n)$ hops w.h.p. Additionally, our construction limits the stretch of shortest paths in $H$ as compared to $G$ by $(1 + \hat{\varepsilon })^{\Lambda + 1}$(i.e., by $(1 + \hat{\varepsilon })^{\operatorname{O}(\log n)}$) w.h.p.

Theorem 4.5. W.h.p., $\operatorname{SPD}(H) \in \operatorname{O}(\log ^2 n)$ and, for all $v,w \in V$,

\begin{equation} \operatorname{dist}(v, w, G) \le \operatorname{dist}(v, w, H) \le (1 + \hat{\varepsilon })^{\operatorname{O}(\log n)} \operatorname{dist}(v, w, G). \end{equation} (89)

Proof. Fix a level $\lambda$. Any fixed pair of vertices of level $\lambda$ or higher fulfills, w.h.p., Equation (80) or (81) by Lemma 4.4. Since there are at most $({{n}\atop{2}})$ such pairs, w.h.p., all of them fulfill Equation (80) or (81) by Lemma 1.2.

Let $\mathcal {E}_{\log }$ denote the event that there is no higher level than $\Lambda \in \operatorname{O}(\log n)$, which holds w.h.p. by Lemma 4.1. Furthermore, let $\mathcal {E}_\lambda$ denote the event that all pairs of vertices of level $\lambda$ or higher fulfill Equation (80) or (81), which holds w.h.p. as argued above. Then $\mathcal {E} := \mathcal {E}_{\log } \cap \mathcal {E}_0 \cap \dots \cap \mathcal {E}_\Lambda$ holds w.h.p. by Lemma 1.2.

Condition on $\mathcal {E}$; in particular, no min-hop shortest path whose edges all have the same level has more than $\operatorname{O}(\log n)$ hops. Consider some min-hop shortest path $p$ in $H$. By Lemma 4.3, $p$ has two parts: The edge level monotonically increases in the first and monotonically decreases in the second part. Hence, $p$ can be split up into at most $2 \Lambda - 1$ segments, in each of which all edges have the same level. As this holds for all min-hop shortest paths, we conclude that $\operatorname{SPD}(H) \in \operatorname{O}(\Lambda \log n) \subseteq \operatorname{O}(\log ^2 n)$ w.h.p., as claimed.

As for Inequality (4.14), recall that $H$ is constructed from $G = (V, E, \operatorname{\omega })$ and that $G$ contains a $(d, \hat{\varepsilon })$-hop set. For all $v,w \in V$, we have

\begin{equation} \operatorname{dist}(v,w,H) \le \operatorname{\omega }_\Lambda (v, w) \le (1 + \hat{\varepsilon })^\Lambda \operatorname{dist}^d(v, w, G) \le (1 + \hat{\varepsilon })^{\Lambda + 1} \operatorname{dist}(v, w, G) \end{equation} (90)

by construction of $H$. Recalling that $\Lambda \in \operatorname{O}(\log n)$ due to $\mathcal {E}$ completes the proof.

We use Cohen's construction to obtain a $(d, \hat{\varepsilon })$-hop set with $\hat{\varepsilon }\in 1 / \operatorname{polylog}n$, where the exponent of $\operatorname{polylog}n$ is under our control [17]. A sufficiently large exponent yields $(1 + \hat{\varepsilon })^{\operatorname{O}(\log n)} \subseteq e^{\hat{\varepsilon }\operatorname{O}(\log n)} \subseteq e^{1 / \operatorname{polylog}n} = 1 + 1 / \operatorname{polylog}n$, upper-bounding Inequality (4.14) by

\begin{equation} \operatorname{dist}(v, w, G) \le \operatorname{dist}(v, w, H) \in (1 + 1 / \operatorname{polylog}n) \operatorname{dist}(v, w, G) \subseteq (1 + \operatorname{o}(1)) \operatorname{dist}(v, w, G). \end{equation} (91)

To wrap things up: Given a weighted graph $G$, we augment $G$ with a $(d, 1 / \operatorname{polylog}n)$-hop set. After that, the $d$-hop distances in $G$ approximate the actual distances in $G$, but these approximations may violate the triangle inequality. We fix this by embedding into $H$, using geometrically sampled node levels and an exponential penalty on the edge weights with decreasing levels. Since $H$ is a complete graph, explicitly constructing it is prohibitively costly in terms of work. The next section shows how to avoid this issue by efficiently simulating MBF-like algorithms on $H$.

5 AN ORACLE FOR MBF-LIKE QUERIES

Given a weighted graph $G$ and $\hat{\varepsilon }\in 1 / \operatorname{polylog}n$, Section 4 introduces a complete graph $H$ that $(1 + \operatorname{o}(1))$-approximates the distances of $G$ and w.h.p. has a polylogarithmic SPD using a $(d, \hat{\varepsilon })$-hop set. $H$ would solve our problem, but we cannot explicitly write $H$ into memory as this requires an unacceptable $\operatorname{\Omega }(n^2)$ work.

Instead, we dedicate this section to an oracle that answers MBF-like queries; that is, to an oracle that, given a weighted graph $G$, an MBF-like algorithm $\mathcal {A},$ and a number of iterations $h$, returns $\mathcal {A}^h(H)$. Note that while the oracle can answer distance queries in polylogarithmic depth (when, e.g., queried by SSSP, $k$-SSP, or APSP), MBF-like queries are more general (compare Section 3) and allow for more work-efficient algorithms (as in Sections 6 and 7). The properties of MBF-like algorithms discussed in Section 2 allow the oracle to internally work on $G$ and simulate iterations of $\mathcal {A}$ on $H$ using $d$ (i.e., polylogarithmically many) iterations on $G$.

Throughout this section, we denote by $A_G$ and $A_H$ the adjacency matrices of $G$ and $H$, respectively. Furthermore, we fix the semiring to be $\mathcal {S}_{\min ,+}$, since we explicitly calculate distances; generalizations to other semirings are possible but require appropriate generalizations of adjacency matrices and hence obstruct presentation.

We establish this section's results in two steps: Section 5.1 derives a representation of $A_H$ in terms of $A_G$, which is then used to efficiently implement the oracle in Section 5.2. The oracle is used to approximate the metric of $G$ in Section 6 and to construct an FRT tree in Section 7, both with polylogarithmic depth.

5.1 Decomposing $H$

The idea is to simulate each iteration of an MBF-like algorithm $\mathcal {A}$ on $H$ using $d$ iterations on $G$. This is done for each level $\lambda \in \lbrace 0, \dots , \Lambda \rbrace$ in parallel. For level $\lambda$, we run $\mathcal {A}$ for $d$ iterations on $G$ with edge weights scaled up by $(1 + \hat{\varepsilon })^{\Lambda - \lambda }$, where the initial vector is obtained by discarding all information at nodes of level smaller than $\lambda$. Afterward, we again discard everything stored at vertices with a level smaller than $\lambda$. Since $(A_G^d)_{vw} = \operatorname{dist}^d(v,w,G)$, this ensures that we propagate information between nodes $v,w \in V$ with $\operatorname{\lambda }(v,w) = \lambda$ with the corresponding edge weight while discarding any exchange between nodes with $\operatorname{\lambda }(v,w) \lt \lambda$ (which is handled by the respective parallel run). While we also propagate information between $v$ and $w$ if $\operatorname{\lambda }(v,w) \gt \lambda$ (over too long a distance because edge weights are scaled by $(1 + \hat{\varepsilon })^{\Lambda - \lambda } \gt (1 + \hat{\varepsilon })^{\Lambda - \operatorname{\lambda }(v,w)}$) the parallel run for $\operatorname{\lambda }(v,w)$ correctly propagates values. Therefore, aggregating the results of all levels (i.e., applying $\oplus$, the source-wise minimum) and applying $r^V$ completes the simulation of an iteration of $\mathcal {A}$ on $H$.

This approach resolves two complexity issues. First, we multiply (polylogarithmically often) with $A_G$, which, as opposed to the dense $A_H,$ has $\operatorname{O}(m)$ non-$\infty$ entries only. Second, Corollary 2.17 shows that we are free to filter using $r^V$ at any time, keeping the entries of intermediate state vectors small.

We formalize the preceding intuition. Recall that

\begin{equation} (A_H)_{vw} = \operatorname{\omega }_\Lambda (v,w) = (1 + \hat{\varepsilon })^{\Lambda - \operatorname{\lambda }(v,w)} \operatorname{dist}^d (v, w, G) = (1 + \hat{\varepsilon })^{\Lambda - \operatorname{\lambda }(v,w)}(A_G^d)_{vw}. \end{equation} (92)

For $\lambda \in \lbrace 0, \dots , \Lambda \rbrace$, denote by $P_\lambda$ the $\mathcal {M}^V$-projection to coordinates $V_\lambda := \lbrace v \in V \mid \operatorname{\lambda }(v) \ge \lambda \rbrace$:

\begin{equation} (P_\lambda x)_v := { \left\lbrace \begin{array}{@{}l@{\quad }l@{}}x_v & \text{if $\operatorname{\lambda }(v) \ge \lambda $ and} \\ \bot & \text{otherwise.} \end{array}\right. } \end{equation} (93)

Observe that $P_\lambda$ is an SLF on $\mathcal {M}^V$, where $(P_\lambda)_{vw} = 0$ if $v = w \in V_\lambda$ and $(P_\lambda)_{vw} = \infty$ otherwise. This gives us the tools to decompose $A_H$ as motivated earlier.

Lemma 5.1. With $(A_\lambda)_{vw} := (1 + \hat{\varepsilon })^{\Lambda - \lambda } (A_G)_{vw}$ (with respect to multiplication in , not $\odot$), we have

\begin{equation} A_H = \bigoplus _{\lambda = 0}^\Lambda P_\lambda A_\lambda ^d P_\lambda . \end{equation} (94)

Proof. Since $(A^d_G)_{vw} = \operatorname{dist}^d(v, w, G)$, it holds that $(A_\lambda ^d)_{vw} = (1 + \hat{\varepsilon })^{\Lambda - \lambda } \operatorname{dist}^d(v, w, G)$. Therefore,

\begin{equation} (A_\lambda ^d P_\lambda)_{vw} = \min _{u \in V} \left\lbrace (A_\lambda ^d)_{vu} + (P_\lambda)_{uw}\right\rbrace = { \left\lbrace \begin{array}{@{}l@{\quad }l@{}}(1 + \hat{\varepsilon })^{\Lambda - \lambda } \operatorname{dist}^d(v, w, G) & \text{if $w \in V_\lambda $ and} \\ \infty & \text{otherwise,} \end{array}\right. } \end{equation} (95)

and hence

\begin{equation} (P_\lambda A_\lambda ^d P_\lambda)_{vw} = \min _{u \in V} \left\lbrace (P_\lambda)_{vu} + (A_\lambda ^d P_\lambda)_{uw} \right\rbrace = { \left\lbrace \begin{array}{@{}l@{\quad }l@{}}(1 + \hat{\varepsilon })^{\Lambda - \lambda } \operatorname{dist}^d(v, w, G) & \text{if $v, w \in V_\lambda $ and} \\ \infty & \text{otherwise.} \end{array}\right. } \end{equation} (96)

We conclude that

\begin{equation} \ \left(\bigoplus _{\lambda = 0}^\Lambda P_\lambda A_\lambda ^d P_\lambda \right)_{vw} = \min \left\lbrace (1 + \hat{\varepsilon })^{\Lambda - \lambda } \operatorname{dist}^d(v, w, G)\,\big |\,\lambda \in \lbrace 0,\ldots ,\lambda (v,w)\rbrace \right\rbrace \\ \end{equation} (97)

\begin{equation} = (1 + \hat{\varepsilon })^{\Lambda - \operatorname{\lambda }(v,w)} \operatorname{dist}^d(v, w, G) \\ \end{equation} (98)

Having decomposed $A_H$, we continue with $\mathcal {A}^h(H)$, taking the freedom to apply filters intermediately. For all , we have

\begin{equation} A_H^h \stackrel{(5.3)}{=} \left(\bigoplus _{\lambda = 0}^\Lambda P_\lambda A_\lambda ^d P_\lambda \right)^h \stackrel{(2.35)}{\sim } \left(r^V \left(\bigoplus _{\lambda = 0}^\Lambda P_\lambda (r^V A_\lambda)^d P_\lambda \right)\right)^h r^V, \end{equation} (99)

and hence

\begin{equation} \mathcal {A}^h(H) = r^V A_H^h x^{(0)} \stackrel{(2.11), (5.8)}{=} \left(r^V \left(\bigoplus _{\lambda = 0}^\Lambda P_\lambda (r^V A_\lambda)^d P_\lambda \right)\right)^h r^V x^{(0)}. \end{equation} (100)

Observe that we can choose $h = \operatorname{SPD}(H) \in \operatorname{O}(\log ^2 n)$ w.h.p. by Theorem 4.5 and recall that $d \in \operatorname{polylog}n$. Overall, this allows us to determine $\mathcal {A}(H)$ with polylogarithmic depth and $\operatorname{\tilde{O}}(m)$ work provided we can implement the individual steps (see below) at this complexity.

5.2 Implementing the Oracle

The oracle determines iterations of $\mathcal {A}$ on $H$ using iterations on $G$ while only introducing a polylogarithmic overhead with respect to iterations in $G$. With the decomposition from Lemma 5.1 at hand, it can be implemented as follows.

Given a state vector $x^{(i)} \in \mathcal {M}^V$, simulate one iteration of $\mathcal {A}$ on $H$ for edges of level $\lambda$; that is, determine $y_\lambda := P_\lambda (r^V A_\lambda)^d P_\lambda x^{(i)}$ by

discarding entries at nodes of a level smaller than $\lambda$,
running $d$ iterations of $\mathcal {A}$ with distances stretched by $(1 + \hat{\varepsilon })^{\Lambda - \lambda }$ on $G$, applying the filter after each iteration, and
again discarding entries at nodes with levels smaller than $\lambda$.

After running this procedure in parallel for all $0 \le \lambda \le \Lambda$, perform the $\oplus$-operation and apply the filter; that is, for each node $v \in V$ determine $x^{(i+1)}_v = r(\bigoplus _{\lambda = 0}^\Lambda y_\lambda)_v$.

Theorem 5.2 (Oracle). Consider an MBF-like algorithm $\mathcal {A}$ using a semimodule $\mathcal {M}$ over the semiring $\mathcal {S}_{\min ,+}$, the representative projection $r:\mathcal {M} \rightarrow \mathcal {M}$, and the initialization $x^{(0)} \in \mathcal {M}^V$. If we can, for all $1 \le f \le d$, $1 \le i \le h$, and $0 \le \lambda \le \Lambda$, where we may assume $\Lambda \in \operatorname{O}(\log n)$,

compute $r^V x^{(0)}$ from $x^{(0)}$ with depth $D$ and work $W$,
determine $r^V A_\lambda y$ from any intermediate state vector $y = (r^V A_\lambda)^{f-1} P_\lambda x^{(i-1)}$—corresponding to the $f$th iteration with respect to $A_\lambda$ starting at state $x^{(i-1)}$—with depth $D$ and work $W$, and
compute $r^V (\bigoplus _{\lambda = 0}^\Lambda y_\lambda)$ from the individual $y_\lambda = (P_\lambda r^V A_\lambda ^d P_\lambda) x^{(i-1)}$, $0 \le \lambda \le \Lambda$ (reflecting aggregation over all levels to complete a simulated iteration in $H)$using depth $D_\oplus$ and work $W_\oplus$,

then we can w.h.p.

determine $\mathcal {A}^h(H)$ using $\operatorname{O}((dW \log n + W_\oplus)h) \subseteq \operatorname{\tilde{O}}((dW + W_\oplus)h)$ work and a depth of $\operatorname{O}((dD + D_\oplus)h)$ and thus
calculate $\mathcal {A}(H)$ using $\operatorname{O}((dW \log n + W_\oplus) \log ^2 n) \subseteq \operatorname{\tilde{O}}(dW + W_\oplus)$ work and a depth of $\operatorname{O}((dD + D_\oplus) \log ^2 n) \subseteq \operatorname{\tilde{O}}(dD + D_\oplus)$.

Proof. Condition on $\Lambda \in \operatorname{O}(\log n)$ and $\operatorname{SPD}(H) \in \operatorname{O}(\log ^2 n)$; both events occur w.h.p. by Lemma 4.1 and Theorem 4.5. By Equation (100), we have to compute

\begin{equation} \mathcal {A}^h(H) = \left(r^V \left(\bigoplus _{\lambda = 0}^\Lambda P_\lambda (r^V A_\lambda)^d P_\lambda \right) \right)^h r^V x^{(0)}. \end{equation} (101)

Concerning $P_\lambda$, note that we can evaluate $(P_\lambda y)_{v \in V}$ lazily; that is, we can determine whether $(P_\lambda y)_v$ evaluates to $\bot$ or to $y_v$ only if it is accessed. Thus, the total work and depth required increase by at most a constant factor due to all applications of $P_\lambda$. Together with the prerequisites, this means that $(r^V A_\lambda P_\lambda)y$ can be determined in $\operatorname{O}(W)$ work and $\operatorname{O}(D)$ depth and that evaluating $P_\lambda (r^V A_\lambda)^d P_\lambda y$ sequentially in $d$ requires $\operatorname{O}(dW)$ work and $\operatorname{O}(dD)$ depth.

The set of summands of $\bigoplus _{\lambda = 0}^\Lambda P_\lambda (r^V A_\lambda)^d P_\lambda y$ can be determined using $\operatorname{O}(\Lambda dW)$ work and $\operatorname{O}(dD)$ depth since this is independent for each $\lambda$. Performing the aggregation and applying the filter is possible in $D_\oplus$ depth and $W_\oplus$ work by assumption. We arrive at $\operatorname{O}(\Lambda dW + W_\oplus)$ work and $\operatorname{O}(dD + D_\oplus)$ depth for determining $x^{(i)} = r^V \bigoplus _{\lambda = 0}^\Lambda P_\lambda (r^V A_\lambda)^d P_\lambda x^{(i-1)}$ from $x^{(i-1)}$.

Sequentially iterating this $h$ times to determine $\mathcal {A}^h(H)$ increases work and depth by a factor of $h$, yielding $\operatorname{O}((\Lambda dW + W_\oplus)h)$ work and $\operatorname{O}((dD + D_\oplus)h)$ depth. Computing $r^V x^{(0)}$ requires work $W$ and depth $D$ by the prerequisites and does not change the asymptotic complexity accumulated so far. We arrive at $\operatorname{O}((dW \log n + W_\oplus)h)$ work and $\operatorname{O}((dD + D_\oplus)h)$ depth, which is the first claim; $\operatorname{SPD}(H) \in \operatorname{O}(\log ^2 n)$ yields the second claim. As we only condition on two events that occur w.h.p., this concludes the proof by Lemma 1.2.

6 APPROXIMATE METRIC CONSTRUCTION

As a consequence of the machinery in Section 5, we can efficiently determine approximate metrics. In fact, our metric approximations are fast enough to improve the state of the art regarding the FRT embedding from $\operatorname{\tilde{O}}(n^3)$ to $\operatorname{\tilde{O}}(n^{2+\varepsilon })$ work when combined with a result of Blelloch et al. [14]; see below.

An approximate metric is stronger than approximate distances (i.e., stronger than hop sets) as it requires consistency with the triangle inequality:

Definition 6.1 (Approxmiate Metric). Consider and a metric on $V$. We refer to $d^{\prime }$ as $\alpha$-approximate metric of $d$ if

$d^{\prime }$ is a metric on $V$ and
for all $v, w \in V$,

\begin{equation} d(v, w) \le d^{\prime }(v, w) \le \alpha d(v, w). \end{equation} (102)

We can determine a $(1 + \operatorname{o}(1))$-approximate metric of $\operatorname{dist}(\cdot ,\cdot ,G)$ for an arbitrary graph $G$ by querying the oracle with APSP on $H$ using polylogarithmic depth and $\operatorname{\tilde{O}}(nm^{1+\varepsilon })$ work. This is much more work-efficient on sparse graphs than the naive approach using $\operatorname{O}(n^3 \log n)$ work (squaring the adjacency matrix $\lceil \log _2 n \rceil$ times) for obtaining $\operatorname{dist}(\cdot ,\cdot ,G)$ exactly.

Theorem 6.2 ($(1 + \operatorname{o}(1))$-Approximate Metric). Given a weighted graph $G = (V, E, \operatorname{\omega })$ and a constant $\varepsilon \gt 0$, we can w.h.p. compute, using $\operatorname{\tilde{O}}(n(m+n^{1+\varepsilon }))$ work and $\operatorname{polylog}n$ depth, a $(1 + 1 / \operatorname{polylog}n)$-approximate metric of $\operatorname{dist}(\cdot , \cdot , G)$ on $V$.

Proof. First augment $G$ with a $(d, 1 / \operatorname{polylog}n)$-hop set using $\operatorname{\tilde{O}}(m^{1 + \varepsilon })$ work and $\operatorname{polylog}n$ depth with $d \in \operatorname{polylog}n$ using Cohen's hop set construction [17]. The resulting graph has $\operatorname{\tilde{O}}(m+n^{1+\varepsilon })$ edges. An iteration of APSP (compare Example 3.5) incurs $\operatorname{O}(\log n)$ depth and $\operatorname{O}(\delta _v n \log n)$ work at a node $v$ of degree $\delta _v$ by Lemma 2.3. Hence, $D \in \operatorname{O}(\log n)$ depth and $W \in \operatorname{O}(\sum _{v \in V} \delta _v n \log n) \subseteq \operatorname{\tilde{O}}(n(m+n^{1+\varepsilon }))$ work suffice for an entire iteration. Aggregation and filtering over the individual $y_\lambda$, $0 \le \lambda \le \Lambda \in \operatorname{O}(\log n)$, takes $D_\oplus \in \operatorname{O}(\log n)$ depth and $W_\oplus \in \operatorname{O}(n^2 \log ^2 n)$ work. The trivial filter $r^V = \operatorname{id}$ does not induce any overhead. By Theorem 5.2, we can w.h.p. simulate $\operatorname{SPD}(H)$ iterations of APSP on $H$ using $\operatorname{\tilde{O}}(n(m+n^{1+\varepsilon }))$ work and $\operatorname{\tilde{O}}(1)$ depth. By Theorem 4.5 and Equation (91), this yields a metric which $(1 + 1 / \operatorname{polylog}n)$-approximates $\operatorname{dist}(\cdot , \cdot , G)$.

Using the sparse spanner algorithm of Baswana and Sen [9], we can obtain a metric with a different work–approximation tradeoff. Note that this is near-optimal in terms of work due to the trivial lower bound of $\operatorname{\Omega }(n^2)$ for writing down the solution.

Theorem 6.3 ($\operatorname{O}(1)$-Approximate Metric). For a weighted graph $G = (V, E, \operatorname{\omega })$ and a constant $\varepsilon \gt 0$, we can w.h.p. compute an $\operatorname{O}(1)$-approximate metric of $\operatorname{dist}(\cdot ,\cdot ,G)$ on $V$ using $\operatorname{\tilde{O}}(n^{2+\varepsilon })$ work and $\operatorname{polylog}n$ depth.

Proof. Baswana and Sen show how to compute a $(2k - 1)$-spanner of $G = (V, E, \operatorname{\omega })$; that is, $E^{\prime } \subseteq E$ such that $G^{\prime } := (V, E^{\prime }, \operatorname{\omega })$ fulfills, for all $v,w \in V$,

\begin{equation} \operatorname{dist}(v, w, G) \le \operatorname{dist}(v, w, G^{\prime }) \le (2k - 1) \operatorname{dist}(v, w, G), \end{equation} (103)

using $\operatorname{\tilde{O}}(1)$ depth and $\operatorname{\tilde{O}}(m)$ work. Baswana and Sen argue that $|E^{\prime }| \in \operatorname{O}(kn^{1 + 1/k})$ in expectation [ 9], and we obtain $|E^{\prime }| \in \operatorname{O}(kn^{1 + 1/k} \log n) \subseteq \operatorname{\tilde{O}}(kn^{1 + 1/k})$ w.h.p., as, for example, argued in Appendix A of Becker et al. [ 10]. Furthermore, without loss of generality $k \in \operatorname{O}(\log n)$, because $kn^{1/k} = k2^{\log n / k}$ starts growing beyond that point. This results in $|E^{\prime }| \in \operatorname{\tilde{O}}(n^{1 + 1/k})$ w.h.p.

We compute an $\operatorname{O}(1)$-approximate metric as follows. (1) Compute a $(2k - 1)$-spanner for $k = \lceil 1 / \varepsilon \rceil$. This is possible within the given bounds on work and depth, yielding $|E^{\prime }| \in \operatorname{\tilde{O}}(n^{1+1/k}) = \operatorname{\tilde{O}}(n^{1+\varepsilon })$ edges w.h.p. and a stretch that is constant with respect to $n$ and $m$. (2) Apply Theorem 6.2 to $G^{\prime } := (V, E^{\prime }, \operatorname{\omega })$ and $\varepsilon$. This induces $\operatorname{\tilde{O}}(1)$ depth and $\operatorname{\tilde{O}}(n^{2 + \varepsilon })$ work.

By construction, the resulting metric has stretch $(2k - 1)(1 + \operatorname{o}(1)) \subseteq \operatorname{O}(1)$.

Blelloch et al. [14] show how to construct an FRT tree from a metric using $\operatorname{O}(n^2)$ work and $\operatorname{O}(\log ^2 n)$ depth. Combining this with Theorem 6.3 enables us to w.h.p. construct an FRT tree from a graph $G$ using polylogarithmic depth and $\operatorname{\tilde{O}}(n^{2+\varepsilon })$ work. This already improves upon the state of the art of using $\operatorname{\tilde{O}}(n^3)$ work to compute $\operatorname{dist}(\cdot ,\cdot ,G)$ exactly and then applying the algorithm of Blelloch et al. [14]. We can, however, achieve this even more efficiently on sparse graphs: Constructing FRT trees is an MBF-like algorithm and solving the problem directly—using the oracle—reduces the work to $\operatorname{\tilde{O}}(m^{1+\varepsilon })$; this is the goal of Section 7.

7 FRT CONSTRUCTION

Given a weighted graph $G$, determining a metric that $\operatorname{O}(1)$-approximates $\operatorname{dist}(\cdot ,\cdot ,G)$ (using polylogarithmic depth and $\operatorname{\tilde{O}}(n^{2+\varepsilon })$ work) is straightforward; see Theorem 6.3. The oracle is queried with the MBF-like APSP algorithm, implicitly enjoying the benefits of the SPD-reducing sampling technique of Section 4. In this section, we show that collecting the information required to construct FRT trees—LE lists—is an MBF-like algorithm; that is, a query that can be directly answered by the oracle. Since collecting LE lists is more work-efficient than APSP, this leads to our main result: w.h.p. sampling from the FRT distribution using polylogarithmic depth and $\operatorname{\tilde{O}}(m^{1+\varepsilon })$ work.

We begin with a formal definition of metric (tree) embeddings in general and the FRT embedding in particular in Section 7.1, proceed to show that the underlying algorithm is MBF-like (Section 7.2) and that all intermediate steps are sufficiently efficient in terms of depth and work (Section 7.3), and present our main results in Section 7.4. Section 7.5 describes how to retrieve the original paths in $G$ that correspond to the edges of the sampled FRT tree.

7.1 Metric Tree Embeddings

We use this section to introduce the (distribution over) metric tree embeddings of Fakcharoenphol, Rao, and Talwar, referred to as FRT embedding, which has expected stretch $\operatorname{O}(\log n)$ [23].

Definition 7.1 (Metric Embedding). Let $G = (V, E, \operatorname{\omega })$ be a graph. A metric embedding of stretch $\alpha$ of $G$ is a graph $G^{\prime } = (V^{\prime }, E^{\prime }, \operatorname{\omega }^{\prime })$, such that $V \subseteq V^{\prime }$ and

\begin{equation} \forall v,w \in V:\quad \operatorname{dist}(v, w, G) \le \operatorname{dist}(v, w, G^{\prime }) \le \alpha \operatorname{dist}(v, w, G), \end{equation} (104)

for some

. If $G^{\prime }$ is a tree, we refer to it as metric tree embedding. For a random distribution of metric embeddings $G^{\prime }$, we require $\operatorname{dist}(v,w,G) \le \operatorname{dist}(v,w,G^{\prime })$ and define the expected stretch as

(105)

We show how to efficiently sample from the FRT distribution for the graph $H$ introduced in Section 4. As $H$ is an embedding of $G$ with a stretch in $1 + \operatorname{o}(1)$, this results in a tree embedding of $G$ of stretch $\operatorname{O}(\log n)$. Khan et al. [30] show that a suitable representation of (a tree sampled from the distribution of) the FRT embedding [23] can be constructed as follows.

Choose $\beta \in [1,2)$ uniformly at random.
Choose uniformly at random a total order of the nodes (i.e., a uniformly random permutation). In the following, $v \lt w$ means that $v$ is smaller than $w$ with respect to to this order.
Determine for each node $v \in V$ its LE list: This is the list obtained by deleting from $\lbrace (\operatorname{dist}(v,w,H), w) \mid w \in V \rbrace$ all pairs $(\operatorname{dist}(v,w,H), w)$ for which there is some $u \in V$ with $\operatorname{dist}(v,u,H) \le \operatorname{dist}(v,w,H)$ and $u \lt w$. Essentially, $v$ learns, for every distance $d$, the smallest node within distance at most $d$, i.e., $\min \lbrace w \in V \mid \operatorname{dist}(v,w,G) \le d \rbrace$.
Denote by $\operatorname{\omega }_{\min } := \min _{e \in E}\lbrace \operatorname{\omega }(e)\rbrace$ and $\operatorname{\omega }_{\max } := \max _{e \in E}\lbrace \operatorname{\omega }(e)\rbrace$ the minimum and maximum edge weight, respectively; recall that $\operatorname{\omega }_{\max }/\operatorname{\omega }_{\min } \in \operatorname{poly}n$ by assumption. From the LE lists, determine for each $v \in V$ and distance $\beta 2^i \in [\operatorname{\omega }_{\min }/2, 2\operatorname{\omega }_{\max }]$, , the node $v_i := \min \lbrace w \in V \mid \operatorname{dist}(v,w,H) \le \beta 2^i\rbrace$. Without loss of generality, we assume that $i \in \lbrace 0, \dots , k\rbrace$ for $k \in \operatorname{O}(\log n)$ (otherwise, we shift the indices of the nodes $v_i$ accordingly). Hence, for each $v \in V$, we obtain a sequence of nodes $(v_0, v_1,\dots , v_k)$. $(v_0, v_1, \dots , v_k)$ is the leaf corresponding to $v = v_0$ of the tree embedding, $(v_1, \dots , v_k)$ is its parent, and so on; the root is $(v_k)$. The edge from $(v_i, \dots , v_k)$ to $(v_{i+1}, \dots , v_k)$ has weight $\beta 2^i$.

We refer to Ghaffari and Lenzen [26] for a more detailed summary.

The preceding procedure implicitly specifies a random distribution over tree embeddings with expected stretch $\operatorname{O}(\log n)$ [23], which we call the FRT distribution. We refer to following the procedure 1–4 as sampling from the FRT distribution. Once the randomness is fixed (i.e., steps 1–2 are completed), the tree resulting from steps 3–4 is unique; we refer to them as constructing an FRT tree.

The next lemma shows that step 4 (i.e., constructing the FRT tree from the LE lists) is easy.

Lemma 7.2. Given LE lists of length $\operatorname{O}(\log n)$ for all vertices, the corresponding FRT tree can be determined using $\operatorname{O}(n \log ^3 n)$ work and $\operatorname{O}(\log ^2 n)$ depth.

Proof. Determining $\operatorname{\omega }_{\max }$, $\operatorname{\omega }_{\min }$, and the range of indices $i$ is straightforward at this complexity, as is sorting of each node's list in ascending order with respect to distance. Note that in each resulting list of distance–node pairs, the nodes are strictly decreasing in terms of the random order on the nodes, and each list ends with an entry for the minimal node. For each node $v$ and entry $(d,u)$ in its list in parallel, we determine the values of $i\in \lbrace 0,\ldots ,k\rbrace$ such that $u$ is the smallest node within distance $\beta 2^i$ of $v$. This is done by reading the distance value $d^{\prime }$ of the next entry of the list (using $d^{\prime }=\beta 2^k+1$ if $(d,u)$ is the last entry) and writing to memory $v_i=u$ for each $i$ satisfying that $d\le \beta 2^i \lt d^{\prime }$. Since $\operatorname{\omega }_{\max }/\operatorname{\omega }_{\min }\in \operatorname{poly}n$, this has depth $\operatorname{O}(\log n)$ and a total work of $\operatorname{O}(n\log ^2 n)$.

Observe that we computed the list $(v_0,\ldots ,v_k)$ for each $v\in V$. Recall that the ancestors of the leaf $(v_0, \dots , v_k)$ are determined by its $k$ suffixes. It remains to remove duplicates wherever nodes share a parent. To this end, we sort the list (possibly with duplicates) of $(k+1)n\in \operatorname{O}(n\log n)$ suffixes (each with $\operatorname{O}(\log n)$ entries) lexicographically, requiring $\operatorname{O}(n\log ^3 n)$ work and depth $\operatorname{O}(\log ^2 n)$, as comparing two suffixes requires depth and work $\operatorname{O}(\log n)$. Then duplicates can be removed by comparing each key to its successor in the sorted sequence, taking another $\operatorname{O}(n\log ^2 n)$ work and $\operatorname{O}(\log n)$ depth.

Note that tree edges and their weights are encoded implicitly as the parent of each node is given by removing the first node from the list, and the level of a node (and thus the edge to its parent) is given by the length of the list representing it. If required, it is thus trivial to determine, for example, an adjacency list with $\operatorname{O}(n\log ^2 n)$ work and depth $\operatorname{O}(\log ^2 n)$. Overall, we spent $\operatorname{O}(n\log ^3 n)$ work at $\operatorname{O}(\log ^2 n)$ depth.

7.2 Computing LE Lists is MBF-like

Picking $\beta$ is trivial and choosing a random order of the nodes can be done w.h.p. by assigning to each node a string of $\operatorname{O}(\log n)$ uniformly and independently chosen random bits. Hence, in the following, we assume this step to be completed, without loss of generality, resulting in a random assignment of the vertex IDs $\lbrace 1, \dots , n\rbrace$. It remains to establish how to efficiently compute LE lists.

We establish that LE lists can be computed by an MBF-like algorithm (compare Definition 2.11) using the parameters in Definition 7.3; the claim that Equations (106) and (107) define a representative projection and a congruence relation is shown in Lemma 7.5.

Definition 7.3. For constructing LE lists, use the semiring $\mathcal {S} = \mathcal {S}_{\min ,+}$ and the distance map $\mathcal {M} = \mathcal {D}$ from Definition 2.1 as zero-preserving semimodule. For all $x \in \mathcal {D}$, define

\begin{gather} r(x)_v := { \left\lbrace \begin{array}{@{}l@{\quad }l@{}}\infty & \text{$\exists w \lt v:\ x_w \le x_v$ and} \\ x_v & \text{otherwise, and} \end{array}\right. } \end{gather} (106)

\begin{gather} x \sim y \quad :\Leftrightarrow \quad r(x) = r(y) \end{gather} (107)

as representative projection and congruence relation, respectively. As initialization $x^{(0)} \in \mathcal {D}^V$ use

\begin{equation} x^{(0)}_{vw} := { \left\lbrace \begin{array}{@{}l@{\quad }l@{}}0 & \text{if $v=w$ and} \\ \infty & \text{otherwise.} \end{array}\right. } \end{equation} (108)

Hence, $r(x)$ is the LE list of $v \in V$ if $x_w = \operatorname{dist}(v,w,H)$ for all $w \in V,$ and we consider two lists equivalent if and only if they result in the same LE list. This allows us to prepare the proof that retrieving LE lists can be done by an MBF-like algorithm in the following lemma. It states that filtering keeps the relevant information: If a node–distance pair is dominated by an entry in a distance map, the filtered distance map also contains a—possibly different—dominating entry.

Lemma 7.4. Consider arbitrary $x,y \in \mathcal {D}$, $v \in V$, and . Then

\begin{equation} \exists w \lt v:x_w \le s \quad \Leftrightarrow \quad \exists w \lt v:r(x)_w \le s \end{equation} (109)

Proof. Observe that the necessity “$\Leftarrow$” is trivial. As for sufficiency “$\Rightarrow$,” suppose that there is $w \lt v$ such that $x_w \le s$. If $r(x)_w = x_w$, we are done. Otherwise, there must be some $u \lt w \lt v$ satisfying $x_u \le x_w \le s$. Since $|V|$ is finite, an inductive repetition of the argument yields that there is some $w^{\prime } \lt v$ with $r(x)_{w^{\prime }} = x_{w^{\prime }} \le s$.

Equipped with this lemma, we can prove that $\sim$ is a congruence relation on $\mathcal {D}$ with representative projection $r$. We say that a node–distance pair $(v, d)$ dominates $(v^{\prime }, d^{\prime })$ if and only if $v \lt v^{\prime }$ and $d \le d^{\prime }$; in the context of $x \in \mathcal {D}$, we say that $x_w$ dominates $x_v$ if and only if $(w, x_w)$ dominates $(v, x_v)$.

Lemma 7.5. The equivalence relation $\sim$ from Equation (107) of Definition 7.3 is a congruence relation. The function $r$ from Equation (106) Definition 7.3 is a representative projection with respect to $\sim$.

Proof. Trivially, $r$ is a projection (i.e., $r^2(x) = r(x)$ for all $x \in \mathcal {D}$). By Lemma 2.8, it hence suffices to show that Equations (18) and (19) hold. In order to do that, let $s \in \mathcal {S}_{\min ,+}$ be arbitrary, and $x,x^{\prime },y,y^{\prime } \in \mathcal {D}$ such that $r(x) = r(x^{\prime })$ and $r(y) = r(y^{\prime })$. As we have $x_v \le x_w \Leftrightarrow s + x_v \le s + x_w$ for all $v,w \in V$, Equation (18) immediately follows from Equation (109).

Regarding Equation (19), we show that

\begin{equation} r(x \oplus y) = r(r(x) \oplus r(y)) \end{equation} (110)

which implies Equation ( 19) due to $r(x \oplus y) = r(r(x) \oplus r(y)) = r(r(x^{\prime }) \oplus r(y^{\prime })) = r(x^{\prime } \oplus y^{\prime })$. Let $v \in V$ be an arbitrary vertex and observe that $(x \oplus y)_v$ is dominated if and only if

\begin{equation} \exists w \lt v:\quad (x \oplus y)_w \le (x \oplus y)_v\\ \end{equation} (111)

\begin{equation} \Leftrightarrow \quad \exists w \lt v:\quad \min \lbrace x_w, y_w\rbrace \le (x \oplus y)_v \\ \end{equation} (112)

\begin{equation} \Leftrightarrow \quad \exists w \lt v:\quad x_w \le (x \oplus y)_v \vee y_w \le (x \oplus y)_v \\ \end{equation} (113)

\begin{equation} \stackrel{(7.6)}{\Leftrightarrow }\quad \exists w \lt v:\quad r(x)_w \le (x \oplus y)_v \vee r(y)_w \le (x \oplus y)_v. \end{equation} (114)

In order to show Equation ( 110), we distinguish two cases.

Case 1 ($(x \oplus y)_v$ is dominated): By Definition 7.3, we have $r(x \oplus y)_v = \infty$. Additionally, we know that $(r(x) \oplus r(y))_v=\min \lbrace r(x)_v,r(y)_v\rbrace \ge \min \lbrace x_v,y_v\rbrace =(x\oplus y)_v$ must be dominated due to Equation (114), and hence $r(r(x) \oplus r(y))_v = \infty = r(x \oplus y)_v$.
Case 2 ($(x \oplus y)_v$ is not dominated): This means that, by Definition 7.3, $r(x \oplus y)_v = (x \oplus y)_v = \min \lbrace x_v, y_v \rbrace$. Furthermore, the negation of Equation (114) holds; that is, $\forall w\lt v:\min \lbrace r(x)_w, r(y)_w \rbrace \gt (x \oplus y)_v = \min \lbrace x_v, y_v\rbrace$. Assuming without loss of generality that $x_v\le y_v$ (the other case is symmetric), we have that $x_v = (x \oplus y)_v = r(x \oplus y)_v$ and that $x_v = r(x)_v = (r(x) \oplus r(y))_v$, where $x_v = r(x)_v$ is implied by Equation (109) because $r(x)_w \ge \min \lbrace r(x)_w, r(y)_w\rbrace \gt \min \lbrace x_v, y_v\rbrace = x_v$ for any $w \lt v$. Thus, $\forall w\lt v:(r(x)\oplus r(y))_w \gt (r(x) \oplus r(y))_v$, yielding by applying Definition 7.3 once more that

\begin{equation} r(r(x) \oplus r(y))_v = (r(x)\oplus r(y))_v = x_v = r(x \oplus y)_v. \end{equation} (115)

Altogether, this shows Equation (110) and, as demonstrated above, implies Equation (19).

Having established that determining LE lists can be done by an MBF-like algorithm allows us to apply the machinery developed in Sections 2–5. Next, we establish that LE list computations can be performed efficiently, which we show by bounding the length of LE lists.

7.3 Computing LE Lists is Efficient

Our course of action is to show that LE list computations are efficient using Theorem 5.2 (i.e., the oracle theorem). The purpose of this section is to prepare the lemmas required to apply Theorem 5.2. We stress that the key challenge is to perform each iteration in polylogarithmic depth; this allows us to determine $\mathcal {A}(H)$ in polylogarithmic depth due to $\operatorname{SPD}(H) \in \operatorname{O}(\log ^2 n)$. To this end, we first establish the length of intermediate LE lists to be logarithmic w.h.p. (Lemma 7.6). This permits us to apply $r^V$ and determine the matrix-vector multiplication with $A_{\lambda }$ (the scaled version of $A_G$, the adjacency matrix of $G$ from Section 5) in a sufficiently efficient manner (Lemmas 7.7 and 7.8). Section 7.4 plugs these results into Theorem 5.2 to establish our main result.

We remark that LE lists are known to have length $\operatorname{O}(\log n)$ w.h.p. throughout intermediate computations [26, 30], assuming that LE lists are assembled using $h$-hop distances. Lemma 7.6, while using the same key argument, is more general since it makes no assumption about $x$ except for its independence of the random node order; we need the more general statement due to our decomposition of $A_H$.

Recall that by $|x|$ we denote the number of non-$\infty$ entries of $x \in \mathcal {D}$ and that we only need to keep the non-$\infty$ entries in memory. Lemma 7.6 shows that any LE list $r(x) \in \mathcal {D}$ has length $|r(x)| \in \operatorname{O}(\log n)$ w.h.p., provided that $x$ does not depend on the random node ordering. Observe that, in fact, the lemma is quite powerful as it suffices that there is any $y \in [x]$ that does not depend on the random node ordering: as $r(x)=r(y)$, then $|r(x)|=|r(y)|\in \operatorname{O}(\log n)$ w.h.p.

Lemma 7.6. Let $x \in \mathcal {D}$ be arbitrary but independent of the random order of the nodes. Then $|r(x)| \in \operatorname{O}(\log n)$ w.h.p.

Proof. Order the non-$\infty$ values of $x$ by ascending distance, breaking ties independently of the random node order. Denote for $i \in \lbrace 1, \dots , |x| \rbrace$ by $v_i \in V$ the $i$th node with respect to this order (i.e., $x_{v_i}$ is the $i$th smallest entry in $x)$. Furthermore, denote by $X_i$ the indicator variable which is 1 if $v_i \lt v_j$ for all $j \in \lbrace 1, \dots , i-1 \rbrace$ and 0 otherwise. As the node order and $x$ are independent, we obtain . For $X := \sum _{i=1}^{|x|} X_i$, this implies

(116)

Observe that $X_i$ is independent of $\lbrace X_1, \dots , X_{i-1} \rbrace$, as whether $v_i \lt v_j$ for all $j \lt i$ is independent of the internal order of the set $\lbrace v_1, \dots , v_{i-1} \rbrace$. We conclude that all $\lbrace X_1, \dots , X_{|x|} \rbrace$ are independent; this can be checked by inductively verifying that for any possible assignment $(b_1, \dots , b_k) \in \lbrace 0,1\rbrace ^k$. Applying Chernoff's bound yields $X \in \Theta (\log n)$ w.h.p. As , this concludes the proof.

Hence, filtered, possibly intermediate LE lists $r(x)$ w.h.p. comprise $\operatorname{O}(\log n)$ entries. We proceed to show that, under these circumstances, $r(x)$ can be computed efficiently.

Lemma 7.7. Let $x \in \mathcal {D}$ be arbitrary. Then $r(x)$ can be computed using $\operatorname{O}(|r(x)| \log n)$ depth and $\operatorname{O}(|r(x)| |x|)$ work.

Proof. We use one iteration per non-$\infty$ entry of $r(x)$. In each iteration, the smallest non-dominated entry of $x_v$ is copied to $r(x)_v,$ and all entries of $x$ dominated by $x_v$ are marked as dominated. This yields $|r(x)|$ iterations as follows:

Initialize $r(x) \leftarrow \bot$. Construct a balanced binary tree on the non-$\infty$ elements of $x$ and identify its leaves with their indices $v \in V$ ($\operatorname{O}(\log n)$ depth and $\operatorname{O}(|x|)$ work).
Find the element with the smallest node index $v$ with respect to the random node order whose corresponding leaf is not marked as discarded by propagating the minimum up the tree ($\operatorname{O}(\log n)$ depth and $\operatorname{O}(|x|)$ work). Set $r(x)_v \leftarrow x_v$.
Mark each leaf $w$ for which $x_v \le x_w$, including $v$, as discarded ($\operatorname{O}(1)$ depth and $\operatorname{O}(|x|)$ work).
If there are non-discarded leaves ($\operatorname{O}(\log n)$ depth and $\operatorname{O}(|x|)$ work), continue at step 2.

Note that for each $w \ne v$ for which the corresponding node is discarded, we have $r(x)_w = \infty$. On the other hand, by construction, we have for all $v$ for which we stored $r(x)_v=x_v$ that there is no $w \in V$ satisfying both $x_w \le x_v$ and $w \lt v$. Thus, the computed list is indeed $r(x)$.

The depth and work bounds follow from the above bounds on the complexities of the individual steps and by observing that, in each iteration, we add a distinct index–value pair (with non-$\infty$ value) to the list that after termination equals $r(x)$.

Any intermediate result used by the oracle is of the form $r^V A_\lambda y$ with

\begin{equation} y = (r^V A_\lambda)^f P_\lambda x^{(h)}, \end{equation} (117)

where $x^{(h)} = r^V A_H^h x^{(0)}$ is the intermediate result of $h$ iterations on $H$, $\lambda \in \lbrace 0, \dots , \Lambda \rbrace$ is a level, and $(r^V A_\lambda)^f P_\lambda$ represents another $f$ iterations in $G$ with edge weights stretched according to level $\lambda$. The oracle uses this to simulate the $(h+1)$-th iteration in $H$ (compare Section 5 and Theorem 5.2 in particular).

Lemma 7.8. Consider $x^{(0)} \in \mathcal {D}^V$ from Equation (108). For arbitrary , we can w.h.p.

determine $r^V x^{(0)}$ from $x^{(0)}$ using $\operatorname{O}(n)$ work and $\operatorname{O}(1)$ depth,
compute $r^V A_\lambda y$ from $y$ as defined in Equation (117) using $W \in \operatorname{O}(m \log ^2 n)$ work and $D \in \operatorname{O}(\log ^2 n)$ depth, and
compute $r^V (\bigoplus _{\lambda = 0}^\Lambda y_\lambda)$ from the individual $y_\lambda = (P_\lambda r^V A_\lambda ^d P_\lambda) x^{(i)}$, $0 \le \lambda \le \Lambda$, with $W_\oplus \in \operatorname{O}(n \log ^4 n)$ work and $D_\oplus \in \operatorname{O}(\log ^2 n)$ depth.

Proof. We establish the claims in order.

Regarding the first claim, observe that $r^V x^{(0)} = x^{(0)}$. Hence, we can copy $x^{(0)}$ using constant depth and $\operatorname{O}(\sum _{v \in V} |x^{(0)}_v|) = \operatorname{O}(n)$ work.
As for the second claim, we expand $x^{(h)}$ in Equation (117) and remove all intermediate filtering steps, obtaining

\begin{equation} y \stackrel{(5.8)}{=} (r^V A_\lambda)^f P_\lambda \left(r^V \left(\bigoplus _{\lambda = 0}^\Lambda P_\lambda (r^V A_\lambda)^d P_\lambda \right)\right)^h r^V x^{(0)} \end{equation} (118)

\begin{equation} \ \stackrel{(2.35)}{=} r^V \underbrace{A_\lambda ^f P_\lambda \left(\bigoplus _{\lambda = 0}^\Lambda P_\lambda A_\lambda ^d P_\lambda \right)^h x^{(0)}}_{=: y^{\prime }}. \end{equation} (119)

The key observation is that, since the random order of $V$ only plays a role for $r$ and we removed all intermediate applications of $r^V,$ $y^{\prime }$ does not depend on that order. Hence, we may apply Lemma 7.6, which yields for each $v \in V$ that $|y_v| = |r(y^{\prime }_v)| \in \operatorname{O}(\log n)$ w.h.p. Condition on $|y_v| \in \operatorname{O}(\log n)$ for all $v \in V$ in the following, which happens w.h.p. by Lemma 1.2.
Regarding the computation of $r^V A_\lambda y$, we first compute each $(A_\lambda y)_v$ in parallel for all $v \in V$. By Lemma 2.3 and because $|y_v| \in \operatorname{O}(\log n)$, this can be done using $\operatorname{O}(\log n)$ depth and work

\begin{equation} \operatorname{O}\left(\sum _{v \in V} \sum _{{{\scriptsize {\begin{array}{c}w \in V \\ \lbrace v,w\rbrace \in E\end{array}}}}} |y_w| \log n \right) \subseteq \operatorname{O}\left(\sum _{\lbrace v,w\rbrace \in E} \log ^2 n \right) = \operatorname{O}(m \log ^2 n). \end{equation} (120)

Here, we use that propagation with respect to $\mathcal {D}$—uniformly increasing weights—requires, due to $|y_v| \in \operatorname{O}(\log n)$, no more than $\operatorname{O}(1)$ depth and $\operatorname{O}(m \log n)$ work and is thus dominated by aggregation. To bound the cost of computing $r^V A_\lambda y$ from $A_\lambda y$, observe that we have

\begin{equation} \left| (A_\lambda y)_v \right| \in \operatorname{O}\left(\sum _{{{\scriptsize {\begin{array}{c}w \in V \\ \lbrace v,w\rbrace \in E\end{array}}}}} |y_w| \right). \end{equation} (121)

Hence, by Lemma 7.7 and due to $|y_v| \in \operatorname{O}(\log n)$, we can compute $(r^V A_\lambda y)_v$ in parallel for all $v \in V$ using $\operatorname{O}(\log ^2 n)$ depth and

\begin{equation} \operatorname{O}\left(\sum _{v \in V} |(A_\lambda y)_v| \log n \right) \stackrel{(7.18)}{\subseteq } \operatorname{O}\left(\sum _{v \in V} \sum _{{{\scriptsize {\begin{array}{c}w \in V \\ \lbrace v,w\rbrace \in E\end{array}}}}} |y_w| \log n \right)\\ \end{equation} (122)

\begin{equation} \subseteq \operatorname{O}\left(\sum _{\lbrace v,w\rbrace \in E} \log ^2 n \right)\\ \end{equation} (123)

\begin{equation} \subseteq \operatorname{O}\left(m \log ^2 n \right) \end{equation} (124)

work. All operations are possible using $D \in \operatorname{O}(\log ^2 n)$ depth and $W \in \operatorname{O}(m \log ^2 n)$ work. As we condition only on an event that occurs w.h.p., this concludes the proof of the second claim.
Regarding the last claim, condition on logarithmic length of all LE lists (i.e., on $|(y_{\lambda })_v| \in \operatorname{O}(\log n)$ for all $0 \le \lambda \le \Lambda$). We can compute $\bigoplus _{\lambda = 0}^\Lambda (y_{\lambda })_v \in \mathcal {D}$, the aggregation for a single vertex $v$, using $\operatorname{O}(\sum _{\lambda = 0}^\Lambda |(y_{\lambda })_v| \log n) = \operatorname{O}(\log ^3 n)$ work and $\operatorname{O}(\log n)$ depth by Lemma 2.3. As the work bounds the length of the resulting list, we can determine $r(\bigoplus _{\lambda = 0}^\Lambda (y_{\lambda })_v)$ using $\operatorname{O}(\log ^4 n)$ work and $\operatorname{O}(\log ^2 n)$ depth by Lemma 7.7. Doing this in parallel for all $v \in V$ yields $W_\oplus \in \operatorname{O}(n \log ^4 n)$ work and $D_\oplus \in \operatorname{O}(\log ^2 n)$ depth. As we condition on two events that occur w.h.p., this concludes the last claim.

7.4 Metric Tree Embedding in Polylogarithmic Time and Near-Linear Work

Determining LE lists on $H$ yields a probabilistic tree embedding of $G$ with expected stretch $\operatorname{O}(\log n)$ (Section 7.1), is the result of an MBF-like algorithm (Section 7.2), and each iteration of this algorithm is efficient (Theorem 5.2 and Section 7.3). We assemble these pieces in Theorem 7.9, which relies on $G$ containing a suitable hop set. Corollaries 7.10 and 7.11 remove this assumption by invoking known algorithms to establish this property first. Note that Theorem 7.9 serves as a blueprint yielding improved tree embedding algorithms when provided with improved hop set constructions.

Theorem 7.9. Suppose we are given the weighted incidence list of a graph $G = (V, E, \operatorname{\omega })$ satisfying for some and that $\operatorname{dist}^d(v,w,G) \le \alpha \operatorname{dist}(v,w,G)$ for all $v,w \in V$. Then, w.h.p., we can sample from a tree embedding of $G$ of expected stretch $\operatorname{O}(\alpha ^{\operatorname{O}(\log n)} \log n)$ with depth $\operatorname{O}(d \log ^4 n) \subset \operatorname{\tilde{O}}(d)$ and work $\operatorname{O}(m(d + \log n) \log ^5 n) \subset \operatorname{\tilde{O}}(md)$.

Proof. We first w.h.p. compute the LE lists of $H$. To this end, by Lemma 7.8, we may apply Theorem 5.2 with parameters $D \in \operatorname{O}(\log ^2 n)$, $W \in \operatorname{O}(m \log ^2 n)$, $D_\oplus \in \operatorname{O}(\log ^2 n)$, and $W_\oplus \in \operatorname{O}(n \log ^4 n)$; we arrive at depth $\operatorname{O}(d \log ^4 n)$ and work $\operatorname{O}(m(d + \log n) \log ^5 n)$. As shown in Fakcharoenphol et al. [23], the FRT tree $T$ represented by these lists has expected stretch $\operatorname{O}(\log n)$ with respect to the distance metric of $H$. By Theorem 4.5, w.h.p. $\operatorname{dist}(v,w,G) \le \operatorname{dist}(v,w,H) \le \alpha ^{\operatorname{O}(\log n)} \operatorname{dist}(v,w,G)$ and hence

\begin{equation} \operatorname{dist}(v,w,G) \le \operatorname{dist}(v,w,T) \in \operatorname{O}\left(\alpha ^{\operatorname{O}(\log n)} \log n \operatorname{dist}(v,w,G) \right) \end{equation} (125)

in expectation (compare Definition 7.1). Observe that, by Lemma 7.2, explicitly constructing the FRT tree is possible within the stated bounds.

As stated earlier, we require $G$ to contain a $(d, 1 / \operatorname{polylog}n)$-hop set with $d \in \operatorname{polylog}n$ in order to achieve polylogarithmic depth. We also need to determine such a hop set using $\operatorname{polylog}n$ depth and near-linear work in $m$ and that it does not significantly increase the problem size by adding too many edges. Cohen's hop sets [17] meet all these requirements, yielding the following corollary.

Corollary 7.10. Given the weighted incidence list of a graph $G$ and an arbitrary constant $\varepsilon \gt 0$, we can w.h.p. sample from a tree embedding of expected stretch $\operatorname{O}(\log n)$ using depth $\operatorname{polylog}n$ and work $\operatorname{\tilde{O}}(m^{1 + \varepsilon })$.

Proof. We apply the hop set construction by Cohen [17] to $G = (V, E, \operatorname{\omega })$ to w.h.p. determine an intermediate graph $G^{\prime }$ with vertices $V$ and an additional $\operatorname{\tilde{O}}(m^{1 + \varepsilon })$ edges. The algorithm guarantees $\operatorname{dist}^d(v,w,G) \le \alpha \operatorname{dist}(v,w,G^{\prime })$ for $d \in \operatorname{polylog}n$ and $\alpha \in 1 + 1 / \operatorname{polylog}n$ (where the $\operatorname{polylog}n$ term in $\alpha$ is under our control) and has depth $\operatorname{polylog}n$ and work $\operatorname{\tilde{O}}(m^{1 + \varepsilon })$. Choosing $\alpha \in 1 + \operatorname{O}(1 / \log n)$ and applying Theorem 7.9, the claim follows.

Adding a hop set to $G$, embedding the resulting graph in $H$, and sampling an FRT tree on $H$ is a three-step sequence of embeddings of $G$. Still, in terms of stretch, the embedding of Corollary 7.10 is—up to a factor in $1 + \operatorname{o}(1)$—as good as directly constructing an FRT tree of $G$: (1) Hop sets do not stretch distances. (2) By Theorem 4.5 and Equation (91), $H$ introduces a stretch of $1 + 1 / \operatorname{polylog}n$. (3) Together, this ensures that the expected stretch of the FRT embedding with respect to $G$ is $\operatorname{O}(\log n)$.

It is possible to reduce the work at the expense of an increased stretch by first applying the spanner construction by Baswana and Sen [9]:

Corollary 7.11. Suppose we are given the weighted incidence list of a graph $G$. Then, for any constant $\varepsilon \gt 0$ and any , we can w.h.p. sample from a tree embedding of $G$ of expected stretch $\operatorname{O}(k \log n)$ using depth $\operatorname{polylog}n$ and work $\operatorname{\tilde{O}}(m + n^{1 + 1/k + \varepsilon })$.

Proof. The algorithm of Baswana and Sen [9] computes a $(2k - 1)$-spanner of $G = (V, E, \operatorname{\omega })$; that is, a subgraph $G^{\prime } = (V, E^{\prime }, \operatorname{\omega })$ satisfying for all $v,w \in V$ that $\operatorname{dist}(v, w, G) \le \operatorname{dist}(v, w, G^{\prime }) \le (2k - 1) \operatorname{dist}(v, w, G)$ using $\operatorname{polylog}n$ depth and $\operatorname{\tilde{O}}(m)$ work. We argue in the proof of Theorem 6.3 that $|E^{\prime }| \in \operatorname{\tilde{O}}(n^{1 + 1/k})$ w.h.p. The claim follows from applying Corollary 7.10 to $G^{\prime }$.

7.5 Reconstructing Paths from Virtual Edges

Given that we only deal with distances and not with paths in the FRT construction, there is one concern: Consider an arbitrary graph $G = (V, E, \operatorname{\omega })$, its augmentation with a hop set resulting in $G^{\prime }$, which is then embedded into the complete graph $H$ and finally into an FRT tree $T = (V_T, E_T, \operatorname{\omega }_T)$. How can an edge $e \in E_T$ of weight $\operatorname{\omega }_T(e)$ be mapped to a path $p$ in $G$ with $\operatorname{\omega }(p) \le \operatorname{\omega }_T(H)$? Note that this question has to be answered in polylogarithmic depth and without incurring too much memory overhead. Our purpose is not to provide specifically tailored data structures, but we propose a three-step approach that maps edges in $T$ to paths in $H$, edges in $H$ to paths in $G$, and finally edges from $G^{\prime }$ to paths in $G$.

Concerning a tree edge $e \in E_T$, observe that $e$ maps back to a path $p$ of at most $\operatorname{SPD}(H)$ hops in $H$ with $\operatorname{\omega }_H(p) \le 3 \operatorname{\omega }_T(e)$ as follows. First, to keep the notation simple, identify each tree node—given as tuple $(v_i, \dots , v_j)$—w ith its “leading” node $v_i \in V$; in particular, each leaf has $i = 0$ and is identified with the node in $V$ that is mapped to it. A leaf $v_0$ has an LE entry $(\operatorname{dist}(v_0, v_1, H), v_1),$ and we can trace the shortest $v_0$-$v_1$-path in $H$ based on the LE lists (nodes locally store the predecessor of shortest paths just like in APSP). Moreover, $\operatorname{dist}(v_i, v_{i+1}, H) \le \operatorname{\omega }_T(v_i, v_{i+1})$ (i.e., we may map the tree edge back to the path without incurring larger cost than in $T$). If $i \gt 0$, $v_i$ and $v_{i+1}$ are inner nodes. Choose an arbitrary leaf $v_0$ that is a common descendant (this choice can, e.g., be fixed when constructing the tree from the LE list without increasing the asymptotic bounds on depth or work). We then can trace shortest paths from $v_0$ to $v_i$ and from $v_0$ to $v_{i+1}$ in $H$, respectively. The cost of their concatenation is $\operatorname{dist}(v_0, v_i, H) + \operatorname{dist}(v_0, v_{i+1}, H) \le \beta 2^i + \beta 2^{i+1} = 3(\beta 2^i) = 3\operatorname{\omega }_T(v,w)$ by the properties of LE lists and the FRT embedding. Note that, due to the identification of each tree node with its “leading” graph node, paths in $T$ map to concatenable paths in $H$.

Regarding the mapping from edges in $H$ to paths in $G$, recall that we compute the LE lists of $H$ by repeated application of the operations $r^V$, $\oplus$, $P_\lambda$, and $A_\lambda$ with $0 \le \lambda \le \Lambda$. Observe that $r^V$, $\oplus$, and $P_\lambda$ discard information; that is, distances to nodes that do not make it into the final LE lists and therefore are irrelevant to routing. $A_\lambda$, on the other hand, is an MBF step. Thus, we may store the necessary information for backtracing the induced paths at each node; specifically, we can store, for each iteration $h \in \operatorname{O}(\log ^2 n)$ with respect to $H$, each of the intermediate $d$ iterations in $G$, and each $\lambda \in \operatorname{O}(\log n)$, the state vector $y$ of the form in Equation (117) in a lookup table. This requires $\operatorname{\tilde{O}}(d)$ memory and efficiently maps edges of $H$ to $d$-hop paths in $G$—or rather to $d$-hop paths in $G^{\prime }$ if we construct $H$ after augmenting $G$ to $G^{\prime }$ using a hop set.

Mapping edges of $G^{\prime }$ to edges in $G$ depends on the hop set. Cohen [17] does not discuss this in her article, but her hop set edges can be efficiently mapped to paths in the original graph by a lookup table: Hop set edges either correspond to a shortest path in a small cluster or to a cluster that has been explored using polylogarithmic depth. Regarding other hop set algorithms, we note that many techniques constructing hop set edges using depth $D$ allow for reconstruction of corresponding paths at depth $\operatorname{O}(D)$ (i.e., that polylogarithmic-depth algorithms are compatible analogously to Cohen's hop sets). For instance, this is the case for the hop set construction by Henziger et al. [29], which we leverage in Section 8.3.

8 DISTRIBUTED FRT CONSTRUCTION

Distributed algorithms for constructing FRT-type tree embeddings in the Congest model are covered by our framework as well. In the following, we recap two existing algorithms [26, 30] (our framework allows doing this in a very compact way), and we improve upon the state of the art, reducing a factor of $n^\varepsilon$ in the currently best-known round complexity for expected stretch $\operatorname{O}(\log n)$ [26] to $n^{o(1)}$. We use the hop set of Henzinger et al. [29] instead of Cohen's [17], because it is compatible with the Congest model. Note that replacing the hop set is straightforward since our theorems in the previous sections are formulated with respect to generic $(d, \hat{\varepsilon })$-hop sets.

The Congest Model. We refer to Peleg [41] for a formal definition of the Congest model but briefly outline its core aspects. The Congest model is a model of computation that captures distributed computations performed by the nodes of a graph, where communication is restricted to its edges. Each node is initialized with a unique ID of $\operatorname{O}(\log n)$ bits, knows the IDs of its adjacent nodes along with the weights of the corresponding incident edges, and “its” part of the input (in our case the input is empty); each node has to compute “its” part of the output (in our case, as detailed in Section 7.1, its LE list). Computations happen in rounds, and we are interested in how many rounds it takes for an algorithm to complete. In each round, each node does the following:

Perform finite, but otherwise arbitrary local computations.
Send a message of $\operatorname{O}(\log n)$ bits to each neighboring node.
Receive the messages sent by neighbors.

Recall that, by assumption, edge weights can be encoded using $\operatorname{O}(\log n)$ bits; that is, an index–distance pair can be encoded in a single message.

Overview. Throughout this section, let $G = (V, E, \operatorname{\omega })$ be a weighted graph and denote, for any graph $G$, by its adjacency matrix according to Equation (4). Fix the semiring $\mathcal {S} = \mathcal {S}_{\min ,+}$ and the zero-preserving semimodule $\mathcal {M} = \mathcal {D}$ from Definition 2.1, as well as $r$, $\sim$, and $x^{(0)}$ as given in Definition 7.3.

Sections 8.1 and 8.2 briefly summarize the distributed FRT algorithms by Kahn et al. [30] and Ghaffari and Lenzen [26], respectively. We use these preliminaries, our machinery, and a distributed hop set construction due to Henziger et al. [29] in Section 8.3 to propose an algorithm that reduces a multiplicative overhead of $n^\varepsilon$ in the round complexity of [26] to $n^{o(1)}$.

8.1 The Algorithm by Khan et al.

In our terminology, the algorithm of Khan et al. [30] performs $\operatorname{SPD}(G)$ iterations of the MBF-like algorithm for collecting LE lists implied by Definition 7.3; that is,

\begin{equation} r^V A_G^{\operatorname{SPD}(G)} x^{(0)} \stackrel{(2.35)}{=} \left(r^V A_G \right)^{\operatorname{SPD}(G)} x^{(0)}. \end{equation} (126)

It does so in $\operatorname{SPD}(G) + 1$ iterations by initializing $x^{(0)}$ as in Equation ( 108) and iteratively computing $x^{(i+1)} := r^V A_G x^{(i)}$ until a fixed point is reached (i.e., until $x^{(i+1)} = x^{(i)}$. As $(r^V A_G)^i x^{(0)} = r^V A_G^i x^{(0)}$). Lemma 7.6 shows, that w.h.p. $|x^{(i)}_v| \in \operatorname{O}(\log n)$ for all $0 \le i \le \operatorname{SPD}(G)$ and all $v \in V$. Therefore, $v \in V$ can w.h.p. transmit $x^{(i)}_v$ to all of its neighbors using $\operatorname{O}(\log n)$ messages, and, upon reception of its neighbors’ lists, locally compute $x^{(i+1)}_v$. Thus, each iteration takes $\operatorname{O}(\log n)$ rounds w.h.p., implying the round complexity of $\operatorname{O}(\operatorname{SPD}(G) \log n)$ w.h.p. shown in Khan et al. [ 30].

8.2 The Algorithm by Ghaffari and Lenzen

The strongest lower bound regarding the round complexity for constructing a (low-stretch) metric tree embedding of $G$ in the Congest model is $\operatorname{\tilde{\Omega }}(\sqrt {n} + \operatorname{D}(G))$ [20, 26]. If $\operatorname{SPD}(G) \gg \max \lbrace \operatorname{D}(G), \sqrt {n}\rbrace$, one may thus hope for a solution that runs in $\operatorname{\tilde{o}}(\operatorname{SPD}(G))$ rounds. For any , in Ghaffari and Lenzen [26], it is shown that expected stretch $\operatorname{O}(\varepsilon ^{-1}\log n)$ can be achieved in $\operatorname{\tilde{O}}(n^{1/2 + \varepsilon } + \operatorname{D}(G))$ rounds; below, we summarize this algorithm.

The strategy is to first determine the LE lists of a constant-stretch metric embedding of (the induced submetric of) an appropriately sampled subset of $V$. The resulting graph is called the skeleton spanner, and its LE lists are then used to jump-start the computation on the remaining graph. When sampling the skeleton nodes in the right way, stretching non-skeleton edges analogously to Section 4, and fixing a shortest path for each pair of vertices, w.h.p. all of these paths contain a skeleton node within a few hops. Ordering skeleton nodes before non-skeleton nodes with respect to the random ordering implies that each LE list has a short prefix accounting for the local neighborhood, followed by a short suffix containing skeleton nodes only. This is due to the fact that skeleton nodes dominate all non-skeleton nodes for which the respective shortest path passes through them. Hence, no node has to learn information that is further away than $d_S\in \operatorname{\tilde{O}}(\sqrt {n})$, an upper bound on the number of hops when a skeleton node is encountered on a shortest path that holds w.h.p.

The Graph $H$ . In Ghaffari and Lenzen [26], $G$ is embedded into $H$ and an FRT tree is sampled on $H$, where $H$ is derived as follows. Abbreviate $\ell := \lceil \sqrt {n} \rceil$. For a sufficiently large constant $c$, sample $\lceil c \ell \log n \rceil$ nodes uniformly at random; call this set $S$. Define the skeleton graph

\begin{equation} G_S := (S, E_S, \operatorname{\omega }_S)\text{, where}\\ \end{equation} (127)

\begin{equation} E_S := \left\lbrace \lbrace s,t\rbrace \in \binom{S}{2} \mid \operatorname{dist}^\ell (s, t, G) \lt \infty \right\rbrace \text{ and}\\ \end{equation} (128)

\begin{equation} \operatorname{\omega }_S(s, t) \mapsto \operatorname{dist}^{\ell }(s, t, G). \end{equation} (129)

Then w.h.p. $\operatorname{dist}(s, t, G_S) = \operatorname{dist}(s, t, G)$ for all $s,t \in S$ (Lemma 4.6 of [ 33]). For $k \in \Theta (\varepsilon ^{-1})$, construct a $(2k-1)$-spanner

\begin{equation} G^{\prime }_S := (S, E^{\prime }_S, \operatorname{\omega }_S) \end{equation} (130)

of the skeleton graph $G_S$ that has $\operatorname{\tilde{O}}(\ell ^{1+1/k}) \subseteq \operatorname{\tilde{O}}(n^{1/2+\varepsilon })$ edges w.h.p. (Lemma 4.9 of [ 33]). Define

\begin{equation} H := (V, E_H, \operatorname{\omega }_H)\text{, where} \\ \end{equation} (131)

\begin{equation} E_H := E^{\prime }_S \cup E\text{, and} \\ \end{equation} (132)

\begin{equation} \operatorname{\omega }_H(e) \mapsto { \left\lbrace \begin{array}{@{}l@{\quad }l@{}}\operatorname{\omega }_S(e) & \text{if $e \in E^{\prime }_S$ and} \\ (2k - 1) \operatorname{\omega }(e) & \text{otherwise.} \end{array}\right. } \end{equation} (133)

By construction, $G$ embeds into $H$ with a stretch of $2k - 1$ w.h.p.; that is, $\operatorname{dist}(v, w, G) \le \operatorname{dist}(v, w, H) \le (2k - 1) \operatorname{dist}(v, w, G)$. Computing an FRT tree $T$ of $H$ of expected stretch $\operatorname{O}(\log n)$ thus implies that $G$ embeds into $T$ with expected stretch $\operatorname{O}(k \log n) = \operatorname{O}(\varepsilon ^{-1} \log n)$.

FRT Trees of $H$ . Observe that min-hop shortest paths in $H$ contain only a single maximal subpath consisting of spanner edges, where the maximal subpaths of non-spanner edges have at most $\ell$ hops w.h.p. This follows analogously to Lemma 4.4 with two levels and a sampling probability of $\operatorname{\tilde{\Theta }}(1 / \ell)$. Assuming $s \lt v$ for all $s \in S$ and $v \in V \setminus S$ (we discuss this below) for each $v \in V$ and each entry $(w, \operatorname{dist}(v, w, H))$ of its LE list, w.h.p. there is a min-hop shortest $v$-$w$-path with a prefix of $\ell$ non-spanner edges followed by a shortest path in $G^{\prime }_S$. This entails that w.h.p.

\begin{equation} r^V A_H^{\operatorname{SPD}(H)} x^{(0)} = r^V A_{G,2k-1}^{\ell } A_{G^{\prime }_S}^{|S|} x^{(0)} = r^V A_{G,2k-1}^{\ell } \underbrace{\left(r^V A_{G^{\prime }_S}^{|S|} x^{(0)} \right)}_{=: \bar{x}^{(0)}}, \end{equation} (134)

where $A_{G,s}$ is $A_G$ with entries stretched by a factor of

and we extend $A_{G^{\prime }_S}$ to be a $V \times V$ matrix by setting $(A_{G^{\prime }_S})_{vw} = \infty$ if $v\ne w\in V\setminus S$ and $(A_{G^{\prime }_S})_{vv}=0$ for $v\in V\setminus S$.

In order to construct an FRT tree, suppose we have sampled uniform permutations of $S$ and $V \setminus S$ and a random choice of $\beta$. We extend the permutations to a permutation of $V$ by ruling that, for all $s \in S$ and $v \in V \setminus S$, we have $s \lt v$, fulfilling the above assumption. Lemma 4.9 of Ghaffari and Lenzen [26] shows that the introduced dependence between the topology of $H$ and the resulting permutation on $V$ does not increase the expected stretch of the embedding beyond $\operatorname{O}(\log n)$. The crucial advantage of this approach lies in the fact that now the LE lists of nodes in $S$ may be used to jump-start the construction of LE lists for $H$, in accordance with Equation (134).

The Algorithm. In Ghaffari and Lenzen [26], it is shown that LE lists of $H$ can be determined quickly in the Congest model as follows.

Some node $v_0$ starts by broadcasting $k$ and a random choice of $\beta$, constructing a BFS tree on the fly. Upon receipt, each node generates a random ID of $\operatorname{O}(\log n)$ bits which is unique w.h.p. Querying the amount of nodes with an ID of less than some threshold via the BFS tree, $v_0$ determines the bottom $\ell$ node IDs via binary search; these nodes form the set $S$ and satisfy the assumption that went into Equation (134). All of these operations can be performed in $\operatorname{\tilde{O}}(\operatorname{D}(G))$ rounds.
The nodes in $S$ determine $G^{\prime }_S$, which is possible in $\operatorname{\tilde{O}}(\operatorname{D}(G) + \ell ^{1 + 1/k}) \subseteq \operatorname{\tilde{O}}(\operatorname{D}(G) + n^{1/2 + \varepsilon })$ rounds, such that all $v \in V$ learn $E^{\prime }_S$ and $\operatorname{\omega }_S$ [26, 33]. After that, $G^{\prime }_S$ is global knowledge and each $v \in V$ can locally compute $\bar{x}^{(0)}_v$.
Subsequently, nodes w.h.p. determine their component of $r^V A_{G,2k-1}^\ell \bar{x}^{(0)} = (r^V A_{G,2k-1})^\ell \bar{x}^{(0)}$ via $\ell$ MBF-like iterations of

\begin{equation} \bar{x}^{(i+1)} := r^V A_{G,2k-1} \bar{x}^{(i)}. \end{equation} (135)

Here, one exploits that, for all $i$, $|\bar{x}^{(i)}_v| \in \operatorname{O}(\log n)$ w.h.p. by Lemma 7.6,¹⁰ and thus each iteration can be performed by sending $\operatorname{O}(\log n)$ messages over each edge (i.e., in $\operatorname{O}(\log n)$ rounds); the entire step thus requires $\operatorname{\tilde{O}}(\ell) \subseteq \operatorname{\tilde{O}}(n^{1/2})$ rounds.

Together, this w.h.p. implies the round complexity of $\operatorname{\tilde{O}}(n^{1/2 + \varepsilon } + \operatorname{D}(G))$ for an embedding of expected stretch $\operatorname{O}(\varepsilon ^{-1}\log n)$.

8.3 Achieving Stretch $\operatorname{O}(\log n)$ in Near-Optimal Time

The multiplicative overhead of $n^{\varepsilon }$ in the round complexity is due to constructing and broadcasting the skeleton spanner $G^{\prime }_S$. We can improve upon this by relying on hop sets, just as we do in our parallel construction. Henziger et al. [29] show how to compute an $(n^{\operatorname{o}(1)}, \operatorname{o}(1))$-hop set of the skeleton graph in the Congest model using $n^{1/2 + \operatorname{o}(1)} + \operatorname{D}(G)^{1 + \operatorname{o}(1)}$ rounds.

Our approach is similar to the one outlined in Section 8.2. The key difference is that we replace the use of a spanner by combining a hop set of the skeleton graph with the construction from Section 4; using the results from Section 5, we can then efficiently construct the LE lists on $S$ to jump-start the construction of LE lists for all nodes.

The Graph $H$ . Let $\ell$, $c$, and the skeleton graph $G_S = (S, E_S, \operatorname{\omega }_S)$ be defined as in Section 8.2 and Equations (127)–(128), w.h.p. yielding $\operatorname{dist}(s, t, G_S) = \operatorname{dist}(s, t, G)$ for all $s,t \in S$. Suppose for all $s,t\in S$, we know approximate weights $\operatorname{\omega }^{\prime }_S(s,t)$ with

\begin{equation*} \operatorname{dist}(s,t,G)\le \operatorname{\omega }^{\prime }_S(s,t)\in (1+o(1))\operatorname{\omega }_S(s,t) \end{equation*}

(our algorithm has to rely on an approximation to meet the stated round complexity) and add an $(n^{\operatorname{o}(1)}, \operatorname{o}(1/\log n))$-hop set to $G_S$ using the construction of Henzinger et al. [ 29]. Together, this results in a graph

\begin{equation} G^{\prime }_S := (S, E^{\prime }_S, \operatorname{\omega }^{\prime }_S), \end{equation} (136)

where $E^{\prime }_S$ contains the skeleton edges $E_S$ and some additional edges, and w.h.p. it holds for all $s,t \in S$ that

\begin{equation} \operatorname{dist}(s, t, G_S) \le \operatorname{dist}^d(s, t, G^{\prime }_S) \in (1 + o(1/\log n)) \operatorname{dist}(v, w, G_S) \end{equation} (137)

for some $d \in n^{\operatorname{o}(1)}$ and $\operatorname{dist}(v,w,G)\le \operatorname{dist}(v,w,G_S)\in (1+o(1))\operatorname{dist}(v,w,G)$. Next, embed $G^{\prime }_S$ into $H_S$ as in Section 4, yielding node and edge levels $\operatorname{\lambda }(e) \in \lbrace 0, \dots , \Lambda \rbrace$:

\begin{equation} H_S := \left(S, \binom{S}{2}, \operatorname{\omega }_{H_S} \right)\text{ with} \\ \end{equation} (138)

\begin{equation} \operatorname{\omega }_{H_S}(\lbrace s,t\rbrace) \mapsto (1 + \hat{\varepsilon })^{\Lambda - \operatorname{\lambda }(s,t)} \operatorname{dist}^d(s, t, G^{\prime }_S) \end{equation} (139)

with $d$ as above, $\hat{\varepsilon }\in \operatorname{o}(1 / \log n)$. By Theorem 4.5, w.h.p. we have that $\operatorname{SPD}(G) \in \operatorname{O}(\log ^2 n)$ and for all $s,t\in S$ that

\begin{equation} \operatorname{dist}(s,t,G)\le \operatorname{dist}(s,t,G_S)\le \operatorname{dist}(s,t,H_S) \in (1+o(1)) \operatorname{dist}(s, t, G_S), \end{equation} (140)

which is bounded from above by $\alpha \operatorname{dist}(s,t,G)$ for some $\alpha \in 1 + \operatorname{o}(1)$. Analogously to Equations ( 8.6)–( 8.8), define

\begin{equation} H := (V, E_H, \operatorname{\omega }_H),\text{ where} \\ \end{equation} (141)

\begin{equation} E_H := E \cup \binom{S}{2}\text{, and} \\ \end{equation} (142)

\begin{equation} \operatorname{\omega }_H(e) \mapsto { \left\lbrace \begin{array}{@{}l@{\quad }l@{}}\operatorname{\omega }_{H_S}(e) & \text{if $e \in \binom{S}{2}$ and} \\ \alpha \operatorname{\omega }_G(e) & \text{otherwise.} \end{array}\right. } \end{equation} (143)

By construction, we thus have

\begin{equation} \forall v,w \in V:\quad \operatorname{dist}(v,w,G) \le \operatorname{dist}(v,w,H) \le \alpha \operatorname{dist}(v,w,G) \in (1 + \operatorname{o}(1)) \operatorname{dist}(v,w,G) \end{equation} (144)

w.h.p.

FRT Trees of $H$ . Analogously to Section 8.2, assume that the node IDs of $S$ are ordered before those of $V \setminus S$; then min-hop shortest paths in $H$ contain a single maximal subpath of edges in $E_{H_S}$. To determine the LE lists for $H$, we must therefore compute

\begin{equation} r^V A_H^{\operatorname{SPD}(H)} x^{(0)} = \left(r^V A_{G,\alpha } \right)^{\ell } \underbrace{\left(r^V A_{H_S} \right)^{\operatorname{SPD}(H_S)} x^{(0)}}_{=: \bar{x}^{(0)}}, \end{equation} (145)

where $A_{G,\alpha }$ is given by multiplying each entry of $A_G$ by the above-mentioned factor of $\alpha$, and $A_{H_S}$ is extended to an adjacency matrix on the node set $V$ as in Section 8.2.

The Algorithm. We determine the LE lists of $H$ as follows, adapting the approach from Ghaffari and Lenzen [26] outlined in Section 8.2.

A node $v_0$ starts the computation by broadcasting a random choice of $\beta$. The broadcast is used to construct a BFS tree, nodes generate distinct random IDs of $\operatorname{O}(\log n)$ bits w.h.p., and $v_0$ figures out the ID threshold of the bottom $c\ell$ nodes $S$ with respect to the induced random ordering. This can be done in $\operatorname{\tilde{O}}(\operatorname{D}(G))$ rounds.
Each skeleton nodes $s\in S$ computes $\operatorname{\omega }^{\prime }_S(s,t)$ as above for all $t\in S$, using the $(1 + 1 / \log ^2 n)$-approximate $(S,\ell ,|S|)$-detection algorithm given in Lenzen and Patt-Shamir [35]. This takes $\operatorname{\tilde{O}}(\ell +\ell)=\operatorname{\tilde{O}}(n^{1/2})$ rounds.
Run the algorithm of Henzinger et al. [29] to compute an $(n^{\operatorname{o}(1)}, \operatorname{o}(1))$-hop set of $G_S^{\prime }$, in the sense that nodes in $S$ learn their incident weighted edges. This takes $n^{1/2 + \operatorname{o}(1)} + \operatorname{D}(G)^{1 + \operatorname{o}(1)}$ rounds.
Next, we (implicitly) construct $H_S$. To this end, nodes in $S$ locally determine their level and broadcast it over the BFS tree, which takes $\operatorname{O}(|S| + \operatorname{D}(G)) \subset \operatorname{\tilde{O}}(\sqrt {n} + \operatorname{D}(G))$ rounds; thus, $s \in S$ knows the level of $\lbrace s,t\rbrace \in E_{H_S}$ for each $t \in S$.
To determine $\bar{x}^{(0)}$, we follow the same strategy as in Theorem 5.2; that is, we simulate matrix-vector multiplication with $A_{H_S}$ via matrix-vector multiplications with $A_{G^{\prime }_S}$. Hence, it suffices to show that we can efficiently perform a matrix-vector multiplication $A_{G^{\prime }_S} x$ for any $x$ that may occur during the computation—applying $r^V$ is a local operation and thus free—assuming each node $v \in V$ knows $x_v$ and its row of the matrix.
Since multiplications with $A_{G^{\prime }_S}$ only affects lists at skeleton nodes, this can be done by local computations once all nodes know $x_s$ for each $s \in S$. As before, $|x_s| \in \operatorname{O}(\log n)$ w.h.p., so $\sum _{s \in S} |x_s| \in \operatorname{O}(|S| \log n)\subset \operatorname{\tilde{O}}(\sqrt {n})$ w.h.p. We broadcast these lists over the BFS tree of $G$, taking $\operatorname{\tilde{O}}(\sqrt {n} + \operatorname{D}(G))$ rounds per matrix-vector multiplication. Due to $\operatorname{SPD}(H_S) \in \operatorname{\tilde{O}}(\log ^2 n),$ by Theorem 4.5, this results in a round complexity of $\operatorname{\tilde{O}}(n^{1/2 + \operatorname{o}(1)} + \operatorname{D}(G)^{1 + \operatorname{o}(1)})$.
Applying $r^V A_{G,\alpha }^{\ell }$ is analogous to step 3 in Section 8.2 and takes $\operatorname{\tilde{O}}(\ell) \subseteq \operatorname{\tilde{O}}(n^{1/2})$ rounds.

Altogether, this yields a round complexity of $n^{1/2 + \operatorname{o}(1)} + \operatorname{D}(G)^{1 + \operatorname{o}(1)}$. Combining this result with the algorithm by Khan et al. [30], which terminates quickly if $\operatorname{SPD}(G)$ is small, yields the following result.

Theorem 8.1. There is a randomized distributed algorithm that w.h.p. samples from a metric tree embedding of expected stretch $\operatorname{O}(\log n)$ in $\min \lbrace (\sqrt {n} + \operatorname{D}(G)) n^{o(1)} , \operatorname{\tilde{O}}(\operatorname{SPD}(G)) \rbrace$ rounds of the Congest model.

9 $k$-MEDIAN

In this section, we turn to the $k$-median problem, an application considered by Blelloch et al. [14], and show how their results are improved by applying our techniques. The contribution is that we work on a weighted graph $G$ that only implicitly provides the distance metric $\operatorname{dist}(\cdot ,\cdot ,G)$; Blelloch et al. require a metric providing constant-time query access. Our solution is more general, as any finite metric defines a complete graph of SPD 1, whereas determining exact distances in graphs (by known techniques) requires $\operatorname{\Omega }(\operatorname{SPD}(G))$ depth. The use of hop sets, however, restricts us to polynomially bounded edge-weight ratios.

Definition 9.1 ($k$-Median). In the $k$-median problem we are given a weighted graph $G = (V, E, \operatorname{\omega })$ and an integer . The task is to determine $F \subseteq V$ with $|F| \le k$ that minimizes

\begin{equation} \sum _{v \in V} \operatorname{dist}(v, F, G), \end{equation} (146)

where $\operatorname{dist}(v, F, G) := \min \lbrace \operatorname{dist}(v, f, G) \mid f \in F \rbrace$ is the distance of $v$ to the closest member of $F$.

Blelloch et al. [14] solve the following problem: Given a metric with constant-time query access, determine an expected $\operatorname{O}(\log k)$-approximation of $k$-median using $\operatorname{O}(\log ^2 n)$ depth and $\operatorname{\tilde{O}}(nk + k^3)$ work for $k \ge \log n$; the special case of $k \lt \log n$ admits an $\operatorname{\tilde{O}}(n)$-work solution of the same depth [15]. Below, we show how to determine an expected $\operatorname{O}(\log k)$-approximation of $k$-median on a weighted graph using $\operatorname{polylog}n$ depth and $\operatorname{\tilde{O}}(m^{1+\varepsilon } + k^3)$ work.

The algorithm of Blelloch et al. [14] essentially comprises three steps:

Use a parallel version of a sampling technique due to Mettu and Plaxton [38]. It samples candidates $Q$, such that $|Q| \in \operatorname{O}(k)$ and there is $F \subseteq Q$ that $\operatorname{O}(1)$-approximates $k$-median.
Sample an FRT tree regarding the submetric spanned by $Q$. Normalize the tree to a binary tree (required by the next step); this is possible without incurring too much overhead with respect to the depth of the tree [14].
Run an $\operatorname{O}(k^3)$-work dynamic programming algorithm to solve the tree instance optimally without using any Steiner nodes. This yields an $\operatorname{O}(\log k)$-approximate solution on the original metric due to the expected stretch from the FRT embedding.

We keep the overall structure but modify steps 1–2, resulting in the following algorithm:

The sampling step generates $\operatorname{O}(k)$ candidate points $Q$.
It requires $\operatorname{O}(\log \frac{n}{k})$ iterations and maintains a candidate set $U$ that initially contains all points. In each iteration, $\operatorname{O}(\log n)$ candidates $S$ are sampled, and a constant fraction of vertices in $U$, those closest to $S$, is removed [14].
The key to adapting this procedure to graphs lies in efficiently determining $\operatorname{dist}(u, S, G)$ for all $u \in U$ (this would be trivial with constant-time query access to the metric). We achieve this by sampling after embedding in $H$ from Section 4, which only costs a factor of $(1 + \operatorname{o}(1))$ in approximation, regardless of $k$. By Theorem 4.5, we only require $\operatorname{O}(\log ^2 n)$ iterations of the MBF-like algorithm from Example 3.7 (for $d = \infty$) to determine each node's distance to the closest vertex in $S$ w.h.p. Hence, we require polylogarithmic depth and $\operatorname{\tilde{O}}(m^{1 + \varepsilon })$ work for this step.
Since $|U|$ decreases by a constant factor in each iteration and we have $\operatorname{O}(\log n)$ iterations, we require a total of $\operatorname{\tilde{O}}(m^{1 + \varepsilon })$ work and polylogarithmic depth, including the costs for determining Cohen's hop set [17].
Sample an FRT tree on the submetric spanned by $Q$.
To compute the embedding only on $Q$, set $x^{(0)}_{vv} = 0$ if $v \in Q$ and $x^{(0)}_{vw} = \infty$ everywhere else. Consider only the LE lists of nodes in $Q$ when constructing the tree.
As we are limited to polynomially bounded edge-weight ratios, our FRT trees have logarithmic depth. We normalize to a binary tree using the same technique as Blelloch et al. [14].
The $\operatorname{\tilde{O}}(k^3)$-work polylogarithmic-depth dynamic-programming algorithm of Blelloch et al. can be applied without modification.

W.h.p., we arrive at an expected $\operatorname{O}(\log k)$-approximation of $k$-median:

Theorem 9.2. For any fixed constant $\varepsilon \gt 0$, w.h.p., an expected $\operatorname{O}(\log k)$-approximation to $k$-median on a weighted graph can be computed using $\operatorname{polylog}n$ depth and $\operatorname{\tilde{O}}(m^{1+\varepsilon } + k^3)$ work.

We remark that, analogously to Corollary 7.11, one can first compute a sparse spanner to reduce the work to $\operatorname{\tilde{O}}(m+n^{1+\varepsilon }+k^3)$.

10 BUY-AT-BULK NETWORK DESIGN

In this section, we reduce the work of the approximation algorithm for the buy-at-bulk network design problem given by Blelloch et al. [14] that requires $\operatorname{O}(n^3 \log n)$ work and $\operatorname{O}(\log ^2 n)$ depth w.h.p. while providing the same asymptotic approximation guarantees. Blelloch et al. transform the input graph $G$ into a metric that allows constant-time query access on which they sample an FRT embedding, hence their work is dominated by solving APSP.

Replacing the APSP routine in the algorithm Blelloch et al. with our $\operatorname{O}(1)$-approximate metric from Theorem 6.3 (and keeping the rest of the algorithm in place) directly reduces the work to $\operatorname{\tilde{O}}(n^{2+\varepsilon })$ while incurring $\operatorname{polylog}n$ depth. However, using our result from Section 7 to sample an FRT without the detour over the metric, we can guarantee a stronger work bound of $\operatorname{\tilde{O}}(\min \lbrace m^{1+\varepsilon } + kn, n^2 \rbrace) \subseteq \operatorname{\tilde{O}}(n^2)$, which achieves the same depth. The use of hop sets, however, restricts us to polynomially bounded edge ratios (or our solution loses efficiency).

Definition 10.1 (Buy-at-Bulk Network Design). In the buy-at-bulk network design problem, one is given a weighted graph $G = (V, E, \operatorname{\omega })$, demands for $1 \le i \le k$, and a finite set of cable types , $1 \le i \le \ell$, where the cable of type $i$ incurs costs $c_i \operatorname{\omega }(e)$ when purchased for edge $e$ (multiple cables of the same type can be bought for an edge). The goal is to find an assignment of cable types and multiplicities to edges minimizing the total cost, such that the resulting edge capacities allow us to simultaneously route $d_i$ units of (distinct) flow from $s_i$ to $t_i$ for all $1 \le i \le k$.

Andrews showed that the buy-at-bulk network design problem is hard to approximate better than with factor $\log ^{1/2 - \operatorname{o}(1)} n$ [5]. Blelloch et al. [14] give an expected $\operatorname{O}(\log n)$-approximation w.h.p. using $\operatorname{polylog}n$ depth and $\operatorname{O}(n^3\log n)$ work for the buy-at-bulk network design problem. It is a straightforward parallelization of the algorithm by Awerbuch and Azar [6]. Our tools allow for a more work-efficient parallelization of this algorithm, as the work of the implementation by Blelloch et al. is dominated by solving APSP to determine the distance metric of the graph; we achieve the same approximation guarantee as Blelloch et al. using $\operatorname{polylog}n$ depth and $\operatorname{\tilde{O}}(n^2)$ work. We propose the following modification of the approach of Blelloch et al.

Metrically embed $G$ into a tree $T = (V_T, E_T, \operatorname{\omega }_T)$ with expected stretch $\operatorname{O}(\log n)$. As the objective is linear in the edge weights, an optimal solution in $G$ induces a solution in $T$ whose expected cost is by at most a factor $\operatorname{O}(\log n)$ larger.
$\operatorname{O}(1)$-approximate on $T$: For $e \in E_T$, pick the cable of type $i$ that minimizes $c_i \lceil d_e / u_i \rceil$, where $d_e$ is the accumulated flow on $e$, see [14]).
Map the tree solution back to $G$, increasing the cost by a factor of $\operatorname{O}(1)$.

Combining these steps yields an $\operatorname{O}(\log n)$-approximation. Using Corollary 7.11 (for constant $\varepsilon$ and $k=\lceil \varepsilon ^{-1}\rceil$), the first step has $\operatorname{polylog}n$ depth and $\operatorname{\tilde{O}}(m+n^{1+\varepsilon })$ work; for the second step, Blelloch et al. discuss an algorithm of $\operatorname{polylog}n$ depth and $\operatorname{\tilde{O}}(n + k)$ work.

Concerning the third step, recall that each tree edge $\lbrace v,w\rbrace$ maps back to a path $p$ of at most $\operatorname{SPD}(H)$ hops in $H$ with $\operatorname{\omega }(p) \le 3 \operatorname{\omega }_T(v,w),$ as argued in Section 7.5. Using this observation, we can map the solution on $T$ back to one in $H$ whose cost is at most by factor 3 larger. Assuming suitable data structures are used, this operation has depth $\operatorname{polylog}n$ and requires $\operatorname{\tilde{O}}(\min \lbrace k,n\rbrace)$ work w.h.p., where we exploit that $\operatorname{SPD}(H) \in \operatorname{O}(\log ^2 n)$ w.h.p. by Theorem 4.5 and the fact that $T$ has depth $\operatorname{O}(\log n)$, implying that the number of edges in $T$ with non-zero flow is bounded by $\operatorname{O}(\min \lbrace k,n\rbrace \log n)$.

Finally, we map back from $H$ to $G^{\prime }$ ($G$ augmented with hop set edges) and then to $G$. This can be handled with depth $\operatorname{polylog}n$ and $\operatorname{\tilde{O}}(n)$ work for a single edge in $H$ because edges in $H$ and hop set edges in $G^{\prime }$ correspond to polylogarithmically many edges in $G^{\prime }$ and at most $n$ edges in $G$, respectively. The specifics depend on the hop set, and, again, we assume that suitable data structures are in place (see Section 7.5). Since we deal with $\operatorname{\tilde{O}}(\min \lbrace k,n\rbrace)$ edges in $H$, mapping back the edges yields $\operatorname{\tilde{O}}(\min \lbrace kn,n^2\rbrace)$ work in total. Together with the computation of the hop set, we have $\operatorname{\tilde{O}}(\min \lbrace m+n^{1+\varepsilon }, n^2 \rbrace + \min \lbrace kn,n^2\rbrace) = \operatorname{\tilde{O}}(\min \lbrace m+n(k+n^{\varepsilon }), n^2 \rbrace) \subseteq \operatorname{\tilde{O}}(n^2)$ work.

Theorem 10.2. For any constant $\varepsilon \gt 0$, w.h.p., an expected $\operatorname{O}(\log n)$-approximation to the buy-at-bulk network design problem can be computed using $\operatorname{polylog}n$ depth and $\operatorname{\tilde{O}}(\min \lbrace m+n(k+n^{\varepsilon }), n^2 \rbrace) \subseteq \operatorname{\tilde{O}}(n^2)$ work.

11 CONCLUSION

In this work, we show how to sample from an FRT-style distribution of metric tree embeddings at low depth and near-optimal work, provided that the maximum ratio between edge weights is polynomially bounded. While we consider the polylogarithmic factors too large for our algorithm to be of practical interest, this result motivates the search for solutions that achieve low depth despite having work comparable to the currently best-known sequential bound of $\operatorname{O}(m \log n)$ w.h.p. [13]. Concretely, better hop set constructions could readily be plugged into our machinery to yield improved bounds, and one may seek to reduce the number of logarithmic factors incurred by the remaining construction.

Our second main contribution is an algebraic interpretation of MBF-like algorithms, reducing the task of devising and analyzing such algorithms to the following recipe:

Pick a suitable semiring $\mathcal {S}$ and semimodule $\mathcal {M}$ over $\mathcal {S}$.
Choose a filter $r$ and initial values $x^{(0)} \in \mathcal {M}^V$ so that $r^V A^h x^{(0)}$ is the desired output.
Verify that $r$ induces a congruence relation on $\mathcal {M}$.
Leverage (repeated use of) $r^V$ to ensure that iterations can be implemented efficiently.

As can be seen by the example of our metric tree embedding algorithm, further steps may be required to control the number of iterations $h$; concretely, we provide an embedding into a complete graph of small SPD and an oracle allowing for efficient MBF-like queries. Nevertheless, we believe that our framework unifies and simplifies the interpretation and analysis of MBF-like algorithms, as illustrated by the examples listed in Sections 3 and the discussion of distributed tree embeddings in Section 8. Therefore, we hope that our framework will be of use in the design of further efficient MBF-like algorithms in the future.

Parallel Metric Tree Embedding Based on an Algebraic View on Moore-Bellman-Ford Parallel Metric Tree Embedding Based on an Algebraic View on Moore-Bellman-Ford

1 INTRODUCTION

1.1 Related Work

1.2 Notation and Preliminaries

2 MBF-LIKE ALGORITHMS

2.1 Propagation and Aggregation

2.2 Filtering

2.3 The Class of MBF-like Algorithms

2.4 Preserving State-Equivalence Across Iterations

3 A COLLECTION OF MBF-LIKE ALGORITHMS

3.1 MBF-like Algorithms over the Min-Plus Semiring

3.2 MBF-like Algorithms over the Max-Min Semiring

3.3 MBF-like Algorithms over the All-Paths Semiring

3.4 MBF-like Algorithms over the Boolean Semiring

4 THE SIMULATED GRAPH

5 AN ORACLE FOR MBF-LIKE QUERIES

5.1 Decomposing $H$

5.2 Implementing the Oracle

6 APPROXIMATE METRIC CONSTRUCTION

7 FRT CONSTRUCTION

7.1 Metric Tree Embeddings

7.2 Computing LE Lists is MBF-like

7.3 Computing LE Lists is Efficient

7.4 Metric Tree Embedding in Polylogarithmic Time and Near-Linear Work

7.5 Reconstructing Paths from Virtual Edges

8 DISTRIBUTED FRT CONSTRUCTION

8.1 The Algorithm by Khan et al.

8.2 The Algorithm by Ghaffari and Lenzen

8.3 Achieving Stretch $\operatorname{O}(\log n)$ in Near-Optimal Time

9 $k$-MEDIAN

10 BUY-AT-BULK NETWORK DESIGN

11 CONCLUSION

APPENDIX

A ALGEBRAIC FOUNDATIONS

B DEFERRED PROOFS

REFERENCES

Footnotes