We consider the classical problem of maximizing a monotone submodular function subject to a cardinality constraint, which, due to its numerous applications, has recently been studied in various computational models. We consider a clean multiplayer model that lies between the offline and streaming model, and study it under the aspect of one-way communication complexity. Our model captures the streaming setting (by considering a large number of players), and, in addition, two-player approximation results for it translate into the robust setting. We present tight one-way communication complexity results for our model, which, due to the connections mentioned previously, have multiple implications in the data stream and robust setting.
Even for just two players, a prior information-theoretic hardness result implies that no approximation factor above 1/2 can be achieved in our model, if only queries to feasible sets (i.e., sets respecting the cardinality constraint) are allowed. We show that the possibility of querying infeasible sets can actually be exploited to beat this bound, by presenting a tight 2/3-approximation taking exponential time, and an efficient 0.514-approximation. To the best of our knowledge, this is the first example where querying a submodular function on infeasible sets leads to provably better results. Through the link to the (non-streaming) robust setting mentioned previously, both of these algorithms improve on the current state of the art for robust submodular maximization, showing that approximation factors beyond 1/2 are possible. Moreover, exploiting the link of our model to streaming, we settle the approximability for streaming algorithms by presenting a tight 1/2+ɛ hardness result, based on the construction of a new family of coverage functions. This improves on a prior 0.586 hardness and matches, up to an arbitrarily small margin, the best-known approximation algorithm.
1 Introduction
A set function \(f:2^W \rightarrow \mathbb {R}\) over a finite ground set W is submodular if
\begin{align*} f(v\mid X) \ge f(v\mid Y) \qquad \mbox{for all $X \subseteq Y \subseteq W$ and $v\in W \setminus Y$,} \end{align*}
where, for a subset \(S\subseteq W\) and an element \(v\in W\), the value \(f(v\mid S) = f(S \cup \lbrace v\rbrace) - f(S)\) is the marginal contribution of v with respect to S. The definition of submodular functions captures the natural property of diminishing returns, and submodular functions have a rich history in optimization with numerous applications (e.g., see the book by Schrijver [45]).
Already in 1978, Nemhauser et al. [42] analyzed the following algorithm, which we refer to as Greedy, for selecting the most valuable set \(S\subseteq W\) of cardinality at most k:
(i)
Initially, let \(S = \varnothing\).
(ii)
For \(i=1, \ldots , k\): choose any \(v\in \arg \max _{w\in W} f(w\mid S)\) and set \(S = S\cup \lbrace v\rbrace\).
In other words, the algorithm greedily picks in each iteration an element with the largest marginal contribution with respect to the already selected elements S. Assuming that f is non-negative (\(f(X) \ge 0\) for all \(X\subseteq W\)) and monotone (\(f(X) \le f(Y)\) if \(X\subseteq Y\)), Nemhauser et al. [42] showed that Greedy returns a \((1-1/e)\)-approximate solution. Moreover, the approximation guarantee of \(1-1/e\) is known to be tight [26, 41].
In recent years, submodular function maximization has found several applications in problems related to data science and machine learning, including feature selection, sensor placement, and image collection summarization [3, 5, 20, 21, 29, 35, 46]. These applications are often modeled as a maximization of a non-negative and monotone submodular function. For example, if we wish to summarize an image collection, we would like to select k images that cover different topics, and this objective can be modeled as a (non-negative and monotone) submodular function. Although Greedy gives the best possible guarantee for solving this problem in traditional computing (when the entire instance is accessible to the algorithm at all times), the requirements stipulated by modern applications, often involving huge datasets, make such algorithms inadequate.
This motivates, together with the inherent theoretical interest, the study of submodular function maximization in new models of computation. Indeed, in recent years, there has been substantial interest in submodular function maximization with respect to limited memory (so-called data stream algorithms) [2, 4, 12, 28, 33, 43], robustness [11, 34, 39, 40, 44], parallel computation (in the map-reduce model) [10, 19, 38], and most recently with respect to adaptivity [6, 7, 8, 16, 17, 22, 23, 24, 25]. In each of these models, the central benchmark problem has been the basic cardinality-constrained problem studied in the work of Nemhauser et al. [42], namely that of finding a set \(S \subseteq W\) of cardinality k that maximizes \(f(S)\), where f is a non-negative and monotone submodular function. We refer to this problem as Max-Card-k.
Although tight algorithms are known for Max-Card-k (and even for the more general problem where the cardinality constraint is replaced by a matroid) in the map-reduce model [10, 36] and the adaptive model [6, 22, 25], it has remained an open problem to give tight results for the data stream and robust settings. In this article, we resolve this question for data stream algorithms and make progress on the robust problem. These results are obtained by considering the one-way communication complexity of Max-Card-k, the study of which highlights several interesting aspects of submodular functions. We first discuss our results in the clean communication model, then give more detail about the connection to data stream and robust algorithms.
1.1 One-Way Communication Complexity of Max-Card-k
We first study the one-way communication complexity of Max-Card-k in the presence of two players. An informal description of the model is as follows (see Section 2 for the formal definition). The first player Alice has only access to a subset \(V_A \subseteq W\) of the ground set and the second player Bob has access to \(V_B \subseteq W\) with \(V_A\cap V_B=\varnothing\). In the first phase, Alice can query the value of a submodular objective function f on any subset of her elements, then, at the end of the phase, she can send an arbitrary message to Bob based on the information that she has. Then, in the second phase, Bob gets the message of Alice and the elements of \(V_B\). He can then query f on any subset of the elements, and his objective is to produce a subset of \(V_A \cup V_B\) of size at most k that approximately maximizes f among all such subsets.
A trivial protocol that allows Bob to always output the optimal solution is for Alice to send all the elements in \(V_A\), and for Bob to then output \(\arg \max _{S \subseteq V_A \cup V_B: |S| \le k} f(S)\). Although this protocol has an optimal approximation guarantee of 1, it has a very large communication complexity since it requires Alice to send all the elements of \(V_A\), which may be as many as \(N = |W|\) elements.
A protocol of lower communication complexity is for Alice to calculate her “optimal” solution \(S_A = \arg \max _{S \subseteq V_A: |S| \le k} f(S)\) and send only those (at most k many) elements in \(S_A\) to Bob. Bob then outputs either his “optimal” solution \(S_B = \arg \max _{S \subseteq V_B: |S| \le k} f(S)\) or \(S_A\), whichever set attains the larger value. It is not hard to see that this protocol has an approximation guarantee of \(1/2\)—that is, that \(\max (f(S_A), f(S_B))\ge 1/2 \cdot \max _{S \subseteq V_A \cup V_B: |S| \le k} f(S)\) for any \(V_A, V_B \subseteq W\).
The preceding examples indicate a natural tradeoff between the amount of communication and the approximation guarantee, with the central question being to understand the optimal relationship between these two quantities. If one further restricts Alice to only query the submodular function on sets of cardinality at most k, then the hardness result in the work of Norouzi-Fard et al. [43] for streaming algorithms implies that, in any (potentially randomized) protocol with an approximation guarantee of \((1/2+\varepsilon)\), Alice must send a message of length \(\Omega (\varepsilon N/k)\). In other words, under this restriction on Alice, the two basic protocols described earlier achieve the optimal tradeoff up to lower-order terms (in this, as in previous work, we think of \(k \ll N\)).
Restricting Alice to evaluate f only on sets of cardinality at most k may appear like a mere technical assumption to make the arguments of Norouzi-Fard et al. [43] work, especially since, to the best of our knowledge, there are no known examples where querying the submodular function on infeasible sets leads to provably better results. Perhaps surprisingly, we prove this intuition wrong and give a protocol that crucially exploits the possibility to query the value of infeasible sets.
We present this protocol in Section 4.1, where we also show that we can further reduce the message size of Alice down to \(O(k \log (k)/\varepsilon)\) elements while still obtaining an approximation guarantee of \(2/3-\varepsilon\). By allowing Alice to query f on sets of cardinality larger than k, we can thus improve the approximation guarantee of \(1/2\) to \(2/3\) while still maintaining an (almost) linear-sized message in k. In Section 4.2, we further show that the guarantee of \(2/3\) is tight in the following strong sense. In any protocol that achieves a better guarantee, Alice must send a message of roughly the same size as the trivial protocol mentioned previously that achieves an approximation guarantee of 1.
The fact that we can beat the approximation guarantee of \(1/2\) using little communication in the presence of two players gives hope that a similar result may hold for many players, and more generally in the streaming model, which can be thought of as having one player per element. The definition of the p-player setting, with \(p \ge 2\), is the natural generalization of the two-player setting. Informally (again, see Section 2 for the formal definition), the i-th player receives a private subset \(V_i \subseteq W\) of the ground set, and upon reception of a message from the previous player, she computes and sends a message to the following player. Finally, the last player’s task is to output a subset of \(V_1 \cup V_2 \cup \cdots \cup V_p\) of cardinality at most k that approximately maximizes f among all such subsets.
One can observe that the streaming algorithm of Kazemi et al. [33] yields, for any integer \(p\ge 2\), a p-player protocol that has an approximation guarantee of \((1/2 - \varepsilon)\) and where each player sends a message consisting of at most \(O(k/\varepsilon)\) elements. Moreover, the obtained protocol only queries f on sets of size at most k and is thus tight with respect to such protocols (even in the two-player setting). Similar to the two-player case, the naturally arising question is whether protocols querying f on sets of cardinality larger than k can improve over this approximation guarantee. Our most technical result shows that this is not the case as p (and k) tends to infinity.
The proof of the preceding theorem is given in Section 5. It is based on a new construction of a family of coverage functions that hides the optimal solution while guaranteeing that no solution that does not contain elements from the optimal solution can provide an approximation significantly better than \(1/2\). This result immediately implies a tight hardness result in the streaming model that we explain in the next section.
1.2 Applications to Data Stream and Robustness
One key reason for the success of communication complexity is that results for the models it motivates, which are on their own right interesting models capturing the essence of tradeoffs involving message sizes, are often widely applicable to other models of computation. This is also the case for submodular functions. As we show in the following, our results yield both new hardness and algorithmic results in the context of data streams and robustness.
Data Stream Algorithms. We first discuss the well-known and direct connection to data stream algorithms. In the data stream model, the elements of the (unknown) ground set arrive one element at a time rather than being available all at once, and the algorithm is restricted to only use a small amount of memory. A semi-streaming algorithm is an algorithm for this model whose memory size has only a nearly linear dependence on the parameter k (the output size) and at most a logarithmic dependence on the size of the ground set. The goal is to output, at the end of the stream, a subset of the elements in the stream of cardinality at most k that approximately maximizes f among all such subsets.
The first result in this setting was given by Chakrabarti and Kale [15], who described a semi-streaming algorithm for Max-Card-k with an approximation guarantee of \(1/4\). Badanidiyuru et al. [4] proposed later a different semi-streaming algorithm that provides a better approximation ratio of \((1/2 - \varepsilon)\) and maintains at most \(O(k \log (k)/\varepsilon)\) elements in memory. This memory footprint was recently improved by Kazemi et al. [33], who obtained the same approximation guarantee while only maintaining at most \(O(k/\varepsilon)\) elements in memory.
The two last algorithms share the approximation guarantee of \(1/2-\varepsilon\). This was improved to \(1-1/e - \varepsilon\) by Agrawal et al. [1], but only under the assumption that the elements of the stream arrive in a uniformly random order. In contrast, Huang et al. [30] showed that without this assumption, one cannot obtain an approximation ratio better than \(2 - \sqrt {2} \approx 0.586\) (improving over a previous inapproximability result of \(1-1/e\) due to McGregor and Vu [37]). Hence, prior to the current work, it remained an open question whether the approximation guarantee of \(1/2-\varepsilon\) obtained by the state-of-the-art algorithms is optimal.1 However, a direct consequence of Theorem 1.3 settles this question. Specifically, any algorithm that achieves a better approximation guarantee than \(1/2\) must (up to a polynomial factor in k) essentially store all the elements of the stream.
The proof of the preceding theorem is almost immediate given Theorem 1.3 and the well-known connection between data stream algorithms and one-way communication. Thus, it is deferred to Appendix A.1.
Robust Submodular Function Maximization. The work on algorithms for Max-Card-k has been partially motivated by the desire to extract small summaries of huge datasets. In many settings, the extracted summary is also required to be robust. In other words, the quality of the summary should degrade by as little as possible when some elements of the ground set are removed. Such removals may arise for many reasons, such as failures of nodes in a network or user preferences that the model failed to account for; they could even be adversarial in nature. Recently, this topic has attracted special attention due to its importance in privacy and fairness constraints. The robust summaries enable us to remove sensitive data without incurring much loss in performance, giving us the ability to protect personal information (the right to be forgotten) and avoid biases (e.g., gender, measurement, and design biases).
The first attempts to design algorithms that generate robust summaries assumed that the summary is simply a set of size k, and the algorithm should guarantee that the value of this set is competitive against the best possible such set even when some elements are deleted (from both the ground set and the solution set). Naturally, this objective makes sense only when the number d of deleted elements is significantly smaller than k. Accordingly, Orlin et al. [44] provided the first constant (0.387) factor approximation result to this problem for \(d = o(\sqrt {k})\), and Bogunovic et al. [11] improved the restriction on number of deletions to \(d = o(k)\) while keeping the approximation guarantee unchanged.
More recent works studied a more general variant of the preceding problem where an algorithm consists of two procedures: a summary procedure and a query procedure. The summary procedure first generates a summary \(M\subseteq W\) of the ground set W with few but typically more than k elements, without knowing the elements \(D\subseteq W\) to be deleted; after this, the set D is revealed, and the query procedure returns a solution set \(S_D\subseteq M\setminus D\) with \(|S_D|\le k\). The goal is for the final output set \(S_D\) to be competitive against the best subset of size k in the ground set without D, for any (worst-case) choice of D. More formally, such a robust algorithm is said to have an approximation guarantee of \(\alpha\) if
\begin{align*} \mathbb {E} \left[ f(S_D) \right] \ge \alpha \cdot \max _{Z \subseteq W\setminus D, |Z| \le k} f(Z) \qquad \forall D\subseteq W \text{ with } |D|\le d. \end{align*}
This problem is usually referred to as robust submodular maximization.
The state-of-the-art result for robust submodular maximization is a \((1/2-\varepsilon)\)-approximation algorithm due to Kazemi et al. [34], whose summaries contain \(O(k + d \log k/\varepsilon ^2)\) elements. This result improved over previous results by Mirzasoleiman et al. [39] and Mitrovic et al. [40]. It should be noted that all these results enjoy a semi-streaming summary procedure, and by Theorem 1.4, the approximation ratio of \(1/2 - \varepsilon\) guaranteed by some of them is basically the best possible as long as the summary procedure remains a semi-streaming algorithm.
We present the first algorithms for robust submodular maximization whose approximation guarantee is better than \(1/2\). We do this via the following theorem, which shows that one can convert most natural two-player protocols for Max-Card-k into algorithms for robust submodular maximization. The proof of this theorem is based on a technique of Mirzasoleiman et al. [39], and we defer both this proof and a fully formal statement of the theorem to Appendix A.2.
As all the protocols we use to prove our results in this article obey the natural properties required by Theorem 1.5, one can combine this theorem with Theorem 1.1 to get the following corollary.
Analogous to Theorem 1.1, one can reduce the summaries to \(O(dk \log (k) /\varepsilon)\) many elements while guaranteeing an approximation factor of \(2/3-\varepsilon\). Unfortunately, the protocol used to prove Theorem 1.1 uses exponential time, and thus Corollary 1.6 is mostly of theoretical value. Nevertheless, we show that even when requiring efficient procedures, the factor of \(1/2\) can be beaten while only using linear message size.
The last theorem is proved in Section 6. Combining this theorem with Theorem 1.5 yields the following result for robust submodular maximization.
2 Formal Statement of Model and Results
In this section, we formally present the model that we assume in this article and restate in a formal way the results that we prove for this model. We begin by discussing the model for the two-player setting. It is natural to formulate a simple model for this setting in which Alice forwards some elements to Bob, and then Bob can access only these elements and the elements he receives directly. All the protocols we present fit into this simple model. However, one could imagine more involved protocols in which Alice passes coded information about the elements she received rather than simply forwarding a subset of these elements. To make our impossibility results apply also to protocols of this kind, we formulate in the following a somewhat more involved model in which the message sent from Alice to Bob is an arbitrary string of bits. We note that there is no unique “right” way to cast the problem we consider into a model, and one can think of multiple natural ways to do so, each corresponding to a different intuitive viewpoint. Fortunately, it seems that our results are mostly independent of the particular formulation used (up to minor changes in the exact bounds), and thus we chose a model that we believe is both intuitive and allows for a nice presentation of the results. Nevertheless, for completeness, we present in Appendix B a sketch of an alternative model that we also found attractive.
An instance of our model consists both of global information known upfront to both Alice and Bob, and private information that is available only to either Alice or Bob. The global information includes the upper bound k on the size of the solution (which is a positive integer), a ground set W of elements and a partition of W into two disjoints sets \(W_A\) and \(W_B\). One should think of the sets \(W_A\) and \(W_B\) as all elements that Alice and Bob, respectively, could potentially get. We denote by \(V_A \subseteq W_A\) the set of elements that Alice actually gets, and by \(V_B \subseteq W_B\) the set of elements that Bob actually gets. Both of these sets are private information available only to their respective players. Finally, the instance also includes a non-negative monotone submodular function \(f:2^W \rightarrow {\mathbb {R}_{\ge 0}}\) defined over all the subsets of W. Alice has access to this function through an oracle that can evaluate f on any set \(S \subseteq W_A\) (in other words, given such a set S, the oracle returns \(f(S)\)). Bob, in contrast, has access to f through a more powerful oracle that can evaluate f on any subset of W. Intuitively, the reason for the difference between the powers of the oracles is that Alice only needs to evaluate sets consisting of elements that she might get, whereas Bob must also be able to evaluate f on subsets that include elements sent by Alice (nevertheless, one can observe that the oracle of Bob does not leak information about the elements of \(W_A\) that Alice actually got, i.e., the elements that ended up in \(V_A\)). The objective of Alice and Bob is to find a set \(S \subseteq V_A \mathbin ⊍V_B\) maximizing f among all such sets of size at most k (here, and throughout the article, \(\mathbin ⊍\) represents a disjoint union, i.e., the union of disjoint sets).
A communication protocol\(\mathsf {P}= (\mathcal {A}_A, \mathcal {A}_B)\) for this model consists of two (possibly randomized) algorithms for Alice and Bob. The protocol proceeds in two phases. In the first phase, the algorithm \(\mathcal {A}_A\) of Alice computes a message m for Bob based on the global information and the private information available to Alice. Then, in the second phase, the algorithm \(\mathcal {A}_B\) of Bob computes an output set based on (i) the global information, (ii) the private information available to Bob, and (iii) the message m received from Alice. Formally, the communication complexity of protocol \(\mathsf {P}\) is the maximum length in bits of the message m, where the maximum is taken over all the possible inputs and the randomness of the algorithms. However, since the message m in our protocols consists mostly of elements that Alice sends to Bob, we state the communication complexity of these protocols, for simplicity, in elements instead of bits. The real communication complexity of these protocols in bits is larger than the stated bound in elements, but only by a logarithmic factor.
We can now restate our results for the two-player model in a more formal way. Note that the first of these theorems uses the \(\widetilde{O}\) notation, which suppresses poly-logarithmic terms, and the second of these theorems refers by N to the size of the ground set W.
Let us now explain how the preceding model can be generalized to the p-player setting for \(p \ge 2\). In this setting, the ground set W is partitioned into p disjoint sets \(W_1, W_2, \ldots\,, W_p\) rather than just two, and the global information available to all the players is again (i) the upper bound k on the size of the solutions, (ii) the ground set W, and (iii) the partition of this ground set.
Every player also has private information. In particular, the private information available to player \(i \in [p]\) (recall that \([p]\) is a shorthand for the set \(\lbrace 1, 2, \ldots , p\rbrace\)) is a subset \(V_i \subseteq W_i\) and an oracle that can evaluate the objective function f on every subset of \(\bigcup _{j = 1}^i W_i\). The objective of the players is to find a set \(S \subseteq \bigcup _{i = 1}^p V_i\) maximizing f among all such sets of size at most k.
A communication protocol \(\mathsf {P}= (A_1, A_2, \ldots\,, A_p)\) for this p-player model consists of p (possibly randomized) algorithms for the p players. The protocol proceeds in p phases. In the first phase, the algorithm \(A_1\) of the first player computes a message \(m_1\) based on the global information and the private information available to this player. The next \(p - 2\) phases are devoted to players 2 up to \(p - 1\). In particular, in phase \(i \in \lbrace 2, 3, \ldots\,, p - 1\rbrace\), the algorithm \(A_i\) of player i computes a message \(m_i\) based on the global information, the private information available to this player, and the message \(m_{i - 1}\) produced by the previous player. Finally, in the last phase, the algorithm \(A_p\) of the last player computes an output set based on the global information, the private information available to this player, and the message \(m_{p - 1}\) produced by the penultimate player. The communication complexity of the protocol \(\mathsf {P}\) is the maximum length in bits of any one of the messages \(m_1, m_2, \dots , m_{p - 1}\), where, like in the two-player model, the maximum is taken over all the possible inputs and the randomness of the algorithms.
We can now restate our result for the p-player model in a more formal way. Recall that \(N = |W|\), and for \(p\in \mathbb {Z}_{\ge 0}\), let \(H_p = 1 + \frac{1}{2} + \frac{1}{3} + \cdots + \frac{1}{p}\) be the p-th harmonic number.
3 PRELIMINARIES: THE INDEX AND CHAINp PROBLEM
The impossibility results that we prove in this article are based on reductions from problems that are known to require high communication complexity. The first of these problems is the well-known INDEX problem. In this two-player problem, Alice gets a string \(x \in \lbrace 0, 1\rbrace ^n\) of n bits and can then send a message to Bob. Bob gets the message of Alice and an index \(t\in [n]\), and based on these two pieces of information alone should output the value of \(x_t\). Clearly, Bob can produce the correct answer with probability \(1/2\) by outputting a random bit. However, it is known that Bob cannot guarantee any larger constant probability of success, unless the message he gets from Alice is of linear (in n) size (e.g., see [9, 31]).
The second problem we reduce from is CHAINp(n), a multiplayer generalization of INDEX recently introduced by Cormode et al. [18], which is closely related to the Pointer Jumping problem (see [14]). In CHAINp(n), the index p indicates the number of players and n is a parameter that regulates the size of the bit string given to each player.2 The definition is as follows. There are p players \(P_1, P_2, \ldots\,, P_p\). For every \(i \in [p - 1]\), player \(P_i\) has as input a bit string \(x^{i} \in \lbrace 0,1\rbrace ^n\) of length n, and for every \(i \in \lbrace 2, 3, \ldots\,, p\rbrace\), player \(P_i\) (also) has as input an index \(t^i\in \lbrace 1, 2, \ldots\,, n\rbrace\) (note that the convention in this terminology is that the superscript of a string/index indicates the player receiving it). Furthermore, it is promised that either \(x^{i}_{t^{i + 1}} = 0\) for all \(i \in [p-1]\) or \(x^{i}_{t^{i + 1}} = 1\) for all these i values. We refer to these cases as the 0-case and 1-case, respectively. The objective of the players in CHAINp(n) is to decide whether the input instance belongs to the 0-case or the 1-case.
In CHAINp(n), we are interested in the communication complexity of a one-way protocol that guarantees a success probability of at least \(2/3\). Such a protocol \(\mathsf {P}= (\mathcal {A}_1, \mathcal {A}_2, \ldots\,, \mathcal {A}_p)\) consists of p (possibly randomized) algorithms corresponding to the p players. The protocol proceeds in p phases. In phase \(i \in [p - 1]\), the algorithm \(\mathcal {A}_i\) of player i computes a message \(m_i\) based on the input of this player and the message \(m_{i - 1}\) computed by \(\mathcal {A}_{i - 1}\) in the previous phase (unless \(i = 1\), in which case the computation done by \(\mathcal {A}_1\) depends only on the input of player 1). In the last phase, algorithm \(\mathcal {A}_p\) of player p decides between the 0-case and the 1-case based on the input of player p and the message \(m_{p - 1}\). The communication complexity of the protocol is defined as the maximum size (in bits) of any one of the messages \(m_1, m_2, \ldots\,, m_{p - 1}\), where the maximum is taken over all the possible inputs and the randomness of the protocol’s algorithms. Furthermore, the success probability of the protocol is the probability that the case indicated by \(A_p\) matches the real case of the input instance.
Note that CHAINp(n) is indeed a generalization of the INDEX problem since the last problem is equivalent to CHAIN2(n). In the work of Cormode et al. [18], the following communication complexity lower bound was shown for CHAINp(n).
Moreover, the following stronger result, for a restricted range of p, was announced in the work of Cormode et al. [18] without proof.
We highlight that the preceding lower bounds are both for the total number of bits communicated and not the maximum message size. Because there are p messages, this immediately translates to lower bounds on the maximum message size of \(\Omega (n/p^3)\) and \(\Omega (n/p^2)\), respectively. For completeness, we show in Appendix C how proofs of standard results for the INDEX problem can get the following impossibility result for CHAINp(n), which provides a lower bound of \(\Omega (n/p^2)\) on the maximum message size without restrictions on the range of p.
4 Two-player Submodular Maximization
In this section, we consider Max-Card-k in the two-player model while ignoring the computational cost—that is, we are only interested here in the relationship between the communication complexity and the approximation guarantee that can be obtained for this problem. In the following, we restate the formal theorems that we prove in the section. The proofs of these theorems can be found in Sections 4.1 and 4.2, respectively. In a nutshell, the two theorems show together that an approximation guarantee of \({2}/{3}\) is tight for the problem under a natural assumption on the communication complexity.
Theorem 1.1.For every\(\varepsilon \gt 0\), there exists a two-player protocol for Max-Card-k with an approximation guarantee of\((2/3 - \varepsilon)\)whose communication complexity is\(\widetilde{O}(k/\varepsilon)\)elements. Moreover, there exists such a protocol achieving an approximation guarantee of\(2/3\)whose communication complexity is\(O(k^2)\)elements.
Theorem 1.2.For every\(\varepsilon \in (0, 1/4)\), any two-player (randomized) protocol with an approximation guarantee of\((2/3 + \varepsilon)\)must have a communication complexity of\(\Omega (\frac{N\varepsilon }{k})\)bits in the regime\(k \ge \varepsilon ^{-1}\).
4.1 Algorithms for Two Players
In this section, we prove Theorem 1.1. For that purpose, let us present Protocol 1, which is a protocol for Max-Card-k in the two-player model that uses exponential computation. In this protocol, Alice finds for every \(i \in \lbrace 0, 1, \ldots\,, 2k\rbrace\) the maximum value subset \(S_i\) of \(V_A\) of size at most i and forwards all the sets she has found to Bob. Then, Bob finds the best solution over the elements that Alice has sent and \(V_B\).
It is easy to see that Protocol 1 always outputs a feasible set; moreover, the number of elements Alice sends to Bob is \(O(k^2)\) because she sends \(2k + 1\) sets of size at most \(2k\) each. Thus, to prove that Protocol 1 obeys all the properties guaranteed by the second part of Theorem 1.1, it remains to show that it produces a \(2/3\)-approximation, which is our main objective in the rest of this section.
Let us denote by \(\mathcal {O}\) a subset of \(V_A \cup V_B\) of size at most k maximizing f among all such subsets, and let \({\rm\small OPT} = f(\mathcal {O})\). Additionally, let \(M = V_B \cup \bigcup _{i = 1}^{2k} S_{i}\) be the set of elements that Bob either receives from Alice or receives directly. Note that M is also the set of elements in which Bob looks for \(\widehat{S}\). Using this notation, we can now describe the intuitive idea behind our first observation.
Our analysis of Protocol 1 is based on two sets \(S_{k - |\mathcal {O} \cap M|}\) and \(S_{2(k - |\mathcal {O} \cap M|)}\). Observe that one candidate for \(S_{k - |\mathcal {O} \cap M|}\) is the part of \(\mathcal {O}\) that Alice got and did not forward to Bob. Thus, we know that \(S_{k - |\mathcal {O} \cap M|}\) is as valuable as \(\mathcal {O} \setminus M\). The following observation formalizes this fact.
Despite the fact that \(S_{k - |\mathcal {O} \cap M|}\) is as valuable as \(\mathcal {O} \setminus M\), it is not clear to what extent the values of the two sets “overlap” (more formally, how far is the value of their union from the sum of their individual values). If the overlap is large, then this means that \(S_{k - |\mathcal {O} \cap M|}\) is a good replacement for \(\mathcal {O} \setminus M\), and thus Bob can construct a good solution by combining \(S_{k - |\mathcal {O} \cap M|}\) with \(\mathcal {O} \cap M\). In contrast, if the overlap between \(S_{k - |\mathcal {O} \cap M|}\) and \(\mathcal {O} \setminus M\) is small, then they can be combined into a single large value set, which guarantees a large value for \(S_{2(k - |\mathcal {O} \cap M|)}\). Thus, there is a tradeoff between the values of the sets \(\hat{S}\) and \(S_{2(k - |\mathcal {O} \cap M|)}\). Lemma 4.2 formally captures this tradeoff.
If \(f(\widehat{S})\) is large, then we are done. Otherwise, the previous lemma guarantees that \(S_{2(k - |\mathcal {O} \cap M|)}\) is a very valuable set. Although this set might be infeasible (unless \(|\mathcal {O} \cap M| \ge k / 2\)), its value can be exploited by adding half of this set to \(\mathcal {O} \cap M\). The following lemma gives the lower bound on \(f(\widehat{S})\) that can be obtained in this way.
We are now ready to prove the approximation guarantee of Protocol 1 (and thus complete the proof of the second part of Theorem 1.1).
To prove also the first part of Theorem 1.1, we need to reduce the number of elements forwarded from Alice to Bob by Protocol 1. This can be done by applying geometric grouping to the sizes of the sets in \(\lbrace S_i\rbrace _{i=1}^{2k}\). More precisely, Alice only forwards the sets \(S_i\) for either \(i = 0\), \(i= \lfloor (1+\varepsilon)^j \rfloor\), or \(i= 2\lfloor (1+\varepsilon)^j \rfloor\) for some integer \(0 \le j \le \log _{1+\varepsilon } k\), where \(\varepsilon\) is the parameter from the theorem. This reduces the number of elements forwarded to \(\widetilde{O}(k/\varepsilon)\), and it is not difficult to argue that the preceding analysis of the approximation ratio still works after this reduction, but its guarantee becomes worse by a factor of \(1-O(\varepsilon)\). A formal proof of this can be found in Appendix D.
4.2 Hardness of Approximation for Two Players
In this section, we prove the impossibility result stated in Theorem 1.2. We do that by using a reduction from a problem known as the INDEX problem, which is presented in Section 2. The same section also states an impossibility result for a generalization of this problem (Theorem 3.3), which in the context of the INDEX problem implies that any protocol guaranteeing a success probability of at least \(2/3\) for this problem must have a communication complexity of at least \(n / 144\).
Our plan in this section is to assume the existence of a protocol named \({PRT}\) for Max-Card-k in the two-players model with an approximation guarantee of \(2/3 + \varepsilon\), and show that this leads to a protocol \({PRT_{\text{INDEX}}}\) for the INDEX problem whose communication complexity depends on the communication complexity of \({PRT}\). This allows us to translate the communication complexity lower bound for protocols for INDEX to a communication complexity lower bound for \({PRT}\).
Before getting to the protocol \({PRT_{\text{INDEX}}}\) mentioned earlier, let us first present a simpler protocol for the the INDEX problem, which is given as Protocol 2 and is used as a building block for \({PRT_{\text{INDEX}}}\). Protocol 2 refers to n possible objective functions that we denote by \(f_1, f_2, \ldots\,, f_n\). (Recall that n is the length of the string that Alice receives in the INDEX problem.) To define these functions, we first need to define a set of n other functions. Let \(W^{\prime } = \lbrace w\rbrace \cup \lbrace v_i \mid i\in [n]\rbrace\). For every \(i\in [n]\), we define \(g_i:2^{W^{\prime }} \rightarrow {\mathbb {R}_{\ge 0}}\) as follows, where S is an arbitrary subset of \(W^{\prime }\):
The multilinear extension of \(g_i\) is the function \(G_i:[0, 1]^{W^{\prime }} \rightarrow {\mathbb {R}_{\ge 0}}\) defined by \(G_i(y) = \mathbb {E} \left[ g_i({\mathcal {R}}(y)) \right]\), where \({\mathcal {R}}(y)\) is a random subset of \(W^{\prime }\) including every element \(v \in W^{\prime }\) with probability \(y_v\), independently.3 In the context of \(G_i\), given an element \(v \in W^{\prime }\), we occasionally use the notation \(\mathbb {1}_v\) to denote the characteristic vector of the singleton set \(\lbrace v\rbrace\)—that is, the vector in \([0, 1]^{W^{\prime }}\) containing 1 in the v-coordinate and 0 in all other coordinates.
Let us now define the ground set \(W = W_A \mathbin ⊍W_B\), where \(W_A = \lbrace u_i^j \mid i\in [n] \text{ and } j\in [k-1]\rbrace\) and \(W_B = \lbrace w\rbrace\). Then, for every \(i\in [n]\), the function \(f_i:2^W \rightarrow {\mathbb {R}_{\ge 0}}\) is defined as
We begin the analysis of Protocol 2 with the following lemma, which shows that the objective function this protocol passes to \({PRT}\) has all the necessary properties. The proof of this lemma is simple and technical, and thus we defer it to Appendix E.1. In a nutshell, it shows by a straightforward case analysis that \(g_i\) is non-negative, monotone, and submodular, then argues that the fact that \(g_i\) has these properties implies that \(f_i\) has them too.
Our next step is analyzing the output distribution of Protocol 2.
At this point, we are ready to present the promised algorithm \({PRT_{\text{INDEX}}}\), which simply executes \(\lceil 2\varepsilon ^{-1} \rceil\) parallel copies of Protocol 2, then outputs \(x_t = 1\) if and only if at least one of the executions returned this answer.
Using the last corollary, we can now complete the proof of Theorem 1.2.
5 Hardness for Many Players
In this section, we prove that in the case of many players, any protocol with reasonable communication complexity has an approximation guarantee upper bounded by an expression that tends to \(1/2\) as the number of players tends to infinity. Specifically, we show the following (where, for \(p\in \mathbb {Z}_{\ge 0}\), \(H_p = 1 + \frac{1}{2} + \frac{1}{3} + \cdots + \frac{1}{p}\) is the p-th harmonic number).
Theorem 1.3.For every\(\varepsilon \gt 0\), anyp-player (randomized) protocol for Max-Card-k with an approximation guarantee of
must have a communication complexity of\(\Omega (\tfrac{N\varepsilon }{p^3})\). Furthermore, this is true even in the special case in which the objective functionfis a coverage function and\(k=p\).
We highlight that a (weighted) coverage function \(f:2^V\rightarrow \mathbb {R}_{\ge 0}\) is defined as follows. There is a finite universe U with non-negative weights \(a:U\rightarrow \mathbb {R}_{\ge 0}\), and \(V\subseteq 2^U\) is a family of subsets of U. Then, for any \(S\subseteq V\), we have \(f(S) = \sum _{u\in \cup _{v\in S} v} a(u)\).4 We also remark that our hardness construction applies to the related maximum set coverage problem. In that problem, the stream consists of N sets \(S_1, S_2, \ldots , S_N\) of some universe U and each \(S_i\) is encoded as the list of elements in that set. In other words, the submodular function f is given explicitly by the sets of the underlying universe. Prior work showed that even in this setting, any streaming algorithm with a better approximation guarantee than \((1-1/e)\) requires memory \(\Omega (N)\) [37]. Our techniques also apply to this setting,5 and hence we improve the hardness factor for the maximum set coverage problem to the tight factor \(1/2\).
The heart of the proof of the preceding theorem is the construction of a family \(\mathcal {F}\) of submodular coverage functions on a common ground set W, partitioned into sets \(W_1, \ldots , W_p\), one for each player. All the sets \(W_i\) have the same cardinality, which we denote by n, and thus \(N = |W| = n\cdot p\). The family \(\mathcal {F}\) contains a weighted coverage function \(f_{o_1, \ldots , o_p}\) for every \(o_1\in W_1, o_2 \in W_2, \ldots , o_p\in W_p\). The intuition is that \(\lbrace o_1, \ldots , o_p\rbrace\) will be the “hidden” optimal solution for \(f_{o_1, \ldots , o_p}\) when we set \(k=p\).
For the hardness result, there are two crucial properties that the construction should satisfy:
•
Indistinguishability: The i-th player should not be able to obtain any information about \(o_i\) by querying the submodular function on subsets of \(W_1 \cup W_2 \cup \cdots \cup W_i\).
•
Value gap: The value of the solution \(\lbrace o_1, \ldots , o_p\rbrace\) is roughly twice the value of any solution of cardinality \(k=p\) that does not contain any of these elements.
The first property intuitively ensures that the players must use much communication to identify the special elements \(\lbrace o_1, \ldots , o_p\rbrace\), and the second property implies that if they fail to do so, then the last player can only output a \(1/2\)-approximate solution. The following lemma formalizes these two properties that our family \(\mathcal {F}\) satisfies.
Equipped with the preceding lemma, we prove Theorem 1.3 in Section 5.5 by a rather direct reduction from the CHAINp(n) problem. We note that the reduction is similar to the one presented in Section 4.2 for the two-player case.
The core part of this section is the construction of \(\mathcal {F}\) and the proof of Lemma 5.1. The outline is as follows. We first give an intuitive description of the main ideas in Section 5.1. The family \(\mathcal {F}\) is then formally defined in Section 5.2. Finally, the value gap and indistinguishability properties of Lemma 5.1 are proved in Section 5.3 and Section 5.4, respectively.
5.1 Intuitive Description of Our Construction
In this section, we highlight our main ideas for constructing the family \(\mathcal {F}\) satisfying the properties of Lemma 5.1. We do so by presenting three families of coverage functions \(\mathcal {H}\), \(\mathcal {G}\), and finally \(\mathcal {F}\). Family \(\mathcal {H}\) is a natural adaptation of coverage functions that have previously appeared in hardness constructions (e.g., see [37]). We then highlight our main ideas for overcoming issues with those functions by first refining \(\mathcal {H}\) to \(\mathcal {G}\) and then by refining \(\mathcal {G}\) to obtain our final construction \(\mathcal {F}\).
To convey the intuition, we work with unweighted coverage functions. However, to provide a clean and concise technical presentation later on, we use weighted coverage functions to formally realize the construction plan described here.
The First Attempt: Family \(\mathcal {H}\). The construction of the family \(\mathcal {H}= \lbrace h_{o_1, \ldots , o_p} \mid o_1 \in W_1, \ldots , o_p\in W_p\rbrace\) is inspired by the coverage functions constructed in the NP-hardness result of Feige [26]. In those coverage functions, every element corresponds to a subset of the underlying universe of size \(|U|/p\). Furthermore, the optimal solution \(\lbrace o_1, \ldots , o_p\rbrace\) forms a disjoint cover of U, whereas any other element behaves like a random subset of the universe of size \(|U|/p\).
Inspired by this, we let \(h_{o_1, \ldots , {o_{p}}} \in \mathcal {H}\) be the coverage function where
•
the subsets of U corresponding to \(o_1,\ldots , o_p\) form a partition of equal-sized sets (i.e., of size \(|U|/p\) each);
•
every other element corresponds to a randomly selected subset of U of size \(|U|/p\).
Although the preceding definition is randomized, we assume for the sake of simplicity in this overview that the value of a subset equals its expected value. This can intuitively be achieved by selecting the underlying universe U to be large enough so as to ensure concentration. For a subset \(S \subseteq W \setminus \lbrace o_1, \ldots , o_p\rbrace\), we thus have that \(h_{o_1, \ldots , o_p}(S)\) equals the expected number of elements of U covered by \(|S|\) random subsets of cardinality \(|U|/p\). Hence,
which is at least \((1-1/e) |U|\) if \(|S| =p\). This already highlights the first issue of the construction: the value gap between the optimal solution \(\lbrace o_1, \ldots , o_p\rbrace\), whose value is \(|U|\), and a solution disjoint from this optimal solution is only \(1-1/e\), and we need it to approach \(1/2\) as p tends to infinity.
The second and perhaps more significant issue is the indistinguishability. First, we can observe that the value of any subset \(S\subseteq W_1\) only depends on \(|S|\), and thus the selection of \(o_1\in W_1\) is indistinguishable when querying the submodular function restricted to \(W_1\). However, the same does not hold for \(o_2\) when querying the submodular function restricted to the set \(W_1 \cup W_2\). To see this, note that \(o_1\) and \(o_2\) are the only elements of \(W_1 \cup W_2\) whose corresponding subsets of U are disjoint. In other words, \(\lbrace o_1, o_2\rbrace\) is the unique maximizer to \(\max _{S \subseteq W_1 \cup W_2: |S| = 2} h_{o_1, \ldots , o_p}(S)\), and \(o_2\) can thus be identified by querying the submodular function on \(W_1 \cup W_2\). A natural idea for addressing this issue is to make all elements in \(W_2\), and not only \(o_2\), correspond to subsets of U that are disjoint of the subset corresponding to \(o_1\). Making this modification for all \(W_2, \ldots , W_p\) results in the refined family \(\mathcal {G}\) that we now describe. We note that a similar approach was used in the work of Kapralov [32] to guarantee indistinguishability.
The First Refinement: Family \(\mathcal {G}\). Motivated by the idea to make every element in \(W_i\) correspond to a subset of U disjoint from the subsets of \(o_1, \ldots , o_{i-1}\), we define the family \(\mathcal {G}= \lbrace g_{o_1, \ldots , o_p} \mid o_1 \in W_1, \ldots , o_p\in W_p\rbrace\) of coverage functions. Specifically, we let \(g_{o_1, \ldots , o_p} \in \mathcal {G}\) be the coverage function where
•
the subsets of U corresponding to \(o_1,\ldots , o_p\) form a partition of equal-sized sets (i.e., of size \(|U|/p\) each);
•
for \(i=1, \ldots , p\), every element in \(W_i \setminus \lbrace o_i\rbrace\) corresponds to a randomly selected subset of U of size \(|U|/p\) that is disjoint from the subsets corresponding to \(o_1, \ldots , o_{i-1}\).
The preceding description of \(\mathcal {G}\) is given in a way that highlights the changes compared to \(\mathcal {H}\). Another equivalent definition of \(g_{o_1, \ldots , o_p}\) is that it is the coverage function where
•
the elements of \(W_1\) form random subsets of U of size \(|U|/p\);
•
for \(i=2, \ldots , p\), every element in \(W_i\) corresponds to a randomly selected subset of U of size \(|U|/p\) that is disjoint from the subsets corresponding to \(o_1, \ldots , o_{i-1}\).
From this viewpoint, it is clear that we now have the indistinguishability property of Lemma 5.1. Indeed, for \(i\in [p]\), the only subsets of U that depend on \(o_i\) in the preceding construction are those corresponding to elements in \(W_{i+1}, \ldots , W_p\). It follows that the value of a subset \(S \subseteq W_1 \cup \cdots \cup W_i\), which is a function of the subsets of U corresponding to the elements in S, is independent of the selection of \(o_i\).
Having verified indistinguishability, let us consider the value gap. First, note that we still have that the optimal solution \(\lbrace o_1, \ldots , o_p\rbrace\) covers the whole universe and thus has value \(|U|\). Now consider a set \(S \subseteq W \setminus \lbrace o_1, \ldots , o_p\rbrace\). It will be instructive to first consider the case when \(S = \lbrace v_1, \ldots , v_p\rbrace\) with \(v_i \in W_i\) for all \(i \in [p]\)—that is, S contains exactly one element from each of the sets \(W_i\). Abbreviating \(g_{o_1, \ldots , o_p}\) by g, we have in this case that
Hence, for sets \(S \subseteq W \setminus \lbrace o_1, \ldots , o_p\rbrace\) that contain one element from each \(W_i\), we have a value gap that approaches the desired constant \(1/2\) as p tends to infinity. The issue is that there are other subsets of \(W \setminus \lbrace o_1, \ldots , o_p\rbrace\) of significantly higher value. To see this, note that any element \(v\in W_1 \setminus \lbrace v_1\rbrace\) has a marginal value with respect to \(\lbrace v_1, \ldots , v_{p-1}\rbrace\) that is much higher than the marginal value of \(v_p\) with respect to the same set. In particular, the value of a subset \(W_1 \setminus \lbrace o_1, \ldots , o_p\rbrace\) of cardinality p is equivalent for functions in \(\mathcal {G}\) and \(\mathcal {H}\), and is thus at least \((1-1/e)|U|\). To overcome this issue (i.e., the fact that elements of \(W_1\) are more “valuable” than other elements), we modify the preceding construction to let the elements from different \(W_i\)’s correspond to subsets of different sizes.
The Second and Last Refinement: Family \(\mathcal {F}\). The family \(\mathcal {F}= \lbrace f_{o_1, \ldots , o_p} \mid o_1 \in W_1, \ldots , o_p\in W_p\rbrace\) is obtained from \(\mathcal {G}\) by selecting subsets of U of non-uniform sizes. Specifically, we carefully select numbers \(1=a_1 \lt a_2 \lt \cdots \lt a_p\) and make the elements of \(W_i\) correspond to subsets of U of size \(a_i\), then we let the total size of U be \(a_1 + a_2 + \cdots + a_k\).6 We now let \(f_{o_1, \ldots , o_p} \in \mathcal {F}\) be the coverage function where
•
the elements of \(W_1\) form random subsets of U of size \(a_1=1\);
•
for \(i=2, \ldots , p\), every element in \(W_i\) corresponds to a randomly selected subset of U of size \(a_i\) that is disjoint from the subsets corresponding to \(o_1, \ldots , o_{i-1}\).
The family \(\mathcal {F}\) satisfies the indistinguishability property of Lemma 5.1 for the exact same reasons \(\mathcal {G}\) satisfies it. We now explain how the values \(a_1, \ldots , a_p\) are selected so as to obtain the value gap. Consider a set \(S \subseteq W \setminus \lbrace o_1, \ldots , o_p\rbrace\) obeying \(S = \lbrace v_1, \ldots , v_p\rbrace\) for some choice of \(v_i \in W_i\) for every \(i \in [p]\). Abbreviating \(f_{o_1, \ldots , o_p}\) by f, we thus have
The numbers \(a_1, \ldots , a_p\) are selected so that each term of this sum equals 1, and hence \(f_{o_1, \ldots , o_p}(S) = p\). Notice that this is in stark contrast to the functions in \(\mathcal {G}\) where the contributions to (1) were highly unequal. The intuitive reason we set the numbers so that these marginal contributions are the same is that we want to prove that one cannot form a subset of \(W \setminus \lbrace o_1, \ldots , o_p\rbrace\) of cardinality at most p of significantly higher value by increasing the number of elements selected from one of the partitions \(W_i\). Formally, this is proved in Section 5.3 by considering the linear extension of a concave function at the point corresponding to such a set S that contains a single element from each \(W_i\). This allows us to upper bound the value of any subset \(W \setminus \lbrace o_1, \ldots , o_p\rbrace\) of cardinality at most p by \(p + (H_p)^2\). The value gap then follows from basic calculations (see Lemma 5.2) showing that \(f_{o_1, \ldots , o_p} (\lbrace o_1, \ldots , o_p\rbrace) = |U| = \sum _{i=1}^p a_i\) is at least \(2p - H_p\) and at most \(2p\).
5.2 Construction of Family of Weighted Coverage Functions
We formally describe the construction of the family \(\mathcal {F}\) of weighted coverage functions on the common ground set W. Recall that the ground set is partitioned into sets \(W_1, \ldots , W_p\). Furthermore, each of these sets has cardinality n, and thus \(N = |W| = n\cdot p\).
In the intuitive description (Section 5.1), we defined the functions in \(\mathcal {F}\) to be coverage functions, where the elements correspond to random subsets of the underlying universe U. Here we will be more precise and avoid this randomness. To this end, we consider a slight generalization of weighted coverage functions that we call weighted fractional coverage functions. This is just done for convenience. In Appendix E.2.1, we show that any such function is indeed a weighted coverage function.
Recall that in a weighted coverage function, every element is a subset of an underlying universe U with non-negative weights \(a:U \rightarrow \mathbb {R}_{\ge 0}\). For fractional weighted coverage functions, apart from the non-negative weights \(a:U \rightarrow \mathbb {R}_{\ge 0}\), we also associate a function \(p_v:U \rightarrow [0,1]\) with each element \(v\in W\) with the intuition that \(p_v(u)\) specifies the “probability” that \(v\in W\) covers \(u\in U\). The value \(f(S)\) of a subset \(S \subseteq W\) of the elements is then defined by
A function \(f:2^W\rightarrow \mathbb {R}_{\ge 0}\) as defined earlier is what we call a weighted fractional coverage function. Note that a weighted coverage function is simply the special case of \(\lbrace p_v: v\in W\rbrace\) taking binary values.
We are now ready to define our family \(\mathcal {F}\) of weighted fractional coverage functions, which as aforementioned is equivalent to weighted coverage functions, which in turn can be approximated by unweighted coverage functions to any desired accuracy.
The underlying universe U of the coverage functions in \(\mathcal {F}\) consists of p points \(U=\lbrace u_1,\ldots , u_p\rbrace\), where the weight \(a_{u_j} \in \mathbb {R}_{\ge 0}\), for \(j\in [p]\), will be fixed later in Section 5.2.1. For notational convenience, we use the shorthand \(a_j\) for \(a_{u_j}\) and let \(A_{\ge j} = \sum _{i=j}^p a_i\). The family \(\mathcal {F}\) now contains a weighted fractional coverage function \(f_{o_1, \ldots , o_p}\) for every \(o_1 \in W_1, \ldots , o_p \in W_p\) that is defined as follows:
•
Element \(o_j\) covers \(\lbrace u_j\rbrace\) (i.e., \(p_{o_j}(u_j) = 1\) and \(p_{o_j}(u) = 0\) for \(u\in U\setminus \lbrace u_j\rbrace\)).
•
For every other element \(v \in W_j \setminus \lbrace o_j\rbrace\),
Note that by interpreting the \(p_v\) functions as probabilities, the preceding definition equals the intuitive description in Section 5.1: the “hidden” optimal elements \(\lbrace o_1, \ldots , o_p\rbrace\) form a disjoint cover of the universe, and every other element in \(W_j\) corresponds to a random subset of the (now weighted) universe disjoint from the subsets corresponding to \(o_1, \ldots , o_{j-1}\). Finally, by definition, for every \(S \subseteq W,\)
where \(\mathbb {1}\lbrace E\rbrace\) indicates whether the event E holds.
5.2.1 Selection of the Weights a1,⋖, ap.
To complete the definition of our family \(\mathcal {F}\), it remains to define the weights \(a_1, \ldots , a_p \in \mathbb {R}_{\ge 0}\) of the universe U. Recall from Section 5.1 that we need to set these weights so that if we let \(f_{o_1, \ldots , o_p} \in \mathcal {F}\) and \(v_j \in W_j \setminus \lbrace o_j\rbrace\) for every \(j\in [p]\), then
where, here and later, we interpret the empty product as 1.
The weights \(a_1,\ldots , a_p\) satisfying this condition can be obtained as follows. First, let \(\delta _p = 1,\) and for \(i = p-1, p-2, \ldots , 1\), let \(\delta _i\) be the largest solution7 of
In the next lemma, we formally verify that these weights indeed satisfy condition (4). By basic calculations, we also show the identity (5) and the inequalities (6). We remark that these are the only properties that we use about these weights in subsequent sections.
5.3 Value of Solutions Without Any Optimal Elements
Consider a function \(f_{o_1, \ldots , o_p} \in \mathcal {F}\). From its definition (3), it is clear that \(\lbrace o_1, \ldots , o_p\rbrace\) is an optimal solution of value \(f(\lbrace o_1, \ldots , o_p\rbrace) = f(W) = \sum _{i=1}^p a_i = A_{\ge 1} = A_{\ge 1}/a_1\), which by (6) is at least \(2p - H_p\) and at most \(2p\). The following lemma therefore implies the value gap property of Lemma 5.1.
The rest of this section is devoted to the proof of the preceding lemma. Throughout, we let \(W^{\prime } = W \setminus \lbrace o_1, \ldots , o_p\rbrace\) and denote by f the submodular function obtained by restricting \(f_{o_1, \ldots , o_p}\) to the ground set \(W^{\prime }\). By definition (see (3)), we then have for every \(S \subseteq W^{\prime }\)
where \(s_i = |S \cap W_i|\). The value of a set \(S \subseteq W^{\prime }\) is thus determined by \(s_1 = |S \cap W_1|, \ldots , s_p = |S \cap W_p|\). In the following, we slightly abuse notation and sometimes write \(f(s_1, \ldots , s_p)\) for \(f(S)\) to highlight that the value only depends on the number of elements from each partition and not on the actual elements.
Now assume first that S contains exactly one element from each \(W_i\), say \(S = \lbrace v_1, \ldots , v_p\rbrace\) where \(v_i \in W_i \setminus \lbrace o_i\rbrace\) for every \(i \in [p]\). Then,
\begin{align*} f (S) = \sum _{i=1}^p f(v_i \mid \lbrace v_1, \ldots , v_{i-1}\rbrace). \end{align*}
Recall that we selected the weights \(a_1, \ldots , a_k\) so that each of the terms in this sum equals 1 (see Section 5.2.1). Thus, we have that the value of the set S equals p, which can also be seen from the following basic calculation:
The inequality of Lemma 5.3 thus holds in the case when S contains exactly one element from each \(W_i\). However, it turns out that such a set is only an approximate maximizer to the left-hand side of the lemma, and we need an additional argument to bound the value of any set \(S \subseteq W^{\prime }\) of cardinality at most p. We do so by defining a continuous concave version \(\widehat{F}\) of the submodular function f, which, loosely speaking, can be thought of as being a continuous extension of f. By leveraging the concavity of \(\widehat{F}\), we can obtain upper bounds through a well-chosen first-order approximation. Specifically, we consider the linear upper bound on the concave function obtained by taking its gradient at the point \(\mathbf {1} = (1, 1, \ldots , 1)\) corresponding to sets that contain exactly one element of each \(W_i \setminus \lbrace o_i\rbrace\) (see (10)).
To define \(\widehat{F}\), let us first define \(F:\mathbb {R}_{\ge 0}^p \rightarrow \mathbb {R}_{\ge 0}\) to be the following continuous proxy for f:
Note that \(\widehat{F}\) is a sum of concave functions, and it is thus a concave function on its own. If we let \(D = \lbrace (s_1, \ldots , s_p) \in \mathbb {R}_{\ge 0}^p : \sum _{i=1}^p s_i = p\rbrace\) be the “feasible” region, then because of concavity,
for every vector \(\mathbf {\bar{s}} = (\bar{s}_1, \ldots \bar{s}_p)\). We select \(\mathbf {\bar{s}} = \mathbf {1} = (1 \ldots , 1)\) to be the all-ones vector, which thus gives the upper bound
where the inequality follows from the bounds on the partial derivatives given by Claim 5.4. Assuming that claim, we have thus shown the statement of Lemma 5.3—that is, that any solution of cardinality at most p without any optimal elements has value at most \(p + (H_p)^2\).
The claim follows from basic calculations and the identities of Lemma 5.2. As these calculations are mechanical and not very insightful, they can be found in Appendix E.2.2.
5.4 Players Have No Information About Their Optimum Element
The second key property of our family \(\mathcal {F}\) is indistinguishability: if one can only query the submodular function \(f_{o_1,\ldots , o_p}\) on \(W_1 \cup W_2 \cup \cdots \cup W_{\ell }\), then no information can be obtained about which element in \(W_\ell\) is selected to be \(o_{\ell }\). Although this is intuitively clear from the description of \(\mathcal {F}\) in Section 5.1, the following lemma gives the formal proof of the indistinguishability property of Lemma 5.1.
5.5 Hardness Reduction from the CHAINp(n) Problem
In this section, we present our reduction using the family \(\mathcal {F}\) of (weighted) coverage functions guaranteed by Lemma 5.1. The arguments are quite similar to those of Section 4.2, but instead of reducing from the INDEX problem, we reduce from the multiplayer version CHAINp(n), referred to in Section 3.
Recall that in the CHAINp(n) problem, there are p players \(P_1, \ldots , P_p\). For \(i=1, \ldots , p-1\), player \(P_i\) has as input a bit string \(x^{i} \in \lbrace 0,1\rbrace ^n\) of length \(n,\) and for \(i=2, \ldots , p,\) player \(P_i\) (also) has as input an index \(t^i\in \lbrace 1, \ldots , n\rbrace\), where we use the convention that the superscript of a string/index indicates the player receiving it. The players are promised that either \(x^{i}_{t^{i+1}} = 0\) for all \(i=1, \ldots , p-1\) (the 0-case) or \(x^{i}_{t^{i+1}} = 1\) for all \(i = 1\, \ldots , p-1\) (the 1-case). The objective of the players is to decide whether the input instance belongs to the 0-case or the 1-case.
For convenience, we restate the hardness result of CHAINp(n) here (recall that the considered protocols are one-way protocols as defined in Section 3).
Theorem 3.3.For any positive integersnand\(p\ge 2\), any (potentially randomized) protocol for CHAINp(n) with success probability of at least\(2/3\)must have a communication complexity of at least\(n/(36 p^2)\).
Our plan in the rest of this section is to reduce the CHAINp(n) problem to the p-player submodular maximization problem on (weighted) coverage functions on a ground set of cardinality \(N=p\cdot n\). As n is clear from context, we simplify notation and refer to the CHAINp(n) problem as the CHAINp problem.
Let \(PRT\) be a protocol for the p-player submodular maximization problem on weighted coverage functions subject to the cardinality constraint \(k=p\). Further assume that \(PRT\) has an approximation guarantee of \(\frac{p + (H_p)^2}{2p - H_p}(1 +\varepsilon)\) for \(\varepsilon \gt 0\). We show in the following that this leads to a protocol \(PRT_{\mbox{CHAIN}_p}\) for the \(CHAIN_p{({n})}\) problem whose message size depends on the message size of \(PRT\). Since Theorem 3.3 lower bounds the message size of any protocol for CHAINp(n), this leads to a lower bound also on the message size of \(PRT\).
In our reduction, we use the weighted coverage functions in family \(\mathcal {F}\) whose existence is guaranteed by Lemma 5.1. These functions are defined over a common ground set W that is partitioned into sets \(W_1, \ldots , W_p\) of cardinality n each. For future reference, we let
Before getting to the protocol \(PRT_{\mbox{CHAIN}_p}\) mentioned earlier, we first present a simpler protocol for the CHAINp problem that is used as a building block for \(PRT_{\mbox{CHAIN}_p}\). The simpler protocol is presented as Protocol 3. In other words, if the CHAINp instance is \(x^{1}, x^{2}, \ldots , x^{p-1}, t^2, t^3 \ldots , t^{p}\), Protocol 3 simulates \(PRT\) on the following p-player Max-Card-k instance with \(k=p\):
•
Player i receives the subset \(V_i = \lbrace v^i_j \in W_i \mid j\in [n] \text{ with }x^i_j = 1\rbrace\) of \(W_i\) corresponding to the 1-bits of \(x^i\).
•
The submodular function the players wish to maximize is \(f_{o_1, \ldots , o_p} \in \mathcal {F}\) with
(by the indistinguishability property of Lemma 5.1, the choice of \(o_p\) does not matter since these functions are identical).
The last player of Protocol 3 then decides between the 0-case and 1-case depending on the value of the solution \(S \subseteq V_1 \cup V_2 \cup \cdots \cup V_p\) outputted by the last player of \(PRT\). Note that this value is informative because the elements \(\lbrace o_1, \ldots , o_p\rbrace\) are in \(V_1 \cup V_2 \cup \cdots \cup V_p\) if and only if the given CHAINp instance is in the 1-case.
Some care has to be taken to make sure that the i-th player of Protocol 3 can answer the oracle queries made during the simulation of player \(P_i\) of \(PRT\). This is the reason the message from the \((i-1)\)-th player to the i-th player of Protocol 3 also contains the indices \(t^2, t^3, \ldots , t^{i-1}\). Indeed, as player i also receives index \(t^i\), she can then infer the selection of \(o_1, \ldots , o_{i-1}\), which in turn allows her to calculate the value \(f_{o_1, \ldots , o_p}(S)\) of any set \(S\subseteq W_1\cup \cdots \cup W_i\) due to the indistinguishability property of Lemma 5.1.
Our next step is analyzing the output distribution of Protocol 3.
The assumption that \(PRT\) has an approximation guarantee of \(\frac{p + (H_p)^2}{2p - H_p}(1 +\varepsilon)\) implies the following success probability in the 1-case.
At this point, we are ready to present the promised protocol \(PRT_{\mbox{CHAIN}_p}\), which simply executes \(\lceil 2\varepsilon ^{-1} \rceil\) parallel copies of Protocol 3, and then determines that the input was in the 1-case if and only if at least one of the executions returned this answer.
Using the last corollary, we can now complete the proof of Theorem 1.3.
6 Polynomial Time Submodular Maximization for Two Players
In this section, we discuss our result about efficient protocols in the two-player setting. Hence, throughout this section, \(f:2^W\rightarrow \mathbb {R}_{\ge 0}\) is a monotone submodular function defined on a ground set W, and \(k\in \mathbb {Z}_{\ge 0}\) is an upper bound on the cardinality of subsets of W that we consider. \(V_A\subseteq W\) are the elements that Alice receives and \(V_B\subseteq W\) the ones that Bob receives, and we define \(V=V_A\cup V_B\). We denote by \(\mathcal {O}\subseteq V\) an optimal solution to the (offline) problem \(\max \lbrace f(S): S\subseteq V, |S|\le k\rbrace\). The well-known Greedy algorithm by Nemhauser et al. [42] is a crucial ingredient in our protocol. We remind the reader that Greedy starts with an empty set S and in each step adds an element \(v\in V\) to S with largest marginal gain (i.e., \(v\in \operatorname{argmax}_{u \in V} f(u \mid S)\)). We recall the following basic performance guarantee of Greedy, which can readily be derived from its definition (see also [42]): when running Greedy for r rounds, a set \(S\subseteq V\) with \(|S|=r\) is obtained that satisfies, for any \(\ell \in \mathbb {Z}_{\gt 0}\),
where \(O_\ell\) denotes the optimum solution of size \(\ell\).
6.1 Protocol
We consider the simple deterministic protocol for the two-player setting given as Protocol 4. Without loss of generality, we assume that \(|\mathcal {O} |=k\), \(|V_A|\ge 2k\), and \(|V_B|\ge k\), as otherwise these properties can easily be obtained by adding dummy elements of value zero to \(V_A\) and \(V_B\) (and consequently, also to W), and complementing \(\mathcal {O}\) with such elements to make sure that \(|\mathcal {O} |=k\).
Our main result here is that Protocol 4 has an approximation factor that is strictly better than \(1/2\).
Our focus here is on highlighting some key arguments explaining why one can beat the factor of \(1/2\). Thus, we keep our analysis simple instead of aiming for the best approximation ratio achievable.
We define \(k_A= |\mathcal {O}\cap V_A|\), \(k_B = |\mathcal {O}\cap V_B|\), and let \(\lbrace a_1,a_2,\ldots , a_{2k}\rbrace \subseteq V_A\) be the \(2k\) elements computed by Alice—that is, they are the first \(2k\) elements chosen by Greedy when maximizing f over \(V_A\), numbered according to the order in which they were chosen. For convenience, we define the following notation for prefixes of \(\lbrace a_1, a_2,\ldots , a_{2k}\rbrace\). For any \(i\in \lbrace 0,\ldots , 2k\rbrace\), let
In particular, \(G_A^0=\varnothing\). Finally, we denote by \(G_B\subseteq V_B\) the set \(Q_{k_A}\) computed by Bob—that is, this is a Greedy solution of the problem \(\max \lbrace f(S): S\subseteq V_B, |S|\le k_B\rbrace\). To show Theorem 6.1, we prove that one of the two sets \(X_{k_A}\) or \(Y_{k_A}\) leads to the desired guarantee—that is,
Without loss of generality, we assume that \(f(\mathcal {O} \cap V_A) \gt 0\) and \(k_A \gt 0\). Otherwise, \(f(\mathcal {O}) = f(\mathcal {O} \cap V_B)\) and the optimal value can be achieved with a set fully contained in Bob’s elements \(V_B\). Hence, by the approximation guarantee of Greedy, we get \(f(Y_{k_B}) \ge (1-e^{-1})\cdot f(\mathcal {O})\), and (13) clearly holds. A key quantity that we use in our analysis is the following:
In other words, \(\Delta _A\) is a normalized way (normalized by \(f(\mathcal {O} \cap V_A)\)) to measure by how much the greedy solution \(G_A\) would further improve when adding all elements of \(\mathcal {O} \cap V_A\) to it. Notice that the monotonicity and submodularity of f guarantee together \(f(\mathcal {O} \cap V_A \mid G_A) \in [0, f(\mathcal {O} \cap V_A)]\), and thus, after the normalization, we get \(\Delta _A\in [0,1]\). As we show in the following, we can exploit both small and large values of \(\Delta _A\) to improve over the factor \(1/2\). To build up intuition, consider first the following ostensibly natural candidate for a hard instance. Assume that
is a submodular maximization instance with maximizer \(\mathcal {O} \cap V_A\) and the Greedy solution, which is \(G_A\), has value \(f(G_A)\) very close to \((1-e^{-1})\cdot f(\mathcal {O} \cap V_A)\), which is the worst-case guarantee for Greedy. By looking into the analysis of Greedy, this happens only when \(\Delta _A\) is very close to \(e^{-1}\). However, it turns out that in this ostensibly bad case, our algorithm is even about \((1-e^{-1})\)-approximate due to the following. The small value of \(\Delta _A \approx e^{-1}\) implies that \(G_A\) and \(\mathcal {O} _A\) behave similarly in terms of how the submodular value changes when adding elements from \(V_B\). This is important to make sure that Bob can complement \(G_A\) to a strong solution through adding elements from \(V_B\). More precisely, adding \(\mathcal {O} \cap V_B\) to \(G_A\) increases the submodular value by
where the first inequality follows by submodularity and the second one by monotonicity of f. Finally, if we have, as discussed, that (14) is close to a worst-case instance in terms of approximability, then \(f(G_A)\approx (1-e^{-1})\cdot f(\mathcal {O} \cap V_A)\) and \(\Delta _A\approx e^{-1}\), and hence
Thus, when augmenting \(G_A\) with \(k_B\) elements of \(V_B\) through Greedy, the increase \(f(X_{k_A}\mid G_A)\) in submodular value is at least around \((1-e^{-1})\cdot (f(\mathcal {O})-f(\mathcal {O} \cap V_A))\). Together with the fact that \(f(G_A)\approx (1-e^{-1})\cdot f(\mathcal {O} \cap V_A)\), this implies \(f(X_{k_A}) \gtrapprox (1-e^{-1})\cdot f(\mathcal {O})\). Hence, our protocol computed a set \(X_{k_A}\) that is even close to a \((1-e^{-1})\)-approximation for this case. The preceding example highlights that small values for \(\Delta _A\)—and this includes the worst-case value of \(\Delta _A=e^{-1}\) for a classical submodular maximization problem with a cardinality constraint—allow Bob to complement \(G_A\) in a strong way. To complete the preceding intuitive reasoning to a full formal proof, we proceed as follows. We first quantify in Lemma 6.2 the performance of Greedy on Alice’s side depending on the parameter \(\Delta _A\). This will particularly imply that \(\Delta _A \approx e^{-1}\) is indeed the worst case if the task is for Alice to select \(k_A\) elements of highest submodular value. It also quantifies how Greedy, run on Alice’s side, improves for values of \(\Delta _A\) bounded away from \(e^{-1}\). We then generalize and formalize in Lemma 6.4 the preceding discussion, done for \(\Delta _A\approx e^{-1}\), to arbitrary \(\Delta _A \in [0,1]\). This shows that our protocol has a good approximation guarantee whenever \(\Delta _A\) is small, and also covers the case when \(f(\mathcal {O} \cap V_A)\) is small. Finally, Lemma 6.5 covers the case when both \(\Delta _A\) and \(f(\mathcal {O} \cap V_A)\) are large. In this case, the set \(G_A\cup (\mathcal {O} \cap V_A)\)—which may have up to \(2k_A\) elements and is thus not necessarily feasible—has large submodular value. This is useful to show that \(f(Y_{k_A})\) is large due to the following. Recall that \(Y_{k_A}\) is constructed by first applying Greedy to Bob’s elements to select \(k_B\) elements \(Q_r\) and then complement \(Q_r\) with elements from \(\lbrace a_1,\ldots , a_{2k}\rbrace\). Knowing that \(G_A \cup (\mathcal {O} \cap V_A)\) has large submodular value implies a significant increase in submodular value when adding to \(Q_r\) the highest-valued elements of \(\lbrace a_1,\ldots , a_{2k}\rbrace\). Notice that this reasoning relies on the fact that Alice considers sets of cardinality larger than k.
We now show how the intuitive discussion for the case \(\Delta _A \approx e^{-1}\) can be made formal and be generalized to arbitrary \(\Delta _A\in [0,1]\).
where, as usual, we interpret \(\Delta _A \ln \Delta _A = 0\) for \(\Delta _A=0\).
Proof.
We recall that \(X_{k_A} = G_A \cup S_{k_A}\), where \(G_A\subseteq V_A\) is a set constructed by Alice and \(S_{k_A}\) is a set constructed by Bob as described in Protocol 4. Observe that the following inequality holds:
Indeed, Bob extends the solution \(G_A\) by adding the \(k_B\) elements \(S_{k_A}\) from \(V_B\) using Greedy. Hence, this corresponds to applying Greedy to the submodular function maximization problem
where the first inequality comes from (19), the second one uses the submodularity of f, the third inequality uses the definition of \(\Delta _A = \frac{f(\mathcal {O} \cap V_A \mid G_A)}{f(\mathcal {O} \cap V_A)}\) and the monotonicity of f, and the last inequality is implied by Lemma 6.2.□
We now show that if both \(\Delta _A\) and \(f(\mathcal {O} \cap V_A)\) are large, then there are (possibly infeasible) sets on Alice’s side of very large submodular value, which can be exploited to complement a greedy solution on Bob’s side, leading to a set \(Y_{k_A}\) with high submodular value.
where, for \(\Delta _A=0\), we interpret \(\Delta _A\ln \Delta _A\) as zero.
Proof.
First, observe that the set \(G_A^{2k_A}\) is obtained by adding \(k_A\) elements from \(V_A\setminus G_A\) to \(G_A\) using Greedy. Since Greedy has an approximation guarantee of \(1-e^{-1}\) and the set \(\mathcal {O} \cap V_A\) is a set of cardinality \(k_A\), the \(k_A\) elements added by Greedy to \(G_A\) increase the submodular value of the set by at least \((1-e^{-1})\cdot f(\mathcal {O} \cap V_A \mid G_A)\). Thus,
Indeed, \(Y_{k_A}\) was obtained by starting from \(G_B\) and adding \(k_A\) elements among the \(2k_A\) elements of \(G_A^{2k_A}\) to it using Greedy. Adding all \(2k_A\) elements of \(G_A^{2k_A}\) to \(G_B\) would have increased the submodular value by \(f(G_A^{2k_A} \mid G_B)\). Adding half of the \(2k_A\) elements of \(G_A^{2k_A}\) to \(G_B\) increases the value by at least half that much due to the following. When running Greedy for \(2k_A\) steps, each element added in the first half has a marginal return at least as large as each element added in the second half, because marginal returns of elements added by Greedy are non-increasing. Hence, the first half has a total marginal increase at least as large as the second one, which implies (22). The statement now follows from the following chain of inequalities:
where the first inequality is due to (22), the second one follows from (21) and the monotonicity of f, the third one uses the fact that \(G_B\) was obtained by Greedy, and therefore \(f(G_B) \ge (1-e^{-1})\cdot f(\mathcal {O} \cap V_B)\), and the last one follows from \(f(\mathcal {O} \cap V_A) + f(\mathcal {O} \cap V_B) \ge f(\mathcal {O})\).□
Finally, the approximation factor claimed by Theorem 6.1 is obtained by combining the lower bounds provided by Lemmas 6.4 and 6.5 to bound \(\max \lbrace f(X_{k_A}),f(Y_{k_A})\rbrace /f(\mathcal {O})\). To this end, we compute the worst-case value of the two lower bounds for all possibilities of \(\Delta _A\in [0,1]\) and \(\frac{f(\mathcal {O} \cap V_A)}{f(\mathcal {O})} \in [0,1]\). This can be captured through the following nonlinear optimization problem, where, for brevity, we use x for \(\Delta _A\) and y for \(f(\mathcal {O} \cap V_A)/f(\mathcal {O})\).
\begin{equation} \begin{array}{rrcl} \min & z & &\\ & z &\ge &1 - e^{-1} - \left[ (1-e^{-1})x - e^{-1} - e^{-1} x \ln x \right]\cdot y \\ & z &\ge &\frac{1}{2} (1-e^{-1}) + \frac{1}{2}\left[ e^{-1} + x\ln x + (1-e^{-1}) x \right]\cdot y\\ & x,y &\in &[0,1]\\ & z &\in &\mathbb {R}\end{array} \end{equation}
(23)
Thus, the optimal value of (23) is a lower bound on the approximation ratio of Protocol 4. Together with the following statement, this completes the proof of Theorem 6.1.
Lemma 6.6.
The optimal value \(\alpha\) of Problem (23) satisfies \(\alpha \ge 0.514\).
One easy way to do a quick sanity check of Lemma 6.6 is by solving (23) via standard numerical optimization methods, which is not difficult due to the fact that the problem has only three variables and only two non-trivial constraints (which are smooth). Nevertheless, we formally prove Lemma 6.6 by providing an analytical description of the unique optimal solution of (23) through the use of the necessary Karush-Kuhn-Tucker optimality conditions. Because this is a standard approach for deriving optima, we defer the proof of Lemma 6.6 to Appendix E.3.
Footnotes
1
We recall that the hardness result of Norouzi-Fard et al. [43] only applies to the restricted case when the value of the submodular function is queried on sets of cardinality at most k.
2
In the work of Cormode et al. [18], the problem was simply named CHAINp, keeping the parameter n implicit.
3
The multilinear extension of a set function was first introduced by Călinescu et al. [13].
4
In some texts, the term coverage function is used for its unweighted version (i.e., \(a(u)=1\) for \(u\in U\)). Our statements and proofs are described in terms of weighted coverage functions. However, this is merely a matter of convenience because any weighted coverage function can be approximated arbitrarily well through a scaled version of an unweighted one.
5
To see that this is the case, it is sufficient to observe that the intuitive description of the family \(\mathcal {F}\) in Section 5.1 is equivalent to the formal definition in Section 5.2 when the underlying universe U is of infinite size, and from that point of view, it is clear that the algorithm receives no advantage if given the explicit representation of the sets compared to having an oracle access to the coverage function. Furthermore, by standard Chernoff concentration inequalities (e.g., see the proof of Lemma 8 in the work of McGregor and Vu [37]), the family can be approximated up to any desired accuracy for feasible sets of cardinality at most k by selecting \(|U| = \Theta (k \log N)\).
6
We remark that the \(a_i\)’s do not take integral values, and we think of U as a set of total size \(a_1 + a_2 + \cdots + a_p\) consisting of infinitely many infinitesimally small items.
7
It can be verified that \(\delta _{i} = 1 + (\tfrac{1+ \sqrt {1+ 4/\delta _{i+1}}}{2})\cdot \delta _{i+1}\), but the exact value is not important to us.
A Formal Connections to the Applications
In this section, we give the proofs of the theorems from Section 1 showing the formal connections between Max-Card-k and the applications we show for it.
For any \(\varepsilon \gt 0\), a data stream algorithm for Max-Card-k with an approximation guarantee of \(1/2 + \varepsilon\) must use memory \(\Omega (\varepsilon s / k^3)\), where s denotes the number of elements in the stream.
Proof.
Let \(\mathcal {A}\) be a data stream algorithm for Max-Card-k with an approximation guarantee of \(1/2+\varepsilon\), and consider the following p-player protocol. The first player feeds to \(\mathcal {A}\) the elements of her input \(V_1 \subseteq W\) in any order. She then sends the memory state of \(\mathcal {A}\) to the second player, who feeds her own elements \(V_2 \subseteq W\) to \(\mathcal {A}\) before forwarding the resulting memory state of \(\mathcal {A}\) to the next player, and so on. The last player finally feeds \(V_p \subseteq W\) to \(\mathcal {A}\) and outputs the same set as \(\mathcal {A}\). It is clear that the output of the last player is the same as that of running \(\mathcal {A}\) on a stream consisting of elements \(V_1 \cup V_2 \cup \cdots \cup V_p\). Therefore, for any \(p\ge 2,\) the protocol satisfies that (i) its approximation guarantee is \(1/2+\varepsilon\), and by definition, (ii) the size of any message sent by the players is at most the memory usage of \(\mathcal {A}\). Now, by selecting \(p=k\) large enough as a function of \(\varepsilon\), Theorem 1.3 implies that the memory usage of \(\mathcal {A}\) must be \(\Omega (\varepsilon N/ p^3) = \Omega (\varepsilon s/k^3)\), where we used the equality \(p=k\) and the fact that \(s = |V_1| + |V_2| + \cdots + |V_p|\) is at most \(|W| = N\) since the sets \(V_1, V_2, \ldots , V_p\) are disjoint.□
We begin this section with a formal restatement of Theorem 1.5.
Theorem 1.5.
Assume we are given a protocol \(\mathsf {P}\) for Max-Card-k in the two-player model which has the properties that
(i)
Alice does not access in any way elements that do not belong to the set \(V_A\). In particular, she does not query f on subsets including such elements and neither includes such elements in the message to Bob.
(ii)
Let M be the set of elements that explicitly appear in the message of Alice. Then, Bob does not access in any way elements that do not belong to the set \(M \cup V_B\). In particular, he does not query f on subsets including such elements and neither includes such elements in the output set he generates.
Then, there exists an algorithm \(\mathcal {A}\) for robust submodular maximization whose approximation guarantee is at least as good as the approximation guarantee of \(\mathsf {P}\), and the number of elements in the summary of \(\mathcal {A}\) is \(O(d \cdot \lbrace \text{communication complexity of $\mathsf {P}$ in elements}\rbrace)\). Furthermore, if \(\mathsf {P}\) runs in polynomial time, then so does \(\mathcal {A}\).
Observe that the properties required from \(\mathsf {P}\) by Theorem 1.5 imply that \(\mathsf {P}\) completely ignores the sets W, \(W_A\), and \(W_B\) even though these sets are part of the global information in the formal description of the two-player model in Section 2. Thus, the algorithm \(\mathcal {A}\) we design may use \(\mathsf {P}\) without specifying these sets. We give in the following, as Algorithm 1, the algorithm \(\mathcal {A}\). The design of this algorithm is based on a technique of Mirzasoleiman et al. [39]. In a nutshell, the algorithm uses \(d + 1\) independent copies of the protocol \(\mathsf {P}\) and manages to guarantee that one of them gets exactly the elements that have not been deleted.
We begin the analysis of Algorithm 1 with the following technical observation.
Observation A.1.
The sets \(M_1, M_2, \ldots\,, M_{d + 1}\) are pairwise disjoint.
Proof.
For every \(i \in [d + 1]\), the set \(M_i\) includes only elements that appear in the message generated by Alice of \(\mathsf {P}_i\). By the assumptions on \(\mathsf {P}\) in Theorem 1.5, \(M_i\) must be a subset of the set \(V_A\) passed to Alice of \(\mathsf {P}_i\), which implies
At least one of the sets \(M_1, M_2, \ldots\,, M_{d + 1}\) is disjoint from D because \(|D| = d\).
The last corollary implies that Algorithm 1 can indeed find a value \(\ell\) with the promised properties, and therefore it is well defined. One can also observe that the summary produced by Algorithm 1 consists of \(d + 1\) messages of \(\mathsf {P}\), and thus the number of elements in this summary is \(d + 1 = O(d)\) times the communication complexity in elements of \(\mathsf {P}\). Finally, it is clear from the description of Algorithm 1 that this algorithm runs in polynomial time when the algorithms of Alice and Bob in \(\mathsf {P}\) run in polynomial time. Hence, to prove Theorem 1.5, it only remains to argue that Algorithm 1 always outputs a feasible solution and that it inherits the approximation guarantee of \(\mathsf {P}\). From this point on, we denote by \(\widehat{S}\) the output set of Algorithm 1 and by \(V_A\) and \(V_B\) the input sets of Alice and Bob of \(\mathsf {P}_\ell\), respectively. Using this notation, we get the following observation.
Observation A.3.
\(\widehat{S} \subseteq V \setminus D\), and therefore Algorithm 1 is guaranteed to produce a feasible solution.
Proof.
By the assumptions about \(\mathsf {P}\) in Theorem 1.5, the output set of \(\mathsf {P}_\ell\) is a subset of \(M_\ell \cup V_B\). Since this output set becomes the output set \(\widehat{S}\) of Algorithm 1, we get
As mentioned in Section 2, there are multiple natural models that can be used to formulate our problem. In this section, we sketch one such model that appears quite different from the model used throughout the rest of the article. Nevertheless, our results can be extended to this model (up to minor changes in the proved bounds). For brevity, we present the model directly for the p-player case instead of presenting first its two-player version. The global information in an instance of this model consists of a ground set W, a non-negative set function \(f:2^W \rightarrow {\mathbb {R}_{\ge 0}}\), and a partition of W into p disjoint sets \(W_1, W_2, \ldots\,, W_p\), where as usual, one should think of \(W_i\) as the set of elements player number i might get. In addition, each player \(i \in [p]\) has access to private information consisting of a set \(V_i \subseteq W_i\) of elements that this player actually gets. It is also guaranteed that f is a monotone and submodular function when restricted to the domain \(2^{\bigcup _{i = 1}^p V_i}\). Like in our regular model, the objective of the players in this model is to find a set \(S \subseteq \bigcup _{i = 1}^p V_i\) of size k maximizing f among all such sets, and they can do that via a one-way communication protocol in which player 1 sends a message to player 2, player 2 sends a message to player 3, and so on, until player p gets a message from player \(p - 1\), and based on this message produces the output set. There are two main differences between this model and our regular model:
•
In our regular model, the objective function is part of the private information, and each player \(i \in [p]\) has access only to the restriction of this function to \(\bigcup _{j = 1}^i W_i\). In contrast, in the model suggested here, the objective function is a global information accessible to all players, which simplifies the model and creates a nice symmetry between the players.
•
Since the objective function is now available to all the players from the very beginning, to avoid leaking information to the early players about the parts of the instance that should be revealed only to the later players, the sets \(W_i\) corresponding to these late players should include many elements that might end up in \(V_i\) in different scenarios. In particular, since \(W_i\) is large, many of the elements in \(W_i\) might never end up in \(V_i\) together, and thus there is no reason to require all these elements to form together one large submodular function. Accordingly, the model does not require f to be monotone and submodular on all of W. Instead, it only requires f to be monotone and submodular over the elements that really arrive to some player.
C HARDNESS OF THE CHAINp PROBLEM
For completeness, we adapt one of the many hardness proofs for the standard INDEX problem to get the mentioned hardness result for the CHAINp. See Section 3 for the definition of the CHAINp problem. For convenience, we recall first the theorem statement.
Theorem 3.3.For any positive integersnand\(p\ge 2\), any (potentially randomized) protocol for CHAINp(n) with success probability of at least\(2/3\)must have a communication complexity of at least\(n/(36 p^2)\).
The proof that we present is both based on insightful discussions with Michael Kapralov and on lecture notes from his course “Sublinear Algorithms for Big Data Analysis.” We start with some preliminaries. We then introduce a distribution of instances \(D^p,\) and finally, we show that no protocol can have a “good” success probability on instances sampled from \(D^p\) without having a “large” communication complexity. Throughout this section, we simply refer to the CHAINp(n) problem as the CHAINp problem because n will be clear from context.
Basic Notation and Fano’s Inequality. For discrete random variables \(X,Y\), and Z, define
the mutual information \(I(X; Y) = H(X)- H(X\mid Y)\), and
•
the conditional mutual information \(I(X;Y \mid Z) = H(X \mid Z) - H(X \mid Y, Z)\).
We also use the following well-known relations:
•
symmetry of mutual information: \(I(X;Y) = I(Y;X)\), and
•
chain rule for mutual information: \(I(X; Y, Z) = I(X;Z) + I(X;Y \mid Z)\).
The proof relies on Fano’s inequality.
Theorem C.1.
Let X and Y be discrete random variables and g an estimator (based on Y) of X such that \(\Pr [g(Y) \ne X] \le \delta\) and \(g(Y)\) only takes values in the support \(\operatorname{supp}(X)\) of X. Then,
where \(H_2(\delta) = \delta \log _2(1/\delta) + (1-\delta) \log _2(1/(1-\delta))\) is the binary entropy at \(\delta\).
Distribution \(D^p\) of CHAINp Instances. Our hardness uses Yao’s minimax principle. Specifically, we give a distribution over CHAINp instances so that any deterministic protocol with a “good” success probability must have a “large” communication complexity. To define a CHAINp instance, we use the superscript to indicate the input of each player. For example, \(x^i \in \lbrace 0,1\rbrace ^n, t^i \in [n]\) denotes the n-bit string \(x^i\) and index \(t^i\) given to the i-th player. A CHAINp instance is thus defined by \(x^1, t^2, x^2, t^3, \ldots , x^{p-1}, t^p,\) where \(x^i \in \lbrace 0,1\rbrace ^n\) and \(t^{i+1} \in [n]\) for \(i\in [p-1]\). We now define a distribution \(D^p\) over such instances and let \(X^i, T^{i+1}\) be the discrete random variables corresponding to \(x^i\) and \(t^{i+1}\), respectively. For \(z\in \lbrace 0, 1\rbrace\), let \(B_z = \lbrace (x, t) \in \lbrace 0,1\rbrace ^n \times [n] : x_t = z\rbrace\). The joint distribution \(D^p\) over the the discrete random variables \(X^1, T^2, X^2, T^3, \ldots , X^{p-1}, T^p\) is now defined by the following sampling procedure:
•
Select \(z \in \lbrace 0,1\rbrace\) uniformly at random.
•
For \(i \in [p-1],\) select \((x^i, t^{i+1}) \in B_z\) uniformly at random.
•
Output \(x^1, t^2, \ldots , x^{p-1}, t^p\).
In other words, \(\Pr [X^1 = x^1, T^2 = t^2, \ldots , X^{p-1}= x^{p-1}, T^p = t^p]\) equals the probability that the preceding procedure outputs \(x^1,t^2, \ldots , x^{p-1}, t^p\). Note that an alternative equivalent procedure for obtaining a sample from \(D^p\) is as follows:
•
Select \(x^1 \in \lbrace 0,1\rbrace ^n\) uniformly at random.
•
Select \(t^2 \in [n]\) uniformly at random.
•
Set \(z = x^1_{t^2}\).
•
For \(i = 2, \ldots , p-1,\) select \((x^i, t^{i+1}) \in B_z\) uniformly at random.
•
Output \(x^1, t^2, \ldots , x^{p-1}, t^p\).
This immediately implies the following observation.
Observation C.2.
The random variables \(X^1, T^2\) are equivalently distributed in \(D^p\) as in \(D^2\) for any \(p\ge 2\)—that is, they are uniformly distributed in \(\lbrace 0,1\rbrace ^n \times [n]\).
We also have the following.
Lemma C.3.
Let M be a function of \(X^1\) and m one of its possible values. Further fix some \(t^2 \in [n]\). Then, the total variation distance between the distribution of \(X^2, T^3, \ldots , X^{p-1}, T^p\) conditioned on \(M = m, T^2 = t^2\) and the unconditional distribution of \(X^2, T^3, \ldots , X^{p-1}, T^p\) equals
Consider the aforementioned alternative sampling procedure for \(D^p\) where we first sample \(x^1, t^2\), and then let \(z = x^1_{t^2}\). As M is a function of \(X^1\), we have that the distribution of \(X^2, T^3, \ldots , X^{p-1}, T^p\) conditioned on \(M = m\) and \(T^2 = t^2\) can be defined by the following sampling procedure:
•
Select \(x^1 \in \lbrace 0,1\rbrace ^n\) at random from the conditional distribution \(M = m\).
•
Set \(z = x^1_{t^2}\).
•
For \(i = 2, \ldots , p-1,\) select \((x^i, t^{i+1}) \in B_z\) uniformly at random.
•
Output the outcome \(x^2, t^3, \ldots , x^{p-1}, t^p\).
In other words, z is selected to be 0 with probability \(\Pr [X^1_{t^2} = 0 \mid M = m]\) and 1 with probability \(\Pr [X^1_{t^2} = 1 \mid M = m]\).
The unconditional distribution is defined in the same way, except that \(x^1\) and \(t_2\) are selected uniformly at random. It follows that the total variation distance between the conditional distribution and unconditional distribution of \(X^2, T^3, \ldots , X^{p-1}, T^p\) equals
Hardness Proof of CHAINp. We are now ready to prove the mentioned hardness result for the CHAINp problem. As already noted, we show that any deterministic protocol with a “good” success probability over instances sampled from \(D^p\) must have a “large” communication complexity. By Yao’s minimax principle, this then implies Theorem 3.3. For a deterministic p-player protocol \(\mathsf {P}\), let \(\mbox{Z}_{\mathsf {P}}(x^1, t^2, \ldots , x^{p-1}, t^{p})\) denote the indicator function that \(\mathsf {P}\) outputs the correct prediction on instance \(x^1, t^2, \ldots , x^{p-1}, t^{p}\). We start by analyzing the two-player case, which we later use as a building block in the more general p-player case.
Lemma C.4.
Let \(\delta \in (0, 1/2)\). Any protocol \(\mathsf {P}\) for the CHAIN2 problem with communication complexity at most \(c = (1-H_2(\delta))\cdot n - 1\) satisfies
We abbreviate \(X^1\) by X to simplify notation. The message M that the first player, Alice, sends is a discrete random variable that is a function of X. Note that
\begin{align*} c + 1 \ge H(M) \ge I(M; X). \end{align*}
The first inequality holds because Alice sends at most c bits, and so the entropy of M is at most \(\log _2(2^c + 2^{c-1} + \cdots + 1) \le c+1\). The second inequality follows from the definition of mutual information and the non-negativity of entropy: \(I(M;X) = H(M) - H(M \mid X) \le H(M)\).
Letting, for any \(i\in [n]\), \(X_{\lt i}\) denote the vector \((X_1, \ldots , X_{i-1})\), we get
\begin{align*} I(M; X) &= I(X;M) \qquad \qquad \qquad \quad \textrm {(symmetry of mutual information)}& \\ & = \sum _{i=1}^n I(X_i; M \mid X_{\lt i}) \qquad \quad \textrm {(chain rule for mutual information)}& \\ & = \sum _{i=1}^n \left(H(X_i \mid X_{\lt i}) - H(X_i \mid M, X_{\lt i})\right)& \\ & \ge \sum _{i=1}^n \left(H(X_i) - H(X_i \mid M)\right) \textrm {(}X_i\textrm {{\prime }s are iid, and conditioning does not increase entropy)} & \\ & = \sum _{i=1}^n \left(1- H(X_i \mid M)\right).\qquad \textrm {(}X_i \textrm {is a uniform binary random variable)}& \end{align*}
Now, if we let \(1-\delta _i\) denote the probability that \(\mathsf {P}\) outputs the correct prediction when \(t^2 = i\), then by Fano’s inequality, \(H(X_i \mid M) \le H_2(\delta _i)\), and hence
Further, let \(\delta ^{\prime } = \frac{1}{n}\sum _{i=1}^n \delta _i\). Note that \(\delta ^{\prime }\) equals the probability that the protocol \(\mathsf {P}\) makes the incorrect prediction. Hence, all that remains to be shown is \(\delta ^{\prime } \ge \delta\). To this end, first observe that if \(\delta ^{\prime }\ge 1/2\), then this trivially holds because \(\delta \in (0,1/2)\). Otherwise, observe that the concavity of the binary entropy function implies \(\frac{1}{n} \sum _{i=1}^n H_2(\delta _i) \le H_2(\delta ^{\prime })\), and hence
Thus, \(n(1-H_2(\delta)) - 1 = c \ge I(M; X) - 1 \ge n(1-H_2(\delta ^{\prime })) - 1\). This implies \(\delta \le \delta ^{\prime }\) since \(n(1 - H_2(y))\) is a strictly decreasing function of y within the range \([0, 1/2]\).□
We now use the preceding lemma and induction to prove the hardness result for CHAINp.
Lemma C.5.
Let \(\delta \in (0, 1/2)\) and \(s = 1/2 - \delta\). For any integer \(p\ge 2\), any protocol \(\mathsf {P}\) for the CHAINp problem with communication complexity at most \((1- H_2(\delta)) \cdot n - 1\) satisfies
We prove the statement by induction on p. For \(p=2\), the statement is implied by Lemma C.4. Now assume it holds for \(2, \ldots , p-1\).
Consider a CHAINp protocol \(\mathsf {P}\), and let M denote the random variable corresponding to the message sent by the first player. Note that M is a function of the random variable \(X^1\). Fix a message m sent by the first player, and let \(t^2\) be the index received by the second player. Denote by \(\mathsf {P}(m, t^2)\) the \((p-1)\)-player protocol that proceeds in the same way as \(\mathsf {P}\) proceeds for players \(2, \ldots , p\) after the first player sent message m and the second player received the index \(T^2 = t^2\). Thus, with this notation, we can write \(\mathsf {E}[ Z_{\mathsf {P}}(X^1, T^2, \ldots , X^{p-1}, T^p)]\) as
Recall that the total variation distance between two distributions is the largest difference the two distributions can assign to an event. It follows (by considering the event “\(\mathsf {P}(m, t^2)\) gives the correct prediction”) that
\begin{align*} \mathsf {E}\left[ Z_{\mathsf {P}(m, t^2)}(X^2, T^3, \ldots , X^p, T^{p-1}) \mid M = m, T^2 = t^2 \right] \end{align*}
where \(\mbox{TVD}(m, t^2)\) denotes the total variation distance between the unconditional distribution of \(X^2, T^3, \ldots , X^{p-1}, T^p\) and the distribution of these random variables conditioned on \(M= m\) and \(T^2 = t^2\).
Note that the unconditional distribution is equivalent to taking a random \((p-1)\)-player instance from \(D^{p-1}\). By the induction hypothesis, we have that any \((p-1)\)-player protocol succeeds with probability at most \(1/2 + s \cdot (p-1)\) over this distribution of instances, and so
Consider now the protocol \(\mathsf {P}^{\prime }\) for the CHAIN2 problem where the first player is identical to the first player in the p-player protocol \(\mathsf {P}\) and the second player, given the message m and the index \(t^2\), predicts that \(x^1_{t^2}\) equals 0 if and only if \(\Pr [X^1_{t^2} = 0 \mid M = m] \ge 1/2\). Conditioned on \(M=m\) and \(T^2=t^2\), we thus have that \(\mathsf {P}^{\prime }\) succeeds with probability \(1/2+\mbox{TVD}(m, t^2)\). Furthermore, the probability that \(\mathsf {P}^{\prime }\) succeeds on a random instance is
As no player is a allowed to send messages consisting of more than \((1-H_2(\delta))n -1\) bits and \(X^1, T^2\) are uniformly distributed (Observation C.2), Lemma C.4 says that the success probability is at most \(1-\delta\) and so
First note that we can assume that \(n\ge 36 p^2\) since any protocol needs communication complexity at least 1 to have a success probability above \(1/2\). Now, let \(s = 1/(6p)\) and \(\delta = 1/2 - s\). Lemma C.5 says that any (potentially randomized) protocol for the CHAINp problem with success probability at least \(1/2 + p\cdot s\) has communication complexity at least \((1- H_2(\delta))n - 1\).
With these parameters, we have \(1/2 + p \cdot s = 2/3\). So any protocol with success probability \(2/3\) must have communication complexity at least \((1- H_2(\delta))n -1\). By the definition of the binary entropy,
where the last inequality holds because \(n \ge 36 p^2\).□
D Proof of the First Part of Theorem 1.1
Protocol 5 is a variant of Protocol 1 modified to use exponential grouping as described in the end of Section 4.1.
One can observe that, like Protocol 1, Protocol 5 is also guaranteed to output a feasible set. Furthermore, the number of elements Alice sends to Bob under this protocol is upper bounded by
Thus, to complete the proof of the first part of Theorem 1.1, we are only left to show that Protocol 5 produces a \((2/3 - \varepsilon)\)-approximation, which is our objective in the rest of this section. We use toward this goal the following well-known lemma.
Lemma D.1 (A Rephrased Version of Lemma 2.2 of Feige et al. [27]).
Let \(g:2^X \rightarrow \mathbb {R}\) be a submodular function. Denote by \(A(p)\) a random subset of A in which each element appears with probability p (not necessarily independently). Then, \(\mathbb {E} \left[ g(A(p)) \right] \ge (1-p) \cdot g(\varnothing) + p \cdot g(A)\).
Recall that by \(\mathcal {O},\) we denote a subset of \(V_A \cup V_B\) of size k maximizing f among all such subsets, and \({\rm\small OPT}\) denotes the value of this set. Additionally, let \(M = V_B \cup \bigcup _{i \in I} S_{i}\) be the set of elements that Bob gets either from Alice or directly, which is also the set of elements in which Bob looks for \(\widehat{S}\). Finally, let \({i^{\prime }}\) be the maximum value in \(\lbrace 0\rbrace \cup \lbrace \lfloor (1 + \varepsilon)^j \rfloor \mid j\in \lbrace 0,\ldots , \lfloor \log _{1+\varepsilon } k \rfloor \rbrace \rbrace\) that is not larger than \(k - |\mathcal {O} \cap M|\). Observe that the definition of \({i^{\prime }}\) guarantees that \({i^{\prime }}\in I\) and \((1 + \varepsilon){i^{\prime }}\ge k - |\mathcal {O} \cap M|\). Furthermore, the definition of I implies \(2{i^{\prime }}\in I\). We are now ready to state the following claims, which correspond to Observation 4.1, Lemma 4.2, and Lemma 4.3 from Section 4.1, respectively.
Let T be a uniformly random subset of \(\mathcal {O} \setminus M\) of size \({i^{\prime }}\) (if \(|\mathcal {O} \setminus M| \lt {i^{\prime }}\), we set T to be deterministically equal to \(\mathcal {O} \setminus M\)). Since every element of \(\mathcal {O} \setminus M\) belongs to T with some probability \(p \ge {i^{\prime }}/ (k - |\mathcal {O} \cap M|) \ge 1 / (1 + \varepsilon)\), we get by Lemma D.1 that \(\mathbb {E} \left[ f(T) \right] \ge f(\mathcal {O} \setminus M) / (1 + \varepsilon) \ge (1 - \varepsilon) \cdot f(\mathcal {O} \setminus M)\). Thus, there exists some realization \(T^{\prime }\) of T obeying \(f(T^{\prime }) \ge (1 - \varepsilon) \cdot f(\mathcal {O} \setminus M)\). The observation now follows from the choice of \(S_{{i^{\prime }}}\) by Protocol 5 since the set \(T^{\prime }\) is a subset of \((V_A \cup V_B) \setminus M \subseteq V_A\) of size at most \({i^{\prime }}\).□
To prove the lemma, we have to show that \(V_A\) includes a set of size at most \(2{i^{\prime }}\) whose value is at least \((1 - \varepsilon) \cdot [{\rm\small OPT} + f(S_{{i^{\prime }}}) - f(\widehat{S})]\). Let T be a uniformly random subset of \(\mathcal {O} \setminus M\) of size \({i^{\prime }}\) (like in the previous proof, if \(|\mathcal {O} \setminus M| \lt {i^{\prime }}\), then we define T to be deterministically equal to \(\mathcal {O} \setminus M\)), and consider the set \(T \cup S_{{i^{\prime }}}\). First, we observe that this set is a subset of \(V_A\) because \(V_B \subseteq M\). Second, the size of this set is at most \(|T| + |S_{{i^{\prime }}}| \le 2{i^{\prime }}\). Thus, to prove the lemma, it remains to show that the value of this set is at least \((1 - \varepsilon) \cdot [{\rm\small OPT} + f(S_{{i^{\prime }}}) - f(\widehat{S})]\) for some realization of T, which we do by showing that the expected value of this set is at least that large. Note that \((\mathcal {O} \cap M) \cup S_{{i^{\prime }}}\) is a subset of M of size
and thus \(f(\widehat{S}) \ge f((\mathcal {O} \cap M) \cup S_{{i^{\prime }}})\) by the definition of \(\widehat{S}\). Using the last inequality, we get
where the second inequality follows from the non-negativity of f and the last two inequalities follow from the submodularity and monotonicity of f, respectively. The first inequality follows from Lemma D.1 by defining \(g(S) = f(S \cup S_{{i^{\prime }}})\) since T includes every element of \(\mathcal {O} \setminus M\) with some probability \(p \ge {i^{\prime }}/ (k - |\mathcal {O} \cap M|) \ge 1 / (1 + \varepsilon)\).□
We omit the proof of the last lemma since it is identical to the proof of Lemma 4.3 up to a replacement of every occurrence of the expression \(k - |\mathcal {O} \cap M|\) with \({i^{\prime }}\). We are now ready to prove the approximation guarantee of Protocol 5 (and thus also complete the proof of the first part of Theorem 1.1).
Corollary D.5.
Protocol 5 is a \((2/3 - \varepsilon)\)-approximation protocol.
where the third inequality follows from the submodularity and non-negativity of f. Because \(\widehat{S}\) is the output of Protocol 5, this concludes the proof.□
One can verify that in all the preceding cases, \(g_i(v \mid S)\) indeed fulfills (E.1) and (E.1). It remains to show that \(g_i\) being non-negative, monotone, and submodular implies that \(f_i\) has these properties as well. Recall that \(f_i(S) = G_i(y^S)\). Thus, the fact that \(g_i\) is non-negative (and therefore so is its multilinear extension \(G_i\)) directly implies non-negativity of \(f_i\). Consider now two sets \(S_1 \subseteq S_2 \subseteq W\). One can observe that the definition of \(y^S\) implies \(y^{S_1} \le y^{S_2}\) component-wise. Hence, by the monotonicity of \(g_i,\) we obtain
which implies that \(f_i\) is also monotone. We now check the submodularity of \(f_i\). For that purpose, let v be an arbitrary element of \(W \setminus S_2\). If \(v = w\), then
where the second and third equalities hold by linearity of expectation, and the inequality follows from the submodularity of \(g_i\) and the inequality \(y^{S_1} \le y^{S_2}\). Similarly, if \(v =u_i^j\) for some \(i\in [n]\) and \(j\in [k-1]\), then
where U is a finite universe with non-negative weights \(a:U\rightarrow \mathbb {R}_{\ge 0}\), each element \(v\in V\) is a subset of U (i.e., \(V\subseteq 2^{U}\)), and \(p_v:U\rightarrow [0,1]\) for \(v\in V\). To show that f is a weighted coverage function, we interpret each element \(v\in V\) as a subset \(\overline{v}\) of a new universe \(\overline{U}\) with non-negative weights \(\overline{a}:U\rightarrow \mathbb {R}_{\ge 0}\) such that
We now define a universe \(\overline{U}\) with weights \(\overline{a}:U\rightarrow \mathbb {R}_{\ge 0}\) and the mapping from \(v\in V\) to \(\overline{v} \subseteq \overline{U}\) such that (25) holds. The universe \(\overline{U}\) is
One can interpret the values \(p:Z\rightarrow [0,1]\) as probabilities. Let Q be a random subset of Z containing element \(z\in Z\) with probability \(\Pr [z\in Q] = p_z\), independently of the other elements. Then, for any fixed \(X\subseteq Z\),
E.2.2 Bounding the Partial Derivatives of \(\widehat{F}\).
In this section, we perform the rather mechanical calculations that bound the partial derivatives of \(\widehat{F}\) at \(\mathbf {1} = (1,1, \ldots , 1)\). For convenience, recall the definition of \(\hat{F}\) is
The bound on the partial derivatives, which we restate here for convenience, now follows from the preceding identity, the fact that \(\ln (1-x) = -\sum _{i=1}^\infty \frac{x^i}{i}\) for \(|x| \lt 1\), and the bounds of Lemma 5.2.
Note that for \(\ell = p,\) the statement trivially holds because \(\frac{\partial \widehat{F}}{\partial s_p} (\mathbf {1}) = 0\). For \(\ell = 1, \ldots , p-1\), we use the identity of the previous claim together with the fact that \(\ln (1-x) = -\sum _{i=1}^\infty \frac{x^i}{i}\) for \(|x| \lt 1\). For brevity, we let \(x = a_\ell /A_{\ge \ell }\). Then,
By Lemma 5.2, we have \(x = a_{\ell }/A_{\ge \ell } \le (2(p+1-\ell) - H_{p+1-\ell })^{-1}\). Since \(\sum _{i=1}^\infty (\tfrac{x^{i-1}}{i})\ge \frac{x^0}{1} = 1\), this gives the lower bound
\(\eta (x) \le e^{-\frac{x}{k_A}}\) for all \(x\in [0,k_A]\).
Proof.
Notice that the claim clearly holds for \(x\in \lbrace 0,\ldots , k_A\rbrace\) as \((1-1/k_A)^x \le e^{-\frac{x}{k_A}}\) holds because \(1+y \le e^y\) for all \(y\in \mathbb {R}\). To prove the claim also for fractional values of x, we fix an integer \(r\in \lbrace 0,\ldots , k_A-1\rbrace\) and show that \(\eta (x)\le e^{-\frac{x}{k_A}}\) holds for all \(x\in [r,r+1]\). Hence, let \(x=r+\lambda\) with \(\lambda \in [0,1]\). By the construction of \(\eta\), we have
over \(\lambda \in [0,1]\), we consider the border values \(\lambda \in \lbrace 0,1\rbrace\) and points in between for which h has a derivative of zero. Because we already proved the statement for integer x, the border values \(\lambda \in \lbrace 0,1\rbrace\) fulfill \(h(\lambda)\ge 0\). Moreover,
Notice that such a \(\overline{\lambda }\) must satisfy \(\overline{\lambda }\ge 0\) because \(e^{-\frac{r+\lambda }{k_A}}\) is strictly decreasing in \(\lambda\) and \(e^{-\frac{r}{k_A}} \ge (1-\frac{1}{k_A})^r\). We now obtain \(h(\overline{\lambda }) \ge 0\) as desired because \(g(\overline{\lambda }) \le g(0) \le (1-\frac{1}{k_A})^r = e^{-\frac{r + \bar{\lambda }}{k_A}}\), where the first inequality holds because \(g(\lambda)\) is a non-increasing function, and the equality follows from the choice of \(\bar{\lambda }\). This completes the proof of the claim.□
We now would like to prove Lemma 6.6. However, since it is more convenient for our proof technique, we prove a slightly stronger version of this lemma, stated in the following, where the variables y can take values within \([0,10]\) instead of just \([0,1]\).
Lemma E.3.
The optimal value \(\alpha\) of the nonlinear program
\begin{equation} \begin{array}{rrcl} \min & z & &\\ & z &\ge &1 - e^{-1} - \left[ (1-e^{-1})x - e^{-1} - e^{-1} x \ln x \right]\cdot y \\ & z &\ge &\frac{1}{2} (1-e^{-1}) + \frac{1}{2}\left[ e^{-1} + x\ln x + (1-e^{-1}) x \right]\cdot y\\ & x &\in &[0,1] \\ & y &\in &[0,10] \\ & z &\in &\mathbb {R}\end{array} \end{equation}
(28)
satisfies \(\alpha \ge 0.514\).
Proof.
We define the two functions \(f_1,f_2:[0,1]\times [0,10]\rightarrow \mathbb {R}\) as the right-hand sides of the two non-trivial equations of ()–that is,
\begin{align*} f_1(x,y) &= 1 - e^{-1} - \left[ (1-e^{-1})x - e^{-1} - e^{-1} x \ln x \right]\cdot y, \text{ and} \\ f_2(x,y) &= \frac{1}{2} (1-e^{-1}) + \frac{1}{2}\left[ e^{-1} + x\ln x + (1-e^{-1}) x \right]\cdot y. \end{align*}
We show in the following that (28) has a unique minimizer \((x^*, y^*)\), defined as follows:
(i)
\(x^*\) is the unique root of \(\ln x + 3 - (e+1) x\) in the interval \([0.5,1]\)—that is,
By plugging these values into (28), one can easily check that the corresponding optimal z value, denoted by \(z^*\) and satisfying \(z^* = \max \lbrace f_1(x^*, y^*), f_2(x^*, y^*)\rbrace\), fulfills
To show that the minimizer of (28) is indeed the tuple \((x^*, y^*)\) described earlier, we show the following:
(1)
Problem (28) does not have a minimizer \(x,y\) at the boundary of the area \([0,1]\times [0,10]\)—that is, any minimizer satisfies \(x\in (0,1)\) and \(y\in (0,10)\).
(2)
We then apply the (necessary) Karush-Kuhn-Tucker conditions to a modified version of Problem (28), where we drop the requirements \(x\in [0,1]\) and \(y\in [0,10]\)—that is, we only consider the remaining two constraints, described by the right-hand sides given by \(f_1\) and \(f_2\), to show that \((x^*, y^*)\) is the unique minimizer.
The minimizer of \(\max \lbrace f_1(1,y), f_2(1,y)\rbrace\) for \(y\in \mathbb {R}\) is achieved for \(\overline{y}\) such that \(f_1(1,\overline{y})=f_2(1,\overline{y})\), which is
However, when setting \(x=1\) and \(y=\overline{y}\), the smallest value that z can take in (28) is \(f_1(1,\overline{y}) = f_2(1,\overline{y}) \ge 5.2\), which is larger than the value we obtain with \((x^*, y^*)\). Finally, there is also no minimizer of (28) with \(y=10\). Indeed, in this case, the objective value of (28) must be at least
where the first inequality follows by observing that \(x\ln x + (1-e^{-1})x\) is a convex function with minimizer at \(e^{e^{-1} -2}\); plugging in this minimizer leads to the inequality. Hence, the minimizer \((x^*, y^*)\) of (28) satisfies \(x^*\in (0,1)\) and \(y^*\in (0,10)\). Consequently, it suffices to write the necessary Karush-Kuhn-Tucker conditions for an optimal solution with respect to the two constraints corresponding to \(f_1\) and \(f_2\). Thus, an optimal solution \((x^*, y^*)\) to (28) must satisfy that there are two multipliers \(\lambda _1, \lambda _2 \in \mathbb {R}_{\ge 0}\) such that
To derive (30), we used the fact that \(y^* \gt 0\), which allowed us to divide both right-hand side and left-hand side by \(y^*\). Multiplying (30) by \(-x^*\) and adding it to (31) leads to the equation
Finally, we use (32) to substitute \(\lambda _1\) in (31), and simplify the expression (by multiplying by \(2(x^*-1)/\lambda _2\), expanding and then dividing by \((1/e - 1)x\)) to obtain
as desired. Hence, any optimal solution \((x^*,y^*)\) to (28) must satisfy (33) due to necessity of the Karush-Kuhn-Tucker conditions.
It remains to observe that (33) has two solutions. One has x-value below 0.07, and clearly does not lead to a minimizer because \(f_1(x,y) \ge 1 - e^{-1}\) for any \(x\le 0.07\) and \(y\in \mathbb {R}_{\ge 0}\). Thus, the \(x^*\)-value of the minimizer is unique and corresponds to the second solution of (33), which is \(x^* \approx 0.7175647\)—that is, the value stated at the beginning of the proof. Finally, the minimizing value \(y^*\) for y can be obtained by computing the unique minimizer of \(\max \lbrace f_1(x^*,y), f_2(x^*, y)\rbrace\), which is a maximum of two linear function, one with strictly positive and one with strictly negative derivative. Hence, the unique minimizer \(y^*\) is the y-value that solves \(f_1(x^*,y) = f_2(x^*, y)\), which leads to the expression for \(y^*\) highlighted at the beginning of the proof.□
Naor Alaluf, Alina Ene, Moran Feldman, Huy L. Nguyen, and Andrew Suh. 2020. Optimal streaming algorithms for submodular maximization with cardinality constraints. CoRRabs/1911.12959 (2020). http://arxiv.org/abs/1911.12959.
Francis R. Bach. 2010. Structured sparsity-inducing norms through submodular functions. In Proceedings of Advances in Neural Information Processing Systems (NIPS’10), Vol. 23. 118–126.
Ashwinkumar Badanidiyuru, Baharan Mirzasoleiman, Amin Karbasi, and Andreas Krause. 2014. Streaming submodular maximization: Massive data summarization on the fly. In Proceedings of the 20th ACM Conference on Knowledge Discovery and Data Mining (KDD’14). 671–680.
Ramakrishna Bairi, Rishabh K. Iyer, Ganesh Ramakrishnan, and Jeff A. Bilmes. 2015. Summarization of multi-document topic hierarchies using submodular mixtures. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL/IJCNLP’15), Vol. 1. 553–563.
Eric Balkanski, Aviad Rubinstein, and Yaron Singer. 2019. An exponential speedup in parallel running time for submodular maximization without loss in approximation. In Proceedings of the 30th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’19). 283–302.
Eric Balkanski, Aviad Rubinstein, and Yaron Singer. 2019. An optimal approximation for submodular maximization under a matroid constraint in the adaptive complexity model. In Proceedings of the 51st Annual ACM Symposium on Theory of Computing (STOC’19). 66–77.
Eric Balkanski and Yaron Singer. 2018. The adaptive complexity of maximizing a submodular function. In Proceedings of the 50th Annual ACM Symposium on Theory of Computing (STOC’18). 1138–1151.
Ziv Bar-Yossef, Thathachar S. Jayram, Ravi Kumar, and D. Sivakumar. 2002. Information theory methods in communication complexity. In Proceedings of the 17th Annual IEEE Conference on Computational Complexity (CCC’02). 93–102.
Rafael Barbosa, Alina Ene, Huy L. Nguyen, and Justin Ward. 2016. A new framework for distributed submodular maximization. In Proceedings of the 57th Annual IEEE Symposium on Foundations of Computer Science (FOCS’16). 645–654.
Ilija Bogunovic, Slobodan Mitrovic, Jonathan Scarlett, and Volkan Cevher. 2017. Robust submodular maximization: A non-uniform partitioning approach. In Proceedings of the 34th International Conference on Machine Learning (ICML’17). 508–516. http://proceedings.mlr.press/v70/bogunovic17a.html.
Niv Buchbinder, Moran Feldman, and Roy Schwartz. 2015. Online submodular maximization with preemption. In Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’15). 1202–1216.
Gruia Călinescu, Chandra Chekuri, Martin Pál, and Jan Vondrák. 2011. Maximizing a monotone submodular function subject to a matroid constraint. SIAM Journal on Computing 40, 6 (2011), 1740–1766.
A. Chakrabarti. 2007. Lower bounds for multi-player pointer jumping. In Proceedings of the 22nd Annual IEEE Conference on Computational Complexity (CCC’07).
Amit Chakrabarti and Sagar Kale. 2014. Submodular maximization meets streaming: Matchings, matroids, and more. In Proceedings of the 17th Conference on Integer Programming and Combinatorial Optimization (IPCO’14). 210–221.
Chandra Chekuri and Kent Quanrud. 2019. Submodular function maximization in parallel via the multilinear relaxation. In Proceedings of the 30th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’19). 303–322.
Lin Chen, Moran Feldman, and Amin Karbasi. 2019. Unconstrained submodular maximization with constant adaptive complexity. In Proceedings of the 51st Annual ACM Symposium on Theory of Computing (STOC’19). 102–113.
G. Cormode, J. Dark, and C. Konrad. 2019. Independent sets in vertex-arrival streams. In Proceedings of the 46th International Colloquium on Automata, Languages, and Programming (ICALP’19). Article 45, 14 pages.
Rafael da Ponte Barbosa, Alina Ene, Huy L. Nguyen, and Justin Ward. 2015. The power of randomization: Distributed submodular maximization on massive datasets. In Proceedings of the 32nd International Conference on Machine Learning (ICML’15). 1236–1244.
Abhimanyu Das, Anirban Dasgupta, and Ravi Kumar. 2012. Selecting diverse features via spectral regularization. In Proceedings of Advances in Neural Information Processing Systems (NIPS’12), Vol. 25. 1592–1600.
Abhimanyu Das and David Kempe. 2011. Submodular meets spectral: Greedy algorithms for subset selection, sparse approximation and dictionary selection. In Proceedings of the 28th International Conference on Machine Learning (ICML’11). 1057–1064.
Alina Ene and Huy L. Nguyen. 2019. Submodular maximization with nearly-optimal approximation and adaptivity in nearly-linear time. In Proceedings of the 30th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’19). 274–282.
Alina Ene, Huy L. Nguyen, and Adrian Vladu. 2019. Submodular maximization with matroid and packing constraints in parallel. In Proceedings of the 51st Annual ACM Symposium on Theory of Computing (STOC’19). 90–101.
Matthew Fahrbach, Vahab S. Mirrokni, and Morteza Zadimoghaddam. 2019. Non-monotone submodular maximization with nearly optimal adaptivity and query complexity. In Proceedings of the 36th International Conference on Machine Learning (ICML’19). 1833–1842.
Matthew Fahrbach, Vahab S. Mirrokni, and Morteza Zadimoghaddam. 2019. Submodular maximization with nearly optimal approximation, adaptivity and query complexity. In Proceedings of the 30th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’19). 255–273.
Daniel Golovin and Andreas Krause. 2011. Adaptive submodularity: Theory and applications in active learning and stochastic optimization. Journal of Artifcial Intelligence Research 42, 1 (2011), 427–486.
Chien-Chung Huang, Naonori Kakimura, Simon Mauras, and Yuichi Yoshida. 2020. Approximability of monotone submodular function maximization under cardinality and matroid constraints in the streaming model. CoRRabs/2002.05477 (2020). https://arxiv.org/abs/2002.05477.
Thathachar S. Jayram, Ravi Kumar, and D. Sivakumar. 2008. The one-way communication complexity of hamming distance. Theory of Computing 4, 1 (2008), 129–135.
Michael Kapralov. 2013. Better bounds for matchings in the streaming model. In Proceedings of the 29th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’13). 1679–1697.
Ehsan Kazemi, Marko Mitrovic, Morteza Zadimoghaddam, Silvio Lattanzi, and Amin Karbasi. 2019. Submodular streaming in all its glory: Tight approximation, minimum memory and low adaptive complexity. In Proceedings of the 36th International Conference on Machine Learning (ICML’19). 3311–3320. http://proceedings.mlr.press/v97/kazemi19a.html.
Ehsan Kazemi, Morteza Zadimoghaddam, and Amin Karbasi. 2018. Scalable deletion-robust submodular maximization: Data summarization with privacy and fairness constraints. In Proceedings of the 35th International Conference on Machine Learning (ICML’18). 2544–2553.
Paul Liu and Jan Vondrák. 2019. Submodular optimization in the MapReduce model. In Proceedings of the 2nd Symposium on Simplicity in Algorithms (SOSA’19). Article 18, 10 pages.
Andrew McGregor and Hoa T. Vu. 2019. Better streaming algorithms for the maximum coverage problem. Theory of Computing Systems 63, 7 (2019), 1595–1619.
Vahab S. Mirrokni and Morteza Zadimoghaddam. 2015. Randomized composable core-sets for distributed submodular maximization. In Proceedings of the 47th Annual ACM Symposium on Theory of Computing (STOC’15). 153–162.
Baharan Mirzasoleiman, Amin Karbasi, and Andreas Krause. 2017. Deletion-robust submodular maximization: Data summarization with “the Right to be Forgotten.” In Proceedings of the 34th International Conference on Machine Learning (ICML’17). 2449–2458.
Slobodan Mitrovic, Ilija Bogunovic, Ashkan Norouzi-Fard, Jakub Tarnawski, and Volkan Cevher. 2017. Streaming robust submodular maximization: A partitioned thresholding approach. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). 4560–4569.
George L. Nemhauser and Laurence A. Wolsey. 1978. Best algorithms for approximating the maximum of a submodular set function. Mathematics of Operations Research 3, 3 (1978), 177–188.
George L. Nemhauser, Laurence A. Wolsey, and Marshall L. Fisher. 1978. An analysis of approximations for maximizing submodular set functions—I. Mathematical Programming 14, 1 (1978), 265–294.
Ashkan Norouzi-Fard, Jakub Tarnawski, Slobodan Mitrovic, Amir Zandieh, Aidasadat Mousavifar, and Ola Svensson. 2018. Beyond 1/2-approximation for submodular maximization on massive data streams. In Proceedings of the 35th International Conference on Machine Learning (ICML’18). 3826–3835. http://proceedings.mlr.press/v80/norouzi-fard18a.html.
James B. Orlin, Andreas S. Schulz, and Rajan Udwani. 2016. Robust monotone submodular function maximization. In Proceedings of the 18th International Conference on Integer Programming and Combinatorial Optimization (IPCO’16). 312–324.
Jingjing Zheng, Zhuolin Jiang, Rama Chellappa, and P. Jonathon Phillips. 2014. Submodular attribute selection for action recognition in video. In Proceedings of Advances in Neural Information Processing Systems (NIPS’14), Vol. 27. 1341–1349.
STOC 2020: Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing
We consider the classical problem of maximizing a monotone submodular function subject to a cardinality constraint, which, due to its numerous applications, has recently been studied in various computational models. We consider a clean multi-player ...
ICALP'13: Proceedings of the 40th international conference on Automata, Languages, and Programming - Volume Part I
We study the power of Arthur-Merlin probabilistic proof systems in the data stream model. We show a canonical ${\mathcal{AM}}$ streaming algorithm for a wide class of data stream problems. The algorithm offers a tradeoff between the length of the proof ...
STOC '09: Proceedings of the forty-first annual ACM symposium on Theory of computing
We consider the following question: given a two-argument boolean function f, represented as an N x N binary matrix, how hard is it to determine the (deterministic) communication complexity of f?
We address two aspects of this question. On the ...
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].