Bounding the price of anarchy, which quantifies the damage to social welfare due to selfish behavior of the participants, has been an important area of research in algorithmic game theory. Classical work on such bounds in repeated games makes the strong assumption that the subsequent rounds of the repeated games are independent beyond any influence on play from past history. This work studies such bounds in environments that themselves change due to the actions of the agents. Concretely, we consider this problem in discrete-time queuing systems, where competitive queues try to get their packets served. In this model, a queue gets to send a packet at each step to one of the servers, which will attempt to serve the oldest arriving packet, and unprocessed packets are returned to each queue. We model this as a repeated game where queues compete for the capacity of the servers, but where the state of the game evolves as the length of each queue varies.
We analyze this queuing system from multiple perspectives. As a baseline measure, we first establish precise conditions on the queuing arrival rates and service capacities that ensure all packets clear efficiently under centralized coordination. We then show that if queues strategically choose servers according to independent and stationary distributions, the system remains stable provided it would be stable under coordination with arrival rates scaled up by a factor of just \(\frac{e}{e-1}\). Finally, we extend these results to no-regret learning dynamics: if queues use learning algorithms satisfying the no-regret property to choose servers, then the requisite factor increases to 2, and both of these bounds are tight. Both of these results require new probabilistic techniques compared to the classical price of anarchy literature and show that in such settings, no-regret learning can exhibit efficiency loss due to myopia.
1 Introduction
A fundamental aim at the intersection of economics and computer science is to understand the efficiency of systems when the dynamics are governed by the actions of strategic and competitive agents. In general, the outcomes reached by selfish agents at an equilibrium, or under some other dynamics, need not necessarily align well with the design of the system. Real-world systems whose performance is provably near-optimal even in the presence of selfish agents can be viewed as robust: such systems can be safely decentralized without sacrificing much efficiency. However, systems whose behavior degrades dramatically in strategic settings require extra safeguards to prevent inferior outcomes.
To make this more concrete, consider the setting of routing in networks as considered in the work of Roughgarden and Tardos [34]. Formulating this as a game, each agent chooses a path in the network from their source vertex to their sink vertex with the objective of minimizing their experienced delay along this path. The delay of an edge is some monotonic function of the number of agents traversing this edge in their chosen path. At a Nash equilibrium, selfish agents each select the path with minimum delay given the equilibrium behavior of the others; in other words, each agent selfishly optimizes her own delay function given the actions of the others. These equilibrium outcomes can be quite different from the globally optimal choice of routes that minimizes the total delay of all agents. However, Roughgarden and Tardos show that in the case of atomless selfish routing with affine delay functions, the ratio between the total delay at any Nash equilibrium and that of the global optimum is bounded by \(4/3\). More generally, if the delay functions are polynomials of degree at most d, then the ratio can behave as \(\Theta (d/\ln d)\) [31]. Such a result thus strongly characterizes the gap between selfish and optimal behavior, and identifies the concrete obstruction (here, nonlinearity) to efficiency.
In more general settings and games, given some quantitative notion of social welfare, the worst-case ratio between the social welfare at the global optimum with respect to this metric and that of any Nash equilibrium (defined in an analogous way as in the routing setting) is known as the price of anarchy [24]. More generally, we will refer to any sort of quantitative comparison between socially optimal outcomes and equilibrium outcomes as a price of anarchy analysis. A large body of work has established price of anarchy bounds for various well-studied games like routing, scheduling, and auctions, among others. Establishing price of anarchy results is important for several reasons. First, price of anarchy analyses help us understand the performance of real-world systems that already exist “in the wild” [33]. Just as importantly, the quantitative understanding given by price of anarchy–style analyses yield crucial insights toward the design of more robust systems with respect to selfish behaviors. For instance, Roughgarden and Tardos [34] further showed that in routing games, the cost of any Nash equilibrium is no more than that of the centralized optimum for twice as much flow. Such an analysis gives clearly actionable prescriptions: to attain good performance, one can simply augment the amount of resources in the system relative to the competition.
In many settings, agents repeatedly play against each other in the same game, not just once. In the most commonly studied models of repeated games, a critical assumption is that in each round, agents play an “independent” copy of the same game. In other words, the past sequence of play does not fundamentally change the nature of the game that is repeated in each round. In some settings, this approximation might be valid. For instance, in the preceding routing setting, consider routing on the scale of the morning rush-hour traffic. In this case, it usually holds that any traffic on Monday morning will have cleared by Tuesday morning, at which point agents again choose paths in a fresh version of the game. However, consider modeling packet routing in computer networks. If a packet gets dropped, then this packet must be re-sent in future rounds and thereby increases the total congestion going forward, thus violating the assumption that the game itself has not changed.
Therefore, developing a deeper theory of the efficiency of strategic agents in repeated games that retain state is of large importance. This motivates the first main question that guides our work in this article.
In this article, we extensively consider this problem of price of anarchy bounds in systems with state that impacts future rounds. Concretely, we study Question 1 in a queuing system with queues sending packets to servers as a simplified model of a network of queues as previously considered by Krishnasamy et al. [25]. In their work, the authors study the performance of a centralized learner in the same queuing system that finds the best server with respect to a more refined notion of “queue-regret,” which measures the expected difference in queue sizes to that of a genie strategy that knows the optimal server.
In contrast, our work studies a decentralized and strategic multi-queue version of the same system, where each queue selfishly competes with each other for service. Our primary focus is on precisely characterizing what conditions ensure that this queuing system remains stable (in a quantitative sense that will be formalized later) even under strategic assumptions, a concern that does not arise in the learning problem with centralized scheduling. In particular, we consider conditions that guarantee the efficiency of stochastic queuing systems when queues choose servers with the aim of getting their packets served in minimal time, and where queues must repeatedly resend their packets until the packet gets served.
To do so, we study the amount of extra resources required for stability in queuing systems under a variety of different behavioral structures. We first consider this question in the completely centralized setting, where a central coordinator can specify which queues send to which servers at each time. We first show, using the theory of majorization, that the obvious necessary condition is in fact sufficient: namely, the sum of arrival rates of the top k queues must be at most the sum of the capacities of the top k servers, for each k. This result establishes a baseline measure on the feasibility of queuing systems to compare against the rest of our results in strategic setting.
Once we establish this baseline, we may then proceed to ask: how many extra resources (server capacity) are needed relative to the coordinated setting to ensure that the system remains stable under some notion of game-theoretic equilibrium? We first study this question under the natural assumption that the behavior is stationary over time for each queue; this situation models the outcomes of systems that have reached a long-term equilibrium. We do so by defining a one-shot version of the queuing dynamics, which we call the patient queuing game, where each queue must choose a fixed distribution over servers. The term patience refers to the fact that the incentives for the queues in this game are their long-run growth rates, measured in terms of the limiting number of uncleared packets normalized by time. Thus, the values obtained in this game correspond to the long-run growth rates obtained in systems when the distribution of actions (choice of server) has stabilized for each queue, and we then ask for the values attained at any Nash equilibrium.
To answer our main question, we establish several delicate probabilistic properties of such systems and show that the long-run behavior of such systems, as a function of the choices of the queues, admits a number of favorable analytic properties. These techniques and results allow us to directly connect these probabilistic properties of the queuing system with the game-theoretic incentives of the agents to show that a factor of just \(\frac{e}{e-1}\) extra server capacity is needed to ensure stability of every Nash equilibrium of this game. Along the way, we also consider, but do not resolve, the separate question of the optimal rates attained by any independent strategies. We show that this question is equivalent to the classical notion of price of stability1 [1, 2] of these games.
Although our results in this setting are tight, the assumption that agents have reached a stationary equilibrium is quite strong. In fact, this problem applies equally to the “independent game” setting. To address this deficiency, recent work in the independent game setting has shown that price of anarchy bounds often seamlessly generalize from the single-round equilibrium setting to the repeated games setting where agents employ no-regret learning algorithms [7, 26, 32]. In this model, agents repeatedly play the same game against each other and use learning algorithms to adapt to each other’s behavior. Such extensions are crucial in capturing the outcomes that might arise in practice, as there are known obstructions to the predictive power of Nash equilibria; to name just a few, Nash equilibria need not be unique, may be computationally infeasible to compute [13, 35], and may require stringent assumptions on the knowledge of the agents. Taken together, these deficiencies may collectively prevent real-world agents from reaching equilibrium outcomes. By contrast, no-regret algorithms are simple, computationally efficient, and encode natural and minimal behavioral assumptions of strategic agents, all while enjoying sufficient provable guarantees to enable a price of anarchy analysis. This guarantee can be ensured by running any of a large set of learning algorithms [38]. The study of learning in games has a long history, dating back to the early work of Brown [10] and Robinson [30] (also see the work of Fudenberg and Levine [19]). In traditional repeated games, if all players employ a no-regret learning strategy, then the play converges to a form of correlated equilibrium of the game [20] (players correlating their play by each of them using the history of play to decide their next action), and price of anarchy analyses often extend also to the correlated play.
Therefore, to complement our equilibrium results, we turn to analyzing the outcomes that arise under learning dynamics. A fundamental question in our setting with strong inter-round dependencies is whether price of anarchy bounds similarly extend to natural learning dynamics.
To formally address this question, we consider the performance of queuing systems where each queue uses no-regret learning to determine which servers to send packets to over time. Because we only assume that the queues satisfy the no-regret property in their choices, much of the analytic structure developed for stationary strategies in the patient queuing game is no longer applicable. In contrast to more classical settings with repeated games, where such price of anarchy bounds often naturally extend from equilibrium notions to dynamic learning outcomes, we show that no-regret learning exhibits myopia, as it generally cannot consider the long-run dependencies in the system. We develop new probabilistic machinery from the equilibrium setting to prove that the corresponding queuing systems require twice the amount of server capacity needed for centralized stability, and this bound is again tight. When taken together with our equilibrium bounds, a key conceptual contribution of our work is thus that standard no-regret algorithms can indeed attain nontrivial performance guarantees in these repeated games with state, but that these guarantees are not entirely lossless. Moreover, the analyses themselves appear to require a fundamentally different set of tools rather than being corollaries of the same generic framework [32].
In the next section, we elaborate on the model, our results, and techniques.
1.1 Overview of Results and Techniques
1.1.1 Strategic Queuing Model.
Before discussing our results in more detail, we first briefly describe the queuing model that will be the focus of this work.2 As mentioned earlier, we consider the discrete-time queuing system studied by Krishnasamy et al. [25], where n queues receive packets at heterogeneous rates \(\mathbf {\lambda }=(\lambda _1,\ldots ,\lambda _n)\in (0,1)^n\) so that queue i receives a new packet at each time with probability \(\lambda _i\). In each round, any queue that has any remaining packets must select exactly one of m servers with heterogeneous success probabilities \(\mathbf {\mu }=(\mu _1,\ldots ,\mu _m)\in [0,1]^{m}\), to attempt to clear a single packet. Each server can only succeed in clearing at most one packet in each round and, most importantly, returns each unprocessed packet to the original queue, assuming for simplicity that servers have no buffer.3 We will also assume that each queue receives only bandit feedback in each round, meaning that it only observes whether it succeeded in clearing the packet it sent in the current round at the server it attempted.
Queue lengths can grow arbitrarily, so the efficiency we consider is under what conditions on the service and arrival rates can it be ensured that the system remains stable? We provide formal definitions in Section 2 of the various stability notions we will consider, but informally, stability corresponds to the number of uncleared packets growing sublinearly over time. To establish a baseline measure on what is possible under a centralized coordination algorithm, we will prove the following theorem.
Because the focus of our work is on outcomes with strategic queues under different behavioral assumptions, an important feature of our model is how conflicts are resolved when multiple queues send to the same server in a time period. In decentralized settings, queues sending a packet at each round can and will often collide at a server, necessitating a choice of which (if any) packet a server attempts to serve. There are at least two natural choices. As a first natural choice, a server may choose a packet to attempt to clear uniformly at random among those that it receives in a round. Although this is a plausible modeling choice, we will actually show (Theorem 2.4) that in this case, the number of uncleared packets in the system can increase linearly over time when queues are strategic unless the success rates of the servers are prohibitively larger than the arrival rates of the queues. Roughly speaking, uniform randomization by the servers is not sufficiently adapted to our system objective of stability; even if there exists a simple coordinated strategy that would ensure queues remain bounded, strategic behavior by the queues can prevent queues with higher arrival rates from exploiting the necessary servers. In particular, such a model precludes the possibility of a constant factor of resource augmentation ensuring stability of selfish queuing systems.
To address this immediate difficulty, we turn to a second natural choice: instead of choosing a packet to serve uniformly at random, we will assume that packets are labeled with timestamps and that servers attempt to serve the received packet with oldest timestamp (breaking ties arbitrarily). This choice immediately induces significant dependencies between rounds, for queues that for some reason have not successfully cleared many packets will have priority over queues that have been successful in previous rounds. Although this is a natural choice to facilitate stability in such systems, we will require delicate probabilistic reasoning to study such processes. A simple but key idea that will enable our analysis will be to study a Geometric version of this model that is more tailored to this priority scheme via the principle of deferred decisions, where “Geometric” refers to the type of random variables governing the evolution of the system. Namely, we reduce the analysis of this complicated system to studying a single parameter for each queue, the age of the oldest packet in the queue, that meshes well with the priority structure. After formulating and proving this equivalence in Section 2.4, all of our subsequent results will be in this Geometric system.
1.1.2 Patient Selfishness.
To understand these systems game theoretically, our first main contribution is to understand the performance of systems that are at long-run equilibrium even though agents nonetheless compete to optimize their own long-run growth rate. We do this by formulating a one-shot version of the queuing dynamics. We define the patient queuing game where queues choose fixed randomized strategies over servers to be played at each round. Each queue i’s choice over servers can be described by a fixed vector \(p_i\in \Delta ^{m-1}\), where \(\Delta ^{m-1}\) is the probability simplex over the m servers. We study this as a traditional game and consider the resulting Nash equilibria when each queue aims to choose their fixed randomization to minimize their long-run aging rate (equivalently, their long-run growth rate, see Section 2.2) conditioned on the others. Our main interest is understanding under what conditions on the service rates and arrival rates the system will remain stable in every Nash equilibrium. To study this, we face significant probabilistic and game-theoretic challenges: probabilistic challenges to determine and prove a closed form of asymptotic growth rates for given strategies exists, and game-theoretic challenges in showing that Nash equilibria exist and bounding their quality. The techniques we use will prove useful in addressing these conceptually distinct difficulties, thereby unifying the game-theoretic and probabilistic properties of our systems.
Asymptotic Growth Rates. In the preceding discussion, we stated that each queue aims to select a fixed randomization over servers to minimize their long-run aging rate in this system given the randomizations of the others. Our first task, to do any game-theoretic analysis of this system, is to analyze the long-run properties of this random process of queue ages (which typically will not even be recurrent). A major technical component of our work is showing that for any fixed, independent randomizations \(\mathbf {p}\) by the queues over servers, not only do these long-run growth rates exist almost surely, they are deterministic and can be explicitly computed as a function of the strategies.
To prove this result, we provide an alternate, algorithmic description of the long-run rates in Section 3.1, which we use for all of our subsequent game-theoretic results. Working just with this alternative definition, we show that the queues partition into groups such that all queues in a group age asymptotically at the same rate. We will return to the task of establishing that the true, long-run asymptotic aging rates of the queues for any choice of strategies coincide with the output of the algorithm in Section 6.
The key technical difficulty in proving this alternate characterization is that the priority structure via timestamps changes rapidly round to round and depends crucially on past successes by the queues. To overcome this, we use a rather delicate inductive argument that accounts for these changes in a controlled fashion that enables us to keep track of the evolution of the queuing system at a less granular level while still being sharp enough to prove the precise quantitative rates. Once we have done so, only then can we repeatedly appeal to concentration bounds to argue that each subset in the partition grows at the desired rate. To conclude, we carefully apply the Borel-Cantelli lemma to establish the result. For ease of exposition, some of the highly nontrivial and technical details are deferred to Appendix F.
Game-Theoretic Properties: Equilibria and Price of Anarchy. Once we show that these limits almost surely are equal to an explicit, deterministic function of \(\mathbf {p}\), it might still not be the case that a Nash equilibrium exists in the induced game. However, we show that the cost function exhibits significant analytic properties, which lets us reason about the structure of the sets that arise in the partition for any fixed strategy profile. More precisely, we show that each level set of the cost function corresponds to the minimizing subset of the ratio of a submodular and modular set function; this significant structure allows us to show that the subsets that minimize this ratio are closed under union and nonempty intersection (and thus essentially form a Boolean lattice). These considerations will be enough to show continuity as a function of the strategies (Proposition 3.7), which along with other properties will enable us to show that an equilibrium exists using Kakutani’s theorem (Theorem 3.8). Although we show that the cost function of our game has significant structure, the correspondence between actions (randomizations) and costs is quite nonlinear, imposing new technical challenges.
Recall that our goal is to ensure stability in any Nash equilibrium, assuming some relationship on the service rates to the arrival rates. Our main result, proven in Section 4, shows that the correct constant of system slack is \(\frac{e}{e-1}\approx 1.58\).
This result is tight—in a symmetric system where \(n=m\), each queue has the same arrival rate, each server has the same success rate, and each queue chooses to uniformly randomize over servers, a simple balls-in-bins analysis yields this constant as \(m,n\rightarrow \infty\).
To prove this theorem, we provide a novel argument that establishes the result by continuously deforming any Nash profile toward a carefully constructed strategy profile while only monotonically decreasing the rate at which the top group clears. We then analyze the resulting profile to give a lower bound on the value of the Nash profile. The key difficulty is that the relevant incentives for each queue correspond to possibly many different subsets of queues that have a maximal aging rate. These constraints are difficult to directly compare; different choices of deviations in the strategy by a queue at any Nash equilibrium may violate distinct constraints, making it unclear how to argue about the quality of these equilibria. In particular, there does not seem to be a direct analogue of the Nash indifference principle in finite-action games where utilities are affine in the randomizations of each agent (recall Example 1.4, where the queue moving to the lesser server will still appear to prefer the better server).
To overcome these difficulties, we show that one can significantly reduce the number of incentive constraints one must consider for each queue (Proposition 4.3). This part of the argument crucially uses the lattice structure of the subsets of queues that ages fastest. We can then carefully perform our deformation of the collective strategy vector of the queues according to the structure of these sparsified constraints, and show that our deformation only hurts the quality of the Nash solution to provide a valid lower bound.
In contrast, almost every known price of anarchy–style result can be viewed via the very general smoothness framework of Roughgarden [32], which connects an equilibrium with the social optimum via discrete changes in the strategy profile. Our argument instead relies on a careful equilibrium analysis that smoothly interpolates between the equilibrium and a “good” profile that is easy to explicitly bound; however, during these deformations, these intermediate strategy profiles will not be equilibria. To prove the monotonicity of this deformation, we connect the incentives at Nash to the structure of the subset of maximizers of the long-run rate function and show that the Nash constraints still hold along the directions we deform.
1.1.3 No-Regret Learning in Queuing Systems.
As mentioned earlier, no-regret learning is a classical modeling assumption in the context of repeated, independent games that often attains equivalent price of anarchy–style guarantees to that of Nash equilibria. Moreover, the convergence of no-regret learning to a correlated equilibrium gives an intrinsic game-theoretic justification for using it as a behavioral model of the agents. In our setting, because queues only receive bandit feedback on their actions, but otherwise may not know the service rates \(\mathbf {\mu }\) nor any other information on the number or identities of the other queues in the systems or their choices of servers, it is natural to assume that they may use an adaptive learning algorithm to select servers in each round. Indeed, in large systems, reasoning about explicit game-theoretic actions on a round-by-round basis may prove difficult or impossible due to these information constraints, but no-regret learning is nonetheless achievable and gives useful performance guarantees.
However, we now provide a simple example showing that no-regret behavior can exhibit surprisingly myopic behavior in our queuing model. In this example, there is a unique no-regret policy for each agent, given the behavior of the other agent, but both agents would have been better off long-term had one even slightly deviated to an inferior server and the other stayed the same. In the classical setting with “independent” repeated games, this cannot occur.
Note that this behavior cannot arise in the patient version of the queuing game considered earlier, because the incentives explicitly favor smaller long-run growth rates. This example shows that no-regret learning, which considers less sophisticated measures of the efficiency of the servers without considering the long-run effects, exhibits myopia. The queue sending to the second server in the preceding example does have regret in the classical sense, despite doing better in the long term.
We thus see that no-regret outcomes could be susceptible to performance losses compared to the patient setting. Our second main result precisely captures the performance of generic no-regret dynamics. We show that under no-regret dynamics, the system still remains stable when there is a factor 2 extra slack in server capacity: if the system has enough capacity to serve all packets when they are centrally coordinated even with half the service rates, then no-regret learning of the queues guarantees that queue lengths stay bounded in expectation across time. We will prove the following theorem.
We complement these results by providing a partial converse that this factor of 2 on the required service rate is tight in Theorem 5.4, in that with less than a factor of 2 higher service rate, no-regret outcomes do not necessarily guarantee stability. Taken in tandem with our results on the patient queuing game, we thus observe rather subtle behavior that can arise in games with state. No-regret learning can indeed attain nontrivial performance guarantees, but the myopia induced by the local dynamics may lead to performance losses compared to the setting where agents compete with stationary, patient strategies.
Our key technique in proving Theorem 1.5 is to use a delicate potential function argument. The main idea of our proof is to argue that when some potential function, which we must construct, has a high enough value, then it must have negative drift. To conclude that the queue sizes remain bounded in expectation, we can then employ a powerful theorem of Pemantle and Rosenthal [29] showing that a sufficiently regular stochastic process with negative drift must have moments uniformly bounded over time.
To carry out this approach, the key difficulty now becomes: what potential function (related to queue lengths/ages) should we choose that provably has negative drift when it is large? The simplest possible choice is the maximum queue age, which is an \(\ell _{\infty }\) potential. Indeed, we will argue that the oldest queues, by virtue of the slack, no-regret condition, and the priority, will tend to decrease in age in aggregate. However, the dependencies from the priority scheme and the learning dynamics make arguing about how this decrease is spread among old queues rather tricky. Moreover, this potential does not alone sufficiently account for the full state of the system; to try to benefit from the performance of all queues, one could instead try an \(\ell _1\)-style potential function. However, this potential has a different problem; the gains by older queues could be washed out by the aging of young queues that do not have priority for this choice of potential.
To that end, it will become most convenient to instead study an \(\ell _2^2\) potential function of queue ages. This potential function construction can be motivated in multiple ways. First, squaring queue ages naturally biases the potential to the older queues as for an \(\ell _{\infty }\) potential, which we will argue are decreasing in aggregate. Moreover, we will want to benefit at all scales like in an \(\ell _1\) potential: to do so, one is naturally led to summing an \(\ell _1\)-potential over just those queues above each age threshold. Upon doing some algebra, one arrives at our \(\ell _2^2\) potential function of queue ages. For technical reasons, we eventually translate back to a suitable \(\ell _2\)-style norm of queue ages. It is our hope that the kinds of qualitative features we establish and the methods of proof for these results will be of interest in the future study of repeated games that similarly relax the independence assumptions of the games played at each round.
1.2 Organization
We first formalize the strategic queuing model in Section 2. In doing so, we will formally establish our notions of stability and feasibility that will give the benchmark for our results with strategic agents. We then turn to our first main result on patient queuing systems at equilibrium. In Section 3, we introduce and prove several properties of the relevant queuing game. We provide an alternate, algorithmic description of the cost functions for the queues, and use the resulting description to prove various game-theoretic and structural properties of these systems. Using these results, we prove our tight bound on the price of anarchy in this patient queuing game in Section 4 assuming that this alternative characterization of long-run rates is valid. In Section 5, we prove our first main result showing that with a factor 2 extra slack, no-regret queues will remain stable. In Section 6, we return to formally showing that the algorithmic description of the cost functions indeed coincides with their original definition as long-run growth rates of the queues.
1.3 Related Work
Our work falls in a long tradition of establishing price of anarchy bounds for various games [24], but is one of the first examples of studying the effect of learning in games with carryover effects between rounds. Compared to classical price of anarchy bounds in repeated games [4, 32, 40], we no longer make the assumption that games at different rounds are independent. Studying this model requires us to combine ideas from the price of anarchy analysis of games with the theory of stochastic processes. Another important repeated game setting with such carryover effect is the repeated ad-auction game with limited budgets. Some works [8, 11, 12] consider such games and offers results on convergence to equilibrium as well as understanding equilibria in the first-price auction settings under a particular behavioral model of the agents. Analyzing such systems for the more commonly used second-price auction system is an important open problem.
Although our stability objective differs from usual objectives in this literature, our results qualitatively also resemble the bicriteria result of Roughgarden and Tardos [34], which shows that in nonatomic routing, the cost incurred at any Nash flow is at most the optimal cost when twice the flow is routed. Unlike most such bounds that follow the smoothness framework of Roughgarden [32], our second main result is an equilibrium analysis that is more similar to that of Johari and Tsitsiklis [23], who establish equilibrium conditions and modify their problem while maintaining the equilibrium condition to arrive at a version that is easy to analyze. In our argument, we also modify the equilibrium itself toward a more tractable solution, but the intermediate points in this deformation will not be Nash, requiring additional arguments.
Our queuing setting also bears resemblance to stochastic games, a generalization of Markov decision processes where multiple players competitively and jointly control the actions and transitions (e.g., see [16, 28]). However, our work differs from this line in multiple ways: in our second model, queues are unaware of the system state and parameters, and most importantly, we are interested in explicit bounds to derive price of anarchy-style results for stability.
Although the goal of our work is in establishing price of anarchy-style bounds in dependent systems, this necessitates a careful understanding of the analytic properties of our random queuing dynamics. Among the large body of literature studying highly dependent random processes, closest to our work is the adversarial queuing systems of Borodin et al. [9], who also use the Pemantle and Rosenthal [29] theorem to establish bounded queue sizes in expectation.
The classical focus of work on scheduling in queuing systems is to identify policies that achieve optimal throughput (e.g., see the textbook of Shortle et al. [39]). There has also been work both on evaluating efficiency loss due to selfishness in different classical queuing systems, as well as the role of learning in such systems. For work on price of anarchy in queuing systems, see the book of Hassin [21] and the survey of Hassin and Haviv [22], and for a very recent tutorial on the role of learning and information in queuing systems, see the work of Walton and Xu [41]. Closest to our model from this literature is the work of Krishnasamy et al. [25], which characterizes the queue-regret of learning algorithms that only seek to identify the best servers but does not consider competition between selfish learners. Their primary goal is to study this more refined notion in the queuing setting for this classical stochastic bandit problem, which exhibits more complicated behavior than standard no-regret bounds that grow at least logarithmically with time. They characterize queue-regret for the case of a single queue aiming to find the best server, and extend the result to the case of multiple queues scheduled by a single coordinated scheduling algorithm, assuming that there is a perfect matching between queues and optimal servers that can serve them. In contrast, we assume that each queue separately learns to selfishly make sure its own packets are served at the highest possible rate, offering a strategic model of scheduling packets in a queuing system. Furthermore, we do not make the matching assumption on queues and servers.
Subsequent Work. After a subset of these results originally appeared in conference form, several works studied several game- and learning-theoretic questions arising from our model and technical results. On the strategic side, Fu et al. [18] show that the probabilistic techniques we introduce for analyzing these queuing systems extend to more general queuing networks once the feasibility conditions under coordinated scheduling are suitably adjusted, which naturally arise via duality arguments. Baudin et al. [5] propose an alternative, episodic queuing system where agents have incentives to hold jobs in an episode before sending to a central server, but suffer penalties should their jobs not be completed before the end of the episode. Their main conclusion, similar to ours, is that both equilibrium and no-regret outcomes ensure stability as long as these costs are sufficiently large. On the learning-theoretic side, Sentenac et al. [37], as well as Freund et al. [17], consider the problem of decentralized learning dynamics in bipartite queuing systems that attain near-optimal performance, extending the original work of centralized learning by Krishnasamy et al. [25]. These algorithmic advances are incomparable with our results, which are inherently noncooperative and strategic in nature. The former work also shows that the more refined guarantee of policy regret [14] is not sufficient to bridge the quantitative gap between our positive results for no-regret learning and patient equilibria.
2 Preliminaries
2.1 Notation
In general, random variables will be denoted by capital letters (i.e., \(X,Y,Z,\ldots)\), whereas vectors will be bolded (i.e., \(\mathbf {\mu },\mathbf {\lambda }\), etc). If a random variable X has some distribution \(\mathcal {D}\), we write \(X\sim \mathcal {D}\). We use the notation \(\text{Geom}(p)\) to denote a geometric distribution with parameter p, \(\text{Bern}(p)\) for a Bernoulli distribution that is 1 with probability p and 0 otherwise, and \(\text{Bin}(n,p)\) for a binomial distribution with parameters n and p.
We say that an event occurs almost surely if it has probability 1. We use standard \(O(\cdot), o(\cdot)\), and \(\Theta (\cdot)\) notation. We will sometimes write \(f(n)\asymp g(n)\) if \(f(n)=\Theta (g(n))\). We will also consider the following norms: for a positive vector \(\mathbf {\lambda }=(\lambda _1,\ldots ,\lambda _n)\), with \(\lambda _1\ge \ldots \ge \lambda _n\gt 0\), we define the following two weighted \(\ell _p\) norms on \(\mathbb {R}^n\):\(\Vert \mathbf {x}\Vert _{\mathbf {\lambda },1}\triangleq \sum _{i=1}^n \lambda _i \vert x_i\vert\) and \(\Vert \mathbf {x}\Vert _{\mathbf {\lambda },2}\triangleq \sqrt {\sum _{i=1}^n \lambda _i x_i^2}.\) It is easily seen that for any \(\mathbf {x}\), \(\Vert \mathbf {x}\Vert _{\mathbf {\lambda },1}\asymp \Vert \mathbf {x}\Vert _{\mathbf {\lambda },2}\) (where the constants depend on \(\mathbf {\lambda }\)) via Cauchy-Schwarz (see Lemma A.1). We use the following fractional sum operation \(\oplus : \mathbb {R}^2_{\ge 0}\times \mathbb {R}^2_{\ge 0}\rightarrow \mathbb {R}_{\ge 0}\):
We will later repeatedly use the following simple fact.
Given a \(n\cdot m\)-dimensional vector \(\mathbf {p}=(p_1,\ldots ,p_n)\), where \(p_i\in \mathbb {R}^m\), we will write \(p_{ij}\) for the jth element of \(p_i\). Given a vector \(\mathbf {x}\in \mathbb {R}^n\) and a subset \(I\subseteq [n]\), we write \(\mathbf {x}_I\) to denote the vector restricted to the components in I. Given a set S, we will write \(\mathcal {P}(S)\) to denote the power set.
2.2 Bernoulli Queuing Model
We consider the following discrete-time queuing system illustrated by Figure 1, which is a decentralized, competitive version of the model considered by Krishnasamy et al. [25]: there is a system of n queues and m servers. During each discrete timestep \(t=0,1,\ldots\), the following occurs:
Fig. 1.
(1)
Each queue i receives a new packet with a fixed, time-independent probability \(\lambda _i\). We model this via an independent random variable \(B^i_t\sim \text{Bern}(\lambda _i)\). This packet has a timestamp that indicates that it was generated at time t. We label queues such that \(1\gt \lambda _1\ge \ldots \ge \lambda _n\gt 0\), writing \(\mathbf {\lambda }\) for the vector of arrival rates.
(2)
Each queue that currently has an uncompleted packet chooses one server to send their oldest unprocessed packet (in terms of timestamp) to.
(3)
Each server j that receives a packet does the following. First, it only considers the packet it receives with the oldest timestamp (breaking ties arbitrarily). It then processes this packet with a fixed, time-independent probability \(\mu _j\). We again label servers so that \(\mu _1\ge \ldots \ge \mu _{m}\ge 0\), writing \(\mathbf {\mu }\) for the vector of service rates.
(4)
All unprocessed packets, possibly including the packets that were selected if the corresponding server failed to process it, are then sent back to their respective queues still uncompleted. Queues receive bandit feedback on whether their packet cleared at their chosen server.
At each round of this process, the queues with packets aim to maximize the probability that their packet gets served, describing the incentives of the stage game.
We write \(Q^i_t\) for the number of unprocessed packets of queue i at the beginning of time t (before sampling new packets) and \(\mathbf {Q}_t=(Q^1_t,\ldots ,Q^n_t)\) for the vector of queue sizes at time t. Define \(Q_t=\sum _{i=1}^n Q^i_t\) as the total number of unprocessed packets in the system at time t. Formally, if \(X^i_t\) is the indicator event that queue i clears a packet at time t and \(B^i_t\) is again the indicator queue i received a new packet at time t, then we have the recurrence as random variables with \(Q^i_0=0\) and
where we note that \(X^i_t\) is necessarily 0 if \(Q^i_{t}+B^i_t=0\) (i.e., queue i had no packets and did not receive a new one in the round, so it does not send nor clear a packet this time period). This ensures that each \(Q^i_t\) is integral and nonnegative. We call the preceding random process the Bernoulli model. We will be interested in the stability of this system in the following sense.
To get a baseline measure for the outcomes that could arise under strategic behavior, we must first understand when a queuing system is stable under centralized coordination: it turns out that an obvious necessary condition on \(\mathbf {\mu }\) and \(\mathbf {\lambda }\) is also sufficient. An instructive example to keep in mind is a single-queue, single-server system. Of course, there is no learning or competition in such a process. If \(0\lt \lambda \lt \mu \le 1\), it is well known that \(Q^1_t\) follows a random walk on the nonnegative integers biased toward 0, and moreover is geometrically ergodic.5 This in particular implies strong stability. However, if \(0\lt \lambda =\mu \lt 1\), then the corresponding unbiased random walk \(Q^1_t\) satisfies \(\mathbb {E}[Q^1_t]=\Theta (\sqrt {t})\). Therefore, there is a sharp threshold for strong stability at \(\lambda = \mu\). Our first result, which provides a natural baseline to compare with all of the subsequent results in this work, offers an extension of this for the feasibility of any coordinated policy for multiple queues and servers.
We give the proof of sufficiency in this section while deferring the (straightforward) proof of necessity to Appendix C, which requires a submartingale argument in the edge cases where some inequality is tight. To prove sufficiency, we require the well-known notions of majorization of vectors.
With this definition in hand, we can prove sufficiency using standard results on the relation between majorization and doubly stochastic matrices (see Appendix C for these standard facts).
2.3 The Need for Packet Priorities
Recall that we will be interested in proving statements of the following form:
Given a queuing system that is centrally feasible (in the sense of Theorem 2.2) even when \(\mathbf {\mu }\) is scaled down by some constant \(c\ge 1\) independent of the parameters of the system, then a random process where queues are decentralized and strategic under certain conditions remains stable.
To see the necessity of timestamps, consider instead a simpler model where there are no timestamps and priorities, and instead each server uniformly randomly picks which packet to process among those that are sent to it in each step. It is easy to see that if a queuing system is feasible even if \(\mathbf {\mu }\) is scaled down by a factor of n, then it will remain a stable queuing system with reasonably strategic queues. Indeed, by this feasibility assumption, \(\mu _1\gt n\cdot \lambda _1\) so that \(\mu _1\gt \sum _{i=1}^n \lambda _i\). Therefore, if every queue just always sends to the largest server whenever they have a packet, they will succeed in clearing a packet with probability at least \(\mu _1/n\gt \lambda _i\), and it is not too difficult to prove that this results in a strongly stable process by comparing to a random walk biased toward the origin. It is natural to ask if a better factor is attainable in this alternate model, perhaps even a constant. It turns out that in general, a polynomial in n is required, as we formally show in Appendix C.
The basic reason this can occur is that low arrival rate queues can saturate the high success rate servers, making it impossible for high arrival rate queues to clear fast enough to offset their higher arrival. Our priority model, although more difficult to analyze, results in older queues gaining an advantage on young queues, causing the young queues to prefer lower-quality servers. In other words, our model implicitly helps fast-growing queues get better service, as long as queues are sufficiently adaptive to take advantage of it.
2.4 Geometric Queuing Model via Deferred Decisions
Because of the significant dependencies induced by the timestamp priority scheme in our model, a direct analysis of these systems proves quite unwieldy. In this section, we use the principle of deferred decision to give an alternate description of the Bernoulli system given in Section 2.2 that will prove significantly more amenable to analysis in the rest of this article.
To describe this Geometric model, suppose that each queue chooses which server to send to at time t only depending on past feedback and their history of oldest timestamps, but not on the \(Q^i_t\). In this case, we can equivalently characterize the evolution of this system keeping only the oldest timestamp of a packet at each queue. To do this, instead of randomly generating new packets at each timestep according to a Bernoulli process, each queue only maintains the timestamp of their current oldest unprocessed packet. Once this packet is successfully cleared, the new current oldest unprocessed packet has a timestamp generated by sampling a geometric random variable with parameter \(\lambda _i\) and adding this to the timestamp of the just-completed packet. If this number exceeds the current timestep t, this corresponds to having processed all packets that arrived before the current timestep, and receiving the next packet in the future. We will call this random process the Geometric process. Because the gap between successes in repeated independent \(\text{Bern}(\lambda _i)\) trials is given by a \(\text{Geom}(\lambda _i)\) random variable, the Bernoulli and Geometric processes can be completely coupled, as described in the following.
We write \(\mathbf {T}_t=(T^1_t,\ldots ,T^n_t)\in \mathbb {N}^n\) for the vector of current ages of oldest packets. To see the equivalence, consider any Bernoulli queuing system with Bernoulli random variables \(\lbrace B^i_t\rbrace _{i\in [n],t\ge 0}\) for packet generation. To get a coupled Geometric system for the same system, use an independent sequence \(\lbrace G^i_s\rbrace _{i\in [n],s\ge 0}\) with the interpretation that \(G^i_{s}\sim \text{Geom}(\lambda _i)\) is the size of the sth gap between successes in the \(B^i_t\). When queue i clears her sth packet, her new oldest timestamp increases by \(G^i=G^i_{s}\) as described earlier. As such gaps between timestamps in the Bernoulli model have \(\text{Geom}(\lambda _i)\) distributions, the Geometric system gives the ages of each queue in the Bernoulli system at all times and gives an explicit coupling.
The key feature is that, under the assumption that each queue i chooses servers at time t only based on at most the \(T^i_t\), not on \(Q^i_t\), all choices by queues are the same conditioned on just the current timestamp and past feedback as it is conditioned on all the past information in the Bernoulli model (which includes arrivals received after the current oldest packet). In other words, if \(\mathcal {G}_t\) denotes the information available to the queues in the Bernoulli model at time t, and \(\mathcal {F}_t\) for the Geometric model, then all choices by the queues at time \(t+1\) are the same conditioned on either history. The point of doing so is that \(G^i_{s}\) will be independent of \(\mathcal {G}_t\) until the queue clears her sth packet (namely, the timestamp of queue i’s \(s+1\)-th packet is not known until the time queue i clears her sth packet).
In the Geometric system, we define stability in the same way as before.
Because heuristically \(Q_t^i\approx \lambda _i T_t^i\), it is intuitive that our notions of strong stability are equivalent whenever both systems correspond to the same random process. This is indeed the case, and furthermore, strong stability implies almost sure subpolynomial asymptotic growth. The basic idea is to use Markov’s inequality and the Borel-Cantelli lemma along an appropriately chosen subsequence of times and then interpolate to the rest. We defer this equivalence to Appendix C.3.
3 Patience in Queuing Systems
In this section, we introduce and establish structural results about a patient version of our queuing game. In this game, each queue chooses a fixed distribution over servers that will be used in all rounds to optimize the long-run age of the packets in the queue. To begin, we formulate the game in a manner that is well defined a priori.
Note that the cost function defined previously is clearly well defined as the lim sup of expected values. However, we will actually show that the limit of the random quantity \(T_i^t/t\) (without expectations) is almost surely equal to a deterministic constant depending on \(\mathbf {p},\mathbf {\lambda },\mathbf {\mu }\) (see Theorem 3.3). By deriving an alternate, explicit characterization of these values, we show that Nash equilibria exist in Theorem 3.8. Because the cost functions are explicit functions of the randomizations and the parameters \(\mathbf {\mu }\) and \(\mathbf {\lambda }\), we omit the notation \((c_i)_{i=1}^n\) when instantiating a game \(\mathcal {G}\).
Our main focus in Section 4 will be to give guarantees on the quality of all Nash equilibria in this game. In a slight abuse of the price of anarchy terminology, we make the following definition.
In this section, we extensively study the properties of the cost function \(c(\mathbf {p})\), which is currently written as the lim sup of the expected value of the random linear aging rate of each queue. By taking the lim sup and expected values, the cost function is well defined, albeit quite unwieldy at present. Our first task is thus to provide an alternative, algorithmic description of \(c(\mathbf {p})\), which we initially denote \(r(\mathbf {p})\) (for “rates”) in Section 3.1. We show that r has significant analytic structure that will help establish various game-theoretic properties of this system. In particular, we show that the level subsets (in \([n]\)) of \(r(\mathbf {p})\) enjoy convenient closure properties, which will be enough to establish continuity and other properties, which we use to prove the existence of equilibria. We will return to proving that this function is equal to c in Section 6.
The Need for Packet Priorities with Patience. We note that even with this restriction to stationary strategies in a patient queuing game, the priority scheme by servers to attempt to serve the oldest packet is necessary to obtain constant price of anarchy bounds. It is not too difficult to see that if servers choose packets uniformly at random among those it receives in each round, the price of anarchy can be polynomially large in n in the sense of Theorem 2.4 when the costs are defined to be the asymptotic aging rates as in Definition 3.1. More specifically, consider a queuing system with one queue with arrival rate \(C/n^{1/3}\) for some large constant \(C\gt 0\) to be determined, and \(n-1\) queues with arrival rate \(1/n^{2/3}\). Suppose that there is one server with success rate 1 and then n servers with success rate \(1/n^{1/3}\). There exist bad Nash equilibria in stationary strategies of the following form: each small queue evenly mixes between a personal server with success rate \(1/n^{2/3}\) and the top server with success rate 1. It is clear that each small queue will be stable for large enough n (provided no other queue shares her personal server) by this choice of constants, so she has no incentive to deviate because her cost in this game is her long-run aging rate of zero.
By standard Chernoff bounds, in any given round, there are at least \(cn^{1/3}\) small queues that will send to the top server under these stationary strategies with overwhelming probability for some absolute constant \(c\gt 0\); therefore, with very high probability in each round, the large queue can succeed in clearing a packet by sending to the large server with probability at most \(1/cn^{1/3}\). This is strictly smaller than her rate if \(C\gt 0\) is taken sufficiently large. At any other server, the queue can get rate at best \(1/n^{1/3}\), which clearly does not ensure stability for large enough \(C\gt 0\). Therefore, the system must remain unstable if she best responds to this behavior by the other queues (note further that this best response will never involve sending to a personal server of any of the other queues, as there are n such servers). No queue has any incentive to deviate according to the cost functions as defined in Definition 3.1 under this alternate server selection choice, so these strategies constitute a Nash equilibrium. This system would remain centrally feasible even if server rates were scaled down by \(\Omega (n^{1/3})\), and hence the price of anarchy of such a patient queuing game can be polynomially large in n.
In this example, small queues mix between the top server and their own server equally because their long-run growth rate is zero in any case. It is possible to modify this example so that all queues mix among servers that offer equal long-run probability of success. This can be done by having each small queue send to the top server with probability \(p(n)\) and to a personal server with rate \(1/n^{1/3}\) with probability \(1-p(n)\), while the top queue sends deterministically to the last server with rate \(1/n^{1/3}\) (note that the effectiveness of the top server is at best \(1/n^{1/3}\) if the top queue were to deviate to there, by construction, so this is a best response). The parameter \(p(n)\) can be chosen so that the long-run average number of small queues sending to the top server is precisely \(n^{1/3}\) almost surely using the strong law of large numbers for Markov chains, so that each queue indeed mixes among servers that offer long-run success probability \(1/n^{1/3}\).
3.1 Algorithmic Description of Costs
As stated, we now construct a function \(r:(\Delta ^{m-1})^n \rightarrow [0,1]\) that we will show is equivalent to c. We will show that for any fixed \(\mathbf {p}\), the set \([n]\) of queues partitions into subsets \(S_1,S_2,\ldots\), where each queue in \(S_i\) group has the same aging rate and \(S_1\) ages the fastest, then \(S_2\), and so on, according to r (and so for c as well). To get a sense of the quantities that will arise before considering the general case, consider the simplest setting of a single queue and a single server (where there are no nontrivial strategies nor competition), with rates \(\lambda \gt \mu\). In any round where the queue has an uncleared packet, the age will first increase by 1 deterministically. With probability \(\mu\), the queue will succeed in clearing this packet, and the age will go down in expectation by \(\mathbb {E}[G]=1/\lambda\), where \(G\sim \text{Geom}(\lambda)\) is independent of whether or not the server succeeds. Therefore, the expected change in this queue’s age will be \(1-\mu /\lambda \gt 0\), and we expect that the queue will asymptotically age at this rate.
In general, with multiple queues and servers, the actual values of \(c_i\) are best described via a recursive algorithm that computes the rates, which we give in the following. The intuition is that \(S_1(\mathbf {p})\) will be the subset that minimizes the ratio of expected packets they clear collectively given \(\mathbf {p}\), assuming that they have priority over all other queues, divided by their sum of arrival rates. This quantity arises by viewing this subset as a single large queue as in the preceding single queue example. Conditioned on this set \(S_1\) of queues growing fastest, they will typically have priority, and then we recurse to find the lower groups. The algorithm begins by initializing \(k=1\) and \(I=[n]\):
(1)
Compute the minimum value over all nonempty subsets \(S\subseteq I\) of
This gives the expected number of packets cleared by S if all queues in S send in a timestep and they have priority over all other queues, divided by their sum of arrival rates.
(2)
If this value is at least 1, then no subset of queues will have linear aging, so set \(S_k=I\), \(r_i(\mathbf {p})=0\) for all \(i\in S_k\), and terminate. Otherwise, set \(S_k\) to be the minimizer of the previous quantity over all nontrivial subsets of I, chosen to be of largest cardinality in the case of degeneracies.12 In this case, for each \(i\in S_k\), \(r_i(\mathbf {p})\) gets set to
For \(k=1\), we refer to any subset with the minimum ratio as a tight, or minimizing, subset.
(3)
Update the server rates \(\mu _j\) as \(\mu _j\leftarrow \mu _j\prod _{i\in S_k}(1-p_{i,j}).\) In other words, \(\mu _j\) gets discounted by the probability a queue from \(S_k\) sends to server j (assuming that all of these queues are sending). Update \(I\leftarrow I\setminus S_k\), \(k\leftarrow k+1\), and recurse on I with \(\mathbf {\mu }\) and \(\mathbf {p}_I\) if nonempty.
As many of these quantities will appear often, we make the following conventions: for any subsets \(S,S^{\prime }\) such that \(S\subseteq [n]\setminus S^{\prime }\), define \(\lambda (S)\) as the sum of arrival rates of packets to a set of queues S, and \(\alpha (S\vert \mathbf {p},\mathbf {\mu },S^{\prime })\) as the expected number of packets cleared from queues in S with service rates \(\mathbf {\mu }\), if the queues in \(S^{\prime }\) have priority, S has priority over all other queues, and all queues in \(S\cup S^{\prime }\) send packets in the round:
and then let
denote the ratio of expected number of packets cleared by S when having priority over all members but \(S^{\prime }\), normalized by the expected number of new packets received in each round by S.
Let \(S_{k}(\mathbf {p},\mathbf {\mu },\mathbf {\lambda })\) be the \({k}\)th set output by the preceding algorithm. When \(\mathbf {p},\mathbf {\mu },\mathbf {\lambda }\) are clear from context, we will suppress them. We write \(U_k=\cup _{\ell =1}^kS_{\ell }\) as the set of queues in the top k groups outputted by the algorithm, with \(U_0=\emptyset\). We will write \(f_k = f(S_k\vert U_{k-1})\), and we use \(g_k=\max \lbrace 0,1-f_k\rbrace\) for the rate of the kth outputted set, which is equal to \(r_i(\mathbf {p},\mathbf {\mu },\mathbf {\lambda })\) for any \(i\in S_k(\mathbf {p},\mathbf {\mu },\mathbf {\lambda })\). From the recursive construction,
where \(\mu ^{\prime }_j = \mu _j\prod _{i\in U_k(\mathbf {p},\mathbf {\mu },\mathbf {\lambda })}(1-p_{ij})\) for all \(j\in [m]\). In other words, having found \(U_k\), \(S_{k+1}\) is the largest minimal set among the remaining elements, but where the \(\mathbf {\mu }\) rates have been reweighed by the probability no element of \(U_k\) sends to each server. These quantities are compiled in a table in Appendix D for easy reference.
Our main probabilistic result about the function r is that this is indeed equivalent to the cost function c of the patient queuing games. In fact, we prove the following, stronger result.
We will prove our game-theoretic results assuming this theorem; however, as the proof is quite nontrivial and rather involved, we defer the proof to Section 6.
3.2 Properties of Rate Function
We first establish basic properties of the output of the algorithm that will be useful in studying the analytic properties, as well as in proving that this algorithm gives the correct asymptotic rates. Throughout, we will view f as the quotient \(\alpha /\lambda\) when invoking Fact 3.1.
Clearly, for fixed S, the function \(f(S\vert T)\) is nonincreasing in T as a set function. We repeatedly use the following fact, which can be seen simply by expanding the definition of f.
Next, we characterize some structure in the minimizing subsets at each step of the algorithm, which will allow us to choose the \(S_k\) canonically as the largest cardinality minimizer. To do this, we first show that the function \(\alpha (\cdot)\) is submodular.
Now, recall that the relevant functions in the construction of the preceding algorithm is the set function \(f=\alpha /\lambda\). As a consequence of the fact that this function is the ratio of a submodular function with a modular function, we will be able to gain significant closure properties of the tight subsets (as defined earlier), which will end up being critical in establishing both game-theoretic and probabilistic properties of our systems.
From Lemma 3.5, it will nearly immediately follow that the outputted rates are strictly monotonic decreasing in the groups: as mentioned, \([n]=S_1\sqcup S_2\sqcup \ldots\) is meant to give a partition into groups that age together, where \(S_1\) is the fastest aging group, \(S_2\) the next fastest, and so on. As such, the disjoint subsets iteratively output by the algorithm satisfy the intuition that motivates the construction.
With these basic properties, we can obtain an important structural result that will prove fruitful in establishing the existence of equilibria. We defer the somewhat technical proof to Appendix D.1.
With these structural results, we can turn to showing our first game-theoretic property of this game, for now assuming that the costs are given by r, the output of the algorithm of Section 3.1: namely, that equilibria exist. Although the cost functions are not quite convex, by restricting each component to a line that varies only a single queue’s strategy, one can deduce enough structure that allows for an application of Kakutani’s theorem. We record this result here while deferring the proof to Appendix D.1.
3.3 Price of Stability and Independence
In Section 4, we will establish one of our main results, a tight bound of \(\frac{e}{e-1}\) on the price of anarchy in the patient queuing game. Recall that this bound asserts that in any patient queuing system that is centrally feasible, if the arrival rates are decreased by a factor of \(\frac{e}{e-1}\), then every Nash equilibria of the resulting game will be stable. In this section, we consider the intermediate and complementary notion of the price of stability, which controls the efficiency loss of the best Nash equilibrium. Formally, we have the following definition.
By definition, the price of stability lies between 1 and the price of anarchy.
Another measure is the price of independence. Recall that the coordinated queuing strategy in the proof of Theorem 2.2 is highly centralized, in the sense that queues never collide due to the central scheduling.13 How well can all agents do when they are not necessarily selfish but are restricted to stationary product strategies? In other words, how well can all agents do when they attempt to make the system as stable as possible for all agents but are restricted to product strategies? Such a measure decouples the contribution in the price of anarchy of selfishness and independence (i.e., the inability to coordinate due to product distributions). Formally, we have the following definition.
In our setting, it is actually not very difficult to see that these two quantities are precisely the same. One direction is trivial, whereas if \(\alpha \ge 1\) is such that there exists a \(\mathbf {p}^*\) with \(c_i(\mathbf {p}^*)=0\) in \(\mathcal {G}(\alpha)\) for all \(i\in [n]\), \(\mathbf {p}^*\) is clearly a Nash equilibrium in \(\mathcal {G}(\alpha)\) as well as it trivially holds that no agent can possibly improve their rate. In fact, we observe that there is a more general correspondence, as shown in the following preposition.
We now give a simple argument showing that the price of independence, and therefore also the price of stability, is at most \(\frac{e}{e-1}\). This will later also be a consequence of our bound on the price of anarchy, but the argument is substantially simpler.
It is not difficult to see that the price of independence is strictly greater than 1; almost any nontrivial example will certify this. For instance, the following example shows that it is at least \(9/8\).
We suspect that the price of independence is significantly less than \(\frac{e}{e-1}\), but we leave the determination of its value as an interesting analytical question for future work.
4 Price of Anarchy of Patient Queuing
In this section, we turn to the game-theoretic problem of understanding what condition ensures the stability at any equilibrium profile assuming Theorem 3.3—we return to proving that this is valid in the next section. By considering the quality of deviations by a queue at a Nash equilibrium to a single other server, it is possible to show that the price of anarchy is always at most 2. With more careful, continuous deviations, we in fact show that this factor is loose with patience, and the correct bound is \(\frac{e}{e-1}\approx 1.58\).
The following simple example shows that this is the best possible constant factor: fix \(\epsilon \gt 0\) small and suppose that there are n queues and n servers, with \(\mathbf {\lambda }=(1-1/e+\epsilon ,\ldots , 1-1/e+\epsilon)\) and \(\mathbf {\mu }=(1,\ldots ,1)\), and \(\mathbf {p}\) has every queue uniformly mixing among the servers. It is easy to see by symmetry that this system is Nash with \(S_1=[n]\), for if a queue deviates from this uniform distribution, this does not change the worst ratio in the algorithm. Moreover, for any fixed \(\epsilon \gt 0\), as \(n\rightarrow \infty\), this system becomes unstable. One can check that
so that \(r(S_1)=\max _i c_i(\mathbf {p})\gt 0\). Our main result asserts that this is the worst case, where every queue is maximally colliding subject to being Nash. Concretely, we prove the following instance-dependent bound from which the claimed factor immediately follows.
We now prepare for the proof of Theorem 4.1. The idea will be to continuously deform the Nash profile toward a highly symmetrized strategy vector while only weakly decreasing \(f(S_1)\). At the end of this process, we obtain a lower bound on this value at Nash. To do this deformation and ensure monotonicity of the growth rate, we must at some point use the Nash property. The difficulty lies in the form of the f functions; recall that as \(S_1\) is the set of all queues growing at the fastest rate as the union of all tight subsets, it can have many proper tight subsets, and each queue \(i\in S_1\) thus has to locally optimize all of the functions \(f(S\vert \mathbf {p})\) with \(S\ni i\)simultaneously at Nash (see Figure 2 for an interesting example). In particular, if queue \(i\in S_1\) at Nash, one possible deviation may weakly decrease \(f(S)\) for some tight subset \(S\ni i\), whereas another deviation may weakly decrease \(f(S^{\prime })\) for some different tight subset \(S^{\prime }\ni i\). In other words, each queue may be constrained by multiple different objective functions at Nash, making it difficult to generically argue about why any given deviation decreases performance. We overcome this barrier via Proposition 4.3 by connecting the incentives for each queue in \(S_1\) with the structure guaranteed by Lemma 3.5.
Fig. 2.
With this result, we may finally return to the proof of Theorem 4.1.
5 No-Regret Learning in Strategic Queuing Systems
Our work in the previous sections provides tight bounds for strategic but stationary behavior in this queuing model. However, it is unclear how agents might reach such a state, let alone as the result of natural dynamics. In this section, we therefore turn to a dynamical setting where agents adaptively update their behavior using no-regret learning algorithms. In the no-regret queuing system we analyze, each queue aims to get their packets served as efficiently as possible. At each timestep, they aim to maximize the probability that their packet gets served, measuring value for a time period with the number of packets served. The effective rate of service at a server j depends on its rate \(\mu _{j}\), as well as on the competition for service by the other queues.
In this section, we will model queues as learners, and will make the assumption that for a parameter w, each queue will satisfy a no-regret learning guarantee on the number of packets served during each window of w time length. To formalize this assumption, we need a few definitions.
Note that all of these random variables are with respect to the same sample path; the \(X^{i,j}_t\) will depend on all previous randomizations and choices by the queues, as these implicitly yield the priorities of the queues. In other words, \(\text{Reg}_i(w)\) of queue i on some fixed window of length w is defined to be the (random) difference between the number of packets queue i cleared on these w periods compared to the backward-looking number of packets she would have cleared had she simply always sent to the best single server, where the comparison is in hindsight to the best single server on the realized sample path—that is, for each time t and server j, we have \(X^{i,j}_t=1\) if at time t server j was successful (regardless of if a packet was sent there then), and the packet that queue i sent had priority over any packet sent there at that time.16
We now make the following assumption on the regret of queuing strategies.
For instance, this assumption holds with EXP3.P.1 with the form of the regret scaling like \(\sqrt {wm\ln (m w/\delta)}=o(w)\) [3]. Note that this high-probability guarantee is possible in our setting even in the priority model where the random variables of success at each server from the perspective of each queue at each timestep depend on all previous actions (via timestamps and priorities), as well as the actions of the other queues in the current time period (e.g., see the discussion in Section 9 of Auer et al. [3]). This property is standard and necessary in applying learning algorithms to multi-player games. Using EXP3.P.1 ensures that such a guarantee holds simultaneously for each window of this length, and not only a fixed window, so the players would not have to be aware which window of size w is relevant for our analysis. This is true as EXP3.P.1 mixes in uniform exploration to guarantee that the probabilities remain high enough throughout the algorithm, allowing us to adapt the classical no-regret analysis starting at any timestep for the window of the next w timesteps.
5.1 Stability of No-Regret Queuing Systems
Our second main result shows that if the queuing system has enough slack and all queues satisfy an appropriate high-probability no-regret guarantee, then the queuing system is strongly stable. To this end, we make the following feasibility assumption asserting that a queuing system with servers scaled down by \(1/2\) would remain feasible.
We will use \(\eta\) to denote the maximum such value that this inequality holds. The parameter \(\eta\) controls the quality of learning required for our results. With these definitions in order, we may formally state our main result on stability of no-regret learners.
The technical tool we use to prove Theorem 5.2 is the following result of Pemantle and Rosenthal [29]:
To apply this theorem, we must define an appropriate potential function of queue ages that satisfies the negative drift and bounded moments condition. We define for \(\tau \in \mathbb {N}\) the following potential functions that will feature prominently in the proof:
(8)
(9)
In other words, \(\Phi _{\tau }(\cdot)\) denotes the expected number of total packets in the system aged above \(\tau\), conditioned just on the ages \(\mathbf {T}_t\).
Remark 5.1.
This analysis crucially relies on using the Geometric system as opposed to the Bernoulli system. The reason is that the preconditions in Theorem 5.3 must hold conditioned on any history, even low probability events. In the Bernoulli system, this would require us to condition on too much. For instance, it is possible for there to be a queue with a very old packet and yet have received no other packets until the current timestep. Although unlikely to actually ever happen, this is a perfectly valid potential history. In this case, clearing this packet would lead to an arbitrarily large pth moment change, as her age would drastically decrease, and therefore the moment condition of Theorem 5.3 would be violated. Although intuitively this should only help the stability of the random process, the conditions in Theorem 5.3 are subtle (see the discussion in the work of Pemantle and Rosenthal [29].
Even if that obstruction can be managed suitably, the extra conditional information in the Bernoulli system highly complicates the analysis, as then one must reason about the priorities of the packets that have already been received before the present timestep. These could in principle be quite arbitrary. We avoid these complications in the Geometric system, as it allows us to only condition on current ages.
We now provide a simple construction showing that a partial converse holds: \(\frac{1}{2}\) is the best constant that can appear in Assumption 5.2 for a similar no-regret condition to be sufficient for stability as in Theorem 5.2. To set it up, let \(w_k=k^2\) for each \(k\ge 1\). Then we have the following theorem.
Theorem 5.4.
Partition time \(t=0,1,\ldots\) into consecutive windows, where the kth window has length \(w_k=k^2\). Then there exists a family of queuing systems with n queues and servers for each \(n\ge 1\) satisfying Assumption 5.2 with \(\frac{1}{2}+o_n(1)\) in place of \(\frac{1}{2}\) with the following properties: almost surely, each queue has zero regret on all but at most finitely many of the windows, but the system is not weakly stable.
The formal details are slightly technical, and therefore the proof is deferred to Appendix E, but the high-level idea is quite natural: for each \(n\ge 1\), consider the following system on n queues and n servers where we set \(\mathbf {\lambda }=(\frac{n+1}{n^2},\ldots ,\frac{n+1}{n^2})\) and \(\mathbf {\mu }=(1,\frac{n-1}{n^2},\ldots ,\frac{n-1}{n^2})\). Consider the strategy where every queue always sends to the rate 1 server. It is easy to see purely from expectations that the queue lengths are unbounded in expectation, as the sum of arrival rates strictly exceeds 1. However, it is intuitive that this strategy will “usually” be zero regret; if all the queues are similarly aged at the start of some window, then they should expect to clear roughly \(1/n\) fraction of the time on this window using this strategy, which strictly exceeds what they would get at any other server. We use standard concentration arguments and the Borel-Cantelli lemma to argue that this situation will happen all but finitely many times almost surely, thereby obtaining the claim.
6 Asymptotic Convergence
In this section, we finally return to proving Theorem 3.3, which asserted the equivalence between the long-run rates of queue ages that form the cost functions for the patient queuing game and the output of the algorithm given in Section 3.1. The high-level idea is to show that this identity holds for all \(i\in S_1\), then \(S_2\), and so on. The first step is showing that the maximum queue age grows by at most the desired rate on each long-enough window with high probability.
Proposition 6.1.
Fix \(\epsilon \gt 0\). For any integer \(a\in \mathbb {N}\), let \(w=a\cdot \lceil \frac{6}{\epsilon }\rceil ^{n-1}\). Suppose that it holds at time t that \(\max _{i\in [n]} T^i_t\ge w\cdot f_1\). Then
with probability at least \(1-C_1\exp (-C_2a)\), where \(C_1,C_2\gt 0\) are absolute constants depending only on \(n,\epsilon ,\mathbf {\lambda }, \mathbf {\mu },\mathbf {p}\), but not on a. More generally, for each \(s\ge 1\), if \(\max _{i\not\in U_{s-1}} T^i_t\ge w\cdot f_s\), then
with probability at least \(1-C_1\exp (-C_2a)\), where \(C_1,C_2\gt 0\) are absolute constants depending only on \(n,\epsilon ,\mathbf {\lambda }, \mathbf {\mu },\mathbf {p}\), but not on a.
We prove Proposition 6.1 in Appendix F using a delicate argument relying on a variety of concentration bounds. The key insight is that if a subset of much older queues S is likely to have priority on a long window of length w, the quantity \(w\cdot f(S_1)\cdot \lambda (S)\) is a lower bound on the expected number of packets cleared collectively by S on this window by definition of \(S_1\). The analysis gets complicated when there are multiple old queues, as although we know these queues collectively have priority over all young queues, we must argue about priorities within this subset to bound the growth of the maximum queue age. We deal with this by induction by carefully chaining together large windows to obtain a win-win analysis.
For Proposition 6.1 to yield anything useful, we will need a corresponding lower bound asserting roughly that if groups have separated according to what the algorithm asserts, then the aging rate of the average queue in a group grows at the conjectured rate. To that end, we prove the following result in Appendix F, which shows that if we have the conjectured aging separation between groups \(U_{k-1}\) and \(S_k\), then some weighted combination of the queue ages in \(S_k\) (whose significance will prove apparent momentarily) must rise quickly.
Proposition 6.2.
For any \(s\ge 1\) and any fixed \(\epsilon \gt 0\), the following holds: suppose that at time t, it holds that
Combined with Proposition 6.1, this will allow us to conclude that because the average queue and oldest queue in \(S_1\) ages at the desired rate almost surely, all queues in \(S_1\) must age at this rate almost surely. To extend this analysis to lower groups \(S_2\) and so on, we will use a similar analysis to show that the maximum age over every queue not in \(S_1\) grows at most like \(r(S_2)\). Then, because we know that every queue in \(S_1\) grows by a \(r(S_1)\gt r(S_2)\) rate, almost surely, eventually every queue in \(S_1\) will be much older than every queue not in \(S_1\), giving priority. We leverage this fact to show again that the average queue in \(S_2\) must grow by at least \(r(S_2)\), and therefore every queue in \(S_2\) grows at this rate almost surely. The proof for the lower groups \(S_3,\ldots\) is completely analogous. We now proceed to make this argument formal to prove Theorem 3.3.
Proof of Theorem 3.3
By the dominated convergence theorem, it suffices to show the second equality. We will show that the desired statement holds for each \(i\in S_1\), then \(S_2\), and so on. We first treat the case that the last outputted group \(S_k\) satisfies \(g_k=0\), or equivalently that \(f_k\ge 1\). Fix \(\epsilon \gt 0\) and partition time into consecutive windows of size \(w_{\ell }=\ell \cdot \lceil \frac{6}{\epsilon }\rceil ^{n-1}\). Let \(W_{\ell }=\sum _{q=1}^{\ell -1} w_{q}\) be the time period at the beginning of the \(\ell\)th window, and note that \(w_{\ell }=\Theta (W_{\ell }^{1/2})\).
Consider the following events for \(\ell =1,2,\ldots\)
Clearly, \(\Pr (C_{\ell })\le \Pr (A_{\ell }\vert B_{\ell })\). But by Proposition 6.1, we know that for some constants \(C_1,C_2\gt 0\) independent of \(\ell\),
The sum over \(\ell\) is thus finite, and so the first Borel-Cantelli lemma (Lemma B.1) implies that almost surely at most finitely many of the \(C_{\ell }\) occur. Equivalently, almost surely, for all but finitely many of the \(\ell\), either
Observe that for each of the intervals where the latter holds, the value during the interval is at most \(w_{\ell }\cdot f_{k} + w_{\ell +1}=O(W_{\ell }^{1/2})\). In particular, it is not difficult to see that almost surely \(\max _{i\in [n]\setminus U_{k-1}} T^i_{W_{\ell }}\) is either \(o(W_{\ell })\), in which case we are done, or grows by at most a rate of \((1-(1-\epsilon)\cdot f_{k})\cdot w_{\ell }\). Either way, as \(\epsilon \gt 0\) was arbitrary, we may take \(\epsilon \rightarrow 0\) to deduce the desired result that almost surely18
By Proposition 6.2, we know that \(\Pr (A_t)\le A\exp (-Bt)\) for some constants \(A,B\gt 0\) independent of t. Therefore, \(\sum _{t=1}^{\infty } \Pr (A_t)\lt \infty\), from which the Borel-Cantelli lemma implies that almost surely, for all but finitely many t,
Again, \(\Pr (C_{\ell })\le \Pr (A_{\ell }\vert B_{\ell })\). By Proposition 6.1, a now routine application of the Borel-Cantelli lemma implies that almost surely, for all but finitely many \(\ell\), either
But the latter event cannot happen infinitely often with positive probability, as this would imply \(\max _{i\in [n]} T^i_{W_{\ell }}=o(W_{\ell })\) infinitely often with nonzero probability, which violates (27). Therefore, it must be the case that almost surely, for all but finitely many \(\ell\), the former event holds. This implies that almost surely,
By Equations (25) and (26), we can also conclude that almost surely the same holds for the minimum \(i\in S_1\). Thus, \(r_i(\mathbf {p})=g_1\) for all \(i\in S_1(\mathbf {p})\) by definition of \(g_1\), proving the theorem for all queues in \(S_1\).
We now show how to extend this inductively to higher values of k with \(g_{k}\gt 0\). Suppose that we have shown for all \(i\in U_{k-1}\) that the desired almost sure limit holds, and now consider \(S_{k}\). A completely analogous argument using the windows \(w_{\ell }\) as earlier with Proposition 6.1 via the Borel-Cantelli lemma implies that almost surely,
Another completely analogous application of Proposition 6.2 and the Borel-Cantelli lemma implies that almost surely, at most finitely many of the \(C_{\ell }\) occur. In other words, almost surely, for all but finitely many \(\ell\), either
But the latter event cannot happen infinitely often with any nonzero probability by virtue of the inductive hypothesis and (28), as \(g_{k-1}\gt g_{k}\) by Lemma 3.6, which implies that these timestamps cannot be so close infinitely often. Therefore, it must be the case that for all but finitely many of the \(\ell\), the former event holds. As usual, this immediately implies that
The extension to all \(i\in S_{k}\) follows in the same manner as before by comparing with the average, completing the proof.□
Observe that Theorem 3.3 rather strongly and explicitly characterizes the linear almost sure asymptotic growth rates of each queue for any choices of randomizations. Our main result in Theorem 4.1 showed that with a small slack in the system capacity, each queue will be guaranteed sublinear asymptotic growth almost surely in any equilibrium. Although our objective function emphasizes the physical interpretation for each queue as an asymptotic linear growth rate, these incentives impose that queues are indifferent between sublinear growth rates. One could instead define the game just using the \(f_k\) quantities directly rather than taking the max with 0 as is needed to argue about the asymptotic growth rates via r. If queues started out equally backed up, the \(f_k\) quantities measure the linear speed at which their ages descend to zero. In this setting, we provide the following stronger conclusion, whose proof is deferred to Appendix F.
Corollary 6.3.
Fix \(\mathbf {p}\) and suppose that for some group \(S_k\) output by the algorithm, \(f_k\gt 1\), so that \(1-f_k\lt 0\). Then, for each \(i\in S_k\), \(T^i_t\) is strongly stable.
7 Conclusion and Open Questions
In this article, we have studied the outcomes of strategic queuing under multiple behavioral assumptions. When considering a patient queuing game restricted to stationary equilibria, we were able to use careful probabilistic arguments to establish the incentive structure of the game. We show that the correct bicriteria factor in this setting is \(\frac{e}{e-1}\) via a novel deformation argument. Turning then to no-regret dynamics, we show that the factor remains constant but degrades to a factor of 2. In total, our work shows that price of anarchy–style bounds are attainable in such repeated games with state in both equilibrium outcomes and learning outcomes, but that these results require substantially different techniques from the existing literature.
Our work leaves open several enticing questions. The most immediate technical question left open is Question 3 to determine the price of stability of the patient queuing game. Doing so would give a more fine-grained analysis of the cost of selfishness rather than independence here. Moreover, although our restriction to time-independent policies in the patient queuing game exhibits quite rich behavior while enabling us to completely characterize the game-theoretic properties, perhaps there is a larger space of strategies where similar results hold.
But a more pressing direction, in our view, is that the gap we show in equilibrium and no-regret outcomes suggests that no-regret behavior can (surprisingly) yield reasonable outcomes here, but may not necessarily be the correct notion of agent behavior in repeated games that carry strong interdependencies between rounds. It is an interesting question as to whether a natural form of non-cooperative learning can arrive at a Nash equilibrium of the patient version or at least result in stable outcomes without reaching an equilibrium. To that end, it may be necessary to explore the theoretical properties of more powerful learning algorithms in such settings that get the best of both worlds: namely, balancing current rewards while maintaining long-run perspective. Whether such results are possible is an exciting open direction in resolving some of the deficiencies of traditional price of anarchy results.
Equally as important, we wonder if it is possible to obtain a more general theory of price of anarchy in stochastic games. If so, it would be quite interesting to see whether such an analysis extends naturally to a reasonable learning dynamic as in the classical setting. The techniques in this work appear quite specialized to the queuing setting and differ significantly in the equilibrium and learning settings. We leave a systematic exploration of this question to future work.
Acknowledgments
We thank Christos Papadimitriou for valuable discussions and insights in the early stages of this work.
Footnotes
1
Note that “stability” here refers to game-theoretic stability of behavior, not the stability notion in our queuing system.
2
See Section 2.2 for precise definitions and the full specification.
3
We remark that in other queuing systems, the servers may also have a bounded size queue and would only send back (or drop) packets when they no longer fit on the queue; our simpler model without server queues makes the tradeoffs we want to study cleaner. A packet sent to a server is either served or returned and offers instantaneous feedback to the learning algorithms of the queues, in contrast to the bit more informative but delayed feedback available in real systems.
4
While not immediately obvious, we will show in Appendix C.3 that strong stability implies almost sure subpolynomial growth.
5
Namely, this random process mixes to a stationary distribution on \(\mathbb {N}\) with geometrically decreasing tail probabilities.
6
We remark that the analogous statement for weak stability holds if the inequalities in Equation (3) only weakly hold by the same proof.
7
Majorization is often used for probabilities, and hence defined so that the total sums are equal; we omit this condition in our work.
8
This can also easily be seen directly using Theorem 5.3. Negative drift when exceeding \(Q_t=0\) is obvious, and as queue sizes can change by at most n in total between steps, increments are clearly bounded in \(L^p\) for any \(p\ge 0\).
9
By this, we mean that conditioned on the (randomized) strategies of all other queues in a given timestep, each queue sends to a server with the highest probability of success.
10
Note that although \(T^i_t\ge 0\) by definition, it is possible that \(\widetilde{T}^i_t\gt t\). The interpretation of this is that the queue has cleared all of her packets at time t and will receive her next one at time \(t=\widetilde{T}^i_t\), or equivalently, in \(\widetilde{T}^i_t-t\) steps in the future from the perspective at time t.
11
In other words, the price of anarchy of a centrally feasible system is the supremum of values of \(\alpha\) such that when all queue arrival rates are scaled down by \(\alpha\), there nonetheless exists a Nash equilibrium and some queue that suffers nonzero linear aging.
12
We show in Lemma 3.5 that this choice is unique and canonical.
13
We remark that because all queues are stable in the coordinated solution and thus no deviation is profitable, this strategy can be interpreted as a correlated equilibrium of the patient queuing game where at each time, each agent is told to play a (random, coordinated) Dirac strategy (and possibly abstaining from sending in that round). Therefore, if we allow for public randomness to induce coordination, the price of stability with respect to the larger class of correlated equilibria is simply 1.
14
For any fixed k, it is not difficult to determine the optimal value of \(\mathbf {x}\) to give the tightest lower bound. It suffices to maximize the numerator, which is concave. By standard Karush-Kuhn-Tucker conditions at optimality, for all \(j,j^{\prime }\in [m]\) such that \(x_j\gt 0\), we must have \(\mu _j (1-x_j/k)^{k-1} = \mu _{j^{\prime }} (1-x_{j^{\prime }}/k)^{k-1}\), and \(x_{\ell }=0\) for all lower indices.
15
In the example given before of a system with multiple tight subsets, the level-1 subset is the subset of two queues that split between the inner and outer servers. The level-2 subsets are the two singleton sets of the outer queues sending each to their own outer server. Notice that these two subsets indeed disjointly mix.
16
Note that this notion of regret does not take into account that had a queue i cleared a packet at at server j instead of another queue \(i^{\prime }\), at a later time \(i^{\prime }\) would have had an older packet and therefore higher priority.
17
Note that this is possible as \(\varphi _{\gamma }(w)=o(w)\) for any fixed \(\gamma\), as well as the exponential decay in w of the concentration bounds in Equations (29) and (30) in the appendix.
18
For any \(\epsilon \gt 0\), we have directly shown that the statement holds for t of the form \(t=W_{\ell }\) for \(\ell \ge 1\). For any t such that \(W_{\ell }\le t\lt W_{\ell +1}\), \(T^i_t\) cannot be more than \(w_{\ell }\) from the value at \(W_{\ell }\), as ages can increase by at most 1 in each period. This implies that on any such intermediate time, the difference in the numerator from the value at \(t=W_{\ell }\) is \(O(w_{\ell })=O(t^{1/2})=o(t)\) and thus vanishes in the limit when divide by t, so the \(\limsup\) may be taken over all t, not just the sparsified sequence.
Appendices
A Basic Inequalities
Fact A.1.
Suppose that \(a,b,c\ge 0\) and that \(a-b\le c.\) Then
The second follows from assuming without loss of generality that \(a\ge b\) and observing that the claim is implied by \(\sqrt {a}-\sqrt {b}\le \sqrt {a-b},\) which holds by squaring and simple algebra.□
Fact A.2.
Suppose that \(a,b,c\ge 0\). Then \(a-b\ge c\) implies that
Recall that we defined the following two weighted \(\ell _p\) norms on \(\mathbb {R}^n\): \(\Vert \mathbf {x}\Vert _{\mathbf {\lambda },1}\triangleq \sum _{i=1}^n \lambda _i \vert x_i\vert\) and \(\Vert \mathbf {x}\Vert _{\mathbf {\lambda },2}\triangleq \sqrt {\sum _{i=1}^n \lambda _i x_i^2}.\)
We will use the following concentration results throughout the article.
Lemma B.1 (First Borel-Cantelli Lemma, Theorem 2.3.1 of Durrett [15]).
Let \(A_1,A_2,\ldots\) be a sequence of events with \(\sum _{i=1}^{\infty }\Pr (A_i)\lt \infty\). Then with probability 1 at most finitely many of the \(A_i\) occur.
Lemma B.2 (Azuma-Hoeffding).
Let \(\lbrace \mathcal {F}_k\rbrace _{k\le n}\) be any filtration and let \(A_k,B_k,\Delta _k\) satisfy the following conditions:
(1)
\(\Delta _k\) is \(\mathcal {F}_k\)-measurable and \(\mathbb {E}[\Delta _k\vert \mathcal {F}_{k-1}]=0\). In other words, the \(\Delta _k\) form a martingale difference sequence.
(2)
\(A_k,B_k\) are \(\mathcal {F}_{k-1}\)-measurable and satisfy \(A_k\le \Delta _k\le B_k\) almost surely.
where \(Z_k\) is the kth partial sum of the \(X_i\) (i.e., \(Z_k=\sum _{i=1}^k X_i\)).
Lemma B.4 (Theorem 1 in the Work of Witt [42]).
Let \(X_1,\ldots ,X_n\) be i.i.d. \(\text{Geom}(\lambda)\) random variables so that \(\mathbb {E}[X_i]=\frac{1}{\lambda }\). Let \(s=\frac{n}{\lambda ^2}\) and \(Z_n=\sum _{i=1}^n X_i\). Then for all \(\delta \gt 0\),
First apply Lemma B.4 for each partial sum \(Z_j\) and \(\delta =\epsilon n/\lambda\). By considering the cases \(j\le \epsilon n\) and \(j\gt \epsilon n\), respectively, it follows for all \(j\le n\) that \(\min \lbrace \delta /s,\lambda \rbrace \ge \epsilon \lambda .\) Lemma B.4 implies that for all \(j\le n\),
Let \(\lbrace G^i_k\rbrace _{i\in [n],k\in [w]}\) be a family of independent geometric random variables such that for all \(i,k\), \(G^i_k\sim \text{Geom}(\lambda _i)\). Let \(Z_{q}^i=\sum _{k=1}^{q} G^i_k\). Then for any \(\epsilon \in [0,1]\),
This follows immediately from Corollary B.5 and a union bound.□
Lemma B.7.
Let \(\lbrace I^j_k\rbrace _{j\in [m], k\in [w]}\) be an independent Bernoulli ensemble such that for all \(j,k\)\(I^j_k\sim \text{Bern}(\mu _{j})\) with \(\mu _1\ge \mu _2\ge \ldots \ge \mu _m\). Then for all \(\delta \in [0,1]\),
The result then follows from a union bound over all \(q\in [m]\).□
The following characterizes the moments of geometric distributions.
Lemma B.8.
Let \(X\sim \text{Geom}(\lambda)\). Then for all \(k\ge 1\), \(\mathbb {E}[X^k]\le \frac{c_k}{\lambda ^k}\), where \(c_k\) is a constant depending on k but not on \(\lambda\).
Lemma B.9.
Let \(X\sim \text{Bin}(n,p)\), where \(p\in (0,1]\) is considered fixed. Then, for any fixed integer \(k\ge 0\), \(\mathbb {E}[X^k]\asymp n^k\), where the implicit constants depend on p and k, but not n.
Proof.
By definition, \(X=\sum _{i=1}^n X_i\), where \(X_i\sim \text{Bern}(p)\) are i.i.d. We clearly have
Note that products of these indicator variables remain indicator random variables, and it is easy to see that for any indices \(1\le i_1,\ldots ,i_k\le n\), \(p^k\le \mathbb {E}\left[\prod _{j=1}^k X_{i_j}\right]\le p.\) Taking expectations and summing, we obtain \(p^k n^k\le \mathbb {E}[X^k]\le pn^k\).□
Multiply the equations by appropriate scalars in the definition and sum to obtain the inequality.□
Lemma C.2 (Theorem B.2. in the Work of Marshall et al. [27]).
Suppose that \(\mathbf {x},\mathbf {y}\in \mathbb {R}^n_+\) are in sorted order, have equal total sums, and \(\mathbf {x}\) majorizes \(\mathbf {y}\). Then \(\mathbf {y}=P\mathbf {x}\) for some doubly stochastic matrix P.
Corollary C.3.
Suppose that \(\mathbf {x},\mathbf {y}\in \mathbb {R}^n_+\) are in sorted order and \(\mathbf {x}\) strictly majorizes \(\mathbf {y}\). Then there exists a doubly stochastic matrix P such that \(P\mathbf {x}\) is strictly greater than \(\mathbf {y}\) componentwise.
Proof.
By continuity and strict majorization, it is possible to scale all entries of \(\mathbf {x}\) by nonnegative factors strictly less than 1 to obtain a vector \(\mathbf {x^{\prime }}\) that majorizes \(\mathbf {y}\) and so that the sums are equal. Applying the previous result, we have \(P\mathbf {x^{\prime }}=\mathbf {y}\) for some doubly stochastic P. But \(P\mathbf {x}\) strictly exceeds \(P\mathbf {x^{\prime }}\) componentwise, giving the result.□
Theorem C.4 (Theorem 2.2, Restated).
Let \(\mathbf {\lambda }\in (0,1)^n\) and \(\mathbf {\mu }\in [0,1]^m\) be the arrival and service rates, respectively. Then the preceding queuing system is strongly stable for some centralized (coordinated) scheduling policy if and only if for all \(1\le k\le n\),
The proof of sufficiency was given in the main text, so we just show necessity. It suffices to show that if one of the preceding inequalities fails, the first moment of \(Q_t\) is unbounded over time. To that end, first suppose that strict majorization is strictly violated, namely there is some \(k\le n\) such that \(\sum _{i=1}^k \lambda _i\gt \sum _{i=1}^k \mu _i\). Let \(Q^{\le k}_t=\sum _{i=1}^k Q^i_t\) be the total number of packets at the k queues with highest arrival rate. Under any scheduling policy, the difference between \(Q_{t+1}^{\le k}\) and \(Q_t^{\le k}\) is bounded below in expectation by \(\sum _{i=1}^{k}\lambda _i-\sum _{i=1}^k \mu _i\gt 0\), as \(\sum _{i=1}^k \lambda _i\) new packets arrive for these queues at each step in expectation, and at most \(\sum _{i=1}^k \mu _i\) packets can be cleared in expectation. In particular, as \(Q_t:=\sum _{i=1}^n Q_t^i\ge Q_t^{\le k}\) surely by nonnegativity of queue sizes, telescoping gives
Now suppose that strict majorization is only weakly violated, namely there is some \(k\le n\) such that \(\sum _{i=1}^k \lambda _i=\sum _{i=1}^k \mu _i\). Again, it is sufficient to show that \(\mathbb {E}[Q^{\le k}_t]\rightarrow \infty\). The previous argument shows that \(Q^{\le k}_t\) is a nonnegative submartingale for any measurable scheduling policy. If \(\lim _{t\rightarrow \infty }\mathbb {E}[Q^{\le k}_t]= \sup _t \mathbb {E}[Q^{\le k}_t]\lt \infty\), then the Martingale convergence theorem (Theorem 4.2.11 of Durrett [15]) implies that there exists an almost surely finite random variable \(Q^{\le k}_{\infty }\) such that \(\lim _{t\rightarrow \infty } Q_t^{\le k}\rightarrow Q_{\infty }^{\le k}\) almost surely. But \(Q_{t+1}^{\le k}-Q_t^{\le k}\) is integer valued and not equal to zero with nonzero probability unless \(\mathbf {\mu }\) and \(\mathbf {\lambda }\) are degenerate, which violates our assumption. This implies that the pointwise limit cannot exist unless it is infinite. This violates the almost sure finiteness of \(Q_{\infty }^{\le k}\).□
C.2 Impossibility for No-Priority Model
Next, we give the promised example that the simpler queuing model is too weak to give any sub-polynomial bicriterion result.
Theorem C.5 (Theorem 2.4, Restated).
In the alternate model, for large enough n, there exists a centrally feasible queuing system with n queues and servers with the following property: the system remains feasible even if \(\mathbf {\mu }\) is scaled down by \(\Omega (n^{1/3})\) and it is possible for all queues to be in a Nash equilibrium of the stage game at each timestep (and in particular, satisfy no-regret properties as in Assumption 5.1), yet the system is not strongly stable.
Proof.
Let \(\lambda _1=2/n^{1/3}\), whereas \(\lambda _2=\ldots =\lambda _n=1/n^{2/3}\); let \(\mu _1=1/2\) and \(\mu _2=\ldots =\mu _n=c/n^{1/3}\), where \(c=c(n)=\Theta (1)\) is such that
We proceed by considering an adversarial, centralized scheduler that suggests actions for each queue in each round while ensuring that each agent achieves no regret. The schedule is as follows: in each round, the scheduler chooses \(n^{1/3}/2-1\) of the low rate agents arbitrarily to send to the unique high rate server, if that many low rate agents have packets, as well as the high rate queue. All other low rate agents send to distinct low rate servers. If fewer that \(n^{1/3}/2-1\) low rate servers are active, then the scheduler schedules all active queues to the high rate server.
By standard Chernoff bounds, the number of low rate queues that receive a packet in a given round is at least \(n^{1/3}/2-1\) with probability at least \(1-\exp (-\Omega (n^{1/3}))\), so with at least this probability there are enough low rate queues for the first case to hold. The preceding inequalities show that in such a round where there are at least \(n^{1/3}/2-1\) active low agents, the suggested schedule is a Nash equilibrium, and the probability of success for each queue sending to the high server is exactly \(1/n^{1/3}\) in such rounds. When this does not occur, the suggested schedule is still Nash, and the probability of success for any queue sending to the high rate server is at most \(1/2\). Therefore, in any timestep where the high rate queue has a packet, by the Law of Total Probability, her probability of clearing is upper bounded by
where the inequality is for sufficiently large n. As a result, in expectation \(Q^1_{t+1}-Q^1_t\) is lower bounded by a nonzero constant (depending on n but not on t), and therefore \(Q^1_t\) diverges with t in expectation by telescoping. This shows that this system is not strongly stable, even though every queue plays a Nash strategy at each time. Note that this system would still be centrally feasible if all queues were scaled up by a factor of \(\Theta (n^{1/3})\), giving the result.
To see that this is no regret with high probability on each fixed window, define \(X^{i,j}_t\) to be the indicator variable that queue iwould succeed in clearing a packet at server j at time t, and let \(\sigma _i(t)\) be the identity of the server that queue i chooses at time t. Note that if queue i is empty at time t, then \(X^{i,j}_t=0\) for all j and \(\sigma _i(t)\) can be arbitrary. Then define \(\Delta ^{i,j}_t=X^{i,\sigma _i(t)}_t-X^{i,j}_t\). By the preceding Nash discussion, \(\mathbb {E}[\Delta ^{i,j}_t\vert \mathcal {F}_{t-1}]\ge 0\) for all t in both cases as described earlier, where \(\mathcal {F}_{t}\) denotes the past history of this process up to time t. This holds regardless of if queue i is really sending in that round (in which case the quantity is just 0).
Therefore, as \(\vert \Delta ^{i,j}_t\vert \le 2\) surely, we may apply the Azuma-Hoeffding inequality (Lemma B.2) to see that on any fixed window of length w (and reindexing time so that time progresses \(t=1,\ldots ,w\) on this window for notational ease),
By a union bound, for each queue i, this holds for all servers \(j\in [m]\) with probability at most \(m\cdot \exp \left(-\alpha ^2/w\right)\). Note that if \(\alpha =\sqrt {w\ln (m/\delta)}\), this quantity is at most \(\delta\). As such, by definition of regret, on any fixed period of length w, with probability at least \(1-\delta\), this strategy satisfies \(\text{Reg}_i(w)\le \sqrt {w\ln (m/\delta)}=o(w)\), as needed.□
We now show the desired relations between our notions of stability and almost sure subpolynomial growth. We need the following technical lemma.
Lemma C.6.
Suppose that a nonnegative sequence of random variables \(X_1,X_2,\ldots\) satisfies \(X_t\le X_{t-1}+L\) surely for some fixed \(L\ge 0\) and any t, as well as the moment condition \(\sup _{t}\mathbb {E}[X_t^p]\le C_p\) for some constant \(C_p\ge 0\) for each \(p\ge 1\). Then, for any \(c\gt 0\), almost surely, \(X_t=o(t^c)\).
Proof.
Fix \(\epsilon \gt 0\). It suffices to prove the lemma for \(0\lt c\lt 1\), so take \(0\lt d\lt c\) and set \(p=d^{-1}\). We do this by proving the desired asymptotics on a conveniently chosen subsequence, then interpolate to intermediate values. Indeed, by Markov’s inequality, for each \(k\ge 1\),
Summing over k and observing that the right side is summable, we deduce from the first Borel-Cantelli lemma that almost surely, for all sufficiently large k, \(X_{k^{1+\epsilon }}\le k^{(1+\epsilon)d}.\) To extend this to all large enough t, suppose that t is such that \(k^{1+\epsilon }\le t\lt (k+1)^{1+\epsilon }\). By the one-sided boundedness, we know that almost surely, for such t and all large enough k,
where the bound on \(t-k^{1+\epsilon }\) arises from the mean value theorem. Clearly, this last expression is \(O(t^{\epsilon /(1+\epsilon)}+t^d)\). As this holds for arbitrary \(\epsilon \gt 0\), we may take \(\epsilon\) small enough so that this expression is \(o(t^c)\), as claimed.□
Lemma C.7 (Lemma 2.7, Restated).
If the Bernoulli and Geometric models characterize the same queuing dynamics, then strong (weak) stability in the Bernoulli system is equivalent to strong (weak) stability in the Geometric system. Moreover, if this holds, then strong stability in either system implies almost sure subpolynomial growth.
Proof.
Suppose that the dynamics are as stated so that the Bernoulli and Geometric dynamics yield completely equivalent processes. Then the distribution of \(Q^i_t\) conditioned on the value of \(T^i_t\) at time t is \(\text{Bin}(T^i_t,\lambda _i)\). Note that by the Law of Iterated Expectations, \(\mathbb {E}[(Q^i_t)^p]=\mathbb {E}[\mathbb {E}[(Q^i_t)^p\vert T^i_t]]\). But by Lemma B.9, \(\mathbb {E}[(Q^i_t)^p\vert T^i_t]\asymp (T_i^t)^p\) up to absolute constants depending only on p and \(\lambda _i\). Therefore, by taking expectations, the Bernoulli system and the Geometric system have equivalent stability properties. The second claim now follows from either form of strong stability from Lemma C.6, noting that either \(Q_t\) or \(T_t\) can increase by at most \(L=n\) in each timestep.□
Expected number of packets cleared by queues in S if all have packets in a round and have priority over all queues except for those in \(S^{\prime }\) and each such queue also has packets in the round
In this section, we fill in the deferred proofs showing analytic properties of r, the output of the algorithm described in Section 3.1.
Proposition D.1 (Proposition 3.7, Restated).
The function \(r:(\Delta ^{m-1})^n\rightarrow [0,1]^n\) given by \(r(\mathbf {p})=(r_1(\mathbf {p}),\ldots ,r_n(\mathbf {p}))\) is continuous.
Proof.
Fix \(\mathbf {p}^*\), a point we wish to show continuity at, and let \(\mathbf {p}^k\rightarrow \mathbf {p}^*\) be a convergent sequence in \((\Delta ^{m-1})^n\). It is easy to see that because the function \(\mathbf {p}\mapsto \min _{S\subseteq [n]} f(S\vert \mathbf {p})\) is clearly continuous as the minimum of finitely many continuous functions, the function \(\mathbf {p}\mapsto \max \lbrace 1-f(S_1(\mathbf {p})\vert \mathbf {p}),0\rbrace = g(S_1(\mathbf {p}))\) is continuous. Therefore, if \(\max _{i\in [n]} r_i(\mathbf {p}^*)=g_1(\mathbf {p}^*)=0\), then by Lemma 3.6, as \(\max _i r_i(\mathbf {p}^k)\rightarrow 0\), monotonicity yields \(r_i(\mathbf {p}^k)\rightarrow 0\) along the sequence for every \(i\in [n]\), proving continuity. We will now assume the harder case \(g_1(\mathbf {p}^*)\gt 0\).
Before proceeding, define \(\delta \gt 0\) to be the minimal nonzero gap between \(f(S\vert S^{\prime },\mathbf {p}^*)\) and \(f(T\vert S^{\prime },\mathbf {p}^*)\) over all choices of \(S,T, S^{\prime }\) such that \(S,T\subseteq [n]\setminus S^{\prime }\)—that is,
(33)
Note that \(\delta\) is strictly positive, as there are only finitely many choices of \(S,T,S^{\prime }\).
Fix \(0\lt \varepsilon \lt \delta\). Clearly, for any fixed \(S,S^{\prime }\) such that \(S\subseteq [n]\setminus S^{\prime }\), the function \(f(S\vert \mathbf {p}, S^{\prime })\) is continuous as a function of \(\mathbf {p}\). For this choice of \(\epsilon\), we may restrict to a tail of the sequence \(\lbrace \mathbf {p}^k\rbrace\) and reindex so that for all \(k\ge 1\), and all \(S,S^{\prime }\),
We now claim that for every \(k\ge 1\), the following holds in the algorithm’s outputted rates on \(\mathbf {p}^k\): although there exists any element \(i\in S_1(\mathbf {p}^*)\) that has not been outputted, the union of outputted subsets to that point must itself be a minimizing subset with respect to \(f(\cdot)\) evaluated with profile \(\mathbf {p}^*\), and that each element outputted so far has f value at \(p^k\) at most \(f(S_1\vert \mathbf {p}^*)+\varepsilon /3\).
To see this, we proceed inductively: at the beginning of the algorithm, for every tight subset \(S\subseteq S_1(\mathbf {p}^*)\) (by Lemma 3.5), we have by Equation (34) that
In particular, the first outputted subset must be a tight subset for \(\mathbf {p}^*\), and the rate of each element in that subset is at least the desired amount.
Suppose that this holds inductively, and now let \(S\subseteq S_1(\mathbf {p}^*)\) be the union of the initial outputted sets, which we know is tight. If \(S=S_1(\mathbf {p}^*)\), we are done, so suppose that there exists \(i\in S_1(\mathbf {p}^*)\setminus S\). Suppose that T is disjoint and such that \(S\cup T\subseteq S_1(\mathbf {p}^*)\) is tight. Such sets exist—for instance, \(T=S_1(\mathbf {p}^*)\setminus S\). From Fact 3.1,
and by Fact 2.1, the fact that both the left side and the first term on the right are minimal implies that \(f(T\vert S,\mathbf {p}^*)=f(S_1(\mathbf {p}^*)\vert \mathbf {p}^*)\). Then at the next step of the algorithm, we again have by Equation (34) that
By Fact 2.1, minimality of the first term on the right, and the fact that the left term is not minimal, it follows that \(f(T^{\prime }\vert S,\mathbf {p}^*)\ge f(S_1(\mathbf {p}^*)\vert \mathbf {p}^*)+\delta\) by definition. We then have by Equation (34) that
In particular, the next outputted set is such that \(S\cup T\subseteq S_1(\mathbf {p}^*)\) is tight, and the rate condition holds as well. We can iteratively apply this while \(S_1(\mathbf {p}^*)\) is not exhausted, proving that every element of \(S_1(\mathbf {p}^*)\) is outputted before any other element when running the algorithm on \(\mathbf {p}^k\), with the rate within \(\varepsilon /3\) of \(r_i(\mathbf {p}^*)\). As \(\varepsilon\) was arbitrary, this shows continuity on all components of \(S_1(\mathbf {p}^*)\).
Because we have shown that every queue of \(S_1(\mathbf {p}^*)\) is outputted before every queue not in \(S_1\), we can apply the recurrence as discussed in Equation (4) to show continuity for each queue in \(S_2(\mathbf {p}^*)\), just discounting \(\mathbf {\mu }\) as usual. The same argument restricted to \([n]\setminus S_1(\mathbf {p}^*)\) nearly shows continuity; the only difference is that the discounting of \(\mathbf {\mu }\) by the queues in \(S_1(\mathbf {p}^*)\) depends on \(\mathbf {p}^k\), not \(\mathbf {p}^*\), but as each \(f(S\vert S_1(\mathbf {p}^*),\mathbf {p},\mathbf {\mu }^{\prime })\) is jointly continuous in \(\mathbf {p},\mathbf {\mu }^{\prime }\), and the composition of continuous functions is continuous, the same argument holds with minimal modification. This proves continuity for the components of each subsequent group recursively, and thus of each component in \([n]\).□
With these results, we complete the proof of Theorem 3.8. We proceed as follows: fix any queue i, as well as any fixed probability choices \(p_{-i}\in (\Delta ^{m-1})^{n-1}\) by the other players, and any two \(p,p^{\prime }\in \Delta ^{m-1}\). Define for \(t\in [0,1]\),
For any fixed i, \(p_{-i}\in (\Delta ^{m-1})^{n-1}\), and \(p_i,p^{\prime }_i\in \Delta ^{m-1}\), the function \(h(t)\) is piecewise linear and has no local maxima on the interior.
Proof.
Let \(\mathbf {p}(t)=(tp_i+(1-t)p^{\prime }_i,p_{-i})\). By Proposition 3.7, h is continuous as the restriction of a continuous function, and it is easy to see that it must be piecewise linear in t by inspection. Indeed, as the algorithmic description of c takes minimums and maximums of finitely many linear functions, this yields a piecewise linear function with no jump discontinuities.
We now prove the last claim. It is sufficient to show that if h is increasing at \(t^{\prime }\), then it is increasing for all \(t^{\prime \prime }\gt t^{\prime }\). Suppose that this is violated for some \(t^{\prime }\lt t^{\prime \prime }\); by piecewise linearity, there must exist some \(t^*\) such that as \(t^{\prime }\lt t^*\lt t^{\prime \prime }\) where two lines intersect in the graph, and so that as \(t\rightarrow t^{*-}\), the slope is increasing while it is nonincreasing for \(t\rightarrow t^{*+}\).
Suppose that for all t that are sufficiently close to \(t^*\) from the left, i is outputted at step k of the algorithm. The only reason the slope can go from positive to nonpositive at \(t^*\) is that there is a change in which sets are outputted in the algorithm at some step \(\ell \le k\), which can happen only if some new set S including i gets selected for \(t\ge t^*\). However, as the rates of all sets not including i fixing any other disjoint set having priority are all constants with respect to t, this can only occur because at \(t^*\), some linear function \(f(S\vert \mathbf {p}(t),S^{\prime })\) went below the \(f(S^{\prime \prime }\vert \mathbf {p}(t),S^{\prime })\) that was previously selected at step \(\ell\), where \(S^{\prime }\) is the union of all sets outputted prior at t and \(S^{\prime \prime }\) is the set that was outputted next for all t close enough to the left of \(t^*\). If \(S^{\prime \prime }\) included i, this could only occur if \(r(S\vert \mathbf {p}(t),S^{\prime }):=\max \lbrace 0,1-f(S\vert \mathbf {p}(t),S^{\prime })\rbrace =1-f(S\vert \mathbf {p}(t),S^{\prime })\rbrace\) has larger positive slope than \(r(S^{\prime \prime }\vert \mathbf {p}(t),S^{\prime })\), so the slope of h would be strictly larger (and in particular, increasing) for all t sufficiently close to \(t^*\) on the right, contradicting our assumption that it is nonincreasing. If \(S^{\prime \prime }\) does not include i, then \(r(S^{\prime \prime }\vert \mathbf {p}(t),S^{\prime })\) is a constant with respect to t, so for \(r(S\vert \mathbf {p}(t),S^{\prime })\) to exceed it for t larger than \(t^*\) but be lower for t less than \(t^*\), the slope of \(r(S\vert \mathbf {p}(t),S^{\prime })\) must also be positive, another contradiction. Both cases lead to a contradiction, proving the claim.□
Corollary D.3.
Fix \(p_{-i}\in (\Delta ^{m-1})^{n-1}\). Then the set of global minimizers of \(r_i(\cdot ,p_{-i})\) form a nonempty, closed convex set.
Proof.
Note that global minima exist by continuity from Proposition 3.7 and the extreme value theorem. Let \(p_i,p^{\prime }_i\) be global minimizers; if we form the line between them in \(\Delta ^{n-1}\) (which is of course convex) and consider the function h defined on this line, then as there are no local maxima in the interior of h by the previous lemma, the maximum must lie at an endpoint. This immediately implies that every point on this line is also a global minimizer. Closedness of the set of global minimizers follows immediately from the continuity guaranteed in Proposition 3.7.□
Theorem D.4 (Theorem 3.8, Restated).
There exists a pure equilibrium of the game with costs given by \(r:(\Delta ^{m-1})^n\rightarrow [0,1]^n\).
Proof.
We will prove this by appealing to Kakutani’s theorem. Let \(B:(\Delta ^{m-1})^n\rightrightarrows (\Delta ^{m-1})^n\) be the correspondence that maps \(\mathbf {p} \in (\Delta ^{m-1})^n\) to the set \(B(\mathbf {p})\subseteq (\Delta ^{m-1})^n\), where \(B(\mathbf {p})=\lbrace \mathbf {p}^{\prime }\in (\Delta ^{m-1})^n: p^{\prime }_i\in \arg \min _{x\in \Delta ^{m-1}} r_i(x,p_{-i})\rbrace\) is the best-response correspondence.
We must verify the preconditions of Kakutani’s theorem. \((\Delta ^{m-1})^n\) is clearly compact and convex, and we have shown that \(B(\mathbf {p})\) is nonempty and is a convex set by Corollary D.3. The final condition to show is that r has closed graph, which can be done by a completely standard argument; we must show that if \((\mathbf {p}^k,\mathbf {s}^k)\rightarrow (\mathbf {p},\mathbf {s})\), where \(\mathbf {s}^k\in B(\mathbf {p}^k)\), then \(\mathbf {s}\in B(\mathbf {p})\). Suppose for a contradiction that this does not hold for some such convergent sequence. This implies that for some \(i\in [n]\), there exists some \(s^{\prime }_i\) and \(\epsilon \gt 0\) such that
As \(p_{-i}^k\rightarrow p_{-i}\), the continuity of r from Proposition 3.7 gives for large enough k that \(r_i(s^{\prime }_i,p^k_{-i})\le r_i(s^{\prime }_i,p_{-i})+\epsilon\). Thus,
where the last inequality holds for all large enough k by continuity of r. This contradicts the optimality of \(\mathbf {s}^k\in B(\mathbf {p}^k)\), proving that r has a closed graph. Kakutani’s theorem then immediately implies the existence of a pure equilibrium—that is, \(\mathbf {p}\in (\Delta ^{m-1})^n\) such that \(\mathbf {p}\in B(\mathbf {p})\).□
For each even integer \(p\ge 2\), there exists a constant \(C_{p,n,w}\gt 0\) that depends only on \(n,w,p\) and the parameters of the system such that for all \(\ell \ge 0\),
Proof.
By the triangle inequality, it is easy to see that as random variables, the change in \(T^i_{\ell \cdot w}\) is at most \(G^i \triangleq \sum _{k=1}^w G^i_k.\) Then the change in \(\Phi\) is again at most
Raising this to the pth power, expanding, and taking expectations, this term is at most \(C_{p,n,w}/\lambda _n^{2p}\) for some constant \(C_{p,n,w}\) depending only on \(n,w\), and p by Lemma B.8.
(2)
Suppose that there does exist \(i\in n\) such that \(\lambda _i T^i_{\ell \cdot w}\gt 1\). We claim that this implies that for all \(j\in [n]\),
as can be confirmed from basic algebra. As \(\lambda _i\le 1/2\) by feasibility (as \(\mu _1\le 1\)), our assumption implies that \(T^i_{\ell \cdot w}\gt 2\), and so
To prove the claim, we split into more cases: if \(T^j_{\ell \cdot w}\le 1/\sqrt {\lambda _j}\), the claim holds using the last inequality in the denominator. Otherwise, we must have \(T^j_{\ell \cdot w}\ge 2\) by integrality, in which case by Equation (35),
By Fact A.1, this is an upper bound as random variables of the change in \(\sqrt {\Phi }\), so taking pth powers, expanding, and taking expectations, we get an upper bound of \(C_{p,n,w}/\lambda _n^{2p}\) by Lemma B.8 for some constant \(C_{p,n,w}\) depending only on \(n,w,p, \mathbf {\lambda }\).
□
E.2 Tightness of Factor 2
Theorem E.1 (Theorem 5.4, Restated).
Partition time \(t=0,1,\ldots\) into consecutive windows, where the kth window has length \(w_k=k^2\). Then there exists a family of queuing systems with n queues and servers for each \(n\ge 1\) satisfying Assumption 5.2 with \(\frac{1}{2}+o_n(1)\) in place of \(\frac{1}{2}\) with the following properties: almost surely, each queue has zero regret on all but at most finitely many of the windows, but the system is not weakly stable.
Proof.
Define \(W_k=\sum _{i=1}^{k-1} w_i\). Note that \(W_k=\Theta (k^3)=\Theta (w_k^{3/2})\). \(W_k\) is the actual timestep at the end of \(k-1\) of the consecutive windows of length \(w_i\) for \(i=1,\ldots ,k-1\). Note also that \(W_{k+1}-W_k=w_k\).
For each \(n\ge 1\), consider the following system on n queues and n servers: set \(\mathbf {\lambda }=(\frac{n+1}{n^2},\ldots ,\frac{n+1}{n^2})\) and \(\mathbf {\mu }=(1,\frac{n-1}{n^2},\ldots ,\frac{n-1}{n^2})\). This system satisfies Assumption 5.2 with factor \(\frac{1}{2}-o_n(1)\). We consider the simple strategy where every queue always sends to the rate 1 server. Under these oblivious dynamics, in expectation the total number of packets grows by \(\frac{1}{n}\) with every step, and therefore this system is not even weakly stable. What we must show is that almost surely, this fixed strategy is zero regret for every queue for all but finitely many of the windows.
We first show almost sure concentration of the arrivals of new packets. Let \(\lbrace B^i_t\rbrace _{i\in [n],t\ge 1}\) be the independent random variables for arrivals as usual. Now, for each queue \(i\in [n]\) and \(\ell \ge 0\), we have
where we use the additive form of the Chernoff bound. As the same holds for all queues, the probability this event happens for any of the n queues is at most \(2n/\ell ^2\). As this is summable in \(\ell\), we may sum over all \(\ell \ge 1\) to deduce from the Borel-Cantelli lemma that almost surely, for all sufficiently large \(\ell\), all \(i\in [n]\) satisfy
Note that this also implies that almost surely, for all large \(\ell\), \(\sum _{t=1}^\ell \sum _{i=1}^n B^i_t\ge (1+\frac{1}{2n})\cdot \ell\) by the choice of \(\lambda _i\). Moreover, under this fixed strategy where everyone always sends to the rate 1 server, at most \(\ell\) packets can be cleared by time \(\ell\).
Next, we show that almost surely, there is a large backup proportional to the current time period. Let \(t_k\) be the last timestamp the rate 1 server clears up to time \(W_k\). As all queues send there under this fixed strategy, at this point, all queues only have packets that were received after \(t_k\) by priority. On the one hand, it is not difficult to see that deterministically, \(t_k\ge W_k/n\) (equality happens in the worst case where every queue received a packet in every step up to \(W_k\)). On the other hand, in light of our preceding results, almost surely, for all but finitely many of the k,
This is because at least \(W_k\) packets have been received up to time \(\frac{1}{1+\frac{1}{2n}} W_k\), and because the server can only have cleared at most \(W_k\) packets up to time \(W_k\), the oldest timestamp the server could have cleared by time \(W_k\) can be at most this quantity.
Next, we show almost sure concentration of the nontrivial server success rates. Let \(I^j_t\) be the indicator that server jwould succeed at clearing a packet at time t (regardless of if one is sent there; under this strategy, no queue ever sends to \(j\ne 1\)). A similar application of the Chernoff bound and union bound with the Borel-Cantelli lemma implies that almost surely, for all but finitely many of the k, and for each server \(j\in [n]\), we have
Note that the increasing nature of the \(w_k\) is needed here for this to be valid (and in fact, this statement will be false with probability 1 if interval sizes are kept fixed by independence and the second Borel-Cantelli lemma). Thus, almost surely, for all large enough \(\ell\) and k, all of these events happen simultaneously.
As \(t_k\ge W_k/n\), almost surely for large enough k, \(t_k\) eventually exceeds the random time \(\ell\) at which Equation (37) holds. Consider any subsequent window of length \(w_k\). Our goal is to use these facts to show that on these windows, all queues have zero regret. First, we show that each queue clears \((\frac{1}{n}-o(1))w_k\) packets on each such window. Let \(c=\frac{1}{n\lambda _i}\lt 1\) (note that this is independent of i). We know from Equation (38) that \(t_k+w_k\lt (1-\Omega (1))W_k+w_k\lt W_{k}\); moreover, using Equation (37) and the fact that \(t_k\ge \ell\),
where the last line uses the relationship between \(W_k\) and \(w_k\). As \(t_k+w_k\lt W_k\), all of these packets were evidently received before the start of the given window, and therefore every queue is backed up throughout the period, and by virtue of the previous equation, each queue has \(\frac{1}{n}-o(1)\) fraction of the next \(w_k\) packets that will be cleared by this top server on this window. Thus, each queue clears at least \((\frac{1}{n}-o(1))\cdot w_k\) packets on such windows under this fixed strategy.
Finally, had any queue deviated on such a window to a single fixed low rate server, in light of Equation (39), she would have cleared \((\frac{n-1}{n^2}+o(1))\cdot w_k\) packets, which is linearly smaller than the amount she actually cleared. Therefore, almost surely, on all but finitely many of the windows, every queue actually has zero regret.□
This section is devoted to proving the intermediate claims in the proof of Theorem 3.3. Our main first technical result of this section asserts that with high probability, the maximum queue age increases at a rate of at most \((1-(1-\epsilon)\cdot f_1)\) on the next w steps for a large enough w. In fact, more generally, the following holds.
Proposition F.1 (Proposition 6.1, Restated).
Fix \(\epsilon \gt 0\). For any integer \(a\in \mathbb {N}\), let \(w=a\cdot \lceil \frac{6}{\epsilon }\rceil ^{n-1}\). Suppose that it holds at time t that \(\max _{i\in [n]} T^i_t\ge w\cdot f_1\). Then
with probability at least \(1-C_1\exp (-C_2a)\), where \(C_1,C_2\gt 0\) are absolute constants depending only on \(n,\epsilon ,\mathbf {\lambda }, \mathbf {\mu },\mathbf {p}\), but not on a. More generally, for each \(s\ge 1\), if \(\max _{i\not\in U_{s-1}} T^i_t\ge w\cdot f_s\), then
with probability at least \(1-C_1\exp (-C_2a)\), where \(C_1,C_2\gt 0\) are absolute constants depending only on \(n,\epsilon ,\mathbf {\lambda }, \mathbf {\mu },\mathbf {p}\), but not on a.
Note that the first part is simply the \(j=1\) case of the more general statement. This proposition will follow from the following lemma that will prove inductively.
Lemma F.2.
Fix \(s\ge 1\) and \(\epsilon \gt 0\). Then, for each \(1\le \tau \le n\) and all \(a\in \mathbb {N}\), the following holds: let \(w=a\cdot \lceil \frac{6}{\epsilon }\rceil ^{\tau -1}\), and suppose that at time t, \(M^*:=\max _{i\not\in U_{s-1}} T^i_t\ge w\cdot f_s\), and that the set
where \(C_1,C_2\gt 0\) are absolute constants depending only on \(n,\epsilon ,\mathbf {\lambda }, \mathbf {\mu },\mathbf {p}\), but not a.
Proposition 6.1 follows as for any \(s\ge 1\) and \(a\in \mathbb {N}\), \(\vert J\vert \le n\). We now turn to proving Lemma F.2 inductively. The case for \(\tau =1\) will turn out to be relatively easy; this case just says that there is a single very old queue among those not in \(U_{s-1}\), so we will be able to lower bound the number of packets she clears by simply assuming that every queue in \(U_{s-1}\) is older than her. To extend this to higher \(\tau\) will be more difficult. To do this, we will chunk together many windows that we know have this property for smaller values of \(\tau\) and then leverage two facts to get a win-win situation. We will be able to easily show that at least one queue in J is always decreasing at the correct rate. If all queues in J are “close,” they also are at the correct rate. If not, then on the next chunk, they will clear at the correct rate on the next chunk inductively with high probability.
We now carry out this high-level plan. For reference, we will use the following similar notation to that used in the main text, but extended to more general windows:
(1)
\(w:=B\cdot L\) will denote a given window length composed of B consecutive blocks of Lsteps. As we will be considering the behavior of the process on some fixed window, we may as well reindex \(t=1\) for convenience so that each window we consider will go from \(t=1\) to w. We will reserve the superscript \(t=0\) to denote the value of the ages at the very beginning of the window we consider.
(2)
Recall the shorthand \(f_s:=f(S_s\vert U_{s-1})\) and \(g_s := \max \lbrace 0,1-f_s\rbrace\).
(3)
With this convention, we will often define (and will make clear from context) at the beginning of some considered window of fixed length w, fixed \(s\ge 1\), and fixed \(\epsilon \gt 0\):
We will often refer to \(T^*\) as the target value for this window, which does not change over the course of the window (notice that it is measured at the beginning of the window). Then, define
In other words, J is the set of queues whose age is within \(w\cdot f_s\) of the oldest age, measured at the beginning of the window. Our goal will be to eventually show that if w is sufficiently large, then with high probability, every queue in J has age below \(T^*\) at the end of the next w steps before accounting for w steps of aging, and of course all queues not in J are already strictly below \(T^*\) by definition. This will imply that the maximum age grows by at most \((1-(1-\epsilon)\cdot f_s)\cdot w\) once we account for the w steps of aging over this window.
(4)
Given a window of length \(w=B\cdot L\), \(\mathcal {F}^{(b)}_{\ell }\) is the filtration of \(\sigma\)-algebras generated up to step \(\ell\) in the bth block, for \(b=1,\ldots ,B\). In particular, \(\mathcal {F}_0^{(1)}\subseteq \mathcal {F}_1^{(1)}\subseteq \ldots \subseteq \mathcal {F}_{L}^{(1)}\subseteq \mathcal {F}_{0}^{(2)}\subseteq \ldots \subseteq \mathcal {F}_{\ell }^{(b)}\subseteq \ldots \subseteq \mathcal {F}_{L}^{(B)}.\)
(5)
\(X_{b,\ell }^{i}\) will be the indicator that queue i cleared a packet in timestep \(\ell\) in the bth block. \(X_{b,\ell }^{i}\) is \(\mathcal {F}^{(b)}_{\ell }\)-measurable.
(6)
\(Y_{b,\ell }^{i}\) will be a sequence of random variables, with same interpretation of the indices, defined as follows: for \(b,\ell\) such that every queue in J is still above \(M^*-w\cdot f_s\) at the start of the \(\ell\)th step of the bth block, set \(Y_{b,\ell }^{i}=X_{b,\ell }^{i}\). If this does not hold for some \(b,\ell\), then let the \(Y_{b,\ell }^{i}\) be arbitrary indicator random variables satisfying
Note that \(Y_{b,\ell }^{i}\) is \(\mathcal {F}^{(b)}_{\ell }\)-measurable. These random variables are purely for technical convenience because they have an a priori lower bound on the conditional expectation, which is not always true of X (i.e., if queues have already cleared a lot and thus some queues in J have lost priority over those not in J).
(7)
\(G_{b,\ell }^{i}\) will be i.i.d. Geom(\(\lambda _i\)) random variables in the same way for \(\ell \in [L],b\in [B]\). We define the partial sums on each window \(Z_{b,k}^{i}:= \sum _{\ell =1}^k G_{b,\ell }^{i}\). For each \(b=1,\ldots ,B\), we make the convention that the \(G_{b,\ell }^{i}\) are sampled for all \(i\in [n]\) and \(\ell \in [L]\) at the beginning of the bth block so that they are all \(\mathcal {F}_0^{(b)}\)-measurable; there is no corresponding queuing step. When queue i clears her kth packet in the bth block, her timestamp decreases by \(G_{b,k}^{i}\); equivalently, when this happens, her timestamp will have decreased on the bth block by \(Z_{b,k}^{i}\) so far.
Proof of Lemma F.2
Fix any group \(s\ge 1\) and \(\epsilon \gt 0\). Note that if \(f_s=0\) (i.e., queues never clear packets), then the lemma holds trivially with probability 1 as every queue in this subset increases deterministically by w in age on any window of length w, so we suppose otherwise. The proof is by induction on \(\tau =\vert J\vert\).
Base Case: \(\tau =1\) (One Old Queue). We first consider \(\tau =1\). Let \(a\in \mathbb {N}\) be arbitrary, and then set \(w=a\). If \(i^*\in J\) is the unique queue in the subset, then \(i^*=\arg \max _{i\in [n]\setminus U_{s-1}} T_0^i\). We must show that at the end of w steps, this queue has decreased age (before accounting for aging) by at least \((1-\epsilon)\cdot w\cdot f_s\) with high probability; the desired conclusion then follows from adding back in the w steps of aging. For simplicity, write \(\lambda =\lambda _{i^*}\) as the arrival rate of this unique queue. We prove the claim in the following two steps.
First, we show that with high probability, the number of packets this queue must clear on this window to get below the target is not too large. Recall that to model this random process on the next w steps, let \(G_1,\ldots ,G_w\) be i.i.d. Geom(\(\lambda\)) random variables (write \(w=1\cdot w\) to indicate that our window is just one block of w steps, so we omit the superscripts for blocks). As usual, write \(Z_{k}=\sum _{\ell =1}^k G_{\ell }\). Recall that when this queue clears her kth packet on this window, her timestamp decreases by \(G_k\); equivalently, clearing k packets decreases her timestamp by \(Z_k\) collectively. Sample all of these random variables beforehand, and let \(\mathcal {F}_0\) be the \(\sigma\)-algebra generated by the previous history, as well as these random variables. \(\mathcal {F}_{\ell }\) will be the filtration generated by all prior events up to the \(\ell\)th step in this window.
Next, define the random variable \(K^*=\min \lbrace k\in [w]: Z_k\ge (1-\epsilon)\cdot w\cdot f_s\rbrace ,\) with the convention that if this set is empty, then \(K^*=w+1\). Observe that \(K^*\) is exactly the number of packets that this queue must clear to clear the target. We claim that with high probability, \(K^*\le K:=\lambda \cdot (1-\epsilon /2)\cdot w\cdot f_s\). To see this, apply Lemma B.4 with \(K=(1-\epsilon /2)\cdot w\cdot f_s\) and \(\delta =\frac{wf_s\epsilon }{2}\) to see that for this choice of K,
As \(f_s\gt 0\), with high probability, the queue will only need to clear at most K packets to get below the desired target.
Next, we show that with high probability, this queue will clear at least as many packets as required by the previous claim. Let \(X_1,\ldots ,X_w\) be the indicator variables for if the queue cleared a packet on each step of the window. Note that these random variables are path dependent, but \(X_{\ell }\) is \(\mathcal {F}_{\ell }\)-measurable. Then the queue’s timestamp decreases by at least \((1-\epsilon)\cdot w\cdot f_s\) if and only if \(\sum _{\ell =1}^w X_{\ell }\ge K^*\) by definition. Therefore, the probability the queue’s timestamp decreases by at least \((1-\epsilon)\cdot w\cdot f_s\) on the next w steps is
Consider the family \(Y_1,\ldots ,Y_w\) of indicator random variables that we couple with \(X_1,\ldots ,X_w\) as follows: while \(\sum _{q=1}^{\ell -1} X_q \lt K^*\), set \(Y_{\ell }=X_{\ell }\). While \(\sum _{q=1}^{\ell -1} X_q \ge K^*\) on a sample path, let \(Y_{\ell }\) be an arbitrary indicator random variable satisfying \(\mathbb {E}[Y_{\ell }\vert \mathcal {F}_{\ell -1}]\ge \lambda \cdot f_s\). Notice that by construction, we always have \(\mathbb {E}[Y_{\ell }\vert \mathcal {F}_{\ell -1}]\ge \lambda \cdot f_s\); this is because if \(\sum _{q=1}^{\ell -1} X_{q}\lt K^*\), then the queue is still above the target, and therefore by assumption has priority. As this queue is the oldest not in \(U_{s-1}\), she has priority over all other queues. If \(V\subseteq U_{s-1}\) is some subset of queues with priority over her before she reaches her target, we know that in this case,
where we use set monotonicity and the fact that \(f_s\) is the minimal value of \(f(\cdot \vert U_{s-1})\) over all subsets contained in the complement.
Of course, if \(\sum _{q=1}^{\ell -1} X_q\ge K^*\), \(\mathbb {E}[Y_{\ell }\vert \mathcal {F}_{\ell -1}]\ge \lambda f_s\) simply by construction. However, because \(X_{\ell }\) and \(Y_{\ell }\) are equal while the queue is above the target, or equivalently before having cleared \(K^*\) packets, it follows that the events that \(\sum _{\ell =1}^w X_{\ell }\ge K^*\) and \(\sum _{\ell =1}^w Y_{\ell }\ge K^*\) have the same probability. Recall again that \(K:=\lambda \cdot (1-\epsilon /2)\cdot w\cdot f_s\). We obtain the probability the queue’s timestamp gets below the target is at least
We use Azuma-Hoeffding in the fourth line applied to \(\Delta _{\ell }:=Y_{\ell }-\mathbb {E}[Y_{\ell }\vert \mathcal {F}_{\ell -1}]\), which is surely between \(-1\) and 1. As \(w=a=a\cdot \lceil \frac{6}{\epsilon }\rceil ^{1-1}\), it is clear that the probability this occurs is of the claimed form. To be safe, one should take \(\lambda =\min _i \lambda _i\) to make the bound hold uniformly, independent of the identity of this queue.
Inductive Step for\(\tau \gt 1\). Suppose that the proposition holds up to \(\tau\), and now we must show that it holds for \(\tau +1\). Let \(a\in \mathbb {N}\), and then \(w=a\cdot \lceil \frac{6}{\epsilon }\rceil ^{\tau }=\lceil \frac{6}{\epsilon }\rceil \cdot (a\cdot \lceil \frac{6}{\epsilon }\rceil ^{\tau -1})\). In our notation, we have \(w=B\cdot L\), where \(B=\lceil \frac{6}{\epsilon }\rceil\) and \(L=a\cdot \lceil \frac{6}{\epsilon }\rceil ^{\tau -1}\).
and suppose now that \(\vert J\vert \le \tau +1\) and \(M^*\ge w\cdot f_s\). For all \(i\in J\), \(\ell =1,\ldots ,L\), and \(b=1,\ldots ,B\), define the random variables \(X_{b,\ell }^{i}\) and \(Y_{b,\ell }^{i}\) as described earlier, as well as the \(\sigma\)-algebras \(\mathcal {F}_{\ell }^{(b)}\).
First, note that for any \(b,\ell\), if no queue in J has timestamp below \(M^*-w\cdot f_s\) at the \(\ell\)th step of the bth block (without accounting for aging), then every queue in J has priority over queues not in J. Then for the same reason as in the base case, we have
Then by construction, we always have
and this holds regardless of the conditioning. In particular, it follows that for every \(b=1,\ldots ,B\), we have
Now, we define the following events for \(b=1,\ldots ,B\):
We note that \(A_B\) implies \(D_B\) by construction so that if \(A_B\) holds, then at the end of this window of w steps (and again without accounting for w steps of aging), for all \(i\in J\),
Recall that our goal is that \(T^i_w\le M^*-B\cdot L\cdot f_s(1-\epsilon)\) (before accounting for w steps of aging); note that our choice of \(B=\lceil \frac{6}{\epsilon }\rceil\) satisfies
where we note that \(\sum _{i\in J} Y_{b,\ell }^{i}-\mathbb {E}[\sum _{i\in J} Y_{b,\ell }^{i}]\) is surely between \(-n\) and n for that extra factor. Therefore, a union bound implies that
We now show that \(\Pr (D_{b+1}\vert A_b)\) is large using a case analysis.
Case 1: No Gap. First suppose that after the bth block, there is no large gap between the maximum and minimum timestamp in J—that is (without accounting for aging, which affects queues equally),
We show that this along with the other assumptions in \(A_b\) already imply the event \(D_{b+1}\), so there is no need to analyze what happens on the \(b+1\)th block. Note that because there is no large gap, \(D_{b+1}\) will be implied by
We now show that Equation (44) is indeed implied by \(A_b\), which we recall implies \(E_b\) and \(F_b\) for all \(q\le b\). This means that for each \(q\le b, i\in J, \ell =1,\ldots , L\),
\[\begin{gather} Z_{q,\ell }^{i}\ge \frac{\ell }{\lambda }-\frac{\epsilon L f_s}{4} \end{gather}\]
If \(\min _{i\in J} T_{b\cdot L}^i\le M^* - B\cdot L\cdot f_s\), we are done by the preceding discussion, so suppose not; this means that no queue in J has a timestamp below \(M^*-wf_s\) (before accounting for aging). By our coupling, we have \(X_{q,\ell }^{i}=Y_{q,\ell }^{i}\) up until now, as no queue in J has lost priority to those outside of J, so this last inequality is equivalent to
(before accounting for aging), which combined with the no-gap assumption implies \(D_{b+1}\). Thus, the no-gap condition implies that the conditional probability of \(D_{b+1}\) is 1.
Case 2: Large Gap. Suppose after the bth block that there is a large gap between the maximum and minimum timestamp in J; specifically, suppose that (again, without aging)
As we have conditioned on \(A_b\), \(\max _{i\in J}T_{b\cdot L}^i\le M^*-(b-3)\cdot L f_s(1-\epsilon /2)\); if this held instead with \(b-2\), we would already be done. If not, as we have set \(L=a\cdot \lceil \frac{6}{\epsilon }\rceil ^{\tau -1}\) and evidently the set of queues with timestamp within \(L\cdot f_s\) of the maximum in J has size at most \(\tau\), we may apply the inductive hypothesis to see that with probability at least \(1-C_1\exp (-C_2a)\), for some absolute constants \(C_1\) and \(C_2\) depending only on \(n,\epsilon ,\mathbf {\lambda }, \mathbf {\mu },\mathbf {p}\) that every such queue decreases by at least \(L\cdot f_s\cdot (1-\epsilon /2)\) on this block by our choice of L; therefore, conditioned on \(A_b\), with probability at least \(1-C_1\exp (-C_2a)\), these queues decrease by enough to satisfy \(D_{b+1}\).
As \(L=a\cdot \lceil \frac{6}{\epsilon }\rceil ^{\tau -1}\) and \(B=\lceil \frac{6}{\epsilon }\rceil\), this evidently has the claimed form of \(1-C_1^{\prime }\exp (-C_2^{\prime }a)\) for absolute constants \(C_1^{\prime },C_2^{\prime }\gt 0\) depending only on system parameters and not a, completing the inductive step. With this, the lemma is proved.□
Proposition F.3 (Proposition 6.2, Restated).
For any \(s\ge 1\) and any fixed \(\epsilon \gt 0\), the following holds: suppose that at time t, it holds that
We prove the second statement first, as it is slightly simpler and the main idea will reappear. As usual, let \(X_{t}^{i}\) denote the indicator variable that queue i cleared a packet at time t. Similar to the previous proof, we have for every \(t\ge 1\),
This is an upper bound, as other queues may have priority at time t and some queues in \(S_1\) may be empty at time t.
Recall that \(G_{\ell }^{i}\) are i.i.d. geometric random variables with parameter \(\lambda _i\) for \(\ell =1,\ldots ,w\). Again, the interpretation is that when queue i clears her \(\ell\)th packet, her age decreases by \(G_{\ell }^{i}\); in particular, the cumulative decrease from clearing k packets is then \(Z_{k}^{i}:=\sum _{\ell =1}^{k} G_{\ell }^i\). Another familiar application of Corollary B.6 implies that with probability at least \(1-A\exp (-Bw)\) (where \(A,B\) are absolute constants depending only on \(n,\epsilon , \mathbf {\lambda }\), not w),
Because the expected number of packets cleared by the queues in \(S_1\) is at most \(\lambda (S_1)\cdot f_1\) by definition, another familiar application of the Azuma-Hoeffding inequality also implies with probability at least \(1-A^{\prime }\exp (-B^{\prime }w)\) (where \(A^{\prime },B^{\prime }\) do not depend on w) that
By a union bound, we thus find that this occurs with probability \(1-A\exp (-Bw)\) for some possibly different constants \(A,B\gt 0\) that do not depend on w, thus concluding the proof of the second statement.
The proof for \(s\ge 1\) is quite similar, just with one extra condition. Suppose that at time t we have
Consider now the next w steps, and let \(G_{\ell }^{i}\) be as earlier, with \(\ell =1\) to w being the same geometric random variables, with \(Z_{k}^{i}\) being the kth partial sum. Another application of Corollary B.6 implies that with probability at least \(1-A_1\exp (-A_2w)\),
Moreover, this event occurring implies that no queue in \(U_s\) can possibly become younger than the any queue in \(S_{s+1}\) on the next w steps, even if they clear a packet in every step by our assumption. Therefore, on this event, we can apply the same analysis with the variables \(X_{t}^{i}\) as earlier, just noting that conditioned on this occurring,
as every queue in \(U_s\) will have priority over every queue in \(S_{s+1}\) on this window. An extremely similar argument via Azuma-Hoeffding and Corollary B.6 and taking a union bound so that the concentration of queues in \(U_s\) earlier implies that the desired result holds with probability at least \(1-A\exp (-Bw)\) for some constants \(A,B\gt 0\) not depending on w (but again, on \(n,\epsilon ,\mathbf {\lambda }, \mathbf {\mu },\mathbf {p}\)).□
Corollary F.4 (Corollary 6.3, Restated).
Fix \(\mathbf {p}\) and suppose that for some group \(S_k\) output by the algorithm, \(f_k\gt 1\), so that \(1-f_k\lt 0\). Then, for each \(i\in S_k\), \(T^i_t\) is strongly stable.
Proof Sketch
It suffices to show this for the random variable \(\max _{i\notin U_{k-1}}T^i_t\). Let \(\epsilon \gt 0\) be small enough such that \((1-(1-\epsilon)\cdot f_{k})\lt \eta \lt 0\). Then let \(w=a\cdot \lceil \frac{6}{\epsilon }\rceil ^{n-1}\) be large enough so, on the event that \(\max _{i\notin U_{k-1}}T^i_{\ell \cdot w}\ge f_{k}\cdot w\), then \(\mathbb {E}[\max _{i\notin U_{k-1}}T^i_{(\ell +1)\cdot w}-\max _{i\notin U_{k-1}}T^i_{\ell \cdot w} \bigg \vert \mathcal {F}_{\ell \cdot w}]\lt \beta \lt 0\) for some \(\beta \lt 0\), where \(\mathcal {F}_{\ell \cdot w}\) is the filtration of the Geometric system; this can be done by Proposition 6.1, noting that on the event where the proposition fails, the queue age can increase at most by w, and this can be drowned out in the expectation by the exponential decay of the probability bound by taking w large enough. This yields negative drift for the random process \(Y_{\ell }:=\max _{i\notin U_{k-1}}T^i_{\ell \cdot w}\) with threshold value \(f_{k}\cdot w\).
Then, for any even \(p\ge 0\), \(\mathbb {E}[\vert \max _{i\notin U_{k-1}}T^i_{(\ell +1)\cdot w}-\max _{i\notin T_{j-1}}T^i_{\ell \cdot w}\vert ^p \vert \mathcal {F}_{\ell \cdot w}]\) is bounded by some constant \(C_p\gt 0\) for each p depending only on \(n,w, \mathbf {\lambda }\). This is because the difference between these is crudely upper bounded as random variables by a sum of at most \(n\cdot w\) geometric random variables in the case that queues somehow clear a packet every round, which are easily seen to have bounded moments. By Theorem 5.3, this implies stochastic stability as the pth moment condition holds for arbitrarily large p.□
References
[1]
Elliot Anshelevich, Anirban Dasgupta, Jon M. Kleinberg, Éva Tardos, Tom Wexler, and Tim Roughgarden. 2008. The price of stability for network design with fair cost allocation. SIAM J. Comput. 38, 4 (2008), 1602–1623.
Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. 2002. The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32, 1 (2002), 48–77.
Maria-Florina Balcan, Avrim Blum, and Yishay Mansour. 2013. Circumventing the price of anarchy: Leading dynamics to good behavior. SIAM J. Comput. 42, 1 (2013), 230–264.
Avrim Blum, MohammadTaghi Hajiaghayi, Katrina Ligett, and Aaron Roth. 2008. Regret minimization and the price of total anarchy. In Proceedings of the 40th Annual ACM Symposium on Theory of Computing. ACM, New York, NY, 373–382.
Christian Borgs, Jennifer T. Chayes, Nicole Immorlica, Kamal Jain, Omid Etesami, and Mohammad Mahdian. 2007. Dynamics of bid optimization in online advertisement auctions. In Proceedings of the 16th International Conference on World Wide Web (WWW’07). ACM, New York, NY, 531–540.
Allan Borodin, Jon M. Kleinberg, Prabhakar Raghavan, Madhu Sudan, and David P. Williamson. 2001. Adversarial queuing theory. J. ACM 48, 1 (2001), 13–38.
G. W. Brown. 1951. Iterative solutions of games by fictitious play. In Activity Analysis of Production and Allocation, Tjalling C. Koopmans (Ed.). Wiley, New York, NY, 374–376.
Vincent Conitzer, Christian Kroer, Debmalya Panigrahi, Okke Schrijvers, Eric Sodomka, Nicolás E. Stier Moses, and Chris Wilkens. 2019. Pacing equilibrium in first-price auction markets. In Proceedings of the 2019 ACM Conference on Economics and Computation (EC’19). ACM, New York, NY, 587.
Constantinos Daskalakis, Paul W. Goldberg, and Christos H. Papadimitriou. 2009. The complexity of computing a nash equilibrium. SIAM J. Comput. 39, 1 (2009), 195–259.
Ofer Dekel, Ambuj Tewari, and Raman Arora. 2012. Online bandit learning against an adaptive adversary: From regret to policy regret. In Proceedings of the 29th International Conference on Machine Learning (ICML’12). 1–8. http://icml.cc/2012/papers/749.pdf.
Hu Fu, Qun Hu, and Jia’nan Lin. 2022. Stability of decentralized queueing networks beyond complete bipartite cases. In Web and Internet Economics. Lecture Notes in Computer Science, Vol. 13778. Springer, 96–114.
Elias Koutsoupias and Christos H. Papadimitriou. 1999. Worst-case equilibria. In Proceedings of the 16th Annual Symposium on Theoretical Aspects of Computer Science (STACS’99). 404–413.
Thodoris Lykouris, Vasilis Syrgkanis, and Éva Tardos. 2016. Learning and efficiency in games with dynamic population. In Proceedings of the 27th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’16). 120–129.
Robin Pemantle and Jeffrey S. Rosenthal. 1999. Moment conditions for a sequence with negative drift to be uniformly bounded in \({L}^r\). Stoch. Proc. Appl. 82, 1 (1999), 143–155.
Vasilis Syrgkanis and Éva Tardos. 2013. Composable and efficient mechanisms. In Proceedings of the 45th Annual Symposium on Theory of Computing (STOC’13). ACM, New York, NY, 211–220.
Biró PBenedek MÁgoston KLosonci D(2024)Beszámoló a 2024- es Mechanizmus- és Intézménytervezési KonferenciárólPénzügyi Szemle = Public Finance Quarterly10.35551/PFQ_2024_3_1070:3(133-137)Online publication date: 30-Sep-2024
We study the price of anarchy (PoA) of simultaneous first-price auctions (FPAs) for buyers with submodular and subadditive valuations. The current best upper bounds for the Bayesian price of anarchy (BPoA) of these auctions are e/(e − 1) [Syrgkanis and ...
STOC '09: Proceedings of the forty-first annual ACM symposium on Theory of computing
The price of anarchy (POA) is a worst-case measure of the inefficiency of selfish behavior, defined as the ratio of the objective function value of a worst Nash equilibrium of a game and that of an optimal outcome. This measure implicitly assumes that ...
FOCS '10: Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science
The Generalized Second Price Auction has been the main mechanism used by search companies to auction positions for advertisements on search pages. In this paper we study the social welfare of the Nash equilibria of this game in various models. In the ...
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Biró PBenedek MÁgoston KLosonci D(2024)Beszámoló a 2024- es Mechanizmus- és Intézménytervezési KonferenciárólPénzügyi Szemle = Public Finance Quarterly10.35551/PFQ_2024_3_1070:3(133-137)Online publication date: 30-Sep-2024