Online Linear Quadratic Tracking with
Regret Guarantees

Aren Karapetyan    Diego Bolliger    Anastasios Tsiamis    Efe C. Balta    and John Lygeros This work has been supported by the Swiss National Science Foundation under NCCR Automation (grant agreement 51NF40_18054551NF40_18054551\text{NF}40\_18054551 NF 40 _ 180545), and by the European Research Council under the ERC Advanced grant agreement 787845787845787845787845 (OCAL). Aren Karapetyan and Diego Bolliger contributed equally to this work.Aren Karapetyan, Anastasios Tsiamis, and John Lygeros are with the Automatic Control Laboratory, Department of Information Technology and Electrical Engineering, ETH Zürich, 8092 Zürich, Switzerland (e-mail: {akarapetyan, atsiamis, jlygeros}@control.ee.ethz.ch).Diego Bolliger is with the School of Engineering, ZHAW Zurich University of Applied Sciences, 8400 Winterthur, Switzerland (e-mail: [email protected]).Efe C. Balta is with the Control and Automation Group, Inspire AG, 8005 Zürich, Switzerland (e-mail: [email protected]).
Abstract

Online learning algorithms for dynamical systems provide finite time guarantees for control in the presence of sequentially revealed cost functions. We pose the classical linear quadratic tracking problem in the framework of online optimization where the time-varying reference state is unknown a priori and is revealed after the applied control input. We show the equivalence of this problem to the control of linear systems subject to adversarial disturbances and propose a novel online gradient descent-based algorithm to achieve efficient tracking in finite time. We provide a dynamic regret upper bound scaling linearly with the path length of the reference trajectory and a numerical example to corroborate the theoretical guarantees.

©2023 IEEE. This version has been accepted for publication at the IEEE Control Systems Letters, DOI: 10.1109/LCSYS.2023.3345809. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
{IEEEkeywords}

Optimal Tracking, Online Control.

1 Introduction

\IEEEPARstart

Linear quadratic tracking (LQT) is the natural generalization of the optimal linear quadratic regulator (LQR) for the setting where the goal is not to drive the state to the origin but to a certain reference. The reference trajectory need not be necessarily time-invariant and in the classic formulation of the problem is known in advance. This is a reasonable assumption in many practical applications, such as aircraft tracking of a predetermined trajectory or precision control in industrial process engineering. However, in other scenarios, for example, in tracking the output of a secondary agent whose dynamics are unknown and/or the measurements are imperfect, the prediction of the next reference point is non-trivial. In these cases the reference trajectory is only revealed sequentially, after the action has been taken, suggesting the need for an online or adaptive algorithm that will learn or adapt to the dynamics of the reference-generating agent.

In this letter, we study the LQT problem with an unknown reference trajectory. We pose the problem in the framework of online convex optimization (OCO) subject to the dynamics constraint of the system [1, 2]. In particular, the tracking problem is recast into an equivalent regulation problem with a redefined state that evolves with linear dynamics subject to additive adversarial disturbances. In the spirit of online decision-making under computational and memory constraints, our goal is to develop a gradient-based algorithm that is fast and simple to implement and requires no large memory. To this end, we show how classical online gradient descent (OGD) may fail to achieve optimal tracking and propose a modified algorithm, called SS-OGD (steady state OGD) that is guaranteed to achieve the goal under mild conditions. Given the online nature of the algorithm, its performance is quantified through the means of dynamic regret that compares the accumulated finite time cost of a given algorithm to that of an optimal benchmark that solves the LQT problem with an a priori knowledge of the reference trajectory. We provide a dynamic regret bound that scales linearly with the path length of the reference trajectory.

The LQT problem for sequentially revealed adversarial reference states is studied mostly with policy regret guarantees, with one of the first works [3] suggesting a relatively computationally heavy algorithm. In a more recent line of work [4] the authors introduce a memory-based, gradient descent algorithm and in [5] tackle the constrained tracking problem. Several works also provide dynamic regret guarantees for tracking of unknown targets, however, their settings differ from ours. In [6], the authors analyze an output tracking scheme but assume an iterative setting, while in [7] a window of predictions is available. Without predictions, their regret order is determined by that of a fixed oracle controller. In [7], the authors also provide a lower bound for the dynamic regret in terms of the reference path length, matching the same order as our proposed scheme. Gradient-based algorithms, as the ones we study, have also been developed in the context of online feedback optimization [8]. There, in contrast to our setting, the dynamics are generally assumed to be unknown, but, crucially, the cost functions are fixed over the horizon, and the regret is not analyzed. Several recent works, e.g. [9], consider a similar setting with time-varying costs. These, however, are allowed to be estimated offline by training, incompatible with our setting, and without regret guarantees.

Notation: The set of positive real numbers is denoted by +subscript\mathbb{R}_{+}blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT and that of non-negative integers by \mathbb{N}blackboard_N. For a matrix W𝑊Witalic_W the spectral radius and the spectral norm are denoted by ρ(W)𝜌𝑊\rho(W)italic_ρ ( italic_W ), and Wnorm𝑊\|W\|∥ italic_W ∥, respectively, and λmin(W)subscript𝜆𝑚𝑖𝑛𝑊\lambda_{min}(W)italic_λ start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT ( italic_W ) denotes its minimum eigenvalue. We define λW:=1+ρ(W)2assignsubscript𝜆𝑊1𝜌𝑊2\lambda_{W}:=\frac{1+\rho(W)}{2}italic_λ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT := divide start_ARG 1 + italic_ρ ( italic_W ) end_ARG start_ARG 2 end_ARG; one can show that if ρ(W)<1𝜌𝑊1\rho(W)<1italic_ρ ( italic_W ) < 1, there exists a cW+subscript𝑐𝑊subscriptc_{W}\in\mathbb{R}_{+}italic_c start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT such that for all k1𝑘1k\geq 1italic_k ≥ 1 WkcWλWknormsuperscript𝑊𝑘subscript𝑐𝑊superscriptsubscript𝜆𝑊𝑘\|W^{k}\|\leq c_{W}{\lambda_{W}}^{k}∥ italic_W start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ ≤ italic_c start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT. For a given vector x𝑥xitalic_x, its Euclidean norm is denoted by xnorm𝑥\|x\|∥ italic_x ∥, and the one weighted by some matrix Q𝑄Qitalic_Q by xQ=xQxsubscriptnorm𝑥𝑄superscript𝑥top𝑄𝑥\|x\|_{Q}=\sqrt{x^{\top}Qx}∥ italic_x ∥ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT = square-root start_ARG italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_Q italic_x end_ARG.

2 Problem Statement

Consider the discrete-time linear time-invariant (LTI) dynamical system, given by

xt+1=Axt+But,t,formulae-sequencesubscript𝑥𝑡1𝐴subscript𝑥𝑡𝐵subscript𝑢𝑡for-all𝑡x_{t+1}=Ax_{t}+Bu_{t},\quad\forall t\in\mathbb{N},italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_A italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_B italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , ∀ italic_t ∈ blackboard_N , (1)

where xtnsubscript𝑥𝑡superscript𝑛x_{t}\in\mathbb{R}^{n}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and utmsubscript𝑢𝑡superscript𝑚u_{t}\in\mathbb{R}^{m}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT are the state and input vectors respectively, and An×n,Bn×mformulae-sequence𝐴superscript𝑛𝑛𝐵superscript𝑛𝑚A\in\mathbb{R}^{n\times n},B\in\mathbb{R}^{n\times m}italic_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT , italic_B ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_m end_POSTSUPERSCRIPT are known system matrices. The goal of the optimal LQT problem is the tracking of a time-varying signal rtnsubscript𝑟𝑡superscript𝑛r_{t}\in\mathbb{R}^{n}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, such that the cost

xTrTP2+t=0T1xtrtQ2+utR2superscriptsubscriptnormsubscript𝑥𝑇subscript𝑟𝑇𝑃2superscriptsubscript𝑡0𝑇1superscriptsubscriptnormsubscript𝑥𝑡subscript𝑟𝑡𝑄2superscriptsubscriptnormsubscript𝑢𝑡𝑅2\|x_{T}-r_{T}\|_{P}^{2}+\sum_{t=0}^{T-1}\|x_{t}-r_{t}\|_{Q}^{2}+\|u_{t}\|_{R}^% {2}∥ italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT - italic_r start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∥ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

is minimized for some weighting matrices Qn×n𝑄superscript𝑛𝑛Q\in\mathbb{R}^{n\times n}italic_Q ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT and Rm×m𝑅superscript𝑚𝑚R\in\mathbb{R}^{m\times m}italic_R ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_m end_POSTSUPERSCRIPT, and where Pn×n𝑃superscript𝑛𝑛P\in\mathbb{R}^{n\times n}italic_P ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT is the solution of the discrete algebraic Riccati equation (DARE)111The final cost matrix is taken to be P𝑃Pitalic_P for simplicity. For other values of the terminal cost matrix the results still hold up to an additional constant [2].

P=Q+APAAPB(R+BPB)1BPA.𝑃𝑄superscript𝐴top𝑃𝐴superscript𝐴top𝑃𝐵superscript𝑅superscript𝐵top𝑃𝐵1superscript𝐵top𝑃𝐴P=Q+A^{\top}PA-A^{\top}PB(R+B^{\top}PB)^{-1}B^{\top}PA.italic_P = italic_Q + italic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_P italic_A - italic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_P italic_B ( italic_R + italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_P italic_B ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_P italic_A . (2)

The LQT problem can be recast into an equivalent LQR formulation [10] by considering instead the dynamics

et+1=Aet+But+wt,t,formulae-sequencesubscript𝑒𝑡1𝐴subscript𝑒𝑡𝐵subscript𝑢𝑡subscript𝑤𝑡for-all𝑡e_{t+1}=Ae_{t}+Bu_{t}+w_{t},\quad\forall t\in\mathbb{N},italic_e start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_A italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_B italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , ∀ italic_t ∈ blackboard_N , (3)

with et:=xtrtassignsubscript𝑒𝑡subscript𝑥𝑡subscript𝑟𝑡e_{t}:=x_{t}-r_{t}italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT := italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and wt:=Artrt+1assignsubscript𝑤𝑡𝐴subscript𝑟𝑡subscript𝑟𝑡1w_{t}:=Ar_{t}-r_{t+1}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT := italic_A italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_r start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT for all t𝑡t\in\mathbb{N}italic_t ∈ blackboard_N, and the corresponding cost function

J(e0,u):=eTP2+t=0T1etQ2+utR2.assign𝐽subscript𝑒0𝑢superscriptsubscriptnormsubscript𝑒𝑇𝑃2superscriptsubscript𝑡0𝑇1superscriptsubscriptnormsubscript𝑒𝑡𝑄2superscriptsubscriptnormsubscript𝑢𝑡𝑅2J(e_{0},u):=\|e_{T}\|_{P}^{2}+\sum_{t=0}^{T-1}\|e_{t}\|_{Q}^{2}+\|u_{t}\|_{R}^% {2}.italic_J ( italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_u ) := ∥ italic_e start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∥ italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (4)

When the reference trajectory rtsubscript𝑟𝑡r_{t}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, t𝑡t\in\mathbb{N}italic_t ∈ blackboard_N is known at the initial time a closed form solution for the optimal controller that solves the following optimization problem can be obtained

u=superscript𝑢absent\displaystyle u^{\star}=italic_u start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT = argminuJ(e0,u)subscriptargmin𝑢𝐽subscript𝑒0𝑢\displaystyle\operatorname*{arg\,min}_{u}\quad J(e_{0},u)start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_J ( italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_u ) (5a)
subject to(3)0t<T.subject toitalic-(3italic-)for-all0𝑡𝑇\displaystyle\text{subject to}\quad\eqref{eq:lti_system_noise}\quad\forall~{}0% \leq t<T.subject to italic_( italic_) ∀ 0 ≤ italic_t < italic_T . (5b)

This controller, often referred to as the optimal offline noncausal controller, can be represented as a linear feedback on the current state and the future reference [11, 12].

Departing from the classical formulation of tracking control, we assume that the reference signal is unknown and is only revealed sequentially after the control input has been applied, similar to the adversarial tracking framework in [3]. In particular, for each time step 0t<T0𝑡𝑇0\leq t<T0 ≤ italic_t < italic_T:

  1. 1:

    The state xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and the reference state rtsubscript𝑟𝑡r_{t}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are observed,

  2. 2:

    The agent decides on an input utsubscript𝑢𝑡u_{t}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT,

  3. 3:

    The environment decides on the next reference rt+1subscript𝑟𝑡1r_{t+1}italic_r start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT, which, in turn, determines wtsubscript𝑤𝑡w_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. The error state then evolves according to (3), incurring the following cost for the agent

    ct(et,ut):=Aet+But+wtQ2+utR2.assignsubscript𝑐𝑡subscript𝑒𝑡subscript𝑢𝑡superscriptsubscriptnorm𝐴subscript𝑒𝑡𝐵subscript𝑢𝑡subscript𝑤𝑡𝑄2subscriptsuperscriptnormsubscript𝑢𝑡2𝑅c_{t}(e_{t},u_{t}):=\|Ae_{t}+Bu_{t}+w_{t}\|_{Q}^{2}+\|u_{t}\|^{2}_{R}.italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) := ∥ italic_A italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_B italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT . (6)

Note that the online cost (6), depends on the current input utsubscript𝑢𝑡u_{t}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and the unknown disturbance wtsubscript𝑤𝑡w_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, and is therefore unknown to the decision maker at timestep t𝑡titalic_t; it is revealed only at time t+1𝑡1t+1italic_t + 1, after the input utsubscript𝑢𝑡u_{t}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT has been applied to the system. Our problem formulation fits the online learning framework, with the extra challenge of inherent dynamics. The goal of the controller is then to minimize the online cumulative cost222For consistency we require cT1(eT1,uT1):=AeT1+BuT1+wT1P2+uT1R2assignsubscript𝑐𝑇1subscript𝑒𝑇1subscript𝑢𝑇1superscriptsubscriptnorm𝐴subscript𝑒𝑇1𝐵subscript𝑢𝑇1subscript𝑤𝑇1𝑃2subscriptsuperscriptnormsubscript𝑢𝑇12𝑅c_{T-1}(e_{T-1},u_{T-1}):=\|Ae_{T-1}+Bu_{T-1}+w_{T-1}\|_{P}^{2}+\|u_{T-1}\|^{2% }_{R}italic_c start_POSTSUBSCRIPT italic_T - 1 end_POSTSUBSCRIPT ( italic_e start_POSTSUBSCRIPT italic_T - 1 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_T - 1 end_POSTSUBSCRIPT ) := ∥ italic_A italic_e start_POSTSUBSCRIPT italic_T - 1 end_POSTSUBSCRIPT + italic_B italic_u start_POSTSUBSCRIPT italic_T - 1 end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_T - 1 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ italic_u start_POSTSUBSCRIPT italic_T - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT. To forego unnecessary cluttering of the notation, the separate treatment of the last timestep is implied implicitly.

t=0T1ct(et,ut)=J(e0,u)e0Q2.superscriptsubscript𝑡0𝑇1subscript𝑐𝑡subscript𝑒𝑡subscript𝑢𝑡𝐽subscript𝑒0𝑢subscriptsuperscriptnormsubscript𝑒02𝑄\sum_{t=0}^{T-1}c_{t}(e_{t},u_{t})=J(e_{0},u)-\|e_{0}\|^{2}_{Q}.∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = italic_J ( italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_u ) - ∥ italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT .

This is the same as the LQR cost without the initial state, implying that the minimizers of both problems coincide.

We quantify the finite-time performance of the algorithm through the means of dynamic regret. Consider a policy π:m:𝜋superscript𝑚\pi:\mathcal{I}\rightarrow\mathbb{R}^{m}italic_π : caligraphic_I → blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, mapping from the available information set, \mathcal{I}caligraphic_I, to the control input space. Its dynamic regret, given a disturbance signal w𝑤witalic_w, is defined as

π(w,e0)=J(e0,uπ)J(e0,u),superscript𝜋𝑤subscript𝑒0𝐽subscript𝑒0superscript𝑢𝜋𝐽subscript𝑒0superscript𝑢\mathcal{R}^{\pi}(w,e_{0})=J(e_{0},u^{\pi})-J(e_{0},u^{\star}),caligraphic_R start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( italic_w , italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = italic_J ( italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_u start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ) - italic_J ( italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_u start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) , (7)

where uπsuperscript𝑢𝜋u^{\pi}italic_u start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT is the input generated by π𝜋\piitalic_π and usuperscript𝑢u^{\star}italic_u start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT is given by (5).

We allow the trajectory rt,tsubscript𝑟𝑡𝑡r_{t},\;t\in\mathbb{N}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ∈ blackboard_N to be arbitrary, as long as it remains bounded.

Assumption 1 (Bounded trajectory)

There exists a R¯+¯𝑅subscript\bar{R}\in\mathbb{R}_{+}over¯ start_ARG italic_R end_ARG ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT, such that rtR¯normsubscript𝑟𝑡¯𝑅\|r_{t}\|\leq\bar{R}∥ italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ over¯ start_ARG italic_R end_ARG for all t𝑡t\in\mathbb{N}italic_t ∈ blackboard_N.

The more abruptly a trajectory changes, the harder it is to achieve good tracking performance, especially if the trajectory is unknown beforehand. To capture this inherent complexity of the problem with dynamic regret, we use the well-established notion of path length [7], [13].

Definition 2.1 (Path Length)

The path length of a reference trajectory r0:Tn(T+1)subscript𝑟:0𝑇superscript𝑛𝑇1r_{0:T}\in\mathbb{R}^{n(T+1)}italic_r start_POSTSUBSCRIPT 0 : italic_T end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n ( italic_T + 1 ) end_POSTSUPERSCRIPT is L(T)=t=0T1Δrt𝐿𝑇superscriptsubscript𝑡0𝑇1normΔsubscript𝑟𝑡L(T)=\sum_{t=0}^{T-1}\|\Delta r_{t}\|italic_L ( italic_T ) = ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∥ roman_Δ italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥, where Δrt=rt+1rtΔsubscript𝑟𝑡subscript𝑟𝑡1subscript𝑟𝑡\Delta r_{t}=r_{t+1}-r_{t}roman_Δ italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_r start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.

For more random and abrupt changes in the trajectory, the path length is higher, and one expects the performance of an online algorithm to deteriorate. Likewise, an efficient algorithm should improve as path length decreases. This is captured quantitatively by showing at least a linear dependence of the algorithm’s regret on the path length. One can instead choose the complexity term to be the path length of the artificial disturbances w0:T1n(T)subscript𝑤:0𝑇1superscript𝑛𝑇w_{0:T-1}\in\mathbb{R}^{n(T)}italic_w start_POSTSUBSCRIPT 0 : italic_T - 1 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n ( italic_T ) end_POSTSUPERSCRIPT. The resulting path length will then decrease the closer the reference dynamics are to the given system in a certain operator norm, but will scale linearly with time in the case of a constant mismatch between the two. Since we only assume bounded references, we allow for potentially random references without any underlying dynamics. Hence, we choose L(T)𝐿𝑇L(T)italic_L ( italic_T ) as our complexity term. Under the following standard assumptions on stabilisability and detectablity[14] the LQR problem is well-posed.

Assumption 2 (LQR is well-posed)

The system (A,B)𝐴𝐵(A,B)( italic_A , italic_B ) is stabilisable, the pair (Q12,A)superscript𝑄12𝐴(Q^{\frac{1}{2}},A)( italic_Q start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT , italic_A ) is detectable and R0succeeds𝑅0R\succ 0italic_R ≻ 0.

3 The SS-OGD Algorithm

We consider a control law of the following form

ut=Ket+vt,0t<T,formulae-sequencesubscript𝑢𝑡𝐾subscript𝑒𝑡subscript𝑣𝑡for-all0𝑡𝑇u_{t}=-Ke_{t}+v_{t},\quad\forall 0\leq t<T,italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = - italic_K italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , ∀ 0 ≤ italic_t < italic_T , (8)

where K=(R+BPB)1BPA𝐾superscript𝑅superscript𝐵top𝑃𝐵1superscript𝐵top𝑃𝐴K=(R+B^{\top}PB)^{-1}B^{\top}PAitalic_K = ( italic_R + italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_P italic_B ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_P italic_A is fixed to the optimal LQR gain, and vtsubscript𝑣𝑡v_{t}italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is a correction term that should account for the unknown disturbances; we will employ online learning techniques to update the latter term.

We investigate the performance of online gradient descent based algorithms. Consider the following “naive” update

vt=vt1αvct1(et1,ut1),subscript𝑣𝑡subscript𝑣𝑡1𝛼subscript𝑣subscript𝑐𝑡1subscript𝑒𝑡1subscript𝑢𝑡1v_{t}=v_{t-1}-\alpha\nabla_{v}c_{t-1}(e_{t-1},u_{t-1}),italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT - italic_α ∇ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( italic_e start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) , (9)

where vtsubscript𝑣𝑡v_{t}italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is updated in the opposite direction of the gradient of the most recent cost. Here αR+𝛼subscript𝑅\alpha\in R_{+}italic_α ∈ italic_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT is the step size and the recursion starts from some v0msubscript𝑣0superscript𝑚v_{0}\in\mathbb{R}^{m}italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT. As the online objective is quadratic, the gradient is available in a closed form and the update can be represented as vt=vt12α(Rut1+BQet)subscript𝑣𝑡subscript𝑣𝑡12𝛼𝑅subscript𝑢𝑡1superscript𝐵top𝑄subscript𝑒𝑡v_{t}=v_{t-1}-2\alpha(Ru_{t-1}+B^{\top}Qe_{t})italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT - 2 italic_α ( italic_R italic_u start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_Q italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ). For the case of a constant reference signal and an underactuated system, the algorithm can converge to a point that is not necessarily the optimal one with respect to infinite horizon cost minimization. This is due to the greedy behavior of the update that does not take into account future dynamics. In this section, we propose a simple modification to this myopic OGD update (9), called SS-OGD that accounts for this shortcoming.

To motivate the SS-OGD update, we consider the steady state solution of (3) in closed-loop with the affine control law (8) when we fix vi=v¯subscript𝑣𝑖¯𝑣v_{i}=\bar{v}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = over¯ start_ARG italic_v end_ARG and ri=r¯subscript𝑟𝑖¯𝑟r_{i}=\bar{r}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = over¯ start_ARG italic_r end_ARG for all subsequent timesteps it𝑖𝑡i\geq titalic_i ≥ italic_t. Defining S:=(IA+BK)1Bassign𝑆superscript𝐼𝐴𝐵𝐾1𝐵S:=(I-A+BK)^{-1}Bitalic_S := ( italic_I - italic_A + italic_B italic_K ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_B, a closed form solution for the steady state and input is given by 333Note that x¯¯𝑥\bar{x}over¯ start_ARG italic_x end_ARG and u¯¯𝑢\bar{u}over¯ start_ARG italic_u end_ARG are both defined for a given v¯¯𝑣\bar{v}over¯ start_ARG italic_v end_ARG and r¯¯𝑟\bar{r}over¯ start_ARG italic_r end_ARG. The dependence is left for simplicity

x¯=Sv¯+SKr¯,u¯=(IKS)(v¯+Kr¯).formulae-sequence¯𝑥𝑆¯𝑣𝑆𝐾¯𝑟¯𝑢𝐼𝐾𝑆¯𝑣𝐾¯𝑟\bar{x}=S\bar{v}+SK\bar{r},\qquad\bar{u}=(I-KS)(\bar{v}+K\bar{r}).over¯ start_ARG italic_x end_ARG = italic_S over¯ start_ARG italic_v end_ARG + italic_S italic_K over¯ start_ARG italic_r end_ARG , over¯ start_ARG italic_u end_ARG = ( italic_I - italic_K italic_S ) ( over¯ start_ARG italic_v end_ARG + italic_K over¯ start_ARG italic_r end_ARG ) . (10)

One can then find the v¯¯𝑣\bar{v}over¯ start_ARG italic_v end_ARG which will recover the optimal steady state solution by minimizing the time-averaged infinite horizon steady state cost. For x¯¯𝑥\bar{x}over¯ start_ARG italic_x end_ARG and u¯¯𝑢\bar{u}over¯ start_ARG italic_u end_ARG defined as in (10), this is equivalent to minimizing

argminv¯{c(x¯r¯,u¯):=x¯r¯Q2+u¯R2},subscriptargmin¯𝑣assign𝑐¯𝑥¯𝑟¯𝑢superscriptsubscriptnorm¯𝑥¯𝑟𝑄2superscriptsubscriptnorm¯𝑢𝑅2\operatorname*{arg\,min}_{\bar{v}}\{c\left(\bar{x}-\bar{r},\bar{u}\right):=\|% \bar{x}-\bar{r}\|_{Q}^{2}+\|\bar{u}\|_{R}^{2}\},start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT over¯ start_ARG italic_v end_ARG end_POSTSUBSCRIPT { italic_c ( over¯ start_ARG italic_x end_ARG - over¯ start_ARG italic_r end_ARG , over¯ start_ARG italic_u end_ARG ) := ∥ over¯ start_ARG italic_x end_ARG - over¯ start_ARG italic_r end_ARG ∥ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ over¯ start_ARG italic_u end_ARG ∥ start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } , (11)

whose gradient is given by

v¯c(x¯r¯,u¯)=2((IKS)Ru¯+SQ(x¯r¯)).subscript¯𝑣𝑐¯𝑥¯𝑟¯𝑢2superscript𝐼𝐾𝑆top𝑅¯𝑢superscript𝑆top𝑄¯𝑥¯𝑟\nabla_{\bar{v}}c\left(\bar{x}\!-\bar{r},\bar{u}\right)=2\left((I-KS)^{\top}R% \bar{u}\!+\!S^{\top}Q(\bar{x}\!-\bar{r})\right).∇ start_POSTSUBSCRIPT over¯ start_ARG italic_v end_ARG end_POSTSUBSCRIPT italic_c ( over¯ start_ARG italic_x end_ARG - over¯ start_ARG italic_r end_ARG , over¯ start_ARG italic_u end_ARG ) = 2 ( ( italic_I - italic_K italic_S ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R over¯ start_ARG italic_u end_ARG + italic_S start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_Q ( over¯ start_ARG italic_x end_ARG - over¯ start_ARG italic_r end_ARG ) ) . (12)

Since r𝑟ritalic_r is, in general not constant, and the steady state condition is not satisfied , we suggest a new OGD-like update rule on the bias term vtsubscript𝑣𝑡v_{t}italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT that is a modified version of the gradient in (12). Specifically, the feedback on the steady state error, x¯r¯¯𝑥¯𝑟\bar{x}-\bar{r}over¯ start_ARG italic_x end_ARG - over¯ start_ARG italic_r end_ARG, is replaced with the measured error, xtrtsubscript𝑥𝑡subscript𝑟𝑡x_{t}-r_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, and the steady state input, u¯¯𝑢\bar{u}over¯ start_ARG italic_u end_ARG, with the latest applied input, ut1subscript𝑢𝑡1u_{t-1}italic_u start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT. This results in the following update, named SS-OGD

vt=vt12α((IKS)Rut1+SQet).subscript𝑣𝑡subscript𝑣𝑡12𝛼superscript𝐼𝐾𝑆top𝑅subscript𝑢𝑡1superscript𝑆top𝑄subscript𝑒𝑡v_{t}=v_{t-1}-2\alpha\left((I-KS)^{\top}Ru_{t-1}+S^{\top}Qe_{t}\right).italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT - 2 italic_α ( ( italic_I - italic_K italic_S ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R italic_u start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + italic_S start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_Q italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) . (13)

The cost c𝑐{c}italic_c in (12) is defined for the steady state x¯¯𝑥\bar{x}over¯ start_ARG italic_x end_ARG and input u¯¯𝑢\bar{u}over¯ start_ARG italic_u end_ARG and is thus decoupled from the true online cost ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT in (6) that reflects the current etsubscript𝑒𝑡e_{t}italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and utsubscript𝑢𝑡u_{t}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. These are, in general, not at a steady state and rtsubscript𝑟𝑡r_{t}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is not constant. Thus, c𝑐{c}italic_c is only an auxiliary, hallucinated cost to construct the update (13).

Lemma 3.1

Under Assumption 2, (11) is strictly convex in v¯¯𝑣\bar{v}over¯ start_ARG italic_v end_ARG for any Km×n𝐾superscript𝑚𝑛K\in\mathbb{R}^{m\times n}italic_K ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_n end_POSTSUPERSCRIPT, for which ρ(ABK)<1𝜌𝐴𝐵𝐾1\rho(A-BK)<1italic_ρ ( italic_A - italic_B italic_K ) < 1.

Proof 3.2.

If the matrix IKS𝐼𝐾𝑆I-KSitalic_I - italic_K italic_S is singular, there exists a vn𝑣superscript𝑛v\in\mathbb{R}^{n}italic_v ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, such that v=KSv𝑣𝐾𝑆𝑣v=KSvitalic_v = italic_K italic_S italic_v. Then, for x=Sv𝑥𝑆𝑣x=Svitalic_x = italic_S italic_v, at steady state x=Ax+B(KSvKx)=Ax𝑥𝐴𝑥𝐵𝐾𝑆𝑣𝐾𝑥𝐴𝑥x=Ax+B(KSv-Kx)=Axitalic_x = italic_A italic_x + italic_B ( italic_K italic_S italic_v - italic_K italic_x ) = italic_A italic_x. Given the detectability condition of the pair (Q12,A)superscript𝑄12𝐴(Q^{\frac{1}{2}},A)( italic_Q start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT , italic_A ), for any unstable, or marginally stable mode of A𝐴Aitalic_A, the matrix Q0succeeds𝑄0Q\succ 0italic_Q ≻ 0. This ensures that the matrix SQS+(IKS)R(IKS)superscript𝑆top𝑄𝑆superscript𝐼𝐾𝑆top𝑅𝐼𝐾𝑆S^{\top}QS+(I-KS)^{\top}R(I-KS)italic_S start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_Q italic_S + ( italic_I - italic_K italic_S ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R ( italic_I - italic_K italic_S ) is positive definite, which is equivalent to the strong convexity of (11) .

The modifications from the standard OGD can be interpreted as incorporating the dynamics information in the update rule. As we show in the following, this ensures that in the limit, if the algorithm is stable and the reference signal is constant, the SS-OGD converges to the same point as the solution of the LQR problem minimizing (4). Moreover, through the feedback on the state etsubscript𝑒𝑡e_{t}italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and input ut1subscript𝑢𝑡1u_{t-1}italic_u start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT, the update rule (13) incorporates a proportional integral (PI) control on the measured state. This is demonstrated on a quadrotor control example in Section 5, where, with the inherent integrator dynamics of the quadrotor, the SS-OGD achieves a zero steady state error in tracking a position reference signal with a constant rate of change.

To study the SS-OGD update rule, we introduce the following evolution of the combined system optimizer dynamics

zt+1=A~zt+B~wt,subscript𝑧𝑡1~𝐴subscript𝑧𝑡~𝐵subscript𝑤𝑡z_{t+1}=\tilde{A}z_{t}+\tilde{B}w_{t},italic_z start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = over~ start_ARG italic_A end_ARG italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + over~ start_ARG italic_B end_ARG italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , (14)

where zt:=[vtet]assignsubscript𝑧𝑡superscriptdelimited-[]superscriptsubscript𝑣𝑡topsuperscriptsubscript𝑒𝑡toptopz_{t}:=[v_{t}^{\top}~{}e_{t}^{\top}]^{\top}italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT := [ italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, the matrices A~p×p~𝐴superscript𝑝𝑝\tilde{A}\in\mathbb{R}^{p\times p}over~ start_ARG italic_A end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_p × italic_p end_POSTSUPERSCRIPT and B~p×n~𝐵superscript𝑝𝑛\tilde{B}\in\mathbb{R}^{p\times n}over~ start_ARG italic_B end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_p × italic_n end_POSTSUPERSCRIPT are defined in Appendix 7 and p:=m+nassign𝑝𝑚𝑛p:=m+nitalic_p := italic_m + italic_n.

Assumption 3

The step size α>0𝛼0\alpha>0italic_α > 0 is such that ρ(A~)<1𝜌~𝐴1\rho(\tilde{A})<1italic_ρ ( over~ start_ARG italic_A end_ARG ) < 1 .

Since all the variables in A~~𝐴\tilde{A}over~ start_ARG italic_A end_ARG are known a priori, we show that there always exists an α𝛼\alphaitalic_α satisfying this assumption and provide a sufficient condition in Appendix 7.

The following theorem shows that, for a constant wt=w¯subscript𝑤𝑡¯𝑤w_{t}=\bar{w}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = over¯ start_ARG italic_w end_ARG for all 0t<T0𝑡𝑇0\leq t<T0 ≤ italic_t < italic_T, SS-OGD update (13) converges to the solution of

(e^t,v^t)=argmin(e,v)eQ2+Ke+vR2subject toe=(ABK)e+Bv+Artrt+1wt,formulae-sequencesubscript^𝑒𝑡subscript^𝑣𝑡subscriptargmin𝑒𝑣superscriptsubscriptdelimited-∥∥𝑒𝑄2superscriptsubscriptdelimited-∥∥𝐾𝑒𝑣𝑅2subject to𝑒𝐴𝐵𝐾𝑒𝐵𝑣subscript𝐴subscript𝑟𝑡subscript𝑟𝑡1subscript𝑤𝑡\begin{split}(\hat{e}_{t},&\hat{v}_{t})=~{}\operatorname*{arg\,min}_{(e,v)}% \quad\|e\|_{Q}^{2}+\left\|-Ke+v\right\|_{R}^{2}\\ &~{}\text{subject to}\;e=(A-BK)e+Bv+{\color[rgb]{0,0,0}\definecolor[named]{% pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill% {0}\underbrace{Ar_{t}-r_{t+1}}_{w_{t}},}\end{split}start_ROW start_CELL ( over^ start_ARG italic_e end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , end_CELL start_CELL over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT ( italic_e , italic_v ) end_POSTSUBSCRIPT ∥ italic_e ∥ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ - italic_K italic_e + italic_v ∥ start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL subject to italic_e = ( italic_A - italic_B italic_K ) italic_e + italic_B italic_v + under⏟ start_ARG italic_A italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_r start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT end_ARG start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , end_CELL end_ROW (15)

with rT+1:=rTassignsubscript𝑟𝑇1subscript𝑟𝑇\textstyle{r_{T+1}:=r_{T}}italic_r start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT := italic_r start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT. The solution of (15) can be interpreted as the steady state and steady state input that minimize the infinite horizon time-averaged cost (4).

Theorem 3.3.

Under Assumptions 2 and  3, if wt=w¯subscript𝑤𝑡¯𝑤w_{t}=\bar{w}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = over¯ start_ARG italic_w end_ARG for all t𝑡t\in\mathbb{N}italic_t ∈ blackboard_N, the steady state of (14) coincides with the solution of (15).

The proof of the theorem is provided in Appendix 8. As a corollary, for a constant signal rt=r¯subscript𝑟𝑡¯𝑟r_{t}=\bar{r}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = over¯ start_ARG italic_r end_ARG, the update converges to the solution of (11). Note that this is not always true for the naive OGD update (13), as its fixed point for a fixed disturbance is not necessarily the same as (15).

4 Regret Analysis

To characterize the effectiveness of the algorithm for time-varying signals and to provide finite time guarantees, we analyze its dynamic regret and show that it scales with the path length. The next theorem summarizes this main result.

Theorem 4.1.

Under Assumptions 1, 2 and 3, the dynamic regret of the SS-OGD algorithm scales with the path length

SSOGD(w,e0)𝒪(1+L(T)).superscriptSSOGD𝑤subscript𝑒0𝒪1𝐿𝑇\mathcal{R}^{\mathrm{SS-OGD}}(w,e_{0})\leq\mathcal{O}\left(1+L(T)\right).caligraphic_R start_POSTSUPERSCRIPT roman_SS - roman_OGD end_POSTSUPERSCRIPT ( italic_w , italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ≤ caligraphic_O ( 1 + italic_L ( italic_T ) ) .

The proof of the theorem is provided in Section 4.1 after some auxiliary results.

Lemma 4.2.

(Cost Difference Lemma [15]) For any two policies π1,π2subscript𝜋1subscript𝜋2\pi_{1},\pi_{2}italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT

J(e0,uπ2)J(e0,uπ1)=t=0T1𝒬tπ1(etπ2,utπ2)Jt(etπ2,uπ1),𝐽subscript𝑒0superscript𝑢subscript𝜋2𝐽subscript𝑒0superscript𝑢subscript𝜋1superscriptsubscript𝑡0𝑇1superscriptsubscript𝒬𝑡subscript𝜋1superscriptsubscript𝑒𝑡subscript𝜋2superscriptsubscript𝑢𝑡subscript𝜋2subscript𝐽𝑡superscriptsubscript𝑒𝑡subscript𝜋2superscript𝑢subscript𝜋1J(e_{0},u^{\pi_{2}})-J(e_{0},u^{\pi_{1}})=\sum_{t=0}^{T-1}\mathcal{Q}_{t}^{\pi% _{1}}(e_{t}^{\pi_{2}},u_{t}^{\pi_{2}})-J_{t}(e_{t}^{\pi_{2}},u^{\pi_{1}}),italic_J ( italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_u start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) - italic_J ( italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_u start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) - italic_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , italic_u start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) ,

where etπ2superscriptsubscript𝑒𝑡subscript𝜋2e_{t}^{\pi_{2}}italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is the state at time t𝑡titalic_t achieved by applying the policy π2subscript𝜋2\pi_{2}italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, utπ2superscriptsubscript𝑢𝑡subscript𝜋2u_{t}^{\pi_{2}}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is the input generated by the policy π2subscript𝜋2\pi_{2}italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT at time t, 𝒬tπ1(e,u)=eQ2+uR2+Jt+1(Ae+Bu+wt,uπ1)superscriptsubscript𝒬𝑡subscript𝜋1𝑒𝑢superscriptsubscriptnorm𝑒𝑄2superscriptsubscriptnorm𝑢𝑅2subscript𝐽𝑡1𝐴𝑒𝐵𝑢subscript𝑤𝑡superscript𝑢subscript𝜋1{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\mathcal{Q}}_{t}^{\pi_{1% }}(e,u)=\|e\|_{Q}^{2}+\|u\|_{R}^{2}+J_{t+1}(Ae+Bu+w_{t},u^{\pi_{1}})caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_e , italic_u ) = ∥ italic_e ∥ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ italic_u ∥ start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_J start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_A italic_e + italic_B italic_u + italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) is the Q-function for policy π1subscript𝜋1\pi_{1}italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and Ji(ei,u)subscript𝐽𝑖subscript𝑒𝑖𝑢J_{i}(e_{i},u)italic_J start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_u ) is the cost-to-go at time step i𝑖iitalic_i, with initial state eisubscript𝑒𝑖e_{i}italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and control signal u𝑢uitalic_u.

The proof is omitted, as it is identical to the one for Markov decision processes [15]. The following result for a general policy π𝜋\piitalic_π akin to the result in [16] follows.

Lemma 4.3.

Under Assumption 2, given the system dynamics (3) and cost function (4), the dynamic regret of any policy π𝜋\piitalic_π is given by

π(w,e0)=t=0T1(utπut)(R+BTPB)(utπut),superscript𝜋𝑤subscript𝑒0superscriptsubscript𝑡0𝑇1superscriptsuperscriptsubscript𝑢𝑡𝜋superscriptsubscript𝑢𝑡top𝑅superscript𝐵𝑇𝑃𝐵superscriptsubscript𝑢𝑡𝜋superscriptsubscript𝑢𝑡\mathcal{R}^{\pi}(w,e_{0})=\sum_{t=0}^{T-1}\left(u_{t}^{\pi}-u_{t}^{\star}% \right)^{\top}\left(R+B^{T}PB\right)\left(u_{t}^{\pi}-u_{t}^{\star}\right),caligraphic_R start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( italic_w , italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ( italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT - italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_R + italic_B start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_P italic_B ) ( italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT - italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ,

where utπsuperscriptsubscript𝑢𝑡𝜋u_{t}^{\pi}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT and utsuperscriptsubscript𝑢𝑡u_{t}^{\star}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT denote the inputs generated by π𝜋\piitalic_π, and the optimal policy, both evaluated at the policy state etπsuperscriptsubscript𝑒𝑡𝜋e_{t}^{\pi}italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT at time t𝑡titalic_t.

Proof 4.4.

Let 𝒬t(e,u)subscriptsuperscript𝒬𝑡𝑒𝑢{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\mathcal{Q}}^{\star}_{t}% (e,u)caligraphic_Q start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_e , italic_u ) be the optimal Q-function, associated with the optimal control law utsuperscriptsubscript𝑢𝑡u_{t}^{\star}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT. Then, using Lemma 4.2 the dynamic regret of the policy π𝜋\piitalic_π is given by

π(w,e0)=t=0T1𝒬t(etπ,utπ)minu𝒬t(etπ,u),superscript𝜋𝑤subscript𝑒0superscriptsubscript𝑡0𝑇1superscriptsubscript𝒬𝑡superscriptsubscript𝑒𝑡𝜋superscriptsubscript𝑢𝑡𝜋subscript𝑢superscriptsubscript𝒬𝑡superscriptsubscript𝑒𝑡𝜋𝑢\textstyle{\mathcal{R}^{\pi}(w,e_{0})=\sum_{t=0}^{T-1}\mathcal{Q}_{t}^{\star}(% e_{t}^{\pi},u_{t}^{\pi})-\min_{u}{\color[rgb]{0,0,0}\definecolor[named]{% pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill% {0}\mathcal{Q}}_{t}^{\star}(e_{t}^{\pi},u),}caligraphic_R start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( italic_w , italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ) - roman_min start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT , italic_u ) , (16)

i.e., a sum of differences of 𝒬tsubscriptsuperscript𝒬𝑡\mathcal{Q}^{\star}_{t}caligraphic_Q start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, evaluated at utπsuperscriptsubscript𝑢𝑡𝜋u_{t}^{\pi}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT and usuperscript𝑢u^{\star}italic_u start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT, its minimizer. For an input, um𝑢superscript𝑚u\in\mathbb{R}^{m}italic_u ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, and some fm𝑓superscript𝑚f\in\mathbb{R}^{m}italic_f ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, g𝑔g\in\mathbb{R}italic_g ∈ blackboard_R

𝒬t(e,u)subscriptsuperscript𝒬𝑡𝑒𝑢\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0% }\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\mathcal{Q}}^{\star}_{t% }(e,u)caligraphic_Q start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_e , italic_u ) =eQ2+uR2+Jt+1(Ae+Bu+wt,u)absentsuperscriptsubscriptnorm𝑒𝑄2superscriptsubscriptnorm𝑢𝑅2subscript𝐽𝑡1𝐴𝑒𝐵𝑢subscript𝑤𝑡superscript𝑢\displaystyle=\|e\|_{Q}^{2}+\|u\|_{R}^{2}+J_{t+1}(Ae+Bu+w_{t},u^{\star})= ∥ italic_e ∥ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ italic_u ∥ start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_J start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_A italic_e + italic_B italic_u + italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT )
=u(R+BPB)u+fu+g,absentsuperscript𝑢top𝑅superscript𝐵top𝑃𝐵𝑢superscript𝑓top𝑢𝑔\displaystyle=u^{\top}(R+B^{\top}PB)u+f^{\top}u+g,= italic_u start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_R + italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_P italic_B ) italic_u + italic_f start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_u + italic_g ,

where the last equality follows from the closed form of Jt+1(x,u)subscript𝐽𝑡1𝑥superscript𝑢J_{t+1}(x,u^{\star})italic_J start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_x , italic_u start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) as an extended quadratic function of x𝑥xitalic_x [12, 17]. Thus, since utsuperscriptsubscript𝑢𝑡u_{t}^{\star}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT minimizes an extended quadratic function, 𝒬t(etπ,utπ)𝒬t(etπ,ut)=utπut(R+BTPB)2superscriptsubscript𝒬𝑡superscriptsubscript𝑒𝑡𝜋superscriptsubscript𝑢𝑡𝜋superscriptsubscript𝒬𝑡superscriptsubscript𝑒𝑡𝜋superscriptsubscript𝑢𝑡subscriptsuperscriptnormsuperscriptsubscript𝑢𝑡𝜋superscriptsubscript𝑢𝑡2𝑅superscript𝐵𝑇𝑃𝐵\mathcal{Q}_{t}^{\star}(e_{t}^{\pi},u_{t}^{\pi})-\mathcal{Q}_{t}^{\star}(e_{t}% ^{\pi},u_{t}^{\star})=\|u_{t}^{\pi}-u_{t}^{\star}\|^{2}_{\left(R+B^{T}PB\right)}caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ) - caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) = ∥ italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT - italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ( italic_R + italic_B start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_P italic_B ) end_POSTSUBSCRIPT.

For future references, we also recall the Cauchy Product inequality defined for two finite series {ai}i=1Tsuperscriptsubscriptsubscript𝑎𝑖𝑖1𝑇\{a_{i}\}_{i=1}^{T}{ italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT and {bi}i=1Tsuperscriptsubscriptsubscript𝑏𝑖𝑖1𝑇\{b_{i}\}_{i=1}^{T}{ italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT:

i=0T|j=0iajbij|(i=0T|ai|)(j=0T|bj|).superscriptsubscript𝑖0𝑇superscriptsubscript𝑗0𝑖subscript𝑎𝑗subscript𝑏𝑖𝑗superscriptsubscript𝑖0𝑇subscript𝑎𝑖superscriptsubscript𝑗0𝑇subscript𝑏𝑗\textstyle{\sum_{i=0}^{T}\left|\sum_{j=0}^{i}a_{j}b_{i-j}\right|\leq\left(\sum% _{i=0}^{T}|a_{i}|\right)\left(\sum_{j=0}^{T}|b_{j}|\right)}.∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT | ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_i - italic_j end_POSTSUBSCRIPT | ≤ ( ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT | italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ) ( ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT | italic_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | ) . (17)

4.1 Proof of Theorem 4.1

As Lemma 4.3 suggests, the dynamic regret depends on the stepwise control input difference,

utogdut=Ket+vt+Ket+i=tT1Kwi,twinormsubscriptsuperscript𝑢ogd𝑡superscriptsubscript𝑢𝑡norm𝐾subscript𝑒𝑡subscript𝑣𝑡𝐾subscript𝑒𝑡superscriptsubscript𝑖𝑡𝑇1superscriptsubscript𝐾𝑤𝑖𝑡subscript𝑤𝑖\displaystyle\left\|u^{\mathrm{ogd}}_{t}-u_{t}^{\star}\right\|=\left\|-Ke_{t}+% v_{t}+Ke_{t}+\sum_{i=t}^{T-1}K_{w}^{i,t}w_{i}\right\|∥ italic_u start_POSTSUPERSCRIPT roman_ogd end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∥ = ∥ - italic_K italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_K italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i = italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_t end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥
vt+i=tKwi,twts1,t+i=tT1Kwi,tΔwi,ts2,t+i=TKwi,twts3,t,absentsubscriptnormsubscript𝑣𝑡superscriptsubscript𝑖𝑡superscriptsubscript𝐾𝑤𝑖𝑡subscript𝑤𝑡subscript𝑠1𝑡subscriptnormsuperscriptsubscript𝑖𝑡𝑇1superscriptsubscript𝐾𝑤𝑖𝑡Δsubscript𝑤𝑖𝑡subscript𝑠2𝑡subscriptnormsuperscriptsubscript𝑖𝑇superscriptsubscript𝐾𝑤𝑖𝑡subscript𝑤𝑡subscript𝑠3𝑡\displaystyle\leq\underbrace{\left\|v_{t}+\sum_{i=t}^{\infty}K_{w}^{i,t}w_{t}% \right\|}_{s_{1,t}}+\underbrace{\left\|\sum_{i=t}^{T-1}K_{w}^{i,t}\Delta w_{i,% t}\right\|}_{s_{2,t}}+\underbrace{\left\|\sum_{i=T}^{\infty}K_{w}^{i,t}w_{t}% \right\|}_{s_{3,t}},≤ under⏟ start_ARG ∥ italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i = italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_t end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ end_ARG start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 1 , italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT + under⏟ start_ARG ∥ ∑ start_POSTSUBSCRIPT italic_i = italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_t end_POSTSUPERSCRIPT roman_Δ italic_w start_POSTSUBSCRIPT italic_i , italic_t end_POSTSUBSCRIPT ∥ end_ARG start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 2 , italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT + under⏟ start_ARG ∥ ∑ start_POSTSUBSCRIPT italic_i = italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_t end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ end_ARG start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 3 , italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ,

where Δwi,t=wiwtΔsubscript𝑤𝑖𝑡subscript𝑤𝑖subscript𝑤𝑡\Delta w_{i,t}=w_{i}-w_{t}roman_Δ italic_w start_POSTSUBSCRIPT italic_i , italic_t end_POSTSUBSCRIPT = italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and for all 0ti<T0𝑡𝑖𝑇0\leq t\leq i<T0 ≤ italic_t ≤ italic_i < italic_T,

Kwi,t=(R+BPB)1B((ABK))itP.superscriptsubscript𝐾𝑤𝑖𝑡superscript𝑅superscript𝐵top𝑃𝐵1superscript𝐵topsuperscriptsuperscript𝐴𝐵𝐾top𝑖𝑡𝑃K_{w}^{i,t}=(R+B^{\top}PB)^{-1}B^{\top}\left((A-BK)^{\top}\right)^{i-t}P.italic_K start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_t end_POSTSUPERSCRIPT = ( italic_R + italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_P italic_B ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( ( italic_A - italic_B italic_K ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_i - italic_t end_POSTSUPERSCRIPT italic_P .

We proceed by bounding each of the above terms separately.

Term s𝟐,tsubscript𝑠2𝑡\boldsymbol{s_{2,t}}bold_italic_s start_POSTSUBSCRIPT bold_2 bold_, bold_italic_t end_POSTSUBSCRIPT: This captures the deviation of the artificial disturbance term from the one fixed at timestep t𝑡titalic_t. By noting that Δwi,tΔsubscript𝑤𝑖𝑡\Delta w_{i,t}roman_Δ italic_w start_POSTSUBSCRIPT italic_i , italic_t end_POSTSUBSCRIPT can be represented as a telescopic sum,

s2,tsubscript𝑠2𝑡\displaystyle s_{2,t}italic_s start_POSTSUBSCRIPT 2 , italic_t end_POSTSUBSCRIPT cFdi=tT1λFitj=ti1wj+1wjabsentsubscript𝑐𝐹𝑑superscriptsubscript𝑖𝑡𝑇1superscriptsubscript𝜆𝐹𝑖𝑡superscriptsubscript𝑗𝑡𝑖1normsubscript𝑤𝑗1subscript𝑤𝑗\displaystyle\leq c_{F}d\sum_{i=t}^{T-1}\lambda_{F}^{i-t}\sum_{j=t}^{i-1}\|w_{% j+1}-w_{j}\|≤ italic_c start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT italic_d ∑ start_POSTSUBSCRIPT italic_i = italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i - italic_t end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT ∥ italic_w start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥
cFd1λFj=tT2wj+1wjλFjt,absentsubscript𝑐𝐹𝑑1subscript𝜆𝐹superscriptsubscript𝑗𝑡𝑇2normsubscript𝑤𝑗1subscript𝑤𝑗superscriptsubscript𝜆𝐹𝑗𝑡\displaystyle\leq\frac{c_{F}d}{1-\lambda_{F}}\sum_{j=t}^{T-2}\|w_{j+1}-w_{j}\|% \lambda_{F}^{j-t},≤ divide start_ARG italic_c start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT italic_d end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_j = italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 2 end_POSTSUPERSCRIPT ∥ italic_w start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ italic_λ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j - italic_t end_POSTSUPERSCRIPT ,

where F:=ABKassign𝐹𝐴𝐵𝐾F:=A-BKitalic_F := italic_A - italic_B italic_K and d=(R+BPB)1BP𝑑normsuperscript𝑅superscript𝐵top𝑃𝐵1superscript𝐵topnorm𝑃d=\|(R+B^{\top}PB)^{-1}B^{\top}\|\cdot\|P\|italic_d = ∥ ( italic_R + italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_P italic_B ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∥ ⋅ ∥ italic_P ∥. Using (17)

t=0T1s2,tcFd1λFt=1T1j=tT1wjwj1λFjtsuperscriptsubscript𝑡0𝑇1subscript𝑠2𝑡subscript𝑐𝐹𝑑1subscript𝜆𝐹superscriptsubscript𝑡1𝑇1superscriptsubscript𝑗𝑡𝑇1normsubscript𝑤𝑗subscript𝑤𝑗1superscriptsubscript𝜆𝐹𝑗𝑡\displaystyle\sum_{t=0}^{T-1}s_{2,t}\leq\frac{c_{F}d}{1-\lambda_{F}}\sum_{t=1}% ^{T-1}\sum_{j=t}^{T-1}\|w_{j}-w_{j-1}\|\lambda_{F}^{j-t}∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT 2 , italic_t end_POSTSUBSCRIPT ≤ divide start_ARG italic_c start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT italic_d end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∥ italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT ∥ italic_λ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j - italic_t end_POSTSUPERSCRIPT
cFd1λFj=1T1t=1jwjwj1λFjtabsentsubscript𝑐𝐹𝑑1subscript𝜆𝐹superscriptsubscript𝑗1𝑇1superscriptsubscript𝑡1𝑗normsubscript𝑤𝑗subscript𝑤𝑗1superscriptsubscript𝜆𝐹𝑗𝑡\displaystyle\leq\frac{c_{F}d}{1-\lambda_{F}}\sum_{j=1}^{T-1}\sum_{t=1}^{j}\|w% _{j}-w_{j-1}\|\lambda_{F}^{j-t}≤ divide start_ARG italic_c start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT italic_d end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ∥ italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT ∥ italic_λ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j - italic_t end_POSTSUPERSCRIPT
cFd(1λF)2j=1T1wjwj1cFd(A+1)(1λF)2L(T)absentsubscript𝑐𝐹𝑑superscript1subscript𝜆𝐹2superscriptsubscript𝑗1𝑇1normsubscript𝑤𝑗subscript𝑤𝑗1subscript𝑐𝐹𝑑norm𝐴1superscript1subscript𝜆𝐹2𝐿𝑇\displaystyle\leq\frac{c_{F}d}{(1-\lambda_{F})^{2}}\sum_{j=1}^{T-1}\|w_{j}-w_{% j-1}\|\leq\frac{c_{F}d\left(\|A\|+1\right)}{(1-\lambda_{F})^{2}}\cdot L(T)≤ divide start_ARG italic_c start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT italic_d end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∥ italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT ∥ ≤ divide start_ARG italic_c start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT italic_d ( ∥ italic_A ∥ + 1 ) end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ⋅ italic_L ( italic_T )

Term s𝟑,tsubscript𝑠3𝑡\boldsymbol{s_{3,t}}bold_italic_s start_POSTSUBSCRIPT bold_3 bold_, bold_italic_t end_POSTSUBSCRIPT: This captures the effect of truncating the infinite horizon problem to a finite one

s3,tcFd(A+1)R¯i=TλFitcFd(A+1)R¯λFTt1λFsubscript𝑠3𝑡subscript𝑐𝐹𝑑norm𝐴1¯𝑅superscriptsubscript𝑖𝑇superscriptsubscript𝜆𝐹𝑖𝑡subscript𝑐𝐹𝑑norm𝐴1¯𝑅superscriptsubscript𝜆𝐹𝑇𝑡1subscript𝜆𝐹\displaystyle s_{3,t}\leq c_{F}d\left(\|A\|+1\right)\bar{R}\sum_{i=T}^{\infty}% \lambda_{F}^{i-t}\leq\frac{c_{F}d\left(\|A\|+1\right)\bar{R}\lambda_{F}^{T-t}}% {1-\lambda_{F}}italic_s start_POSTSUBSCRIPT 3 , italic_t end_POSTSUBSCRIPT ≤ italic_c start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT italic_d ( ∥ italic_A ∥ + 1 ) over¯ start_ARG italic_R end_ARG ∑ start_POSTSUBSCRIPT italic_i = italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i - italic_t end_POSTSUPERSCRIPT ≤ divide start_ARG italic_c start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT italic_d ( ∥ italic_A ∥ + 1 ) over¯ start_ARG italic_R end_ARG italic_λ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - italic_t end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_ARG
t=0T1s3,tcFd(A+1)R¯(1λFT)(1λF)2,superscriptsubscript𝑡0𝑇1subscript𝑠3𝑡subscript𝑐𝐹𝑑norm𝐴1¯𝑅1superscriptsubscript𝜆𝐹𝑇superscript1subscript𝜆𝐹2\displaystyle\sum_{t=0}^{T-1}s_{3,t}\leq\frac{c_{F}d\left(\|A\|+1\right)\bar{R% }\left(1-\lambda_{F}^{T}\right)}{(1-\lambda_{F})^{2}},∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT 3 , italic_t end_POSTSUBSCRIPT ≤ divide start_ARG italic_c start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT italic_d ( ∥ italic_A ∥ + 1 ) over¯ start_ARG italic_R end_ARG ( 1 - italic_λ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) end_ARG start_ARG ( 1 - italic_λ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ,

where R¯¯𝑅\bar{R}over¯ start_ARG italic_R end_ARG is defined in Assumption 1.

Term s𝟏,tsubscript𝑠1𝑡\boldsymbol{s_{1,t}}bold_italic_s start_POSTSUBSCRIPT bold_1 bold_, bold_italic_t end_POSTSUBSCRIPT: This captures the cost of performing a gradient step in the direction of the steady state solution instead of the full solution, for a fixed wtsubscript𝑤𝑡w_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Note that i=tKwi,twtsuperscriptsubscript𝑖𝑡superscriptsubscript𝐾𝑤𝑖𝑡subscript𝑤𝑡-\sum_{i=t}^{\infty}K_{w}^{i,t}w_{t}- ∑ start_POSTSUBSCRIPT italic_i = italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_t end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the solution of the following infinite horizon optimization problem and is independent of the initial state [12, 18]

v^t=argminvlimT1Ti=0TeQ2+Ke+vR2subject toe=(ABK)e+Bv+wt,subscript^𝑣𝑡subscriptargmin𝑣subscript𝑇1𝑇superscriptsubscript𝑖0𝑇superscriptsubscriptdelimited-∥∥𝑒𝑄2superscriptsubscriptdelimited-∥∥𝐾𝑒𝑣𝑅2subject to𝑒𝐴𝐵𝐾𝑒𝐵𝑣subscript𝑤𝑡\begin{split}\hat{v}_{t}&=\operatorname*{arg\,min}_{v}\lim_{T\rightarrow\infty% }\frac{1}{T}\sum_{i=0}^{T}\|e\|_{Q}^{2}+\|Ke+v\|_{R}^{2}\\ &\begin{split}\text{subject to}\quad&e=(A-BK)e+Bv+w_{t},\end{split}\end{split}start_ROW start_CELL over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL start_CELL = start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT roman_lim start_POSTSUBSCRIPT italic_T → ∞ end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ italic_e ∥ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ italic_K italic_e + italic_v ∥ start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL start_ROW start_CELL subject to end_CELL start_CELL italic_e = ( italic_A - italic_B italic_K ) italic_e + italic_B italic_v + italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , end_CELL end_ROW end_CELL end_ROW

which is equivalent to (15). Hence, by Theorem 3.3

i=tKwi,twt=[I 0](IA~)1B~wt=[I 0]z^t,superscriptsubscript𝑖𝑡superscriptsubscript𝐾𝑤𝑖𝑡subscript𝑤𝑡delimited-[]𝐼 0superscript𝐼~𝐴1~𝐵subscript𝑤𝑡delimited-[]𝐼 0subscript^𝑧𝑡\sum_{i=t}^{\infty}K_{w}^{i,t}w_{t}=-\left[I\;0\right](I-\tilde{A})^{-1}\tilde% {B}w_{t}=-\left[I\;0\right]\hat{z}_{t},∑ start_POSTSUBSCRIPT italic_i = italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_t end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = - [ italic_I 0 ] ( italic_I - over~ start_ARG italic_A end_ARG ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over~ start_ARG italic_B end_ARG italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = - [ italic_I 0 ] over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,

where z^t=[v^te^t]:=(IA~)1B~wtsubscript^𝑧𝑡superscriptdelimited-[]subscriptsuperscript^𝑣top𝑡subscriptsuperscript^𝑒top𝑡topassignsuperscript𝐼~𝐴1~𝐵subscript𝑤𝑡\hat{z}_{t}=[\hat{v}^{\top}_{t}~{}\hat{e}^{\top}_{t}]^{\top}:=(I-\tilde{A})^{-% 1}\tilde{B}w_{t}over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ over^ start_ARG italic_v end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT over^ start_ARG italic_e end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT := ( italic_I - over~ start_ARG italic_A end_ARG ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over~ start_ARG italic_B end_ARG italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the steady state of the SS-OGD dynamics (14) for a given wtsubscript𝑤𝑡w_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. This term captures the difference between the SS-OGD update term vtsubscript𝑣𝑡v_{t}italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and the steady state value v^tsubscript^𝑣𝑡\hat{v}_{t}over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT for that timestep. We look at the evolution of the augmented state difference; for all 0<tT0𝑡𝑇0<t\leq T0 < italic_t ≤ italic_T

εtsubscript𝜀𝑡\displaystyle\varepsilon_{t}italic_ε start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT :=ztz^t=A~zt1+B~wt1z^tassignabsentsubscript𝑧𝑡subscript^𝑧𝑡~𝐴subscript𝑧𝑡1~𝐵subscript𝑤𝑡1subscript^𝑧𝑡\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0% }\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}:=z_{t}-\hat{z}_{t}}=% \tilde{A}z_{t-1}+\tilde{B}w_{t-1}-\hat{z}_{t}:= italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = over~ start_ARG italic_A end_ARG italic_z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + over~ start_ARG italic_B end_ARG italic_w start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT - over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (18)
=A~zt1+(IA~)z^t1z^t=A~εt1+z^t1z^t.absent~𝐴subscript𝑧𝑡1𝐼~𝐴subscript^𝑧𝑡1subscript^𝑧𝑡~𝐴subscript𝜀𝑡1subscript^𝑧𝑡1subscript^𝑧𝑡\displaystyle=\tilde{A}z_{t-1}+(I-\tilde{A})\hat{z}_{t-1}-\hat{z}_{t}=\tilde{A% }\varepsilon_{t-1}+\hat{z}_{t-1}-\hat{z}_{t}.= over~ start_ARG italic_A end_ARG italic_z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + ( italic_I - over~ start_ARG italic_A end_ARG ) over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT - over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = over~ start_ARG italic_A end_ARG italic_ε start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT - over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT .

Then εt=A~tε0+i=0t1A~i(z^ti1z^ti)subscript𝜀𝑡superscript~𝐴𝑡subscript𝜀0superscriptsubscript𝑖0𝑡1superscript~𝐴𝑖subscript^𝑧𝑡𝑖1subscript^𝑧𝑡𝑖\varepsilon_{t}=\tilde{A}^{t}\varepsilon_{0}+\sum_{i=0}^{t-1}\tilde{A}^{i}% \left(\hat{z}_{t-i-1}-\hat{z}_{t-i}\right)italic_ε start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = over~ start_ARG italic_A end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ε start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT over~ start_ARG italic_A end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t - italic_i - 1 end_POSTSUBSCRIPT - over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t - italic_i end_POSTSUBSCRIPT ) for a given time step 0tT0𝑡𝑇0\leq t\leq T0 ≤ italic_t ≤ italic_T. Under Assumption 3

εtcA~λA~tε0+(IA~)1B~i=0t1λA~i(AΔrti1+Δrti).delimited-∥∥subscript𝜀𝑡subscript𝑐~𝐴superscriptsubscript𝜆~𝐴𝑡delimited-∥∥subscript𝜀0delimited-∥∥superscript𝐼~𝐴1~𝐵superscriptsubscript𝑖0𝑡1superscriptsubscript𝜆~𝐴𝑖delimited-∥∥𝐴delimited-∥∥Δsubscript𝑟𝑡𝑖1delimited-∥∥Δsubscript𝑟𝑡𝑖\begin{split}&\|\varepsilon_{t}\|\leq c_{\tilde{A}}\lambda_{\tilde{A}}^{t}{% \color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\|\varepsilon_{0}\|}\\ &+\left\|\left(I-\tilde{A}\right)^{-1}\tilde{B}\right\|\cdot\sum_{i=0}^{t-1}% \lambda_{\tilde{A}}^{i}\left(\|A\|\|\Delta r_{t-i-1}\|+\|\Delta r_{t-i}\|% \right).\end{split}start_ROW start_CELL end_CELL start_CELL ∥ italic_ε start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ italic_c start_POSTSUBSCRIPT over~ start_ARG italic_A end_ARG end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT over~ start_ARG italic_A end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ italic_ε start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + ∥ ( italic_I - over~ start_ARG italic_A end_ARG ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over~ start_ARG italic_B end_ARG ∥ ⋅ ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT over~ start_ARG italic_A end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( ∥ italic_A ∥ ∥ roman_Δ italic_r start_POSTSUBSCRIPT italic_t - italic_i - 1 end_POSTSUBSCRIPT ∥ + ∥ roman_Δ italic_r start_POSTSUBSCRIPT italic_t - italic_i end_POSTSUBSCRIPT ∥ ) . end_CELL end_ROW

Defining h=(IA~)1B~(A+1)normsuperscript𝐼~𝐴1~𝐵norm𝐴1h=\left\|\left(I-\tilde{A}\right)^{-1}\tilde{B}\right\|\left(\|A\|+1\right)italic_h = ∥ ( italic_I - over~ start_ARG italic_A end_ARG ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over~ start_ARG italic_B end_ARG ∥ ( ∥ italic_A ∥ + 1 ), b=(h+1)R¯+x0+v0𝑏1¯𝑅normsubscript𝑥0normsubscript𝑣0b=(h+1)\bar{R}+\|x_{0}\|+\|v_{0}\|italic_b = ( italic_h + 1 ) over¯ start_ARG italic_R end_ARG + ∥ italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ + ∥ italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥, and ε¯=cA~(b+2R¯h1λA~)¯𝜀subscript𝑐~𝐴𝑏2¯𝑅1subscript𝜆~𝐴\bar{\varepsilon}=c_{\tilde{A}}\left(b+\frac{2\bar{R}h}{1-\lambda_{\tilde{A}}}\right)over¯ start_ARG italic_ε end_ARG = italic_c start_POSTSUBSCRIPT over~ start_ARG italic_A end_ARG end_POSTSUBSCRIPT ( italic_b + divide start_ARG 2 over¯ start_ARG italic_R end_ARG italic_h end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT over~ start_ARG italic_A end_ARG end_POSTSUBSCRIPT end_ARG )

εtcA~bλA~t+cA~hi=0t1λA~i(Δrti+Δrt1i),normsubscript𝜀𝑡subscript𝑐~𝐴𝑏superscriptsubscript𝜆~𝐴𝑡subscript𝑐~𝐴superscriptsubscript𝑖0𝑡1superscriptsubscript𝜆~𝐴𝑖normΔsubscript𝑟𝑡𝑖normΔsubscript𝑟𝑡1𝑖\displaystyle\|\varepsilon_{t}\|\leq c_{\tilde{A}}b\lambda_{\tilde{A}}^{t}+c_{% \tilde{A}}h\sum_{i=0}^{t-1}\lambda_{\tilde{A}}^{i}\left(\|\Delta r_{t-i}\|+\|% \Delta r_{t-1-i}\|\right),∥ italic_ε start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ italic_c start_POSTSUBSCRIPT over~ start_ARG italic_A end_ARG end_POSTSUBSCRIPT italic_b italic_λ start_POSTSUBSCRIPT over~ start_ARG italic_A end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_c start_POSTSUBSCRIPT over~ start_ARG italic_A end_ARG end_POSTSUBSCRIPT italic_h ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT over~ start_ARG italic_A end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( ∥ roman_Δ italic_r start_POSTSUBSCRIPT italic_t - italic_i end_POSTSUBSCRIPT ∥ + ∥ roman_Δ italic_r start_POSTSUBSCRIPT italic_t - 1 - italic_i end_POSTSUBSCRIPT ∥ ) , (19)
s1,tεtcA~bλA~t+cA~hi=0tλA~iΔrti,subscript𝑠1𝑡normsubscript𝜀𝑡subscript𝑐~𝐴𝑏superscriptsubscript𝜆~𝐴𝑡subscript𝑐~𝐴superscriptsubscript𝑖0𝑡superscriptsubscript𝜆~𝐴𝑖normΔsubscript𝑟𝑡𝑖\displaystyle s_{1,t}\leq\|\varepsilon_{t}\|\leq c_{\tilde{A}}b\lambda_{\tilde% {A}}^{t}+c_{\tilde{A}}h\sum_{i=0}^{t}\lambda_{\tilde{A}}^{i}\|\Delta r_{t-i}\|,italic_s start_POSTSUBSCRIPT 1 , italic_t end_POSTSUBSCRIPT ≤ ∥ italic_ε start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ italic_c start_POSTSUBSCRIPT over~ start_ARG italic_A end_ARG end_POSTSUBSCRIPT italic_b italic_λ start_POSTSUBSCRIPT over~ start_ARG italic_A end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_c start_POSTSUBSCRIPT over~ start_ARG italic_A end_ARG end_POSTSUBSCRIPT italic_h ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT over~ start_ARG italic_A end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∥ roman_Δ italic_r start_POSTSUBSCRIPT italic_t - italic_i end_POSTSUBSCRIPT ∥ ,
t=0T1s1,tt=0T1εtcA~1λA~(b+hL(T)).superscriptsubscript𝑡0𝑇1subscript𝑠1𝑡superscriptsubscript𝑡0𝑇1normsubscript𝜀𝑡subscript𝑐~𝐴1subscript𝜆~𝐴𝑏𝐿𝑇\displaystyle\sum_{t=0}^{T-1}s_{1,t}\leq\sum_{t=0}^{T-1}\|\varepsilon_{t}\|% \leq\frac{c_{\tilde{A}}}{1-\lambda_{\tilde{A}}}\left(b+hL(T)\right).∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT 1 , italic_t end_POSTSUBSCRIPT ≤ ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∥ italic_ε start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ divide start_ARG italic_c start_POSTSUBSCRIPT over~ start_ARG italic_A end_ARG end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT over~ start_ARG italic_A end_ARG end_POSTSUBSCRIPT end_ARG ( italic_b + italic_h italic_L ( italic_T ) ) .

There exist s2,s3+subscript𝑠2subscript𝑠3subscripts_{2},s_{3}\in\mathbb{R}_{+}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT, such that s2,ts2,s3,ts3formulae-sequencesubscript𝑠2𝑡subscript𝑠2subscript𝑠3𝑡subscript𝑠3s_{2,t}\leq s_{2},\;s_{3,t}\leq s_{3}italic_s start_POSTSUBSCRIPT 2 , italic_t end_POSTSUBSCRIPT ≤ italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 3 , italic_t end_POSTSUBSCRIPT ≤ italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT and from (19) s1,tε¯subscript𝑠1𝑡¯𝜀s_{1,t}\leq\bar{\varepsilon}italic_s start_POSTSUBSCRIPT 1 , italic_t end_POSTSUBSCRIPT ≤ over¯ start_ARG italic_ε end_ARG for all t𝑡t\in\mathbb{N}italic_t ∈ blackboard_N. Using Lemma 4.3 and denoting P¯=4R+BPB¯𝑃4norm𝑅superscript𝐵top𝑃𝐵\bar{P}=4\|R+B^{\top}PB\|over¯ start_ARG italic_P end_ARG = 4 ∥ italic_R + italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_P italic_B ∥

(w,e0)<P¯t=0T1(s1,t2+s2,t2+s3,t2)𝒪(1+L(T)).𝑤subscript𝑒0¯𝑃superscriptsubscript𝑡0𝑇1superscriptsubscript𝑠1𝑡2superscriptsubscript𝑠2𝑡2superscriptsubscript𝑠3𝑡2𝒪1𝐿𝑇\displaystyle\mathcal{R}(w,e_{0})<\bar{P}\sum_{t=0}^{T-1}\left(s_{1,t}^{2}+s_{% 2,t}^{2}+s_{3,t}^{2}\right)\leq{\mathcal{O}\left(1+L(T)\right)}.caligraphic_R ( italic_w , italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) < over¯ start_ARG italic_P end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ( italic_s start_POSTSUBSCRIPT 1 , italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_s start_POSTSUBSCRIPT 2 , italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_s start_POSTSUBSCRIPT 3 , italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ≤ caligraphic_O ( 1 + italic_L ( italic_T ) ) .

Note that, unlike the regret bound of the FOSS algorithm in [7], the constant multiplying L(T)𝐿𝑇L(T)italic_L ( italic_T ) above does not depend on R¯¯𝑅\bar{R}over¯ start_ARG italic_R end_ARG, but only on system parameters. This implies that the complexity term captures only the relative distance of the references and is not amplified by their upper bounds.

4.2 Steady State Benchmark

Given Theorem 3.3, one can also compare SS-OGD to the steady state optimal solution for each timestep. Consider

u^t=Ke^t+v^tsubscript^𝑢𝑡𝐾subscript^𝑒𝑡subscript^𝑣𝑡\hat{u}_{t}=-K\hat{e}_{t}+\hat{v}_{t}over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = - italic_K over^ start_ARG italic_e end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (20)

for all 0t<T0𝑡𝑇0\leq t<T0 ≤ italic_t < italic_T, where e^tsubscript^𝑒𝑡\hat{e}_{t}over^ start_ARG italic_e end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and v^tsubscript^𝑣𝑡\hat{v}_{t}over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT solve (15). This steady state controller can be interpreted as an optimal benchmark that is decoupled from the system dynamics, has access to the current cost ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, and hence to the one step ahead reference, rt+1subscript𝑟𝑡1r_{t+1}italic_r start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT, and solves for its optimal, steady state solution. The following Lemma provides a side result on the regret of the SS-OGD algorithm with respect to the steady state controller (20), SSSSOGD(w,e0):=J(e0,uSSOGD)J(e0,u^)assignsuperscriptsubscriptSSSSOGD𝑤subscript𝑒0𝐽subscript𝑒0superscript𝑢SSOGD𝐽subscript𝑒0^𝑢\mathcal{R}_{\mathrm{SS}}^{\mathrm{SS-OGD}}(w,e_{0}):=J(e_{0},u^{\mathrm{SS-% OGD}})-J(e_{0},\hat{u})caligraphic_R start_POSTSUBSCRIPT roman_SS end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_SS - roman_OGD end_POSTSUPERSCRIPT ( italic_w , italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) := italic_J ( italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_u start_POSTSUPERSCRIPT roman_SS - roman_OGD end_POSTSUPERSCRIPT ) - italic_J ( italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_u end_ARG ).

Lemma 4.5.

Under Assumptions 1, 2, and 3, the regret of the SS-OGD algorithm with respect to the steady state benchmark (20) scales with the reference path length

SSSSOGD(w,e0)𝒪(1+L(T)).superscriptsubscriptSSSSOGD𝑤subscript𝑒0𝒪1𝐿𝑇\mathcal{R}_{\mathrm{SS}}^{\mathrm{SS-OGD}}(w,e_{0})\leq\mathcal{O}\left(1+L(T% )\right).caligraphic_R start_POSTSUBSCRIPT roman_SS end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_SS - roman_OGD end_POSTSUPERSCRIPT ( italic_w , italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ≤ caligraphic_O ( 1 + italic_L ( italic_T ) ) .
Proof 4.6.

The regret can be expressed as a function of the combined error state ε𝜀\varepsilonitalic_ε evolving according to (18). Defining Q~i:=Q~assignsubscript~𝑄𝑖~𝑄\tilde{Q}_{i}:=\tilde{Q}over~ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := over~ start_ARG italic_Q end_ARG for all 0i<T0𝑖𝑇0\leq i<T0 ≤ italic_i < italic_T and Q~Tsubscript~𝑄𝑇\tilde{Q}_{T}over~ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT as in (21) in Appendix 7

SSSSOGD(w,e0)Q~(2hR¯+ε¯)t=0Tεt,superscriptsubscriptSSSSOGD𝑤subscript𝑒0norm~𝑄2¯𝑅¯𝜀superscriptsubscript𝑡0𝑇normsubscript𝜀𝑡\mathcal{R}_{\mathrm{SS}}^{\mathrm{SS-OGD}}(w,e_{0})\leq\|\tilde{Q}\|\left(2h% \bar{R}+\bar{\varepsilon}\right)\sum_{t=0}^{T}\|\varepsilon_{t}\|,caligraphic_R start_POSTSUBSCRIPT roman_SS end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_SS - roman_OGD end_POSTSUPERSCRIPT ( italic_w , italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ≤ ∥ over~ start_ARG italic_Q end_ARG ∥ ( 2 italic_h over¯ start_ARG italic_R end_ARG + over¯ start_ARG italic_ε end_ARG ) ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ italic_ε start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ,

using (19) and z^thR¯,tnormsubscript^𝑧𝑡¯𝑅for-all𝑡\|\hat{z}_{t}\|\leq h\bar{R},\;\forall t∥ over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ italic_h over¯ start_ARG italic_R end_ARG , ∀ italic_t. Then

SSSSOGD(w,e0)cA~Q~(2hr¯+ε¯)1λA~(b+hL(T)),superscriptsubscriptSSSSOGD𝑤subscript𝑒0subscript𝑐~𝐴norm~𝑄2¯𝑟¯𝜀1subscript𝜆~𝐴𝑏𝐿𝑇\mathcal{R}_{\mathrm{SS}}^{\mathrm{SS-OGD}}(w,e_{0})\leq\frac{c_{\tilde{A}}\|% \tilde{Q}\|\left(2h\bar{r}+\bar{\varepsilon}\right)}{1-\lambda_{\tilde{A}}}% \left(b+hL(T)\right),caligraphic_R start_POSTSUBSCRIPT roman_SS end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_SS - roman_OGD end_POSTSUPERSCRIPT ( italic_w , italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ≤ divide start_ARG italic_c start_POSTSUBSCRIPT over~ start_ARG italic_A end_ARG end_POSTSUBSCRIPT ∥ over~ start_ARG italic_Q end_ARG ∥ ( 2 italic_h over¯ start_ARG italic_r end_ARG + over¯ start_ARG italic_ε end_ARG ) end_ARG start_ARG 1 - italic_λ start_POSTSUBSCRIPT over~ start_ARG italic_A end_ARG end_POSTSUBSCRIPT end_ARG ( italic_b + italic_h italic_L ( italic_T ) ) ,

using (19), and the Cauchy Product inequality (17).

5 Numerical Example

The SS-OGD algorithm is implemented on a linearized quadrotor model [19] in closed-loop with a PI velocity controller [20], to track a reference trajectory in two dimensions. In particular, we consider the following model

A=[1.00000.096000.04001.00000.0960.0400000.894000.7030000.8940.70300000.1930.4520000.193000.452],𝐴matrix1.00000.096000.04001.00000.0960.0400000.894000.7030000.8940.70300000.1930.4520000.193000.452\displaystyle\leavevmode\resizebox{}{35.56593pt}{$A=\begin{bmatrix}1.000&0&0.0% 96&0&0&0.040\\ 0&1.000&0&0.096&-0.040&0\\ 0&0&0.894&0&0&0.703\\ 0&0&0&0.894&-0.703&0\\ 0&0&0&0.193&0.452&0\\ 0&0&-0.193&0&0&0.452\end{bmatrix}$},italic_A = [ start_ARG start_ROW start_CELL 1.000 end_CELL start_CELL 0 end_CELL start_CELL 0.096 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0.040 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 1.000 end_CELL start_CELL 0 end_CELL start_CELL 0.096 end_CELL start_CELL - 0.040 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0.894 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0.703 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0.894 end_CELL start_CELL - 0.703 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0.193 end_CELL start_CELL 0.452 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL - 0.193 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0.452 end_CELL end_ROW end_ARG ] ,
B=[0.00400.106000.19300.00400.1060.1930],𝐵superscriptmatrix0.00400.106000.19300.00400.1060.1930top\displaystyle\leavevmode\resizebox{}{15.6491pt}{$B=\begin{bmatrix}0.004&0&0.10% 6&0&0&0.193\\ 0&0.004&0&0.106&-0.193&0\end{bmatrix}^{\top}$},italic_B = [ start_ARG start_ROW start_CELL 0.004 end_CELL start_CELL 0 end_CELL start_CELL 0.106 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0.193 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0.004 end_CELL start_CELL 0 end_CELL start_CELL 0.106 end_CELL start_CELL - 0.193 end_CELL start_CELL 0 end_CELL end_ROW end_ARG ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ,

where the state x:=[pxpyvxvyβρ]assign𝑥superscriptmatrixsubscript𝑝𝑥subscript𝑝𝑦subscript𝑣𝑥subscript𝑣𝑦𝛽𝜌topx:=\begin{bmatrix}p_{x}&p_{y}&v_{x}&v_{y}&\beta&\rho\end{bmatrix}^{\top}italic_x := [ start_ARG start_ROW start_CELL italic_p start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_CELL start_CELL italic_p start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT end_CELL start_CELL italic_v start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_CELL start_CELL italic_v start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT end_CELL start_CELL italic_β end_CELL start_CELL italic_ρ end_CELL end_ROW end_ARG ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT contains the horizontal position, velocity, the roll and pitch angles of the quadrotor, and the input u:=[vxtvyt]assign𝑢superscriptmatrixsuperscriptsubscript𝑣𝑥𝑡superscriptsubscript𝑣𝑦𝑡topu:=\begin{bmatrix}v_{x}^{t}&v_{y}^{t}\end{bmatrix}^{\top}italic_u := [ start_ARG start_ROW start_CELL italic_v start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL start_CELL italic_v start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT sets the target horizontal velocities. We take Q=diag(100,100,1,1,0,0)𝑄diag1001001100Q=\mathop{\rm diag}\left(100,100,1,1,0,0\right)italic_Q = roman_diag ( 100 , 100 , 1 , 1 , 0 , 0 ) and R=0.1I𝑅0.1𝐼R=0.1\cdot Iitalic_R = 0.1 ⋅ italic_I.

In the first experiment, the drone tracks the shape of the letters IFA for an a priori unknown reference with a fixed ΔrtΔsubscript𝑟𝑡\Delta r_{t}roman_Δ italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT for all timesteps. SS-OGD’s performance is compared to that of the causal CE controller that solves for the time-averaged infinite horizon steady state cost by fixing all future references to rtsubscript𝑟𝑡r_{t}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, i.e. ri=rt,t<i<Tformulae-sequencesubscript𝑟𝑖subscript𝑟𝑡𝑡𝑖𝑇\textstyle{r_{i}=r_{t}},\;t<i<Titalic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t < italic_i < italic_T. This is equivalent to solving (15) and fixing rt+1=rtsubscript𝑟𝑡1subscript𝑟𝑡r_{t+1}=r_{t}italic_r start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. The CE controller does not have access to rt+1subscript𝑟𝑡1r_{t+1}italic_r start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT, as opposed to the steady state benchmark in (20). The results in Figure 2 show that even though the CE controller appears to be tracking the reference better in the (px,py)subscript𝑝𝑥subscript𝑝𝑦(p_{x},p_{y})( italic_p start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ) plot, the time plot reveals that it lags behind the reference trajectory, resulting in around 3333 times higher regret, compared to SS-OGD. As the reference signal has a constant rate of change, the double integrator dynamics of the open loop transfer function from the error to the state, allow SS-OGD to achieve perfect position tracking. When this is not the case SS-OGD again outperforms the CE controller.

Refer to caption
Figure 1: Tracking a 2-D shape with a quadrotor model. The horizontal position plot (left panel) shows the apparent better tracking of the CE controller. However, the time plot (top right panel) shows its visible time lag; by contrast SS-OGD quickly converges to the reference. This leads to a lower rate of regret for SS-OGD (bottom right panel).
Refer to caption
Figure 2: Empirical regret of SS-OGD with a finite reference path length converges to a finite value, as expected from the theoretical bound.

In a second experiment the empirical worst-case regret as a function T𝑇Titalic_T is calculated. For each T𝑇Titalic_T, 60606060 random reference signals are simulated and the highest value of regret is noted. The references are generated such that ΔrtnormΔsubscript𝑟𝑡\|\Delta r_{t}\|∥ roman_Δ italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ decreases with a constant factor of 0.990.990.990.99. This ensures a finite path length and therefore a finite regret in the limit, as shown in Figure 2, and in agreement with the upper bound in Theorem 4.1.

6 Conclusion

In this letter, we reformulate the online LQT problem as an online control problem subject to adversarial disturbances. Within this framework, we propose a novel online gradient descent-based algorithm, called SS-OGD, and show that its dynamic regret scales with the path length of the reference signal. We validate the results on numerical examples with a quadrotor model. The improvement of the regret coefficients, as well as the case where the references are generated by some unknown dynamics is left to be studied in future work.

\appendices

7 System-Optimizer Dynamics

The combined system-optimizer dynamics matrices are

A~=[IαMαHBABK],B~=[2αSTQI],formulae-sequence~𝐴matrix𝐼𝛼𝑀𝛼𝐻𝐵𝐴𝐵𝐾~𝐵matrix2𝛼superscript𝑆𝑇𝑄𝐼\tilde{A}=\begin{bmatrix}I-\alpha M&-\alpha H\\ B&A-BK\end{bmatrix},\qquad\tilde{B}=\begin{bmatrix}-2\alpha S^{T}Q\\ I\end{bmatrix},over~ start_ARG italic_A end_ARG = [ start_ARG start_ROW start_CELL italic_I - italic_α italic_M end_CELL start_CELL - italic_α italic_H end_CELL end_ROW start_ROW start_CELL italic_B end_CELL start_CELL italic_A - italic_B italic_K end_CELL end_ROW end_ARG ] , over~ start_ARG italic_B end_ARG = [ start_ARG start_ROW start_CELL - 2 italic_α italic_S start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Q end_CELL end_ROW start_ROW start_CELL italic_I end_CELL end_ROW end_ARG ] ,

where M:=2(STQB+(IKS)R)assign𝑀2superscript𝑆𝑇𝑄𝐵superscript𝐼𝐾𝑆top𝑅M:=2\left(S^{T}QB{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{% 0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}+}(I-KS)^{\top}R\right)italic_M := 2 ( italic_S start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Q italic_B + ( italic_I - italic_K italic_S ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R ), and H:=2(STQ(ABK)(IKS)RK)assign𝐻2superscript𝑆𝑇𝑄𝐴𝐵𝐾superscript𝐼𝐾𝑆top𝑅𝐾H:=2(S^{T}Q(A-BK)-(I-KS)^{\top}RK)italic_H := 2 ( italic_S start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Q ( italic_A - italic_B italic_K ) - ( italic_I - italic_K italic_S ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R italic_K ). The objective function in (15) can be equivalently written as ztQ~ztsuperscriptsubscript𝑧𝑡top~𝑄subscript𝑧𝑡z_{t}^{\top}\tilde{Q}z_{t}italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over~ start_ARG italic_Q end_ARG italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT for 0<t<T0𝑡𝑇0<t<T0 < italic_t < italic_T and as zTQ~TzTsuperscriptsubscript𝑧𝑇topsubscript~𝑄𝑇subscript𝑧𝑇z_{T}^{\top}\tilde{Q}_{T}z_{T}italic_z start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over~ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT for t=T𝑡𝑇t=Titalic_t = italic_T, where

Q~=[RRKKRQ+KRK],Q~T=[𝟎m×m𝟎m×n𝟎n×mP].formulae-sequence~𝑄matrix𝑅𝑅𝐾superscript𝐾top𝑅𝑄superscript𝐾top𝑅𝐾subscript~𝑄𝑇matrixsubscript0𝑚𝑚subscript0𝑚𝑛subscript0𝑛𝑚𝑃\tilde{Q}\!=\!\begin{bmatrix}R&-RK\\ -K^{\top}R&Q\!+\!K^{\top}RK\end{bmatrix}\!,~{}\tilde{Q}_{T}\!=\!\begin{bmatrix% }\boldsymbol{0}_{m\times m}&\boldsymbol{0}_{m\times n}\\ \boldsymbol{0}_{n\times m}&P\end{bmatrix}\!.over~ start_ARG italic_Q end_ARG = [ start_ARG start_ROW start_CELL italic_R end_CELL start_CELL - italic_R italic_K end_CELL end_ROW start_ROW start_CELL - italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R end_CELL start_CELL italic_Q + italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R italic_K end_CELL end_ROW end_ARG ] , over~ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL bold_0 start_POSTSUBSCRIPT italic_m × italic_m end_POSTSUBSCRIPT end_CELL start_CELL bold_0 start_POSTSUBSCRIPT italic_m × italic_n end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_0 start_POSTSUBSCRIPT italic_n × italic_m end_POSTSUBSCRIPT end_CELL start_CELL italic_P end_CELL end_ROW end_ARG ] . (21)

Consider the coordinate transformation A~V:=VA~V1assignsubscript~𝐴𝑉𝑉~𝐴superscript𝑉1\tilde{A}_{V}:=V\tilde{A}V^{-1}over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT := italic_V over~ start_ARG italic_A end_ARG italic_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT

V=[I𝟎m×nSI],A~V=[IαM¯αHαSM¯αSH+(ABK)],formulae-sequence𝑉matrix𝐼subscript0𝑚𝑛𝑆𝐼subscript~𝐴𝑉matrix𝐼𝛼¯𝑀𝛼𝐻𝛼𝑆¯𝑀𝛼𝑆𝐻𝐴𝐵𝐾V=\begin{bmatrix}I&\boldsymbol{0}_{m\times n}\\ -S&I\end{bmatrix},\tilde{A}_{V}=\begin{bmatrix}I-\alpha\overline{M}&-\alpha H% \\ \alpha S\overline{M}&\alpha SH+(A-BK)\end{bmatrix},italic_V = [ start_ARG start_ROW start_CELL italic_I end_CELL start_CELL bold_0 start_POSTSUBSCRIPT italic_m × italic_n end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL - italic_S end_CELL start_CELL italic_I end_CELL end_ROW end_ARG ] , over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL italic_I - italic_α over¯ start_ARG italic_M end_ARG end_CELL start_CELL - italic_α italic_H end_CELL end_ROW start_ROW start_CELL italic_α italic_S over¯ start_ARG italic_M end_ARG end_CELL start_CELL italic_α italic_S italic_H + ( italic_A - italic_B italic_K ) end_CELL end_ROW end_ARG ] ,

with M¯:=M+HS=2(SQS+(IKS)R(IKS))assign¯𝑀𝑀𝐻𝑆2superscript𝑆top𝑄𝑆superscript𝐼𝐾𝑆top𝑅𝐼𝐾𝑆\overline{M}:=M+HS=2\left(S^{\top}QS+(I-KS)^{\top}R(I-KS)\right)over¯ start_ARG italic_M end_ARG := italic_M + italic_H italic_S = 2 ( italic_S start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_Q italic_S + ( italic_I - italic_K italic_S ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R ( italic_I - italic_K italic_S ) ) positive definite, as shown in Lemma 3.1. Using the small gain theorem for interconnected systems [14], the following, along with α<2/ρ(M¯)𝛼2𝜌¯𝑀\alpha<2/\rho(\overline{M})italic_α < 2 / italic_ρ ( over¯ start_ARG italic_M end_ARG ) is a sufficient condition for the stability of A~Vsubscript~𝐴𝑉\tilde{A}_{V}over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT and therefore of the dynamics (14)

αSM¯Hλmin(M¯)maxw[ejwIαSH(ABK)]1<1.𝛼norm𝑆¯𝑀norm𝐻subscript𝜆𝑚𝑖𝑛¯𝑀subscript𝑤normsuperscriptdelimited-[]superscript𝑒𝑗𝑤𝐼𝛼𝑆𝐻𝐴𝐵𝐾11\alpha\cdot\frac{\|S\overline{M}\|\|H\|}{\lambda_{min}(\overline{M})}\cdot\max% _{w\in\mathbb{R}}\big{\|}\big{[}e^{jw}I-\alpha SH-(A-BK)\big{]}^{-1}\big{\|}<1.italic_α ⋅ divide start_ARG ∥ italic_S over¯ start_ARG italic_M end_ARG ∥ ∥ italic_H ∥ end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT ( over¯ start_ARG italic_M end_ARG ) end_ARG ⋅ roman_max start_POSTSUBSCRIPT italic_w ∈ blackboard_R end_POSTSUBSCRIPT ∥ [ italic_e start_POSTSUPERSCRIPT italic_j italic_w end_POSTSUPERSCRIPT italic_I - italic_α italic_S italic_H - ( italic_A - italic_B italic_K ) ] start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ < 1 .

Since ABK𝐴𝐵𝐾A-BKitalic_A - italic_B italic_K is stable, there always exists an arbitrarily small α>0𝛼0\alpha>0italic_α > 0 such that the above is fulfilled.

8 Proof of Theorem 3.3

Given a disturbance vector wtsubscript𝑤𝑡w_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and a bias input v𝑣vitalic_v, the steady state of the dynamics (3) with the control law (8) is given by e=Sv+S^wt𝑒𝑆𝑣^𝑆subscript𝑤𝑡e=Sv+\hat{S}w_{t}italic_e = italic_S italic_v + over^ start_ARG italic_S end_ARG italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, where S^:=(IA+BK)1assign^𝑆superscript𝐼𝐴𝐵𝐾1\hat{S}:=(I-A+BK)^{-1}over^ start_ARG italic_S end_ARG := ( italic_I - italic_A + italic_B italic_K ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. Substituting this in the objective function of (15), one can confirm that the v𝑣vitalic_v that minimizes that cost is the unique (as shown in Lemma 3.1) solution of (SQS+(IKS)TR(IKS))v=((IKS)RKS^SQS^)wtsuperscript𝑆top𝑄𝑆superscript𝐼𝐾𝑆𝑇𝑅𝐼𝐾𝑆𝑣superscript𝐼𝐾𝑆top𝑅𝐾^𝑆superscript𝑆top𝑄^𝑆subscript𝑤𝑡\left(S^{\top}QS+{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{% 0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}(I-KS)^{T}R(I-KS)}% \right)v=\left((I-KS)^{\top}RK\hat{S}-S^{\top}Q\hat{S}\right)w_{t}( italic_S start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_Q italic_S + ( italic_I - italic_K italic_S ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_R ( italic_I - italic_K italic_S ) ) italic_v = ( ( italic_I - italic_K italic_S ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R italic_K over^ start_ARG italic_S end_ARG - italic_S start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_Q over^ start_ARG italic_S end_ARG ) italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Using the definition of A~~𝐴\tilde{A}over~ start_ARG italic_A end_ARG, the steady state v^^𝑣\hat{v}over^ start_ARG italic_v end_ARG of the SS-OGD update (13) for a constant wtsubscript𝑤𝑡w_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT solves 0=Mv^+H(Sv^+S^wt)+2SQwt0𝑀^𝑣𝐻𝑆^𝑣^𝑆subscript𝑤𝑡2superscript𝑆top𝑄subscript𝑤𝑡0=M\hat{v}+H(S\hat{v}+\hat{S}w_{t})+2S^{\top}Qw_{t}0 = italic_M over^ start_ARG italic_v end_ARG + italic_H ( italic_S over^ start_ARG italic_v end_ARG + over^ start_ARG italic_S end_ARG italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + 2 italic_S start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_Q italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Then

00\displaystyle 0 =STQ(I+(ABK)S^)Bv^+(IKS)TR(IKS)v^absentsuperscript𝑆𝑇𝑄𝐼𝐴𝐵𝐾^𝑆𝐵^𝑣superscript𝐼𝐾𝑆𝑇𝑅𝐼𝐾𝑆^𝑣\displaystyle=S^{T}Q({\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{% rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}I}+(A-BK)\hat% {S})B\hat{v}+(I-KS)^{T}R(I-KS)\hat{v}= italic_S start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Q ( italic_I + ( italic_A - italic_B italic_K ) over^ start_ARG italic_S end_ARG ) italic_B over^ start_ARG italic_v end_ARG + ( italic_I - italic_K italic_S ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_R ( italic_I - italic_K italic_S ) over^ start_ARG italic_v end_ARG
+STQ(I+(ABK)S^)wt(IKS)TRKS^wt.superscript𝑆𝑇𝑄𝐼𝐴𝐵𝐾^𝑆subscript𝑤𝑡superscript𝐼𝐾𝑆𝑇𝑅𝐾^𝑆subscript𝑤𝑡\displaystyle+S^{T}Q(I+(A-BK)\hat{S})w_{t}-(I-KS)^{T}RK{\color[rgb]{0,0,0}% \definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}% \pgfsys@color@gray@fill{0}\hat{S}w_{t}}.+ italic_S start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Q ( italic_I + ( italic_A - italic_B italic_K ) over^ start_ARG italic_S end_ARG ) italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - ( italic_I - italic_K italic_S ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_R italic_K over^ start_ARG italic_S end_ARG italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT .

Since I+(ABK)S^=S^𝐼𝐴𝐵𝐾^𝑆^𝑆I+(A-BK)\hat{S}=\hat{S}italic_I + ( italic_A - italic_B italic_K ) over^ start_ARG italic_S end_ARG = over^ start_ARG italic_S end_ARG, and S=S^B𝑆^𝑆𝐵S=\hat{S}Bitalic_S = over^ start_ARG italic_S end_ARG italic_B, the two equations coincide, leading to the unique steady state solution v^^𝑣\hat{v}over^ start_ARG italic_v end_ARG.

References

  • [1] E. Hazan, S. Kakade, and K. Singh, “The nonstochastic control problem,” in Algorithmic Learning Theory, pp. 408–421, PMLR, 2020.
  • [2] D. Foster and M. Simchowitz, “Logarithmic regret for adversarial online control,” in International Conference on Machine Learning, pp. 3211–3221, PMLR, 2020.
  • [3] Y. Abbasi-Yadkori, P. Bartlett, and V. Kanade, “Tracking adversarial targets,” in International Conference on Machine Learning, pp. 369–377, PMLR, 2014.
  • [4] M. Nonhoff and M. A. Müller, “Online gradient descent for linear dynamical systems,” IFAC-PapersOnLine, vol. 53, no. 2, pp. 945–952, 2020.
  • [5] M. Nonhoff, J. Köhler, and M. A. Müller, “Online convex optimization for constrained control of linear systems using a reference governor,” arXiv preprint arXiv:2211.09088, 2022.
  • [6] E. C. Balta, A. Iannelli, R. S. Smith, and J. Lygeros, “Regret analysis of online gradient descent-based iterative learning control with model mismatch,” in 2022 IEEE 61st Conference on Decision and Control (CDC), pp. 1479–1484, IEEE, 2022.
  • [7] Y. Li, X. Chen, and N. Li, “Online optimal control with linear dynamics and predictions: Algorithms and regret analysis,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  • [8] A. Hauswirth, S. Bolognani, G. Hug, and F. Dörfler, “Optimization algorithms as robust feedback controllers,” arXiv preprint arXiv:2103.11329, 2021.
  • [9] L. Cothren, A. M. Ospina, G. Bianchin, and E. Dall’Anese, “Online optimization of linear-time invariant dynamical systems with cost perception,” in 2022 56th Asilomar Conference on Signals, Systems, and Computers, pp. 1357–1361, IEEE, 2022.
  • [10] A. Karapetyan, A. Tsiamis, E. C. Balta, A. Iannelli, and J. Lygeros, “Implications of regret on stability of linear dynamical systems,” IFAC-PapersOnLine, vol. 56, no. 2, pp. 2583–2588, 2023.
  • [11] F. L. Lewis, D. Vrabie, and V. L. Syrmos, Optimal control. John Wiley & Sons, 2012.
  • [12] G. Goel and B. Hassibi, “The power of linear controllers in LQR control,” in 2022 IEEE 61st Conference on Decision and Control (CDC), pp. 6652–6657, IEEE, 2022.
  • [13] M. Zinkevich, “Online convex programming and generalized infinitesimal gradient ascent,” in Proceedings of the 20th international conference on machine learning (icml-03), pp. 928–936, 2003.
  • [14] M. Green and D. J. Limebeer, Linear robust control. Courier Corporation, 2012.
  • [15] S. Kakade and J. Langford, “Approximately optimal approximate reinforcement learning,” in Proceedings of the Nineteenth International Conference on Machine Learning, pp. 267–274, 2002.
  • [16] R. Zhang, Y. Li, and N. Li, “On the regret analysis of online LQR control with predictions,” in 2021 American Control Conference (ACC), pp. 697–703, IEEE, 2021.
  • [17] A. Karapetyan, A. Iannelli, and J. Lygeros, “On the regret of subscript\mathcal{H}_{\infty}caligraphic_H start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT control,” in 2022 IEEE 61st Conference on Decision and Control (CDC), pp. 6181–6186, IEEE, 2022.
  • [18] C. Yu, G. Shi, S.-J. Chung, Y. Yue, and A. Wierman, “The power of predictions in online control,” Advances in Neural Information Processing Systems, vol. 33, pp. 1994–2004, 2020.
  • [19] P. N. Beuchat, “N-rotor vehicles: modelling, control, and estimation,” ETH Zurich Research Collection, 2019.
  • [20] A. Karapetyan, “Distributed Control of Flying Quadrotors.” https://github.com/akarapet/admm_collision_avoidance, June 2020.