Computational-Statistical Gaps in Reinforcement Learning

Kane, Daniel; Liu, Sihan; Lovett, Shachar; Mahajan, Gaurav

Computer Science > Machine Learning

arXiv:2202.05444 (cs)

[Submitted on 11 Feb 2022 (v1), last revised 3 Jul 2022 (this version, v2)]

Title:Computational-Statistical Gaps in Reinforcement Learning

Authors:Daniel Kane, Sihan Liu, Shachar Lovett, Gaurav Mahajan

View PDF

Abstract:Reinforcement learning with function approximation has recently achieved tremendous results in applications with large state spaces. This empirical success has motivated a growing body of theoretical work proposing necessary and sufficient conditions under which efficient reinforcement learning is possible. From this line of work, a remarkably simple minimal sufficient condition has emerged for sample efficient reinforcement learning: MDPs with optimal value function $V^*$ and $Q^*$ linear in some known low-dimensional features. In this setting, recent works have designed sample efficient algorithms which require a number of samples polynomial in the feature dimension and independent of the size of state space. They however leave finding computationally efficient algorithms as future work and this is considered a major open problem in the community.
In this work, we make progress on this open problem by presenting the first computational lower bound for RL with linear function approximation: unless NP=RP, no randomized polynomial time algorithm exists for deterministic transition MDPs with a constant number of actions and linear optimal value functions. To prove this, we show a reduction from Unique-Sat, where we convert a CNF formula into an MDP with deterministic transitions, constant number of actions and low dimensional linear optimal value functions. This result also exhibits the first computational-statistical gap in reinforcement learning with linear function approximation, as the underlying statistical problem is information-theoretically solvable with a polynomial number of queries, but no computationally efficient algorithm exists unless NP=RP. Finally, we also prove a quasi-polynomial time lower bound under the Randomized Exponential Time Hypothesis.

Comments:	Updated references. Added discussion on linear Q* and V* only over reachable states
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Complexity (cs.CC); Optimization and Control (math.OC); Machine Learning (stat.ML)
Cite as:	arXiv:2202.05444 [cs.LG]
	(or arXiv:2202.05444v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2202.05444

Submission history

From: Gaurav Mahajan [view email]
[v1] Fri, 11 Feb 2022 04:48:35 UTC (38 KB)
[v2] Sun, 3 Jul 2022 00:53:33 UTC (38 KB)

Computer Science > Machine Learning

Title:Computational-Statistical Gaps in Reinforcement Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Computational-Statistical Gaps in Reinforcement Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators