In this paper, we introduce a new version of Q(lambda) that does exactly that, without significantly increased algorithmic complexity.
We apply this technique to derive first a new off-policy version of TD(λ), called PTD(λ), and then our new Q(λ), called PQ(λ).
En route to our new Q(λ), we introduce a new derivation technique based on the forward-view/backward-view analysis familiar from TD(λ) but extended to apply at ...
A new version of Q(λ) is introduced that does exactly that, without significantly increased algorithmic complexity, and introduces a new derivation ...
A new version of Q( ) is introduced that approaches exactness in the conventional online case as the step-size parameter approaches zero, and a new ...
Finally, we intro- duce an interim forward view for action values and use it to derive and prove equivalence of our new Q(λ). Like the original equivalences, ...
A new Q(lambda) with interim forward view and Monte Carlo equivalence (pdf). Q-learning, the most popular of reinforcement learning algorithms, has always ...
People also ask
What is the difference between TD O and Monte Carlo value function update equations?
What is Monte Carlo in reinforcement learning?
What is the equation for the Monte Carlo update?
Is Monte Carlo tree search a reinforcement learning?
A new Q (λ) with interim forward view and Monte Carlo equivalence. RS Sutton, AR Mahmood, D Precup, M CA, H van Hasselt, U CA. (ICML) In International ...
A new Q(λ) with interim forward view and Monte Carlo equivalence. In Proceedings of the 31st International Conference on Machine Learning (ICML), Beijing ...
Aug 11, 2016 · A new q (λ) with interim forward view and monte carlo equivalence. In International. Conference on Machine Learning, pages 568–576, 2014. 18 ...