The key to fast iterations of research experiments are well-written baseline algorithms. Unfortunately, most big research companies write their code in TensorFlow, (openai/baselines, openai/spinningup, deepmind/trfl, google/dopamine) so the PyTorch implementations are less well known. To help PyTorch deep RL researchers, we compare and recommend open source implementations of policy gradient algorithms in PyTorch.

Note that due to the big difference between PyTorch 0.3 and 0.4, we only include repositories with PyTorch versions 0.4 or above.

Policy Gradient Methods

A3C

Asynchronous Advantage Actor Critic

[arXiv Paper]

  pytorch-a3c
Author ikostrikov
Version 0.4.1
Pretrained Models
Stars 519

A2C

Advantage Actor Critic

  pytorch-a2c-ppo-acktr RL-Adventure-2 vel DeepRL
Author ikostrikov higgsfield MillionIntegrals ShangtongZhang
Version 0.4 0.4 0.4.1 0.4.0
Pretrained Models
Stars 1077 1521 194 1034

ACER

Actor Critic with Experience Replay

[arXiv Paper]

  ACER RL-Adventure-2 vel
Author Kaixhin higgsfield MillionIntegrals
Version 0.4 0.4 0.4.1
Pretrained Models
Stars 138 1521 194

ACKTR

Actor Critic using Kronecker-Factored Trust Region

[arXiv Paper]

  pytorch-a2c-ppo-acktr
Author ikostrikov
Version 0.4
Pretrained Models
Stars 1077

TRPO

Trust Region Policy Optimization

[arXiv Paper]

  pytorch-trpo vel
Author ikostrikov MillionIntegrals
Version 0.4 0.4.1
Pretrained Models
Stars 170 194

PPO

Proximal Policy Optimization

[arXiv Paper]

  pytorch-a2c-ppo-acktr RL-Adventure-2 vel DeepRL
Author ikostrikov higgsfield MillionIntegrals ShangtongZhang
Version 0.4 0.4 0.4.1 0.4.0
Pretrained Models
Stars 1077 1521 194 1034

SAC

Soft Actor-Critic

[arXiv Paper]

  rlkit RL-Adventure-2
Author vitchyr higgsfield
Version 0.4 0.4
Pretrained Models
Stars 491 1521

Twin SAC

Combination of SAC and TD3

  rlkit
Author vitchyr
Version 0.4
Pretrained Models
Stars 491

Recommendation

Although vitchyr/rlkit has SAC and Twin SAC, which are state-of-the-art methods in robotic control, it unfortunately does not include PPO, the standard baseline policy gradient algorithm. We found ikostrikov/pytorch-a2c-ppo-acktr and ShangtongZhang/DeepRL to be the best implementation of PPO, allowing us to run code almost immediately after cloning the repository. We gave bonus points to this repository because it also included some pretrained models.

Verdict: ikostrikov/pytorch-a2c-ppo-acktr

Deterministic Policy Gradient Methods

DDPG

Deep Deterministic Policy Gradient

[arXiv Paper]

  rlkit pytorch-ddpg-naf RL-Adventure-2 vel DeepRL
Author vitchyr ikostrikov higgsfield MillionIntegrals ShangtongZhang
Version 0.4 0.4 0.4 0.4.1 0.4.0
Pretrained Models
Stars 491 136 1521 194 1034

TD3

Twin-Delayed Deep Deterministic Policy Gradient

[arXiv Paper]

  rlkit RL-Adventure-2
Author vitchyr higgsfield
Version 0.4 0.4
Pretrained Models
Stars 491 1521

HER+TD3

Hindsight Experience Replay + Twin-Delayed Deep Deterministic Policy Gradient

[arXiv Paper]

  rlkit
Author vitchyr
Version 0.4
Pretrained Models
Stars 491

Recommendation

For Deterministic Policy Gradient methods, vitchyr/rlkit and higgsfield/RL-Adventure-2 were the only repositories with both DDPG and TD3 implemented. We found higgsfield/RL-Adventure-2 to be more suitable for understanding the algorithm than running it, so we recommend using vitchyr/rlkit as your baseline.

Verdict: vitchyr/rlkit

Checked Repositories


If you believe we missed a great PyTorch RL repository, please tell us in the comment section!