The key to fast iterations of research experiments are well-written baseline algorithms. Unfortunately, most big research companies write their code in TensorFlow, (openai/baselines, openai/spinningup, deepmind/trfl, google/dopamine) so the PyTorch implementations are less well known. To help PyTorch deep RL researchers, we compare and recommend open source implementations of policy gradient algorithms in PyTorch.
Note that due to the big difference between PyTorch 0.3 and 0.4, we only include repositories with PyTorch versions 0.4 or above.
Policy Gradient Methods
A3C
Asynchronous Advantage Actor Critic
pytorch-a3c | |
---|---|
Author | ikostrikov |
Version | 0.4.1 |
Pretrained Models | ✘ |
Stars | 519 |
A2C
Advantage Actor Critic
pytorch-a2c-ppo-acktr | RL-Adventure-2 | vel | DeepRL | |
---|---|---|---|---|
Author | ikostrikov | higgsfield | MillionIntegrals | ShangtongZhang |
Version | 0.4 | 0.4 | 0.4.1 | 0.4.0 |
Pretrained Models | ✔ | ✘ | ✘ | ✘ |
Stars | 1077 | 1521 | 194 | 1034 |
ACER
Actor Critic with Experience Replay
ACER | RL-Adventure-2 | vel | |
---|---|---|---|
Author | Kaixhin | higgsfield | MillionIntegrals |
Version | 0.4 | 0.4 | 0.4.1 |
Pretrained Models | ✘ | ✘ | ✘ |
Stars | 138 | 1521 | 194 |
ACKTR
Actor Critic using Kronecker-Factored Trust Region
pytorch-a2c-ppo-acktr | |
---|---|
Author | ikostrikov |
Version | 0.4 |
Pretrained Models | ✔ |
Stars | 1077 |
TRPO
Trust Region Policy Optimization
pytorch-trpo | vel | |
---|---|---|
Author | ikostrikov | MillionIntegrals |
Version | 0.4 | 0.4.1 |
Pretrained Models | ✘ | ✘ |
Stars | 170 | 194 |
PPO
Proximal Policy Optimization
pytorch-a2c-ppo-acktr | RL-Adventure-2 | vel | DeepRL | |
---|---|---|---|---|
Author | ikostrikov | higgsfield | MillionIntegrals | ShangtongZhang |
Version | 0.4 | 0.4 | 0.4.1 | 0.4.0 |
Pretrained Models | ✔ | ✘ | ✘ | ✘ |
Stars | 1077 | 1521 | 194 | 1034 |
SAC
Soft Actor-Critic
rlkit | RL-Adventure-2 | |
---|---|---|
Author | vitchyr | higgsfield |
Version | 0.4 | 0.4 |
Pretrained Models | ✘ | ✘ |
Stars | 491 | 1521 |
Twin SAC
Combination of SAC and TD3
rlkit | |
---|---|
Author | vitchyr |
Version | 0.4 |
Pretrained Models | ✘ |
Stars | 491 |
Recommendation
Although vitchyr/rlkit has SAC and Twin SAC, which are state-of-the-art methods in robotic control, it unfortunately does not include PPO, the standard baseline policy gradient algorithm. We found ikostrikov/pytorch-a2c-ppo-acktr and ShangtongZhang/DeepRL to be the best implementation of PPO, allowing us to run code almost immediately after cloning the repository. We gave bonus points to this repository because it also included some pretrained models.
Verdict: ikostrikov/pytorch-a2c-ppo-acktr
Deterministic Policy Gradient Methods
DDPG
Deep Deterministic Policy Gradient
rlkit | pytorch-ddpg-naf | RL-Adventure-2 | vel | DeepRL | |
---|---|---|---|---|---|
Author | vitchyr | ikostrikov | higgsfield | MillionIntegrals | ShangtongZhang |
Version | 0.4 | 0.4 | 0.4 | 0.4.1 | 0.4.0 |
Pretrained Models | ✘ | ✘ | ✘ | ✘ | ✘ |
Stars | 491 | 136 | 1521 | 194 | 1034 |
TD3
Twin-Delayed Deep Deterministic Policy Gradient
rlkit | RL-Adventure-2 | |
---|---|---|
Author | vitchyr | higgsfield |
Version | 0.4 | 0.4 |
Pretrained Models | ✘ | ✘ |
Stars | 491 | 1521 |
HER+TD3
Hindsight Experience Replay + Twin-Delayed Deep Deterministic Policy Gradient
rlkit | |
---|---|
Author | vitchyr |
Version | 0.4 |
Pretrained Models | ✘ |
Stars | 491 |
Recommendation
For Deterministic Policy Gradient methods, vitchyr/rlkit and higgsfield/RL-Adventure-2 were the only repositories with both DDPG and TD3 implemented. We found higgsfield/RL-Adventure-2 to be more suitable for understanding the algorithm than running it, so we recommend using vitchyr/rlkit as your baseline.
Verdict: vitchyr/rlkit
Checked Repositories
- vitchyr/rlkit
- ikostrikov/pytorch-a3c
- ikostrikov/pytorch-trpo
- ikostrikov/pytorch-a2c-ppo-acktr
- ikostrikov/pytorch-ddpg-naf
- Kaixhin/ACER
- higgsfield/RL-Adventure-2
- MillionIntegrals/vel
- ShangtongZhang/DeepRL (Added 2018/12/29)
If you believe we missed a great PyTorch RL repository, please tell us in the comment section!