endtoendAI

AI for Prosthetics

This is a guide for the NIPS 2018 AI for Prosthetics challenge with a helper package and a series of blog posts.

Competition

The AI for Prosthetics challenge is one of NIPS 2018 Competition tracks. In this challenge, the participants seek to build an agent that can make a 3D human model with prosthetics run.

Prosthetics

osim-rl-helper

This package contains basic learning agents using popular reinforcement learning libraries such as keras-rl and tensorforce. It provides a good starting point for those unfamiliar with practical reinforcement learning. Currently, the package contains Deep Determinsitic Policy Gradient (DDPG) agent and Proximal Policy Optimization (PPO) agent. Other popular methods and libraries such as Deep Q-Networks (DQN) and rllib should be added soon.

Blog Posts

I wrote posts about the competition, the status of the leaderboard, the properties of the environment, applicable techniques, and updates of the osim-rl-helper package. I have discussed general approaches to the problem, possible methods of reward shaping, and other techniques from recent literature that could be worth experimenting.

AI for Prosthetics Week 9 - 10: Unorthodox Approaches

reinforcement-learning ai-for-prosthetics competition

We end the series by exploring possible unorthodox approaches for the competition. These are approaches that deviate from the popular policy gradient methods such as DDPG or PPO.

AI for Prosthetics Week 6: General Techniques of RL

reinforcement-learning ai-for-prosthetics competition

This week, we take a step back from the competition and study common techniques used in Reinforcement Learning.

AI for Prosthetics Week 5: Understanding the Reward

reinforcement-learning ai-for-prosthetics competition

The goal of reinforcement learning is defined by the reward signal - to maximize the cumulative reward throughout an episode. In some ways, the reward is the most important aspect of the environment for the agent: even if it does not know about values of states or actions (like Evolutionary Strategies), if it can consistently get high return (cumulative reward), it is a great agent.

AI for Prosthetics Week 3-4: Understanding the Observation Space

reinforcement-learning ai-for-prosthetics competition

The observation can be roughly divided into five components: the body parts, the joints, the muscles, the forces, and the center of mass. For each body part component, the agent observes its position, velocity, acceleration, rotation, rotational velocity, and rotational acceleration.

AI for Prosthetics Week 2: Understanding the Action Space

reinforcement-learning ai-for-prosthetics competition

Last week, we saw how a valid action has 19 numbers, each between 0 and 1. The 19 numbers represented the amount of force to put to each muscle. I know barely anything about muscles, so I decided to manually go through all the muscles to understand the effects of each muscle...

AI for Prosthetics Week 1: Understanding the Challenge

reinforcement-learning ai-for-prosthetics competition

The AI for Prosthetics challenge is one of NIPS 2018 Competition tracks. In this challenge, the participants seek to build an agent that can make a 3D model of human with prosthetics run. This challenge is a continuation of the Learning to Run challenge (shown below) that was part of NIPS 2017 Competition Track. The challenge was enhanced in three ways...