Modified version of the cart-pole environment where the agent has to balance the pole on the left or right side of the screen based on a state returned by the environment.
The code is a notebook file which is ready to use on Google Colab.
[RLlib] is used for training the agent using PPO and DQN algorithms.
- Add more algorithms
- Experiment with more advanced exploration methods