Deep Reinforcement Learning (DRL) Implementation

This repository contains implementations of various deep reinforcement learning algorithms, focusing on fundamental concepts and practical applications.

Project Structure

It is recommended to follow the material in the given order.

Model Free Learning

Discrete Problems

Monte Carlo Methods

Implementation of Monte Carlo (MC) algorithms using the Blackjack environment as an example:

MC Prediction
- First-visit MC prediction for estimating action-value function
- Policy evaluation with stochastic limit policy
MC Control with Incremental Mean
- GLIE (Greedy in the Limit with Infinite Exploration)
- Epsilon-greedy policy implementation
- Incremental mean updates
MC Control with Constant-alpha
- Fixed learning rate approach
- Enhanced control over update process

Temporal Difference Methods

Implementation of TD algorithms on both Blackjack and CliffWalking environments:

SARSA (On-Policy TD Control)
- State-Action-Reward-State-Action
- On-policy learning with epsilon-greedy exploration
- Episode-based updates with TD(0)
Q-Learning (Off-Policy TD Control)
- Also known as SARSA-Max
- Off-policy learning using maximum action values
- Optimal action-value function approximation
Expected SARSA
- Extension of SARSA using expected values
- More stable learning through action probability weighting
- Combines benefits of SARSA and Q-Learning

Continuous Problems

Discretization

Q-Learning (Off-Policy TD Control)
- Q-Learning to the MountainCar environment using discretized state spaces
- State space discretization through uniform grid representation for continuous variables
- Exploration of the impact of discretization granularity on learning performance
Q-Learning (Off-Policy TD Control) with Tile Coding
- Q-Learning applied to the Acrobot environment using tile coding for state space representation
- Tile coding as a method to efficiently represent continuous state spaces by overlapping feature grids

Model Based Learning

Value Based Iteration

Deep Q Network with Experience Replay
- A neural network is used to approximate the Q-value function $ Q(s, a) $.
- Breaks the temporal correlation of samples by randomly sampling from a replay buffer.
- Periodically updates the target network's parameters to reduce instability in target value estimation.

Environments Brief

Blackjack: Classic card game environment for policy learning
CliffWalking: Grid-world navigation task with negative rewards and cliff hazards
Taxi-v3: Grid-world transportation task where an agent learns to efficiently navigate, pick up and deliver passengers to designated locations while optimizing rewards.
MountainCar: Continuous control task where an underpowered car must learn to build momentum by moving back and forth to overcome a steep hill and reach the goal position.
Acrobot: A two-link robotic arm environment where the goal is to swing the end of the second link above a target height by applying torque at the actuated joint. It challenges agents to solve nonlinear dynamics and coordinate the motion of linked components efficiently.
LunarLander: A physics-based environment where an agent controls a lunar lander to safely land on a designated pad. The task involves managing fuel consumption, balancing thrust, and handling the dynamics of gravity and inertia.

Requirements

Create (and activate) a new environment with Python 3.10 and PyTorch 2.5.1

Linux or Mac:

conda create -n DRL python=3.10
conda activate DRL

Installation

Clone the repository:

git clone https://github.com/deepbiolab/drl.git
cd drl

Install dependencies:

pip install -r requirements.txt

Usage

Exmaple: Monte Carlo Methods

Run the Monte Carlo implementation:

cd monte-carlo-methods
python monte_carlo.py

Or explore the detailed notebook:

Future Work

Comprehensive implementations of fundamental RL algorithms
- Q-Learning
- SARSA
- Monte-Carlo Control
- Deep Q-Network
- Hill Climbing
- REINFORCE
- A2C, A3C
- Proximal Policy Optimization
- Deep Deterministic Policy Gradients
- MCTS, AlphaZero

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
model-based-learning/value-iteration		model-based-learning/value-iteration
model-free-learning		model-free-learning
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Reinforcement Learning (DRL) Implementation

Project Structure

Model Free Learning

Discrete Problems

Monte Carlo Methods

Temporal Difference Methods

Continuous Problems

Discretization

Model Based Learning

Value Based Iteration

Environments Brief

Requirements

Installation

Usage

Exmaple: Monte Carlo Methods

Future Work

About

Releases

Packages

Languages

License

deepbiolab/drl

Folders and files

Latest commit

History

Repository files navigation

Deep Reinforcement Learning (DRL) Implementation

Project Structure

Model Free Learning

Discrete Problems

Monte Carlo Methods

Temporal Difference Methods

Continuous Problems

Discretization

Model Based Learning

Value Based Iteration

Environments Brief

Requirements

Installation

Usage

Exmaple: Monte Carlo Methods

Future Work

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages