Autonomous-lunar-landing is a reinforcement learning project that leverages Deep Q-Network (DQN) algorithm to train an autonomous agent to land a ship safely without collision. The agent must learn to land the ship in the designated spot without crashing into the two yellow flags. The agent learns how to do this using the rewards (numeric feedback) it receives for its actions.
This is a project is a reimplementation of the original DQN algorithm from scratch based on paper.
Note: Episode refers to training iteration in the context of this project and Reinforcement Learning.
- Deep Q-Network (DQN) Implementation: Customizable DQN target network for stable learning. In this project,DQN algorithm has been implemented from scratch without any dependencies from the original paper (look at algorithm 1 in the paper for the pseudocode).
- Advanced Replay Buffer Management: Enhanced buffer training frequency to optimize learning rates. Replay buffer is used to store the experiences of the agent during training.
- Comprehensive Documentation: Detailed README with environment setup, feature descriptions, and usage guidelines.
- Performance Logging and Visualization: Track training progress and visualize results for better insights.
The environment is a LunarLander-v3 from Gymnasium.
This table provides a detailed description of the key features used in the lander environment, which are critical for controlling and observing the state of the lander during its descent and landing.
| Index | Feature | Description |
|---|---|---|
| 0 | Horizontal pad coordinate (x) | Horizontal position of the lander (x-axis) |
| 1 | Vertical pad coordinate (y) | Vertical position of the lander (y-axis) |
| 2 | Horizontal speed (x) | Speed of the lander along the x-axis |
| 3 | Vertical speed (y) | Speed of the lander along the y-axis |
| 4 | Angle | Rotation angle of the lander |
| 5 | Angular speed | Rotational speed of the lander |
| 6 | If the left leg contact point has touched the land (boolean) | Whether the left leg has made contact (True/False) |
| 7 | If the right leg contact point has touched the land (boolean) | Whether the right leg has made contact (True/False) |
This table outlines the possible actions the agent (who is learning) can take during the descent and landing of the ship.
| Index | Action | Description |
|---|---|---|
| 0 | Do nothing | The lander remains idle |
| 1 | Fire left orientation engine | Rotates the lander to the right |
| 2 | Fire the main engine | Fires the main engine to propel upwards |
| 3 | Fire right orientation engine | Rotates the lander to the left |
The recordings directory contains the videos of the agent's performance in the training process. The recordings happen every 100 episodes.
To observe the effects of the buffer reply, the usage of the buffer reply was modified with another training iteration heavily dependent on the buffer. This hindered the learning process and the agent was unable to learn anything stable. If you are interested in observing this check out the recordings directory and compare the "regular" training with the "buffer" training.
To run the DQN.py file, create a virtual environment and install the dependencies. Here is the command:
python -m venv venv
source venv/bin/activate
pip install "gymnasium[all]" torch numpyIf you would like to run the PPO model instead of the DQN model, you should also install the stable-baselines3 library:
python -m venv venv
source venv/bin/activate
pip install "gymnasium[all]" torch numpy stable-baselines3
