Skip to content

redradman/Autonomous-lunar-landing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Autonomous-lunar-landing ๐Ÿš€

Autonomous-lunar-landing is a reinforcement learning project that leverages Deep Q-Network (DQN) algorithm to train an autonomous agent to land a ship safely without collision. The agent must learn to land the ship in the designated spot without crashing into the two yellow flags. The agent learns how to do this using the rewards (numeric feedback) it receives for its actions.

This is a project is a reimplementation of the original DQN algorithm from scratch based on paper.

Table of Contents

Results

Before Training After Training
Episode 0
The ship simply collapses since the agent has not interacted with the environment and not learned anything yet
Epsiode 500
The agent has learned to slowly land the ship between the yellow flags

Note: Episode refers to training iteration in the context of this project and Reinforcement Learning.

Features

  • Deep Q-Network (DQN) Implementation: Customizable DQN target network for stable learning. In this project,DQN algorithm has been implemented from scratch without any dependencies from the original paper (look at algorithm 1 in the paper for the pseudocode).
  • Advanced Replay Buffer Management: Enhanced buffer training frequency to optimize learning rates. Replay buffer is used to store the experiences of the agent during training.
  • Comprehensive Documentation: Detailed README with environment setup, feature descriptions, and usage guidelines.
  • Performance Logging and Visualization: Track training progress and visualize results for better insights.

Environment

The environment is a LunarLander-v3 from Gymnasium.

What the learning agent can see

This table provides a detailed description of the key features used in the lander environment, which are critical for controlling and observing the state of the lander during its descent and landing.

Index Feature Description
0 Horizontal pad coordinate (x) Horizontal position of the lander (x-axis)
1 Vertical pad coordinate (y) Vertical position of the lander (y-axis)
2 Horizontal speed (x) Speed of the lander along the x-axis
3 Vertical speed (y) Speed of the lander along the y-axis
4 Angle Rotation angle of the lander
5 Angular speed Rotational speed of the lander
6 If the left leg contact point has touched the land (boolean) Whether the left leg has made contact (True/False)
7 If the right leg contact point has touched the land (boolean) Whether the right leg has made contact (True/False)

What the agent can do

This table outlines the possible actions the agent (who is learning) can take during the descent and landing of the ship.

Index Action Description
0 Do nothing The lander remains idle
1 Fire left orientation engine Rotates the lander to the right
2 Fire the main engine Fires the main engine to propel upwards
3 Fire right orientation engine Rotates the lander to the left

Experimentation

The recordings directory contains the videos of the agent's performance in the training process. The recordings happen every 100 episodes.

To observe the effects of the buffer reply, the usage of the buffer reply was modified with another training iteration heavily dependent on the buffer. This hindered the learning process and the agent was unable to learn anything stable. If you are interested in observing this check out the recordings directory and compare the "regular" training with the "buffer" training.

Installation

To run the DQN.py file, create a virtual environment and install the dependencies. Here is the command:

python -m venv venv
source venv/bin/activate
pip install "gymnasium[all]" torch numpy

If you would like to run the PPO model instead of the DQN model, you should also install the stable-baselines3 library:

python -m venv venv
source venv/bin/activate
pip install "gymnasium[all]" torch numpy stable-baselines3

About

๐Ÿ›ธ Training an AI (from scratch) to land spacecraft autonomously with Deep Q-Networks ๐ŸŽฎ

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages