This week I thought it might be quite nice to have a look at one of the machine learning (ML) frameworks that I am using for one of my AI projects that I’m currently working on as part of my university master’s degree.
I am by no means an expert of the field, but I’d like to take a look at how you can quickly get setup with ML in Python to train your own agents.
Python 3.9 64-bit
at the time of writing).
install to PATH
is checked during installation, this will allow us to run Python directly on the command line (cmd
or powershell
).
Python Interpreter
make sure new environment
is selected using Virtualenv
and ensure you select the 64-bit version
of python as the Base Interpreter
(it will be the only version if that’s all you installed). and go ahead and hit create.
Install PyTorch
, for this we want to select the latest stable build (1.10.2 at time of writing), select your Operating System (OS), you want to use the pip package
(though you can use conda if installed), language Python
and finally we can select the Compute Platform.
cmd
or Powershell
and enter nvidia-smi
it will be printed in the top-right of the output). Now copy the Run command and paste it into the terminal in PyCharm (at the bottom).
pip3 install torch torchvision torchaudio
for non-Nvidia users or pip3 install torch==1.10.2+cu113 torchvision==0.11.3+cu113 torchaudio===0.10.2+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
for uses with the latest version of CUDA installed.)
pip install gym
into the terminal of PyCharm, followed by pip install stable-baseline3
to install stable baseline3. Once that is completed we are done with configuring our python environment for OpenAi Gym. reinforcement learning
(mentioned above). Basically it’s a bit like training a pet, if it does something good you give it a reward (or treat) or it does something undesired, so you punish it (don’t worry this is AI, it doesn’t have feelings :DDD ). This process of reward/punishment happens every time the environment updates (or ticks), which in turn tunes a bunch of params based on the current observations of the environment and actions that can be performed.
On the other hand there is supervised and unsupervised learning. These learning methods are generally used to classify data into groups, for example, classifying images of cats and dogs. The main difference between the two, is during the supervised learning training process you tell it (or tag it) this is a dog
, this is a cat
…. While unsupervised learning will attempt to lean patterns from untagged datasets, which is useful from more ambiguous data. If you would like to explore supervised/unsupervised classification more, I recommend taking a look at the scikit-learn python library.
There are other types of ML, however these are the three that I see most commonly. MountainCar
, Acrobot
and some Atari games such as Space Invaders
. However for our Hello World project, we’ll have a look at the classic Cart Pole
problem. The aim of Cart Pole is to try and balance the pole above the cart for as long as possible by moving the cart left and right.
So let’s walk through the implementation step by step and I’ll explain what we are doing as we go along. First of all in PyCharm create a new python script (
right click in project (left panel) -> New -> python file
) or delete the contents of the default Python script created by PyCharm.
Now we want to include the packages required for OpenAi Gym and the ML algorithms.
import gym
from stable_baselines3 import DQN
Adding the above code to the top of the python script will import the OpenAi Gym module (import gym
) and Deep Q Network (DQN) ML algorythm (from stable_baseline3 import DQN) into the project.
Now we can go ahead and create our Gym environment by adding the following line
environment = gym.make("CartPole-v0")
Now we are able to run the script for the first time. In PyCharm you can press the play button in the top right of the main window (assuming you used the default script that PyCharm creates, otherwise click on the dropdown labelled “main“ -> “Edit configuration“ and change the “script path“ to match the script you are working on).
You should notice that nothing happens, however you should also notice that the application does not exit, so this means something is happening. The reason we don’t see anything is because we have not told the environment to render the output to screen yet, but before we can do that we could do with training our agent, so let’s have a look at how we can do that next.
model = DQN("MlpPolicy", environment, verbose=1)
model.learn( total_timesteps=100_000 )
model.save("models/dqn_cartpole")
So what does this do,
model = DQN("MlpPolicy", environment, verbose=1)
CartPole
environment. As we can see, it contains three parameters of which the first two are required.
MlpPolicy
is the learning policy, environment
is the OpenAI Gym environment that we want to train our agent in. The third parameter verbose=1
is optional and when it is set to 1 (ie =1
) it prints the learning statistics to the development console, while setting it to 0 will print nothing. We’ll set it to 1 so we know something is happening.
model.learn( total_timesteps=100_000 )
total_timesteps
this is the number of timesteps (or updates/ticks) that the ML model will train for. We’ll set this to 100,000 for now (the _
is just a way to space out numbers in python which I think is a nice feature of the langerage) and you can play around with this parameter in your own time.
What happens if you increase or decrease total_timesteps
?
Lastly we have,
model.save("models/dqn_cartpole")
C:/user/username/python Projects/My First ML/models/dgn_cartpole
ctrl+c
. Again though we still don’t see our environment (or game) yet, this is because rendering is usually the slow part of running a game, and we want to train our agents as fast as possible. The only time you will need to render the game during training is if you wanted to use machine vision to play the game, but that’s another post for another day.
episodes = 100
update_steps = 1000
The first one episodes
is the amount of times the agent can reset the environment, upon death or completion of the scene. In this case we have set it to 100
.
The second one update_steps
is the maximum number of updates (or ticks) before the scene is automatically reset. Now it’s time to render the environment. This is probably the trickiest part, but don’t worry we go through it line by line.
for episode in range(episodes):
for step in range(update_steps):
action, _state = model.predict( observation, deterministic=True )
observation, _reward, done, info = environment.step( action )
environment.render()
if done:
print( f"Ended environment after {step} steps" )
print(f"last info: {info}")
observation = environment.reset()
break
Starting from the top
for episode in range(episodes):
for step in range(update_steps):
update_steps
amount of time.
This basically means that we are going to tick the environment upto episodes * update_steps
(or 100,000 with our configuration) amount of times (assuming it does not reset early).
action, _state = model.predict( observation, deterministic=True )
_
) in front of the state to show that we are discarding this value. Alternatively we could do action = model.predict( observation, deterministic=True )[0]
witch means we only want the first value.
observation, _reward, done, info = environment.step( action )
observation
is the state of the environment at the end of the tick._reward
is the reward that the agent would have received if we were training.
done
has agent finished (ie dead or completed the scene)info
any debug info generated during the tick.environment.render()
if done:
observation = environment.reset()
model = DQN("MlpPolicy", environment, verbose=1 )
model.learn( total_timesteps=100_000 )
model.save("models/dqn_cartpole")
with
model = DQN.load(f"models/dqn_cartpole")
Deep Q Network
as our ML model, however there are several available in stable_baseline3
. Another model you might want to give a go is the Proximal Policy Optimization
(PPO), you can include this in the project by adding PPO to the from stable_baselines3 import DQN
line. So,
from stable_baselines3 import DQN, PPO
and replace the references to DQN with PPO, it’s really that simple. In this post we have had a look at how you can configure your system and run a simple example ML environment using OpenAi Gym. We have gone through the process of creating a simple hello world application in OpenAi Gym step by step. Going from zero to hero in under 30 lines of code! However there is a hell of a lot more that can be done with the ML models and OpenAi Gym. This post was designed to only wet your whistle in the ML space and I hope you found this post useful and experiment more with ML in the future! Maybe next time we’ll look at how you can implement your own game into OpenAi Gym using PyGame.
All Rights Reserved. Designed & Developed by Studio 316
Cookie | Duration | Description |
---|---|---|
cookielawinfo-checkbox-analytics | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics". |
cookielawinfo-checkbox-functional | 11 months | The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional". |
cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-others | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other. |
cookielawinfo-checkbox-performance | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance". |
viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |