PyCharm

Studio Shenanigans #3 – A Beginners Guide to Machine Learning using Python and OpenAI Gym

This week I thought it might be quite nice to have a look at one of the machine learning (ML) frameworks that I am using for one of my AI projects that I’m currently working on as part of my university master’s degree.

I am by no means an expert of the field, but I’d like to take a look at how you can quickly get setup with ML in Python to train your own agents.

Prerequisites

Before we can get started there are a number of preisequites that we need to take care of.

First off you’re going to have to get yourself a copy of python, I recommend getting the version below the latest 64-bit version of Python (which is Python 3.9 64-bit at the time of writing).

This is because it can take a little bit of time for some of the changes to filter through to some of the libraries that we require and could save a little bit of time trying to figure out what’s going on. Furthermore, when installing Python it generally makes life a bit easier in the long run to ensure that install to PATH is checked during installation, this will allow us to run Python directly on the command line (cmd or powershell).

Now you have python you’re gonna want to get yourself an integrated development environment (IDE), don’t worry its basically just a fancy name for a development text editor (a bit like a very clever Notepad). I recommend using JetBarin’s PyCharm Community Edition (CE), this is what we will be using throughout this post, although you can use Visual Code, Atom or IDLE (IDLE is installed with python by default, but is very basic).

Once PyCharm (or the IDEA of your choice) is installed) we start looking at getting our environment configured for ML using OpenAi Gym and Stable_baseline3 (we’ll discuss these in more detail later).

Upon opening PyCharm for the first time you will be prompted to create a new project, select the location for the project and under Python Interpreter make sure new environment is selected using Virtualenv and ensure you select the 64-bit version of python as the Base Interpreter (it will be the only version if that’s all you installed). and go ahead and hit create.

Now we can go and install the required packages for OpenAi Gym. We’ll start with PyTorch as this has a little bit of configuration itself. First off head to the PyTorch Website and scroll down to Install PyTorch, for this we want to select the latest stable build (1.10.2 at time of writing), select your Operating System (OS), you want to use the pip package (though you can use conda if installed), language Python and finally we can select the Compute Platform.

Unfortunately, PyTorch only supports Nvidia graphics cards (to the best of my knowledge, at the time of writing). Therefore, if you don’t have an Nvidia graphics card you’ll need to select CPU for the compute platform, otherwise, select the CUDA version that best matches the version installed on your machine. (To find out which version of CUDA you have installed open cmd or Powershell and enter nvidia-smi it will be printed in the top-right of the output). Now copy the Run command and paste it into the terminal in PyCharm (at the bottom).

(This will be pip3 install torch torchvision torchaudio for non-Nvidia users or

pip3 install torch==1.10.2+cu113 torchvision==0.11.3+cu113 torchaudio===0.10.2+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

for uses with the latest version of CUDA installed.)

It may take a few minutes for PyTorch to install but once it is complete we can install the other two packages. To install OpenAI Gym simply enter pip install gym into the terminal of PyCharm, followed by pip install stable-baseline3 to install stable baseline3. Once that is completed we are done with configuring our python environment for OpenAi Gym.

What is OpenAi Gym and Stable Baseline 3?

Well I’m glad you asked, OpenAi Gym is “A toolkit for developing and comparing reinforcement learning algorithms” -OpenAI Gym. Basically it is a framework that implements the basic methods required to train different ML Models (algorithms), with further methods to tick and render the ML environment. Furthermore OpenAi Gym contains a wealth of example projects for learning with included many classic Atari games, which can be used to train agents. On the other hand, stable baseline3 is a set of reliable reinforcement learning algorithms which can be trained and once training is completed make predictions based on the current state of the environment.

We don’t really need to care about PyTorch, but just know that it is a dependence of stable baseline3 library.

Different types of learning

While I don’t want to get into too much detail, regarding different learning types I’ll quickly cover the basics. You may be asking what the hell is reinforcement learning (mentioned above). Basically it’s a bit like training a pet, if it does something good you give it a reward (or treat) or it does something undesired, so you punish it (don’t worry this is AI, it doesn’t have feelings :DDD ). This process of reward/punishment happens every time the environment updates (or ticks), which in turn tunes a bunch of params based on the current observations of the environment and actions that can be performed. On the other hand there is supervised and unsupervised learning. These learning methods are generally used to classify data into groups, for example, classifying images of cats and dogs. The main difference between the two, is during the supervised learning training process you tell it (or tag it) this is a dog, this is a cat…. While unsupervised learning will attempt to lean patterns from untagged datasets, which is useful from more ambiguous data. If you would like to explore supervised/unsupervised classification more, I recommend taking a look at the scikit-learn python library. There are other types of ML, however these are the three that I see most commonly.

Hello World in OpenAi Gym

OpenAi Gym comes with many examples environments for learning how to use reinforcement learning algorithms, including MountainCar, Acrobot and some Atari games such as Space Invaders. However for our Hello World project, we’ll have a look at the classic Cart Pole problem. The aim of Cart Pole is to try and balance the pole above the cart for as long as possible by moving the cart left and right.

So let’s walk through the implementation step by step and I’ll explain what we are doing as we go along. First of all in PyCharm create a new python script (right click in project (left panel) -> New -> python file) or delete the contents of the default Python script created by PyCharm. Now we want to include the packages required for OpenAi Gym and the ML algorithms.


import gym
from stable_baselines3 import DQN

Adding the above code to the top of the python script will import the OpenAi Gym module (import gym) and Deep Q Network (DQN) ML algorythm (from stable_baseline3 import DQN) into the project. Now we can go ahead and create our Gym environment by adding the following line


environment = gym.make("CartPole-v0")

Now we are able to run the script for the first time. In PyCharm you can press the play button in the top right of the main window (assuming you used the default script that PyCharm creates, otherwise click on the dropdown labelled “main“ -> “Edit configuration“ and change the “script path“ to match the script you are working on). You should notice that nothing happens, however you should also notice that the application does not exit, so this means something is happening. The reason we don’t see anything is because we have not told the environment to render the output to screen yet, but before we can do that we could do with training our agent, so let’s have a look at how we can do that next.


model = DQN("MlpPolicy", environment, verbose=1)

model.learn( total_timesteps=100_000 )

model.save("models/dqn_cartpole")

So what does this do,

model = DQN("MlpPolicy", environment, verbose=1)

CartPole

MlpPolicy

environment

verbose=1

=1

model.learn( total_timesteps=100_000 )

total_timesteps

_

total_timesteps

model.save("models/dqn_cartpole")

C:/user/username/python Projects/My First ML/models/dgn_cartpole

Now if we run the script again we should notice that it starts printing an output to the development console of PyCharm (at the bottom). The output is just the statistics of the ML Model, it just lets us know that it is trying to learn something. if we leave this running for a few minutes you’ll notice that the application will just exit (and create a new save file). If you don’t want to wait you can either press the stop button or click in the development console and press ctrl+c. Again though we still don’t see our environment (or game) yet, this is because rendering is usually the slow part of running a game, and we want to train our agents as fast as possible. The only time you will need to render the game during training is if you wanted to use machine vision to play the game, but that’s another post for another day.

Once the agent has finished training, we are able to let it play the game by itself (at least to some degree). This is where we can actually render the environment and see how well the agent has done. First of all we need to define a couple of variables


episodes = 100
update_steps = 1000

The first one episodes is the amount of times the agent can reset the environment, upon death or completion of the scene. In this case we have set it to 100. The second one update_steps is the maximum number of updates (or ticks) before the scene is automatically reset. Now it’s time to render the environment. This is probably the trickiest part, but don’t worry we go through it line by line.




for episode in range(episodes):



  for step in range(update_steps):

    action, _state = model.predict( observation, deterministic=True )

    observation, _reward, done, info = environment.step( action )

  environment.render()



  if done:

    print( f"Ended environment after {step} steps" )

    print(f"last info: {info}")

    observation = environment.reset()

    break

Starting from the top

for episode in range(episodes):
for step in range(update_steps):

update_steps

episodes * update_steps

action, _state = model.predict( observation, deterministic=True )

_

action = model.predict( observation, deterministic=True )[0]

observation, _reward, done, info = environment.step( action )

observation is the state of the environment at the end of the tick.
_rewardis the reward that the agent would have received if we were training.
done has agent finished (ie dead or completed the scene)
infoany debug info generated during the tick.

environment.render()
if done:
the next two lines just prints a message to the output console the first print the amounts of ticks that the agent survived while the second print the last frames debug info.
observation = environment.reset()

And that’s pretty much it! Now if you hit the play button in PyCharm, it will start the learning process, once it completes the environment windows should appear with the agent attempting to balance the pole for as long as it can!
The only thing i haven’t told you yet is how to load in your saved ML model.
To do that replace




model = DQN("MlpPolicy", environment, verbose=1 )

model.learn( total_timesteps=100_000 )

model.save("models/dqn_cartpole")

with




model = DQN.load(f"models/dqn_cartpole")

Using Different ML Models

In this example we have used the Deep Q Network as our ML model, however there are several available in stable_baseline3. Another model you might want to give a go is the Proximal Policy Optimization (PPO), you can include this in the project by adding PPO to the from stable_baselines3 import DQN line. So,


from stable_baselines3 import DQN, PPO

and replace the references to DQN with PPO, it’s really that simple.

Conclusion

In this post we have had a look at how you can configure your system and run a simple example ML environment using OpenAi Gym. We have gone through the process of creating a simple hello world application in OpenAi Gym step by step. Going from zero to hero in under 30 lines of code! However there is a hell of a lot more that can be done with the ML models and OpenAi Gym. This post was designed to only wet your whistle in the ML space and I hope you found this post useful and experiment more with ML in the future! Maybe next time we’ll look at how you can implement your own game into OpenAi Gym using PyGame.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.