Python

Studio Shenanigans #3 – A Beginners Guide to Machine Learning using Python and OpenAI Gym

This week I thought it might be quite nice to have a look at one of the machine learning (ML) frameworks that I am using for one of my AI projects that I’m currently working on as part of my university master’s degree.

I am by no means an expert of the field, but I’d like to take a look at how you can quickly get setup with ML in Python to train your own agents. 

Prerequisites
Before we can get started there are a number of preisequites that we need to take care of.

 First off you’re going to have to get yourself a copy of python, I recommend getting the version below the latest 64-bit version of Python (which is Python 3.9 64-bit at the time of writing).

This is because it can take a little bit of time for some of the changes to filter through to some of the libraries that we require and could save a little bit of time trying to figure out what’s going on. Furthermore, when installing Python it generally makes life a bit easier in the long run to ensure that install to PATH is checked during installation, this will allow us to run Python directly on the command line (cmd or powershell).

 Now you have python you’re gonna want to get yourself an integrated development environment (IDE), don’t worry its basically just a fancy name for a development text editor (a bit like a very clever Notepad). I recommend using JetBarin’s PyCharm Community Edition (CE), this is what we will be using throughout this post, although you can use Visual Code, Atom or IDLE (IDLE is installed with python by default, but is very basic).

 Once PyCharm (or the IDEA of your choice) is installed) we start looking at getting our environment configured for ML using OpenAi Gym and Stable_baseline3 (we’ll discuss these in more detail later).

Upon opening PyCharm for the first time you will be prompted to create a new project, select the location for the project and under Python Interpreter make sure new environment is selected using Virtualenv and ensure you select the 64-bit version of python as the Base Interpreter (it will be the only version if that’s all you installed). and go ahead and hit create.

 Now we can go and install the required packages for OpenAi Gym. We’ll start with PyTorch as this has a little bit of configuration itself. First off head to the PyTorch Website and scroll down to Install PyTorch, for this we want to select the latest stable build (1.10.2 at time of writing), select your Operating System (OS), you want to use the pip package (though you can use conda if installed), language Python and finally we can select the Compute Platform.

Unfortunately, PyTorch only supports Nvidia graphics cards (to the best of my knowledge, at the time of writing). Therefore, if you don’t have an Nvidia graphics card you’ll need to select CPU for the compute platform, otherwise, select the CUDA version that best matches the version installed on your machine. (To find out which version of CUDA you have installed open cmd or Powershell and enter nvidia-smi it will be printed in the top-right of the output). Now copy the Run command and paste it into the terminal in PyCharm (at the bottom).

 (This will be pip3 install torch torchvision torchaudio for non-Nvidia users or pip3 install torch==1.10.2+cu113 torchvision==0.11.3+cu113 torchaudio===0.10.2+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html for uses with the latest version of CUDA installed.)

It may take a few minutes for PyTorch to install but once it is complete we can install the other two packages. To install OpenAI Gym simply enter pip install gym into the terminal of PyCharm, followed by pip install stable-baseline3 to install stable baseline3. Once that is completed we are done with configuring our python environment for OpenAi Gym.
What is OpenAi Gym and Stable Baseline 3?
Well I’m glad you asked, OpenAi Gym is “A toolkit for developing and comparing reinforcement learning algorithms” -OpenAI Gym. Basically it is a framework that implements the basic methods required to train different ML Models (algorithms), with further methods to tick and render the ML environment. Furthermore OpenAi Gym contains a wealth of example projects for learning with included many classic Atari games, which can be used to train agents. On the other hand, stable baseline3 is a set of reliable reinforcement learning algorithms which can be trained and once training is completed make predictions based on the current state of the environment.

We don’t really need to care about PyTorch, but just know that it is a dependence of stable baseline3 library.
Different types of learning
While I don’t want to get into too much detail, regarding different learning types I’ll quickly cover the basics.  You may be asking what the hell is reinforcement learning (mentioned above). Basically it’s a bit like training a pet, if it does something good you give it a reward (or treat) or it does something undesired, so you punish it (don’t worry this is AI, it doesn’t have feelings :DDD ). This process of reward/punishment happens every time the environment updates (or ticks), which in turn tunes a bunch of params based on the current observations of the environment and actions that can be performed.  On the other hand there is supervised and unsupervised learning. These learning methods are generally used to classify data into groups, for example, classifying images of cats and dogs. The main difference between the two, is during the supervised learning training process you tell it (or tag it) this is a dog, this is a cat…. While unsupervised learning will attempt to lean patterns from untagged datasets, which is useful from more ambiguous data. If you would like to explore supervised/unsupervised classification more, I recommend taking a look at the scikit-learn python library. There are other types of ML, however these are the three that I see most commonly.
Hello World in OpenAi Gym
OpenAi Gym comes with many examples environments for learning how to use reinforcement learning algorithms, including MountainCar, Acrobot and some Atari games such as Space Invaders. However for our Hello World project, we’ll have a look at the classic Cart Pole problem. The aim of Cart Pole is to try and balance the pole above the cart for as long as possible by moving the cart left and right.

So let’s walk through the implementation step by step and I’ll explain what we are doing as we go along. First of all in PyCharm create a new python script (right click in project (left panel) -> New -> python file) or delete the contents of the default Python script created by PyCharm. Now we want to include the packages required for OpenAi Gym and the ML algorithms. import gym from stable_baselines3 import DQN Adding the above code to the top of the python script will import the OpenAi Gym module (import gym) and Deep Q Network (DQN) ML algorythm (from stable_baseline3 import DQN) into the project. Now we can go ahead and create our Gym environment by adding the following line environment = gym.make("CartPole-v0") Now we are able to run the script for the first time. In PyCharm you can press the play button in the top right of the main window (assuming you used the default script that PyCharm creates, otherwise click on the dropdown labelled “main“ -> “Edit configuration“ and change the “script path“ to match the script you are working on). You should notice that nothing happens, however you should also notice that the application does not exit, so this means something is happening. The reason we don’t see anything is because we have not told the environment to render the output to screen yet, but before we can do that we could do with training our agent, so let’s have a look at how we can do that next.   model = DQN("MlpPolicy", environment, verbose=1)
model.learn( total_timesteps=100_000 )
model.save("models/dqn_cartpole")
So what does this do,
  • model = DQN("MlpPolicy", environment, verbose=1)
  • This creates a new Deep Q Network ML model, in which we can be trained to play the CartPole environment. As we can see, it contains three parameters of which the first two are required. MlpPolicy is the learning policy, environment is the OpenAI Gym environment that we want to train our agent in. The third parameter verbose=1 is optional and when it is set to 1 (ie =1) it prints the learning statistics to the development console, while setting it to 0 will print nothing. We’ll set it to 1 so we know something is happening.
  • model.learn( total_timesteps=100_000 )
  • This tells our ML model to start learning our environment. It achieves this by exploring the environment through a series of trial and error actions based on the current observations of the environment. You’ll notice that we have included one optional parameter total_timesteps this is the number of timesteps (or updates/ticks) that the ML model will train for. We’ll set this to 100,000 for now (the _ is just a way to space out numbers in python which I think is a nice feature of the langerage) and you can play around with this parameter in your own time. What happens if you increase or decrease total_timesteps? Lastly we have,
  • model.save("models/dqn_cartpole")
  • This simply saves the ML model so we can load it back in at a later time. It contains a single parameter for the file that data should be saved in, relative to the running script. ie. C:/user/username/python Projects/My First ML/models/dgn_cartpole
Now if we run the script again we should notice that it starts printing an output to the development console of PyCharm (at the bottom). The output is just the statistics of the ML Model, it just lets us know that it is trying to learn something. if we leave this running for a few minutes you’ll notice that the application will just exit (and create a new save file). If you don’t want to wait you can either press the stop button or click in the development console and press ctrl+c. Again though we still don’t see our environment (or game) yet, this is because rendering is usually the slow part of running a game, and we want to train our agents as fast as possible. The only time you will need to render the game during training is if you wanted to use machine vision to play the game, but that’s another post for another day.

Once the agent has finished training, we are able to let it play the game by itself (at least to some degree). This is where we can actually render the environment and see how well the agent has done. First of all we need to define a couple of variables episodes = 100 update_steps = 1000 The first one episodes is the amount of times the agent can reset the environment, upon death or completion of the scene. In this case we have set it to 100. The second one update_steps is the maximum number of updates (or ticks) before the scene is automatically reset. Now it’s time to render the environment. This is probably the trickiest part, but don’t worry we go through it line by line.
for episode in range(episodes):

  for step in range(update_steps):
    action, _state = model.predict( observation, deterministic=True )
    observation, _reward, done, info = environment.step( action )
  environment.render()

  if done:
    print( f"Ended environment after {step} steps" )
    print(f"last info: {info}")
    observation = environment.reset()
    break

Starting from the top
  • for episode in range(episodes):
  • is a loop and we are basically saying run the following indented code “episodes“ amount of time. (Note that the indentation is very important in Python)
  • for step in range(update_steps):
  • we are doing the same as above but for update_steps amount of time. This basically means that we are going to tick the environment upto episodes * update_steps (or 100,000 with our configuration) amount of times (assuming it does not reset early).
  • action, _state = model.predict( observation, deterministic=True )
  • Here we are asking the ML model to predict the next action and state of the environment. However we only need the action, so we put an underscore (_) in front of the state to show that we are discarding this value. Alternatively we could do action = model.predict( observation, deterministic=True )[0] witch means we only want the first value.
  • observation, _reward, done, info = environment.step( action )
  • Next we update/tick the environment, witch returns four values.
    • observation is the state of the environment at the end of the tick.
    • _rewardis the reward that the agent would have received if we were training.
    • done has agent finished (ie dead or completed the scene)
    • infoany debug info generated during the tick.
  • environment.render()
  • this renders the frame to the output window (see i told yay we’ll get there soon :D)
  • if done:
  • here are asking if “done“ == “True“. If it is run the following indented code.
  • the next two lines just prints a message to the output console the first print the amounts of ticks that the agent survived while the second print the last frames debug info.
  • observation = environment.reset()
  • simply resets the scene back to its initial state, ready for the agent to make another attempt.
And that’s pretty much it! Now if you hit the play button in PyCharm, it will start the learning process, once it completes the environment windows should appear with the agent attempting to balance the pole for as long as it can!
The only thing i haven’t told you yet is how to load in your saved ML model.
To do that replace

model = DQN("MlpPolicy", environment, verbose=1 )
model.learn( total_timesteps=100_000 )
model.save("models/dqn_cartpole")

with
model = DQN.load(f"models/dqn_cartpole")

Using Different ML Models
In this example we have used the Deep Q Network as our ML model, however there are several available in stable_baseline3. Another model you might want to give a go is the Proximal Policy Optimization (PPO), you can include this in the project by adding PPO to the from stable_baselines3 import DQN line. So, from stable_baselines3 import DQN, PPO and replace the references to DQN with PPO, it’s really that simple.
Further Reading and Resources
If you would like to read more about stable_baseline3 and the models that it implements I highly recommend reading some of their documentation. Furthermore there are a whole bunch of parameters that can be modified to effect how the agent learns which we have not had time to cover in this post.   stable baseline3 DQN
stable baseline3 PPO
OpenAI Gym
Conclusion

In this post we have had a look at how you can configure your system and run a simple example ML environment using OpenAi Gym. We have gone through the process of creating a simple hello world application in OpenAi Gym step by step. Going from zero to hero in under 30 lines of code! However there is a hell of a lot more that can be done with the ML models and OpenAi Gym. This post was designed to only wet your whistle in the ML space and I hope you found this post useful and experiment more with ML in the future! Maybe next time we’ll look at how you can implement your own game into OpenAi Gym using PyGame.

Scroll to top