NAV Navbar

Problem Overview

RL Overview Image

In this tutorial you will learn about reinforcement learning (RL), and using RL to solve a simple simulated problem using Inkling and the Bonsai Platform. This will include learning the essential components of the Inkling language needed to connect the simulation to the platform. You will familiarize yourself with the simulation, test it, then fill in the necessary Inkling schemas (states and actions) needed to connect the simulation to the platform. By the end of this tutorial, you can expect to be comfortable with the relationship between a BRAIN and a simulation, as well as how to train a BRAIN inside the Bonsai Platform using custom Inkling code, and basic knowledge of Bonsai’s Command Line Interface (CLI).

We expect you to already have read the Quick Start tutorial overview and install the CLI before following this tutorial so you have a general understanding of how the platform works and have installed and configured the Bonsai CLI. Please do so before continuing with this tutorial.

The problem we are trying to solve using this simulation is to teach an agent (the Bonsai BRAIN) to move to a target point in a simple planar world. The agent is told where it is relative to the goal, and has to decide which way to move. The agent sends its chosen action to the simulation, which simulates the step, moving the agent to a new location, and this repeats until the agent makes it to the goal, or time runs out.

First, we’ll cover a bit about reinforcement learning for those who are unfamiliar with this style of machine learning.

How Reinforcement Learning Works

Visualization of one iteration on the Bonsai Platform Iteration visualization

Reinforcement learning (RL), illustrated here, is a machine learning technique for controlling or optimizing a system or process. In RL, an agent takes actions in an environment, getting feedback in the form of reward. The agent explores different action strategies or policies and learns to maximize cumulative reward. One cycle from the environment to the agent and back is called an iteration.

Terminology

This table describes the key terms used in RL:

state The state of the environment at each iteration (ex: current agent position relative to the target). The agent uses this to decide what action to select.
action Actions define what the agent can do at each iteration (ex: which direction to move in). The goal of RL is to learn to select the right series of actions to achieve an objective.
reward The reward at each iteration gives the agent feedback that helps it learn (ex: higher reward when moving toward the target). An important note is that the agent’s goal is to maximize cumulative future reward, not the instantaneous reward at each iteration.
iteration An iteration is one state → action → reward → new-state transition in the environment. The agent uses the state to select an action, which causes the environment to transition to a new state, and results in a reward.
episode An episode is a series of iterations, starting in some initial state and ending when the environment hits a termination condition and the environment resets for the next episode. (ex: episode is terminated either when agent reaches the target (success) or time runs out (failure)) Episodes can vary in length, with termination conditions typically defined based on succeeding at the task, failing at the task, getting too far from success, or running out of time.
cumulative reward Cumulative reward is the sum of the per-iteration rewards over an episode. The agent’s goal is to select actions that maximize cumulative future reward through the end of the episode.
terminal condition Terminal conditions specify when to end an episode.
policy A policy determines what action the agent selects for every possible state. The goal of RL is to learn a policy that maximizes cumulative future reward.

To learn more about reinforcement learning, watch our training video about types of machine learning.

Create a BRAIN

Now that you’ve gotten a taste of Reinforcement Learning, let us begin by downloading the code for this tutorial and creating a new BRAIN on the Bonsai Platform for training.

As mentioned in the introduction, you’ll need to have the Bonsai CLI installed before you can run these commands. If you haven’t already done so, please Install the CLI.

git clone https://github.com/BonsaiAI/bonsai-tutorials.git
cd bonsai-tutorials/tutorial1

If you don’t have git installed on your computer you can download a .zip file of the materials instead.

# The name of your BRAIN will be move-a-point
bonsai create move-a-point

Once you have either git cloned or downloaded the files for this tutorial, navigate on your command prompt inside of the tutorial1 folder and run bonsai create move-a-point which will create a new BRAIN for your account called move-a-point. This command also silently uploads your project files to the server so it’s important to use this command within the tutorial1 folder where your Inkling and simulation files are.

Simulation Overview

The simulation we will be running in this tutorial is called Move a Point. Start by skimming over move_a_point_sim.py which is found within the tutorial1 folder of bonsai-tutorials. The simulation, in the PointSimulation class, models the problem described in the introduction – in each episode (started by a call to reset()), an agent starts in a location (current), and has to get within PRECISION of a target point target. At each step of the simulation, the agent moves a distance STEP_SIZE in a specified direction.

The simulation ends (i.e., game_over() returns True) when the agent is close enough to the target.

The PointSimulation class is a simple example of a common integration pattern: some often pre-existing code simulates a process, and we write additional code to control the simulation, read out the state, and reset periodically. Next, let’s review how to connect a simulation to a BRAIN using the Bonsai SDK (Software Development Kit).

Using the Bonsai SDK to connect to the BRAIN

import bonsai_ai

class PointBonsaiBridge(bonsai_ai.Simulator):

The integration of the simulation with the Bonsai SDK includes three pieces. First, take a look at how we import the bonsai_ai module and derive the simulator class from bonsai_ai.Simulator.

Implementing the Simulator interface requires coding two functions, episode_start() and simulate().

Implementing the Simulator interface

def episode_start(self, parameters=None):
    """Set up the simulation for a new episode. Returns the initial state."""
    ...

def simulate(self, action):
    """Given an action, run one simulation iteration and return a tuple:
           (state, reward, is_terminal)""" 
    ...

The action parameter to simulate is a dictionary, with a key for each action variable defined in your Inkling code. In our case, this is action["direction_radians"]. The state returned is a dictionary with one key for each state variable defined in your Inkling. In our case, this is implemented by the code in _get_state(self).

In addition to the “game over” condition specified by the simulation, our SDK connection bridge adds a maximum number of steps (MAX_STEPS) per episode in _is_terminal(). Having a time limit like this helps ensure that episodes end even if the agent keeps moving in the wrong direction.

The episode_start and simulate functions together define the action loop, which can be written as shown in the code panel.

Action loop

state = sim.episode_start()
is_terminal = False
while not is_terminal:
    action = policy(state)   # decide what to do next. When connected to a BRAIN, the BRAIN chooses the action.
    (state, reward, is_terminal) = sim.simulate(action)
    # Learning from the new state and the obtained reward happens here

Finally, we need to instantiate the simulator class and run it.

Instantiate the Simulator class

if __name__ == "__main__":
    config = bonsai_ai.Config(sys.argv)
    brain = bonsai_ai.Brain(config)
    sim = PointBonsaiBridge(brain, "move_a_point_sim")
    while sim.run():
        continue

The configuration includes the BRAIN name and url, your Bonsai API key, and whether to train the BRAIN or run it in prediction mode (sometimes called test mode). The name passed to the PointBonsaiBridge constructor, "move_a_point_sim" here, identifies the simulation file to the BRAIN, and must match your Inkling code (described in a later section).

Run Your Simulator

To test the simulator, we can create a PointSimulator object and directly call episode_start() and simulate(), as described above. You can see an example of how to test the simulator before connecting it to the Bonsai Platform in test_simulator.ipynb (requires Jupyter Notebook), or in the test_sim.py Python script, both of which can be found within the tutorial1 folder.

Episode loop

def run_sim_episode(sim, policy):
    """
    Given a sim and a policy, step through some iterations 
    of the simulator, returning the history of states.

    Args:
        sim: a PointBonsaiBridge
        policy: a function (SimState -> action dictionary)
    """
    state_history = []
    reward_history = []
    state = sim.episode_start()
    state_history.append(state)

    is_terminal = False
    while not is_terminal:
        action = policy(state)
        (state, reward, is_terminal) = sim.simulate(action)
        state_history.append(state)
        reward_history.append(reward)

    return state_history, reward_history

The key code is the episode loop, shown in the code panel.

Then we can define some policies, defining what action to take for a given state. If you’d like to test with other silly policies or define your own, change line 56 in test_sim.py: states, rewards = run_sim_episode(point_sim, random_policy.

After that the code will run some episodes, plotting the results.

Define policies

# Some silly policies
def random_policy(state):
    """
    Ignore the state, move randomly.
    """
    return {'direction_radians': random.random() * 2 * math.pi}

def go_up_policy(state):
    return {'direction_radians': math.pi / 2.0}

Run episode and plot results

for i in range(3):
    states, rewards = run_sim_episode(point_sim, random_policy)
    plot_state_history(states)

When making or integrating simulations, it is always a good idea to run some sanity checks and verifications before starting BRAIN training.

Exercises

  • Run the simulator via the above code, either via test_sim.py or, if you have Jupyter Notebook, using test_simulator.ipynb, both in the tutorial1 folder.
  • Write two different policy functions for moving in different directions, and plot their behavior.

Connecting Your BRAIN

Adaptation of previous RL image to show the translation onto the Bonsai Platform Overview Image

To connect our simulation to a BRAIN in the Bonsai platform, we will need an Inkling program that describes the problem and how to teach the AI to solve it. Inkling is a programming language specifically designed for artificial intelligence (AI). It abstracts away the vast world of dynamic AI algorithms that require expertise in machine learning and enables more developers and subject matter experts to create AI.

The problem description includes schemas for the states the BRAIN will receive and the actions it will need to send. The description of how to teach the BRAIN includes the reward, as well as any decomposition of the problem into concepts and the sequencing of learning using lessons. This first tutorial will use a single concept and a single lesson.

Schemas

A schema in Inkling describes a named record and its contained fields. Each field in a schema has a name and a type. A field may also have a constraint on values that the data described by this field will take.

Exercises

  • Fill in the state schema
  • Fill in the action schema
Fill in the state schema
schema GameState
    # EXERCISE: Add a name (and type) for each variable of x and y position.
    # These have to match the dictionary returned by _get_state() in our simulator.
    # <Your code goes here>
end
schema GameState
    # X and Y direction of the point. These names (and types) have to match the
    # dictionary returned by _get_state() our simulator.
    Float32 dx,
    Float32 dy
end

The state schema describes the state of the environment, and must match the state values returned from the simulator. In this case, the state consists of two floating point numbers, named dx and dy. Now, fill in the state schema (refer to the Inkling reference if you’re not sure how).

Fill in the action schema
schema PlayerMove
    # EXERCISE: This name (and type) has to match the parameter to advance() in
    # our simulator. You need to specify the range and step size for the action.
    # <Your code goes here>
end
schema PlayerMove
    # This name (and type) has to match the parameter to advance() in our
    # simulator. We specify the range and step size for the action.
    Float32{0:1.575:6.283} direction_radians  # a constraint {0,1.575,3.142, 4.712} would also work
end

Next, let’s fill in the action schema. The action in our problem corresponds to picking a direction, specified as a floating point number in radians, and named direction_radians. In this case, we want the system to pick from the four cardinal directions: 0, pi/2, pi, 3*pi/2. Use the Inkling reference to look up the syntax for schema constraints, and fill in the action schema (bonus: there are at least two ways to do it).

Concepts

concept find_the_target
    is classifier      # We're picking one of a few options
    predicts (PlayerMove)
    follows input(GameState)
    feeds output
end

A concept in Inkling defines what you are going to teach the AI. By declaring a concept, you are instructing the BRAIN server that this a part of the BRAIN’s mental model that must be learned. Consequently, concept nodes must have corresponding curricula to teach them.

In this simulation we are asking the agent to learn the concept find_the_target by choosing a direction (predicts (PlayerMove)) after seeing the current state (follows input(GameState)) of the simulation.

For more information about using the concept keyword, refer to the Concept Reference.

Curriculum and Lessons

curriculum learn_curriculum
    # this is the name of our concept from above
    train find_the_target
    # this is our simulator name
    with simulator move_a_point_sim
    # This is the name of our objective function in the simulator
    objective reward_shaped
        lesson get_close 
            configure
                constrain dummy with Int8{-1}
            until
                # This is again the name of the objective function
                maximize reward_shaped
end

A curriculum in Inkling is used to define what and how to teach a concept. Each concept needs a corresponding curriculum to teach it. A lesson is part of a curriculum; it teaches a specific piece of the concept by specifying training and testing parameters and a stopping point (until). Lessons enable the BRAIN to learn the concept bit-by-bit instead of all at once (using multiple lessons will be covered in a future tutorial). Lessons are contained within curriculum statements. Each curriculum must contain at least one lesson.

A curriculum contains information the Bonsai AI Engine uses to train your BRAIN on the concept you’ve specified. They also specify the reward function (objective) for teaching a given concept. The reward function is a way the system concretely measures the AI’s performance as it learns the concept. For this tutorial we will only using one concept, and therefore, one curriculum.

Every lesson must have a configuration. In this case, we have a simple example that doesn’t need a configuration, so the config schema is named dummy with a default value of -1 which in this case does nothing to configure the lesson.

For more information about using these keywords, refer to the Curriculums Reference and Lessons Reference.

The Inkling/Simulation Relationship

Your Inkling code and your simulation are tightly coupled – through the SDK bridge class. The Inkling describes what to expect from the simulation as state, and what actions and configurations to send to the simulation. The bridge class does any conversion needed to make this match the simulation. This section describes how the different parts of your Inkling program and your simulation relate to each other through the bridge. The colors in the table below indicate which parts are connected.

Inkling/Simulator Graphic

Color Description
Purple (dark/light) The Inkling state schema field names and types must match the state dictionaries returned from episode_start and simulate in the simulator.
Blue (dark/light) The Inkling action schema field names will match the keys in the action dictionary passed to simulate in the simulator, and the values will have the types specified in Inkling, and will obey the specified constraints ({0:1.575:6.283} in the example).
Orange (dark/light) The simulator’s configuration passes as parameters to the episode_start, and will take values from the constrain clause in Inkling (if used).
Red The name of the concept must match the train clause in the curriculum for that concept.
Green The simulator name must match between the simulator clause and the with simulator clause in the curriculum. The simulator must pass the same name to the constructor of the Simulator class, so the AI engine knows which simulator is connected.
Turquoise The name of the optimization objective or reward function appears twice in the Inkling, and is available as self.objective_name in the simulator.

Train Your BRAIN

Now that you’ve written a curriculum of machine teaching (through your Inkling code) to connect your simulation to the Bonsai AI Engine, it’s time to prepare a new version for training. Since we previously created a new BRAIN for training, the --brain argument in the below commands is optional, but we’ve left it in to remind you which BRAIN the CLI is targeting.

# Push the edited files to the server
bonsai push

# Start a new BRAIN version
bonsai train start

Use bonsai push to upload your edited Inkling file to the server whenever you make changes (make sure you filled in the action and state schemas first or you will get an error!).

Python 2

python move_a_point_sim.py --brain=move-a-point

Python 3

python3 move_a_point_sim.py --brain=move-a-point

Once you have started training mode with bonsai train start it’s time to start your simulation. Training will begin automatically after you connect your simulator.

View your BRAIN training status

Training BRAIN

View your BRAIN’s training status as it trains on the simulator by going to the BRAIN’s Dashboard page on beta.bons.ai. Training move-a-point takes about a minute to get sufficient training to find the goal quickly.

There is no automatic ending to training, you can train this brain for hours, but there will be diminishing returns after a few minutes because of how simple of a problem this is to solve. You should wait until the reward approaches and stabilizes around 18 (with the max being 20 if the AI is perfect every episode). This should only take about a minute.

Stop Training

bonsai train stop

Once the BRAIN has gotten to this level of performance (or sooner if you prefer), CTRL-C to disconnect the simulator, then bonsai train stop will end the training, and proceed to prediction.

Predict with Your BRAIN

Python 2

python move_a_point_sim.py --predict=latest

Python 3

python3 move_a_point_sim.py --predict=latest

After your BRAIN is finished training you can use it to move to a point as quickly as it can. How well it does depends on how long you let it train! Using your BRAIN involves starting your simulation, but now in prediction mode with --predict=latest which will use the version of the latest training session that you just ran.

Predicting BRAIN

Now you can see how fast the simulation can move to a point depending on how far away from the point it started. The higher the distance the more steps the agent will likely take to reach the point.

And that’s it! You have now successfully learned how to test out a simulation, write your own schemas to connect a BRAIN, train, and predict from that BRAIN!

Next Steps

Next tutorial coming soon! In the meantime you can check out our Programming Machine Teaching guide for more detailed information on the topics we have covered in this tutorial.

And we have these other resources that will enable you to maximize your AI development experience: