NAV Navbar

Problem Overview

RL Overview Image

In this tutorial you will learn about reinforcement learning (RL), and using RL to solve a simple simulated problem using Inkling and the Bonsai Platform. This will include learning the essential components of the Inkling language needed to connect the simulation to the platform. You will familiarize yourself with the simulation, test it, then fill in the necessary Inkling types (states and actions) needed to connect the simulation to the platform. By the end of this tutorial, you can expect to be comfortable with the relationship between a BRAIN and a simulation, as well as how to train a BRAIN inside the Bonsai Platform using custom Inkling code, and basic knowledge of Bonsai’s Command Line Interface (CLI).

We expect you to already have read the Quick Start tutorial overview and install the CLI before following this tutorial so you have a general understanding of how the platform works and have installed and configured the Bonsai CLI. Please do so before continuing with this tutorial.

The problem we are trying to solve using this simulation is to teach an agent (the Bonsai BRAIN) to move to a target point in a simple planar world. The agent is told where it is relative to the goal, and has to decide which way to move. The agent sends its chosen action to the simulation, which simulates the step, moving the agent to a new location, and this repeats until the agent makes it to the goal, or time runs out.

First, we’ll cover a bit about reinforcement learning for those who are unfamiliar with this style of machine learning.

How Reinforcement Learning Works

Visualization of one iteration on the Bonsai Platform Iteration visualization

Reinforcement learning (RL), illustrated here, is a machine learning technique for controlling or optimizing a system or process. In RL, an agent takes actions in an environment, getting feedback in the form of reward. The agent explores different action strategies or policies and learns to maximize cumulative reward. One cycle from the environment to the agent and back is called an iteration.


This table describes the key terms used in RL:

state The state of the environment at each iteration (ex: current agent position relative to the target). The agent uses this to decide what action to select.
action Actions define what the agent can do at each iteration (ex: which direction to move in). The goal of RL is to learn to select the right series of actions to achieve an objective.
reward The reward at each iteration gives the agent feedback that helps it learn (ex: higher reward when moving toward the target). An important note is that the agent’s goal is to maximize cumulative future reward, not the instantaneous reward at each iteration.
iteration An iteration is one state → action → reward → new-state transition in the environment. The agent uses the state to select an action, which causes the environment to transition to a new state, and results in a reward.
episode An episode is a series of iterations, starting in some initial state and ending when the environment hits a termination condition and the environment resets for the next episode. (ex: episode is terminated either when agent reaches the target (success) or time runs out (failure)) Episodes can vary in length, with termination conditions typically defined based on succeeding at the task, failing at the task, getting too far from success, or running out of time.
cumulative reward Cumulative reward is the sum of the per-iteration rewards over an episode. The agent’s goal is to select actions that maximize cumulative future reward through the end of the episode.
terminal condition Terminal conditions specify when to end an episode.
policy A policy determines what action the agent selects for every possible state. The goal of RL is to learn a policy that maximizes cumulative future reward.

To learn more about reinforcement learning, watch our training video about types of machine learning.

Create a BRAIN

Now that you’ve gotten a taste of Reinforcement Learning, let us begin by downloading the code for this tutorial and creating a new BRAIN on the Bonsai Platform for training.

As mentioned in the introduction, you’ll need to have the Bonsai CLI installed before you can run these commands. If you haven’t already done so, please Install the CLI.

git clone
cd bonsai-tutorials/tutorial1

If you don’t have git installed on your computer you can download a .zip file of the materials instead.

# The name of your BRAIN will be move-a-point
bonsai create move-a-point

Once you have either git cloned or downloaded the files for this tutorial, navigate on your command prompt inside of the tutorial1 folder and run bonsai create move-a-point which will create a new BRAIN for your account called move-a-point. This command also silently uploads your project files to the server so it’s important to use this command within the tutorial1 folder where your Inkling and simulation files are.

Simulation Overview

The simulation we will be running in this tutorial is called Move a Point. Start by skimming over which is found within the tutorial1 folder of bonsai-tutorials. The simulation, in the PointSimulation class, models the problem described in the introduction – in each episode (started by a call to reset()), an agent starts in a location (current), and has to get within PRECISION of a target point target. At each step of the simulation, the agent moves a distance STEP_SIZE in a specified direction.

The simulation ends (i.e., game_over() returns True) when the agent is close enough to the target.

The PointSimulation class is a simple example of a common integration pattern: some often pre-existing code simulates a process, and we write additional code to control the simulation, read out the state, and reset periodically. Next, let’s review how to connect a simulation to a BRAIN using the Bonsai SDK (Software Development Kit).

Using the Bonsai SDK to connect to the BRAIN

import bonsai_ai

class PointBonsaiBridge(bonsai_ai.Simulator):

The integration of the simulation with the Bonsai SDK includes three pieces. First, take a look at how we import the bonsai_ai module and derive the simulator class from bonsai_ai.Simulator.

Implementing the Simulator interface requires coding two functions, episode_start() and simulate().

Implementing the Simulator interface

def episode_start(self, parameters=None):
    """Set up the simulation for a new episode. Returns the initial state."""

def simulate(self, action):
    """Given an action, run one simulation iteration and return a tuple:
           (state, reward, is_terminal)""" 

The action parameter to simulate is a dictionary, with a key for each action variable defined in your Inkling code. In our case, this is action["direction_radians"]. The state returned is a dictionary with one key for each state variable defined in your Inkling. In our case, this is implemented by the code in _get_state(self).

In addition to the “game over” condition specified by the simulation, our SDK connection bridge adds a maximum number of steps (MAX_STEPS) per episode in _is_terminal(). Having a time limit like this helps ensure that episodes end even if the agent keeps moving in the wrong direction.

The episode_start and simulate functions together define the action loop, which can be written as shown in the code panel.

Action loop

state = sim.episode_start()
is_terminal = False
while not is_terminal:
    action = policy(state)   # decide what to do next. When connected to a BRAIN, the BRAIN chooses the action.
    (state, reward, is_terminal) = sim.simulate(action)
    # Learning from the new state and the obtained reward happens here

Finally, we need to instantiate the simulator class and run it.

Instantiate the Simulator class

if __name__ == "__main__":
    config = bonsai_ai.Config(sys.argv)
    brain = bonsai_ai.Brain(config)
    sim = PointBonsaiBridge(brain, "move_a_point_sim")

The configuration includes the BRAIN name and url, your Bonsai API key, and whether to train the BRAIN or run it in prediction mode (sometimes called test mode). The name passed to the PointBonsaiBridge constructor, "move_a_point_sim" here, identifies the simulation file to the BRAIN, and must match your Inkling code (described in a later section).

Run Your Simulator

To test the simulator, we can create a PointSimulator object and directly call episode_start() and simulate(), as described above. You can see an example of how to test the simulator before connecting it to the Bonsai Platform in test_simulator.ipynb (requires Jupyter Notebook), or in the Python script, both of which can be found within the tutorial1 folder.

Episode loop

def run_sim_episode(sim, policy):
    Given a sim and a policy, step through some iterations 
    of the simulator, returning the history of states.

        sim: a PointBonsaiBridge
        policy: a function (SimState -> action dictionary)
    state_history = []
    reward_history = []
    state = sim.episode_start()

    is_terminal = False
    while not is_terminal:
        action = policy(state)
        (state, reward, is_terminal) = sim.simulate(action)

    return state_history, reward_history

The key code is the episode loop, shown in the code panel.

Then we can define some policies, defining what action to take for a given state. If you’d like to test with other silly policies or define your own, change line 56 in states, rewards = run_sim_episode(point_sim, random_policy.

After that the code will run some episodes, plotting the results.

Define policies

# Some silly policies
def random_policy(state):
    Ignore the state, move randomly.
    return {'direction_radians': random.random() * 2 * math.pi}

def go_up_policy(state):
    return {'direction_radians': math.pi / 2.0}

Run episode and plot results

for i in range(3):
    states, rewards = run_sim_episode(point_sim, random_policy)

When making or integrating simulations, it is always a good idea to run some sanity checks and verifications before starting BRAIN training.


  • Run the simulator via the above code, either via or, if you have Jupyter Notebook, using test_simulator.ipynb, both in the tutorial1 folder.
  • Write two different policy functions for moving in different directions, and plot their behavior.

Connecting Your BRAIN

Adaptation of previous RL image to show the translation onto the Bonsai Platform Overview Image

To connect our simulation to a BRAIN in the Bonsai platform, we will need an Inkling program that describes the problem and how to teach the AI to solve it. Inkling is a programming language specifically designed for artificial intelligence (AI). It abstracts away the vast world of dynamic AI algorithms that require expertise in machine learning and enables more developers and subject matter experts to create AI.

The problem description includes type definitions for the states the BRAIN will receive and the actions it will need to send. The description of how to teach the BRAIN includes the reward, as well as any decomposition of the problem into concepts and the sequencing of learning using lessons. This first tutorial will use a single concept and a single lesson.


A type in Inkling describes the format and allowed ranges for a data value.


  • Fill in the state type
  • Fill in the action type
Fill in the state type
type GameState {
    # EXERCISE: Add a field name and type for each variable of x and y position.
    # These have to match the dictionary returned by _get_state() in our simulator.
    # <Your code goes here>
type GameState {
    # X and Y direction of the point. These names (and types) have to match the
    # dictionary returned by _get_state() our simulator.
    dx: number,
    dy: number

The state type describes the state of the environment, and must match the state values returned from the simulator. In this case, the state consists of two numbers named dx and dy. Now, fill in the state type definition. (Refer to the Inkling reference if you’re not sure how.)

Fill in the action type
type PlayerMove {
    # EXERCISE: The field names and types must match the parameter to step() in
    # our simulator. You need to specify the range and step size for the action.
    # <Your code goes here>
type PlayerMove {
    # This field names and types must match the parameter to step() in our
    # simulator. We specify the range and step size for the action.
    # A constraint {0,1.575,3.142, 4.712} would also work
    direction_radius: number<0 .. 6.283 step 1.575>  

Next, let’s define the action type. The action in our problem corresponds to picking a direction, specified as a number of radians and named direction_radians. In this case, we want the system to pick from the four cardinal directions: 0, pi/2, pi, 3*pi/2. Use the Inkling reference to look up the syntax for number type constraints and fill in the action type. (Bonus: there are at least two ways to do it.)


graph (input: GameState) {
    concept FindTheTarget(input): PlayerMove {
        # Curriculum omitted

    output FindTheTarget

A concept in Inkling defines what you are going to teach the AI. By declaring a concept, you are instructing the AI Engine to learn a new function that maps a set of inputs to outputs. Each concept must include a curriculum.

In this simulation, we are asking the agent to learn the concept FindTheTarget by choosing a direction (of type PlayerMove) after seeing the current state (of type GameState).

For more information about using the concept keyword, refer to the Concept Reference.

Curriculum and Lessons

curriculum {
    # Specify a simulator as the data source
    source MoveAPointSim

A curriculum in Inkling is used to define what and how to teach a concept. Each concept must contain a curriculum. It refers to a simulator that acts as a data source for training the concept.

Train Your BRAIN

Now that you’ve written a curriculum of machine teaching (through your Inkling code) to connect your simulation to the Bonsai AI Engine, it’s time to prepare a new version for training. Since we previously created a new BRAIN for training, the --brain argument in the below commands is optional, but we’ve left it in to remind you which BRAIN the CLI is targeting.

# Push the edited files to the server
bonsai push

# Start a new BRAIN version
bonsai train start

Use bonsai push to upload your edited Inkling file to the server whenever you make changes (make sure you filled in the action and state types first or you will get an error!).

Python 2

python --brain=move-a-point

Python 3

python3 --brain=move-a-point

Once you have started training mode with bonsai train start it’s time to start your simulation. Training will begin automatically after you connect your simulator.

View your BRAIN training status

Training BRAIN

View your BRAIN’s training status as it trains on the simulator by going to the BRAIN’s Dashboard page on Training move-a-point takes about a minute to get sufficient training to find the goal quickly.

There is no automatic ending to training, you can train this brain for hours, but there will be diminishing returns after a few minutes because of how simple of a problem this is to solve. You should wait until the reward approaches and stabilizes around 18 (with the max being 20 if the AI is perfect every episode). This should only take about a minute.

Stop Training

bonsai train stop

Once the BRAIN has gotten to this level of performance (or sooner if you prefer), CTRL-C to disconnect the simulator, then bonsai train stop will end the training, and proceed to prediction.

Predict with Your BRAIN

Python 2

python --predict=latest

Python 3

python3 --predict=latest

After your BRAIN is finished training you can use it to move to a point as quickly as it can. How well it does depends on how long you let it train! Using your BRAIN involves starting your simulation, but now in prediction mode with --predict=latest which will use the version of the latest training session that you just ran.

Predicting BRAIN

Now you can see how fast the simulation can move to a point depending on how far away from the point it started. The higher the distance the more steps the agent will likely take to reach the point.

And that’s it! You have now successfully learned how to test out a simulation, write your own types to connect a BRAIN, train, and predict from that BRAIN!

Next Steps

Next tutorial coming soon! In the meantime you can check out our Programming Machine Teaching guide for more detailed information on the topics we have covered in this tutorial.

And we have these other resources that will enable you to maximize your AI development experience: