Commit 0e7f700b authored by Benjamin's avatar Benjamin
Browse files
parents fbabffe3 92a6126f
...@@ -2,23 +2,23 @@ ...@@ -2,23 +2,23 @@
# Animal-AI Environment tutorial # Animal-AI Environment tutorial
This tutorial is a step-by-step presentation of the new version of the Animal-AI library. The new Animal-AI environment is quite similar to the version used for the competition, however the `animalai` and `animalai_train` APIs have been dramatically improved, reflecting the great improvements made by [Unity ml-agents](https://github.com/Unity-Technologies/ml-agents). This tutorial is a step-by-step presentation of the new version of the Animal-AI library. The new Animal-AI environment is quite similar to the version used for the competition, however the `animalai` and `animalai_train` APIs have been dramatically improved, reflecting the great improvements made by [Unity ml-agents](https://github.com/Unity-Technologies/ml-agents).
In this notebook, **we present the environment and how to design both your training and testing setups**. In the [second notebook (training)](./training_tutorial.pynb) we'll show you how to train an agent to solve a task it has never seen before. In this notebook, **we present the environment and how to design both your training and testing setups**. In the second notebook (training) we'll show you how to train an agent to solve a task it has never seen before.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Introducing animal cognition to the AI world ## Introducing animal cognition to the AI world
Our goal is to provide a tool for researchers to go beyond classical RL environments, allowing you to develop agents that possess cogintive skills similar to animal's. The main idea is to be able to test and/or train your agents on **experiments taken or inspired from real life animal experiments**. This repository holds 900 such experiments which cover a dozen cognitive skills. You can find more details on the test-bed on [our website](http://animalaiolympics.com/AAI/testbed) Our goal is to provide a tool for researchers to go beyond classical RL environments, allowing you to develop agents that possess cogintive skills similar to animal's. The main idea is to be able to test and/or train your agents on **experiments taken or inspired from real life animal experiments**. This repository holds 900 such experiments which cover a dozen cognitive skills. You can find more details on the test-bed on [our website](http://animalaiolympics.com/AAI/testbed)
The environment is a simple arena with an agent that can only move left, right, forward and backward, aiming to collect positive reward and avoid negative ones. It can also hold [several objects](../documentation/definitionsOfObjects.md) which can be used to set up experiments. You can really put yourself in the shoes of an animal cognition scientist building experiments with whatever you can find in a lab. The environment is a simple arena with an agent that can only move left, right, forward and backward, aiming to collect positive reward and avoid negative ones. It can also hold [several objects](https://github.com/beyretb/AnimalAI-Olympics/blob/master/documentation/definitionsOfObjects.md) which can be used to set up experiments. You can really put yourself in the shoes of an animal cognition scientist building experiments with whatever you can find in a lab.
From the agent's perspective, a classical experiment called a Y-maze looks like this(the agent must explore a simple Y-shaped maze to find a reward, often food): From the agent's perspective, a classical experiment called a Y-maze looks like this(the agent must explore a simple Y-shaped maze to find a reward, often food):
<img src="documentation/notebook_data/y_maze.png" width="40%"> <img src="notebook_data/y_maze.png" width="40%">
The agent is on an elevated platform (blue), needs to move towards the reward (green ball) and avoid going to the right in which case the agent would be stuck (the platform is too high for the agent to climb back on). The agent is on an elevated platform (blue), needs to move towards the reward (green ball) and avoid going to the right in which case the agent would be stuck (the platform is too high for the agent to climb back on).
From an RL perspective this might seem like a trivial problem to solve! In a classical RL setup where you train and test on the same problem it is indeed simple. However, when tested on a similar task, an animal would encounter this problem for the first time. And this is what we encourage you to do as well: **create your own training curriculum, and use our experiments as a test set your agent has never seen before**. We believe this is needed to truly test an agent's capacity to acquire cognitive skills. From an RL perspective this might seem like a trivial problem to solve! In a classical RL setup where you train and test on the same problem it is indeed simple. However, when tested on a similar task, an animal would encounter this problem for the first time. And this is what we encourage you to do as well: **create your own training curriculum, and use our experiments as a test set your agent has never seen before**. We believe this is needed to truly test an agent's capacity to acquire cognitive skills.
...@@ -50,18 +50,18 @@ ...@@ -50,18 +50,18 @@
<IPython.core.display.HTML object> <IPython.core.display.HTML object>
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
In the following sections we'll design a training curriculum which does not include the exact "reward in a transparent cylinder" task, but which we can use to train an agent that can solve this same task. In the [next tutorial](./training_tutorial.pynb), we'll train such an agent using this curriculum. In the following sections we'll design a training curriculum which does not include the exact "reward in a transparent cylinder" task, but which we can use to train an agent that can solve this same task. In the training tutorial, we'll train such an agent using this curriculum.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Let's get started: experiments design ## Let's get started: experiments design
First things first, as rigorous researchers, we want to design a good training environment. To do so, we provide a [list of items](../documentation/definitionsOfObjects.md) you can include in your arena, you can have a look at the details later, this section highlights the basics. First things first, as rigorous researchers, we want to design a good training environment. To do so, we provide a [list of items](https://github.com/beyretb/AnimalAI-Olympics/blob/master/examples/environment_tutorial.ipynb) you can include in your arena, you can have a look at the details later, this section highlights the basics.
To begin with let's train an agent to collect food right in front of it, as simple as that! To do so, you'll need to design a `yaml` file which describes the experiment setup. It contains: To begin with let's train an agent to collect food right in front of it, as simple as that! To do so, you'll need to design a `yaml` file which describes the experiment setup. It contains:
- experiment parameters (maximum steps, steps at which the light is turned on/off) - experiment parameters (maximum steps, steps at which the light is turned on/off)
- a list of objects - a list of objects
...@@ -91,11 +91,11 @@ ...@@ -91,11 +91,11 @@
2: !Arena 2: !Arena
...... ......
``` ```
But if you want all the arenas in the environment to have the same configuration then do as we did above: define one configuration only with key `-1` But if you want all the arenas in the environment to have the same configuration then do as we did above: define one configuration only with key `-1`
You can now use this to load an environment and play yourself ([this script does that for you](./load_config_and_play.py)). Make sure you have followed the [installation guide](../README.md#requirements) and then create an `AnimalAIEnvironment` in play mode: You can now use this to load an environment and play yourself (`load_config_and_play.py` does that for you). Make sure you have followed the [installation guide](https://github.com/beyretb/AnimalAI-Olympics#requirements) and then create an `AnimalAIEnvironment` in play mode:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from animalai.envs.arena_config import ArenaConfig from animalai.envs.arena_config import ArenaConfig
...@@ -138,11 +138,11 @@ ...@@ -138,11 +138,11 @@
2. food in front of us further away 2. food in front of us further away
3. food at the same distance as 2, but randomize the agent's rotation (might be behind the agent) 3. food at the same distance as 2, but randomize the agent's rotation (might be behind the agent)
4. agent and food randomly placed and rotated each on a fixed z-axis, and small transparent wall in between the two 4. agent and food randomly placed and rotated each on a fixed z-axis, and small transparent wall in between the two
5. same as 4 with bigger and bigger walls 5. same as 4 with bigger and bigger walls
To design a curriculum, we need to place all the yaml files in a folder along with a json configuration file which contains the details of when to switch from one level to the next. The above curriculum can be found in [this folder](./examples/configurations/curriculum). To design a curriculum, we need to place all the yaml files in a folder along with a json configuration file which contains the details of when to switch from one level to the next. The above curriculum can be found in `configurations/curriculum`.
The second configuration is just like the first but with say `z: 35` for `GoodGoal`. The third one only requires to randomize the rotation, to do so we can replace the `rotations: [0]` with `rotations: [-1]` as any parameter with a value of `-1` is randomized. Otherwise you can just remove the `rotations` line altogether, and the rotation will be randomized automatically (also works with positions, sizes and colors). The second configuration is just like the first but with say `z: 35` for `GoodGoal`. The third one only requires to randomize the rotation, to do so we can replace the `rotations: [0]` with `rotations: [-1]` as any parameter with a value of `-1` is randomized. Otherwise you can just remove the `rotations` line altogether, and the rotation will be randomized automatically (also works with positions, sizes and colors).
Putting all of the above together, we can have a look at step 4: Putting all of the above together, we can have a look at step 4:
...@@ -374,11 +374,11 @@ ...@@ -374,11 +374,11 @@
Taht's pretty much it! In practice we provide a very efficient environments manager (derived from ml-agents). It allows the training loop to run over several instances of the environment for better performances. You can also use the Gym implementation of the environment. See the examples on training with OpenAI baselines for that. Taht's pretty much it! In practice we provide a very efficient environments manager (derived from ml-agents). It allows the training loop to run over several instances of the environment for better performances. You can also use the Gym implementation of the environment. See the examples on training with OpenAI baselines for that.
If you want to train using ml-agents, it is a very modular framework which requires some work to plug your own meodel in (depending on how far they are from PPO or SAC), but it's worth it! If you want to train using ml-agents, it is a very modular framework which requires some work to plug your own meodel in (depending on how far they are from PPO or SAC), but it's worth it!
Have a look at the second notebook on [training an agent using animalai-train](training_tutorial.ipynb) Have a look at the second notebook on training an agent using animalai-train.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
......
...@@ -7,11 +7,11 @@ ...@@ -7,11 +7,11 @@
In this notebook we show you how to run the `animal-ai` trainers which are optimized to train on the AnimalAI environment. It's a powerfull modular library you can tinked with in order to implement your own algorithm. We strongly recommend you have a look at its various parts described at the end of this tutorial should you wish to make some modifications. In this notebook we show you how to run the `animal-ai` trainers which are optimized to train on the AnimalAI environment. It's a powerfull modular library you can tinked with in order to implement your own algorithm. We strongly recommend you have a look at its various parts described at the end of this tutorial should you wish to make some modifications.
## Can your agent self control? - Part II ## Can your agent self control? - Part II
If you haven't done so already, go throught the [environement tutorial](environment_tutorial.pynb) where we decribe the problem of self-control in animals. We created a curriculum which includes increasingly difficult levels for the agent to retrieve food, while introducing items similar to those in the final experiment, without ever having the exact experiment in the training curriculum. If you haven't done so already, go throught the environement tutorial where we decribe the problem of self-control in animals. We created a curriculum which includes increasingly difficult levels for the agent to retrieve food, while introducing items similar to those in the final experiment, without ever having the exact experiment in the training curriculum.
We created the curriculum in the previous notebook, now we need to configure the training environemnt. The `animalai-train` library provides all the tool you need to train PPO or SAC. We'll use the former here. We created the curriculum in the previous notebook, now we need to configure the training environemnt. The `animalai-train` library provides all the tool you need to train PPO or SAC. We'll use the former here.
First, we need to set all the hyperparameters of our model, this is done by creating a yaml file as follows: First, we need to set all the hyperparameters of our model, this is done by creating a yaml file as follows:
...@@ -120,11 +120,11 @@ ...@@ -120,11 +120,11 @@
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Using ML-Agents and AnimalAI for your algorithms ## Using ML-Agents and AnimalAI for your algorithms
As mentionned earlier AnimalAI is built on top of ML-Agents, and we strongly recommend you have a look at the various bits and pieces you can tinker with in order to implement your own agents. This part is a brief overview of where you can find these parts at the heart of most RL algortihms. We'll start from the higher level controllers down to the basic bricks of RL algorithms. Should you wish to modify them, you'll need to clone [ml-agents repository)[https://github.com/Unity-Technologies/ml-agents]. As mentionned earlier AnimalAI is built on top of ML-Agents, and we strongly recommend you have a look at the various bits and pieces you can tinker with in order to implement your own agents. This part is a brief overview of where you can find these parts at the heart of most RL algortihms. We'll start from the higher level controllers down to the basic bricks of RL algorithms. Should you wish to modify them, you'll need to clone [ml-agents repository)(https://github.com/Unity-Technologies/ml-agents).
- `animalai_train.run_training`: contains the highest level of control for training an agent. You can find all the subroutines you need in order to do so. The most import ones are: - `animalai_train.run_training`: contains the highest level of control for training an agent. You can find all the subroutines you need in order to do so. The most import ones are:
- `animalai_train.subprocess_env_manager_aai.SubprocessEnvManagerAAI`: an evnironment manager which derives from `mlagents.trainers.subprocess_env_manager.SubprocessEnvManager` and manages several environments to run in parallel. In prcatice you shouldn't need to change this part. - `animalai_train.subprocess_env_manager_aai.SubprocessEnvManagerAAI`: an evnironment manager which derives from `mlagents.trainers.subprocess_env_manager.SubprocessEnvManager` and manages several environments to run in parallel. In prcatice you shouldn't need to change this part.
- `mlagents.trainers.trainer_util.TrainerFactory`: a factory method which is in charge of creating trainer methos to manage the agents in the environment. In practice we only have a single type of agents in every all the environments, therefore there will only be one trainer to manage all the agents. **You might need to change this code** in case you add a new RL algorithm as it is designed to handle PPO and SAC only. - `mlagents.trainers.trainer_util.TrainerFactory`: a factory method which is in charge of creating trainer methos to manage the agents in the environment. In practice we only have a single type of agents in every all the environments, therefore there will only be one trainer to manage all the agents. **You might need to change this code** in case you add a new RL algorithm as it is designed to handle PPO and SAC only.
- `animalai_train.trainer_controller_aai.TrainerControllerAAI`: derives from `mlagents.trainers.trainer_controller.TrainerController` and is where the training loop is. - `animalai_train.trainer_controller_aai.TrainerControllerAAI`: derives from `mlagents.trainers.trainer_controller.TrainerController` and is where the training loop is.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment