Commit 32d57486 authored by afspies's avatar afspies
Browse files

Change some wording / fix typos in ipynbs for environment and training

parent 21aa99ff
This diff is collapsed.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Training # Training
The API provides both a [Unity ml-agents](https://github.com/Unity-Technologies/ml-agents) and an [OpenAI Gym](https://github.com/openai/gym) interfaces. We include training examples for both in [the examples folder](./examples), the former uses the ml-agents' own training library which is optimised for the environment, the latter uses [OpenAI baselines](https://github.com/openai/baselines). The API provides both a [Unity ml-agents](https://github.com/Unity-Technologies/ml-agents) and an [OpenAI Gym](https://github.com/openai/gym) interface. We include training examples for both in [the examples folder](./examples); the former uses ml-agents' own training library which is optimised for the environment, the latter uses [OpenAI baselines](https://github.com/openai/baselines).
In this notebook we show you how to run the `animal-ai` trainers which are optimized to train on the AnimalAI environment. It's a powerfull modular library you can tinked with in order to implement your own algorithm. We strongly recommend you have a look at its various parts described at the end of this tutorial should you wish to make some modifications. In this notebook we show you how to run the `animal-ai` trainers which are optimized for training on the AnimalAI environment. It's a powerful modular library you can tinker with in order to implement your own algorithms. We strongly recommend that you have a look at its various parts (described at the end of this tutorial) should you wish to make modifications.
## Can your agent self control? - Part II ## Can your agent self control? - Part II
If you haven't done so already, go throught the environement tutorial where we decribe the problem of self-control in animals. We created a curriculum which includes increasingly difficult levels for the agent to retrieve food, while introducing items similar to those in the final experiment, without ever having the exact experiment in the training curriculum. If you haven't done so already, go through the environement tutorial where we decribe the problem of self-control in animals. We created a curriculum which includes increasingly difficult levels in which the agent must retrieve food, while being introduced to objects similar to those in the final experiment, without encountering the exact testing configuration(s).
We created the curriculum in the previous notebook, now we need to configure the training environemnt. The `animalai-train` library provides all the tool you need to train PPO or SAC. We'll use the former here. Having created a curriculum in the previous notebook, we now need to configure the training environment. The `animalai-train` library provides all the tools you'll need to train using PPO or SAC - we'll be using the former here.
First, we need to set all the hyperparameters of our model, this is done by creating a yaml file as follows: First, we need to set all the hyperparameters of our model, which is done by creating a yaml file as follows:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
with open("configurations/training_configurations/train_ml_agents_config_ppo.yaml") as f: with open("configurations/training_configurations/train_ml_agents_config_ppo.yaml") as f:
print(f.read()) print(f.read())
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
If you're already familiar with RL algorithms in general, these should be quite explainatory. Either way, you can have a look at [this page](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-Configuration-File.md) for both PPO and SAC configuration files details. If you're already familiar with RL algorithms in general, these parameters should be fairly self-explanatory. Nonetheless, you can have a look at [this page](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-Configuration-File.md) for explanations of the parameters specified for both PPO and SAC.
You then need to configure the trainer which is just a named tuple that defines parameters such as: You then need to configure the trainer, which is just a named tuple defining parameters such as:
- the paths to the environment and your configuration file (above) - the paths to the environment and your configuration file (above)
- how many environments to launch in parralel and with how many agent per environment - how many environments to launch in parallel and with how many agent per environment
- the path to your curriculum - the path to your curriculum
- and many more! - and many more!
This is all done as follows:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import warnings import warnings
warnings.filterwarnings('ignore') warnings.filterwarnings('ignore')
import tensorflow as tf import tensorflow as tf
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR) tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
...@@ -67,34 +68,33 @@ ...@@ -67,34 +68,33 @@
) )
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Once this is done we're pretty much left with a one liner! The training library isn't verbose, but you can monitor training via Tensorboard. The first few lines just load tensorboard, once it is launched and you can see the orange window below, just click on the refresh button in the top right of Tensorboard, graphs will appear after a few training steps. Once this is done we're pretty much just left with a one liner! The training library isn't verbose, but you can monitor training via Tensorboard. The first few lines just load tensorboard, once it is launched and you can see the orange window below, just click on the refresh button in the top right of Tensorboard - graphs will appear after a few training steps.
_Note_: in case you don't want to wait for the model to train, you can jump ahead to the next step as we provide a pre-trained model for inference _Note_: in case you don't want to wait for the model to train, you can jump ahead to the next step as we provide a pre-trained model for inference.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import os import os
# logging.getLogger('tensorflow').disabled = True # logging.getLogger('tensorflow').disabled = True
logs_dir = "summaries/" logs_dir = "summaries/"
os.makedirs(logs_dir, exist_ok=True) os.makedirs(logs_dir, exist_ok=True)
%load_ext tensorboard %load_ext tensorboard
%tensorboard --logdir {logs_dir} %tensorboard --logdir {logs_dir}
run_training_aai(0, args) run_training_aai(0, args)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
You can see the lessons increasing as the agent gets better at each level. That's pretty much it for training using the provided library. One last thing we need to do is assess how well our agent trained with only rewards and transparent walls can perform on the transparent cylinder task. To do so we can load the model and run the model in inference You can see the lessons increasing as the agent gets better at each level. That's pretty much it for training using the provided library.
Lastly, let's assess how well our agent trained with transparent walls present can perform on the transparent cylinder task. To do so we can load the model and run it in inference
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from animalai.envs.arena_config import ArenaConfig from animalai.envs.arena_config import ArenaConfig
...@@ -114,35 +114,34 @@ ...@@ -114,35 +114,34 @@
run_training_aai(0, args) run_training_aai(0, args)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
You should see the agent get the reward about 50% of the time. It's far from perfect, but it's a good start! Remeber it's a problem which is meant to be hard! You can now give a go at making your own algorithm to train agents that can solve one or more tasks in the `competition_configurations` folder! You should see the agent get the reward about 50% of the time. It's far from perfect, but it's a good start! Remember, this problem is meant to be hard! You can now have a go at making your own algorithm to train agents that can solve one or more tasks in the `competition_configurations` folder!
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Using ML-Agents and AnimalAI for your algorithms ## Using ML-Agents and AnimalAI for your algorithms
As mentionned earlier AnimalAI is built on top of ML-Agents, and we strongly recommend you have a look at the various bits and pieces you can tinker with in order to implement your own agents. This part is a brief overview of where you can find these parts at the heart of most RL algortihms. We'll start from the higher level controllers down to the basic bricks of RL algorithms. Should you wish to modify them, you'll need to clone [ml-agents repository](https://github.com/Unity-Technologies/ml-agents). As mentioned earlier, AnimalAI is built on top of ML-Agents, and we strongly recommend that you have a look at the various bits and pieces which you can tinker with in order to implement your own agents. This part provides a brief overview of where you can find these parts at the heart of most RL algortihms. We'll start from high level controllers and work our way down to the basic bricks of RL algorithms. Should you wish to modify them, you'll need to clone the [ml-agents repository](https://github.com/Unity-Technologies/ml-agents).
- `animalai_train.run_training`: contains the highest level of control for training an agent. You can find all the subroutines you need in order to do so. The most import ones are: - `animalai_train.run_training`: contains the highest level of control for training an agent. You can find all the subroutines you need in order to do so. The most import ones are:
- `animalai_train.subprocess_env_manager_aai.SubprocessEnvManagerAAI`: an evnironment manager which derives from `mlagents.trainers.subprocess_env_manager.SubprocessEnvManager` and manages several environments to run in parallel. In prcatice you shouldn't need to change this part. - `animalai_train.subprocess_env_manager_aai.SubprocessEnvManagerAAI`: an environment manager which derives from `mlagents.trainers.subprocess_env_manager.SubprocessEnvManager` and can run multiple environments in parallel. In prcatice you shouldn't need to change this part.
- `mlagents.trainers.trainer_util.TrainerFactory`: a factory method which is in charge of creating trainer methos to manage the agents in the environment. In practice we only have a single type of agents in every all the environments, therefore there will only be one trainer to manage all the agents. **You might need to change this code** in case you add a new RL algorithm as it is designed to handle PPO and SAC only. - `mlagents.trainers.trainer_util.TrainerFactory`: a factory method which is in charge of creating trainer methods to manage the agents in the environment. In practice we only have a single type of agent in all of the environments, therefore there will only be one trainer to manage all the agents. **You might need to change this code** if you add a new RL algorithm, as it was designed to handle PPO and SAC.
- `animalai_train.trainer_controller_aai.TrainerControllerAAI`: derives from `mlagents.trainers.trainer_controller.TrainerController` and is where the training loop is. - `animalai_train.trainer_controller_aai.TrainerControllerAAI`: derives from `mlagents.trainers.trainer_controller.TrainerController` and is where the training loop is.
The basic elements which are most likely to be of interest to you: The basic elements which are most likely to be of interest to you:
- **Curriculum**: managed in `animalai_train.meta_curriculum_aai.MetaCurriculumAAI` and `animalai_train.meta_curriculum_aai.CurriculumAAI` - **Curriculum**: managed in `animalai_train.meta_curriculum_aai.MetaCurriculumAAI` and `animalai_train.meta_curriculum_aai.CurriculumAAI`.
- **RL algo**: you can find the implementations for PPO and SAC in `mlagents.trainers.ppo.trainer` and `mlagents.trainers.sac.trainer` respectively. They both implment the base class `mlagents.trainers.trainer.trainer` which you can implement and plug directly in the overall training setup (you'll manage all the necessary model parameters in the `TrainerFactory` mentionned above) - **RL algo**: you can find the implementations for PPO and SAC in `mlagents.trainers.ppo.trainer` and `mlagents.trainers.sac.trainer` respectively. They both implment the base class `mlagents.trainers.trainer.trainer` which you can implement and plug directly into the overall training setup (managing all the necessary model parameters in the `TrainerFactory` mentioned above).
- **Exploration**: there is a curiosity module already provided in `mlagents.trainers.components.reward_signals` - **Exploration**: there is a curiosity module already provided in `mlagents.trainers.components.reward_signals`.
- **Buffer**: the agent's replay buffer is in `mlagents.trainers.buffer` - **Buffer**: the agent's replay buffer is in `mlagents.trainers.buffer`.
There are many more componnets you can find, one which is not implemented for AnimalAI but on our todo list is imitation learnign and the possibility to record a player's actions in the environmnent. There are many more components which you can find; two which are not implemented for AnimalAI, but which are on our todo list, are imitation learning and the option to record player actions in the environmnent.
That's pretty much all there is to know, we hope you enjoy the environment! That's pretty much all there is to know, we hope you enjoy the environment!
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
``` ```
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment