Commit 4b17ea66 authored by Benjamin's avatar Benjamin
Browse files

trainin jupyter notebook

parent f8fccea4
......@@ -5,6 +5,8 @@ examples/env/*
models/
summaries/
logs/
examples/models/*
!examples/models/self_control_curriculum_pre_trained
# Environemnt logfile
*Project.log
......
......@@ -2,7 +2,7 @@ from setuptools import setup, find_packages
setup(
name="animalai",
version="2.0.0b0",
version="2.0.0b3",
description="Animal AI envronment Python API",
url="https://github.com/beyretb/AnimalAI-Olympics",
author="Benjamin Beyret",
......
......@@ -62,13 +62,22 @@ def run_training_aai(run_seed: int, options: RunOptionsAAI) -> None:
options.arena_config,
options.resolution,
)
engine_config = EngineConfig(
options.width,
options.height,
AnimalAIEnvironment.QUALITY_LEVEL.train,
AnimalAIEnvironment.TIMESCALE.train,
AnimalAIEnvironment.TARGET_FRAME_RATE.train,
)
if options.train_model:
engine_config = EngineConfig(
options.width,
options.height,
AnimalAIEnvironment.QUALITY_LEVEL.train,
AnimalAIEnvironment.TIMESCALE.train,
AnimalAIEnvironment.TARGET_FRAME_RATE.train,
)
else:
engine_config = EngineConfig(
AnimalAIEnvironment.WINDOW_WIDTH.play,
AnimalAIEnvironment.WINDOW_HEIGHT.play,
AnimalAIEnvironment.QUALITY_LEVEL.play,
AnimalAIEnvironment.TIMESCALE.play,
AnimalAIEnvironment.TARGET_FRAME_RATE.play,
)
env_manager = SubprocessEnvManagerAAI(
env_factory, engine_config, options.num_envs
)
......
......@@ -2,7 +2,7 @@ from setuptools import setup, find_packages
setup(
name="animalai_train",
version="2.0.0b0",
version="2.0.0b3",
description="Animal AI training library",
url="https://github.com/beyretb/AnimalAI-Olympics",
author="Benjamin Beyret",
......@@ -16,6 +16,6 @@ setup(
],
packages=find_packages(exclude=["*.tests", "*.tests.*", "tests.*", "tests"]),
zip_safe=False,
install_requires=["animalai==2.0.0b0", "mlagents==0.15.0"],
install_requires=["animalai==2.0.0b3", "mlagents==0.15.0"],
python_requires=">=3.6.1",
)
TODO
# Examples
Detail the various files in this folder
- load_and_play:
- train_ml_agents
- ...
\ No newline at end of file
## Notebooks
To run the notebooks you can simply install the requirements by running (we recommend using a virtual environment):
```
pip install -r requirements.txt
```
Then you can start a jupyter notebook by running `jupyter notebook` from your terminal.
## Designing arenas
You can use `load_config_and_play.py` to visualize a `yml` configuration for an environment arena. Make sure `animalai`
is [installed](../README.md#requirements) and run `python load_config_and_play.py your_configuration_file.yml` which will open the environment in
play mode (control with W,A,S,D or the arrows), close the environment by pressing CTRL+C in the terminal.
## Animalai-train examples
We provide two scripts which show how to use `animalai_train` to train agents:
- `train_ml_agents.py` uses ml-agents' PPO implementation (or SAC) and can run multiple environments in parralel to speed up
the training process
- `train_curriculum.py` shows how you can add a curriculum to your training loop
To run either of these make sure you have `animalai-train` [installed](../README.md#requirements).
## OpenAI Gym and Baselines
You can use the OpenAI Gym interface to train using Baselines or other similar libraries (including
[Dopamine](https://github.com/google/dopamine) and [Stable Baselines](https://github.com/hill-a/stable-baselines)). To
do so you'll need to install:
On Linux:
```
sudo apt-get update && sudo apt-get install cmake libopenmpi-dev python3-dev zlib1g-dev &&
pip install tensorflow==1.14 &&
pip install git+https://github.com/openai/baselines.git@master#egg=baselines-0.1.6
```
On Mac: TODO
You can then run `train_baselines_dqn.py` or `train_baselines_ppo2.py` for examples.
\ No newline at end of file
!ArenaConfig
arenas:
0: !Arena
pass_mark: 0
t: 250
items:
- !Item
name: CylinderTunnelTransparent
positions:
- !Vector3 {x: 20, y: 0, z: 20}
rotations: [90]
sizes:
- !Vector3 {x: 10, y: 10, z: 10}
- !Item
name: GoodGoal
positions:
- !Vector3 {x: 20, y: 0, z: 20}
sizes:
- !Vector3 {x: 1, y: 1, z: 1}
- !Item
name: Agent
positions:
- !Vector3 {x: 20, y: 0, z: 1}
rotations: [180]
\ No newline at end of file
!ArenaConfig
arenas:
0: !Arena
-1: !Arena
pass_mark: 0
t: 250
items:
......
!ArenaConfig
arenas:
0: !Arena
-1: !Arena
pass_mark: 0
t: 250
items:
......
!ArenaConfig
arenas:
0: !Arena
-1: !Arena
pass_mark: 0
t: 250
items:
......
!ArenaConfig
arenas:
0: !Arena
-1: !Arena
pass_mark: 0
t: 250
items:
......
!ArenaConfig
arenas:
0: !Arena
-1: !Arena
pass_mark: 0
t: 250
items:
......
!ArenaConfig
arenas:
0: !Arena
-1: !Arena
pass_mark: 0
t: 250
items:
......
......@@ -4,8 +4,8 @@
0.8,
0.8,
0.8,
0.8,
0.8
0.6,
0.2
],
"min_lesson_length": 100,
"signal_smoothing": true,
......
......@@ -32,11 +32,11 @@
```
%% Cell type:markdown id: tags:
## Can your agent self control?
## Can your agent self control? - Part I
Self control is hard, we've all been there (looking at you chocolate bar). But don't worry, this is something a lot of species struggle with. In [The evolution of self-control](https://www.pnas.org/content/111/20/E2140) MacLean et al. tested this ability in **36 different species**! In a very simple experiment, animals are offered food they can reach easily by reaching out to it. But then, they're shown the same food behind a transparent wall, they need to go around the wall to grab the food. They can see the food just as well, but they need to refrain from reaching out like before.
Below are videos of such animals, as well as two participants' submissions to our compeition, exhibiting similar behaviors (remember, these agents never encoutered this task during training):
......@@ -78,10 +78,23 @@
%% Cell type:markdown id: tags:
This file contains the configuration of one arena (`!Arena`), with only the agent on the ground (`y=0`) in the center (`x=20`,`z=20`) and a `GoodGoal` (green sphere) of size 1 in front of it (`x=20`,`z=22`). Pretty simple right!
One _little trick_ we used here: one environment can contain several arenas during training, each with its own configuration. This allows your training algorithm to collect more observations at once. You can just place the configurations one after the others like this:
```
!ArenaConfig
arenas:
0: !Arena
......
1: !Arena
......
2: !Arena
......
```
But if you want all the arenas in the environment to have the same configuration then do as we did above: define one configuration only with key `-1`
You can now use this to load an environment and play yourself ([this script does that for you](./load_config_and_play.py)). Make sure you have followed the [installation guide](../README.md#requirements) and then create an `AnimalAIEnvironment` in play mode:
%% Cell type:code id: tags:
``` python
......@@ -177,10 +190,12 @@
%% Cell type:markdown id: tags:
This tells you that we'll switch from one level to the next once the reward per episod is above 0.8. Easy right?
In the next notebook we'll use the above curriculum example to train an agent that can solve the tube task we saw in the videos earlier. Before that, it is worth looking at an extra feature of the environment (blackouts) and use that to interact with the environment from python rather than playing manually.
%% Cell type:markdown id: tags:
## Interacting with the environment + bonus light switch!
In this final part, we look at the API for interacting with the environment. Namely, we want to take steps, collect observations and rewards. For this part we'll load an environment which tests for a cognitive skill called **objects permanence**. It tests the capacity of an agent to understand that an object still exist even if it moved out of sight, think of a car turning a corner, we all know the car hasn't vanished from existence. This test introduces another feature of the environment: **the light switch** which allows to switch the light in the environment on and off. Let's have a look at the experiment:
......
tensorflow>=1.7,<2.0 # if you wish to run examples using tf>=2.0 change the baselines requirement accordingly
animalai==2.0.0b0
animalai_train==2.0.0b0
baselines # replace with git+https://github.com/openai/baselines.git@tf2 to use tf2
animalai==2.0.0b1
animalai_train==2.0.0b2
jupyter
matplotlib
\ No newline at end of file
%% Cell type:markdown id: tags:
## Training simple
# Training
The API provides both a [Unity ml-agents](https://github.com/Unity-Technologies/ml-agents) and an[OpenAI Gym](https://github.com/openai/gym) interfaces. We include training examples for both in [the examples folder](./examples), the former uses the ml-agents' own training library which is optimised for the environment, the latter uses [OpenAI baselines](https://github.com/openai/baselines).
The API provides both a [Unity ml-agents](https://github.com/Unity-Technologies/ml-agents) and an [OpenAI Gym](https://github.com/openai/gym) interfaces. We include training examples for both in [the examples folder](./examples), the former uses the ml-agents' own training library which is optimised for the environment, the latter uses [OpenAI baselines](https://github.com/openai/baselines).
Our first training will aim at giving our agent the ability to simply collect "food" with nothing else in the environment. In your configuration file you can omit some parameters to randomize their values, for example:
In this notebook we show you how to run the `animal-ai` trainers which are optimized to train on the AnimalAI environment. It's a powerfull modular library you can tinked with in order to implement your own algorithm. We strongly recommend you have a look at its various parts described at the end of this tutorial should you wish to make some modifications.
## Can your agent self control? - Part II
If you haven't done so already, go throught the [environement tutorial](environment_tutorial.pynb) where we decribe the problem of self-control in animals. We created a curriculum which includes increasingly difficult levels for the agent to retrieve food, while introducing items similar to those in the final experiment, without ever having the exact experiment in the training curriculum.
We created the curriculum in the previous notebook, now we need to configure the training environemnt. The `animalai-train` library provides all the tool you need to train PPO or SAC. We'll use the former here.
First, we need to set all the hyperparameters of our model, this is done by creating a yaml file as follows:
%% Cell type:code id: tags:
``` python
with open("configurations/training_configurations/train_ml_agents_config_ppo.yaml") as f:
print(f.read())
```
%% Cell type:markdown id: tags:
If you're already familiar with RL algorithms in general, these should be quite explainatory. Either way, you can have a look at [this page](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-Configuration-File.md) for both PPO and SAC configuration files details.
You then need to configure the trainer which is just a named tuple that defines parameters such as:
- the paths to the environment and your configuration file (above)
- how many environments to launch in parralel and with how many agent per environment
- the path to your curriculum
- and many more!
%% Cell type:code id: tags:
``` python
import warnings
warnings.filterwarnings('ignore', category=DeprecationWarning)
warnings.filterwarnings('ignore', category=FutureWarning)
from mlagents.trainers.trainer_util import load_config
from animalai_train.run_options_aai import RunOptionsAAI
from animalai_train.run_training_aai import run_training_aai
trainer_config_path = (
"configurations/training_configurations/train_ml_agents_config_ppo.yaml"
)
environment_path = "env/AnimalAI"
curriculum_path = "configurations/curriculum"
run_id = "self_control_curriculum"
base_port = 5005
number_of_environments = 4
number_of_arenas_per_environment = 8
args = RunOptionsAAI(
trainer_config=load_config(trainer_config_path),
env_path=environment_path,
run_id=run_id,
base_port=base_port,
num_envs=number_of_environments,
curriculum_config=curriculum_path,
n_arenas_per_env=number_of_arenas_per_environment,
)
```
%% Cell type:markdown id: tags:
Once this is done we're pretty much left with a one liner! The training library isn't verbose, but you can monitor training via Tensorboard. The first few lines just load tensorboard, once it is launched and you can see the orange window below, just click on the refresh button in the top right of Tensorboard, graphs will appear after a few training steps.
_Note_: in case you don't want to wait for the model to train, you can jump ahead to the next step as we provide a pre-trained model for inference
%% Cell type:code id: tags:
``` python
import os
logs_dir = "summaries/"
os.makedirs(logs_dir, exist_ok=True)
%load_ext tensorboard
%tensorboard --logdir {logs_dir}
run_training_aai(0, args)
```
%%%% Output: display_data
%% Cell type:markdown id: tags:
You can see the lessons increasing as the agent gets better at each level. That's pretty much it for training using the provided library. One last thing we need to do is assess how well our agent trained with only rewards and transparent walls can perform on the transparent cylinder task. To do so we can load the model and run the model in inference
%% Cell type:code id: tags:
``` python
from animalai.envs.arena_config import ArenaConfig
#Comment the row below to load your trained model
run_id = "self_control_curriculum_pre_trained"
args = RunOptionsAAI(
trainer_config=load_config(trainer_config_path),
env_path=environment_path,
run_id=run_id,
base_port=base_port+2,
load_model=True,
train_model=False,
arena_config=ArenaConfig("configurations/arena_configurations/cylinder_task.yml")
)
run_training_aai(0, args)
```
%%%% Output: error
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
<ipython-input-2-04d80dec01d2> in <module>
13 arena_config=ArenaConfig("configurations/arena_configurations/cylinder_task.yml")
14 )
---> 15 run_training_aai(0, args)
~/AnimalAI/AnimalAI-Olympics/animalai_train/animalai_train/run_training_aai.py in run_training_aai(run_seed, options)
113 tc.start_learning(env_manager)
114 finally:
--> 115 env_manager.close()
116 write_timing_tree(summaries_dir, options.run_id)
117
~/AnimalAI/AnimalAI-Olympics/venv/lib/python3.6/site-packages/mlagents/trainers/subprocess_env_manager.py in close(self)
265 self.step_queue.join_thread()
266 for env_worker in self.env_workers:
--> 267 env_worker.close()
268
269 def _postprocess_steps(
~/AnimalAI/AnimalAI-Olympics/venv/lib/python3.6/site-packages/mlagents/trainers/subprocess_env_manager.py in close(self)
78 pass
79 logger.debug(f"UnityEnvWorker {self.worker_id} joining process.")
---> 80 self.process.join()
81
82
/usr/lib/python3.6/multiprocessing/process.py in join(self, timeout)
122 assert self._parent_pid == os.getpid(), 'can only join a child process'
123 assert self._popen is not None, 'can only join a started process'
--> 124 res = self._popen.wait(timeout)
125 if res is not None:
126 _children.discard(self)
/usr/lib/python3.6/multiprocessing/popen_fork.py in wait(self, timeout)
48 return None
49 # This shouldn't block if wait() returned successfully.
---> 50 return self.poll(os.WNOHANG if timeout == 0.0 else 0)
51 return self.returncode
52
/usr/lib/python3.6/multiprocessing/popen_fork.py in poll(self, flag)
26 while True:
27 try:
---> 28 pid, sts = os.waitpid(self.pid, flag)
29 except OSError as e:
30 # Child process not yet created. See #1731717
KeyboardInterrupt:
%% Cell type:markdown id: tags:
You should see the agent get the reward about 50% of the time. It's far from perfect, but it's a good start! Remeber it's a problem which is meant to be hard! You can now give a go at making your own algorithm to train agents that can solve one or more tasks in the `competition_configurations` folder!
%% Cell type:markdown id: tags:
## Using ML-Agents and AnimalAI for your algorithms
As mentionned earlier AnimalAI is built on top of ML-Agents, and we strongly recommend you have a look at the various bits and pieces you can tinker with in order to implement your own agents. This part is a brief overview of where you can find these parts at the heart of most RL algortihms. We'll start from the higher level controllers down to the basic bricks of RL algorithms. Should you wish to modify them, you'll need to clone [ml-agents repository)[https://github.com/Unity-Technologies/ml-agents].
- `animalai_train.run_training`: contains the highest level of control for training an agent. You can find all the subroutines you need in order to do so. The most import ones are:
- `animalai_train.subprocess_env_manager_aai.SubprocessEnvManagerAAI`: an evnironment manager which derives from `mlagents.trainers.subprocess_env_manager.SubprocessEnvManager` and manages several environments to run in parallel. In prcatice you shouldn't need to change this part.
- `mlagents.trainers.trainer_util.TrainerFactory`: a factory method which is in charge of creating trainer methos to manage the agents in the environment. In practice we only have a single type of agents in every all the environments, therefore there will only be one trainer to manage all the agents. **You might need to change this code** in case you add a new RL algorithm as it is designed to handle PPO and SAC only.
- `animalai_train.trainer_controller_aai.TrainerControllerAAI`: derives from `mlagents.trainers.trainer_controller.TrainerController` and is where the training loop is.
The basic elements which are most likely to be of interest to you:
- **Curriculum**: managed in `animalai_train.meta_curriculum_aai.MetaCurriculumAAI` and `animalai_train.meta_curriculum_aai.CurriculumAAI`
- **RL algo**: you can find the implementations for PPO and SAC in `mlagents.trainers.ppo.trainer` and `mlagents.trainers.sac.trainer` respectively. They both implment the base class `mlagents.trainers.trainer.trainer` which you can implement and plug directly in the overall training setup (you'll manage all the necessary model parameters in the `TrainerFactory` mentionned above)
- **Exploration**: there is a curiosity module already provided in `mlagents.trainers.components.reward_signals`
- **Buffer**: the agent's replay buffer is in `mlagents.trainers.buffer`
There are many more componnets you can find, one which is not implemented for AnimalAI but on our todo list is imitation learnign and the possibility to record a player's actions in the environmnent.
That's pretty much all there is to know, we hope you enjoy the environment!
%% Cell type:code id: tags:
``` python
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment