Commit 4b17ea66 authored by Benjamin's avatar Benjamin
Browse files

trainin jupyter notebook

parent f8fccea4
......@@ -5,6 +5,8 @@ examples/env/*
models/
summaries/
logs/
examples/models/*
!examples/models/self_control_curriculum_pre_trained
# Environemnt logfile
*Project.log
......
......@@ -2,7 +2,7 @@ from setuptools import setup, find_packages
setup(
name="animalai",
version="2.0.0b0",
version="2.0.0b3",
description="Animal AI envronment Python API",
url="https://github.com/beyretb/AnimalAI-Olympics",
author="Benjamin Beyret",
......
......@@ -62,13 +62,22 @@ def run_training_aai(run_seed: int, options: RunOptionsAAI) -> None:
options.arena_config,
options.resolution,
)
engine_config = EngineConfig(
options.width,
options.height,
AnimalAIEnvironment.QUALITY_LEVEL.train,
AnimalAIEnvironment.TIMESCALE.train,
AnimalAIEnvironment.TARGET_FRAME_RATE.train,
)
if options.train_model:
engine_config = EngineConfig(
options.width,
options.height,
AnimalAIEnvironment.QUALITY_LEVEL.train,
AnimalAIEnvironment.TIMESCALE.train,
AnimalAIEnvironment.TARGET_FRAME_RATE.train,
)
else:
engine_config = EngineConfig(
AnimalAIEnvironment.WINDOW_WIDTH.play,
AnimalAIEnvironment.WINDOW_HEIGHT.play,
AnimalAIEnvironment.QUALITY_LEVEL.play,
AnimalAIEnvironment.TIMESCALE.play,
AnimalAIEnvironment.TARGET_FRAME_RATE.play,
)
env_manager = SubprocessEnvManagerAAI(
env_factory, engine_config, options.num_envs
)
......
......@@ -2,7 +2,7 @@ from setuptools import setup, find_packages
setup(
name="animalai_train",
version="2.0.0b0",
version="2.0.0b3",
description="Animal AI training library",
url="https://github.com/beyretb/AnimalAI-Olympics",
author="Benjamin Beyret",
......@@ -16,6 +16,6 @@ setup(
],
packages=find_packages(exclude=["*.tests", "*.tests.*", "tests.*", "tests"]),
zip_safe=False,
install_requires=["animalai==2.0.0b0", "mlagents==0.15.0"],
install_requires=["animalai==2.0.0b3", "mlagents==0.15.0"],
python_requires=">=3.6.1",
)
TODO
# Examples
Detail the various files in this folder
- load_and_play:
- train_ml_agents
- ...
\ No newline at end of file
## Notebooks
To run the notebooks you can simply install the requirements by running (we recommend using a virtual environment):
```
pip install -r requirements.txt
```
Then you can start a jupyter notebook by running `jupyter notebook` from your terminal.
## Designing arenas
You can use `load_config_and_play.py` to visualize a `yml` configuration for an environment arena. Make sure `animalai`
is [installed](../README.md#requirements) and run `python load_config_and_play.py your_configuration_file.yml` which will open the environment in
play mode (control with W,A,S,D or the arrows), close the environment by pressing CTRL+C in the terminal.
## Animalai-train examples
We provide two scripts which show how to use `animalai_train` to train agents:
- `train_ml_agents.py` uses ml-agents' PPO implementation (or SAC) and can run multiple environments in parralel to speed up
the training process
- `train_curriculum.py` shows how you can add a curriculum to your training loop
To run either of these make sure you have `animalai-train` [installed](../README.md#requirements).
## OpenAI Gym and Baselines
You can use the OpenAI Gym interface to train using Baselines or other similar libraries (including
[Dopamine](https://github.com/google/dopamine) and [Stable Baselines](https://github.com/hill-a/stable-baselines)). To
do so you'll need to install:
On Linux:
```
sudo apt-get update && sudo apt-get install cmake libopenmpi-dev python3-dev zlib1g-dev &&
pip install tensorflow==1.14 &&
pip install git+https://github.com/openai/baselines.git@master#egg=baselines-0.1.6
```
On Mac: TODO
You can then run `train_baselines_dqn.py` or `train_baselines_ppo2.py` for examples.
\ No newline at end of file
!ArenaConfig
arenas:
0: !Arena
pass_mark: 0
t: 250
items:
- !Item
name: CylinderTunnelTransparent
positions:
- !Vector3 {x: 20, y: 0, z: 20}
rotations: [90]
sizes:
- !Vector3 {x: 10, y: 10, z: 10}
- !Item
name: GoodGoal
positions:
- !Vector3 {x: 20, y: 0, z: 20}
sizes:
- !Vector3 {x: 1, y: 1, z: 1}
- !Item
name: Agent
positions:
- !Vector3 {x: 20, y: 0, z: 1}
rotations: [180]
\ No newline at end of file
!ArenaConfig
arenas:
0: !Arena
-1: !Arena
pass_mark: 0
t: 250
items:
......
!ArenaConfig
arenas:
0: !Arena
-1: !Arena
pass_mark: 0
t: 250
items:
......
!ArenaConfig
arenas:
0: !Arena
-1: !Arena
pass_mark: 0
t: 250
items:
......
!ArenaConfig
arenas:
0: !Arena
-1: !Arena
pass_mark: 0
t: 250
items:
......
!ArenaConfig
arenas:
0: !Arena
-1: !Arena
pass_mark: 0
t: 250
items:
......
!ArenaConfig
arenas:
0: !Arena
-1: !Arena
pass_mark: 0
t: 250
items:
......
......@@ -4,8 +4,8 @@
0.8,
0.8,
0.8,
0.8,
0.8
0.6,
0.2
],
"min_lesson_length": 100,
"signal_smoothing": true,
......
%% Cell type:markdown id: tags:
# Animal-AI Environment tutorial
This tutorial is a step-by-step presentation of the new version of the Animal-AI library. The new Animal-AI environment is quite similar to the version used for the competition, however the `animalai` and `animalai_train` APIs have been dramatically improved, reflecting the great improvements made by [Unity ml-agents](https://github.com/Unity-Technologies/ml-agents).
In this notebook, **we present the environment and how to design both your training and testing setups**. In the [second notebook (training)](./training_tutorial.pynb) we'll show you how to train an agent to solve a task it has never seen before.
%% Cell type:markdown id: tags:
## Introducing animal cognition to the AI world
Our goal is to provide a tool for researchers to go beyond classical RL environments, allowing you to develop agents that possess cogintive skills similar to animal's. The main idea is to be able to test and/or train your agents on **experiments taken or inspired from real life animal experiments**. This repository holds 900 such experiments which cover a dozen cognitive skills. You can find more details on the test-bed on [our website](http://animalaiolympics.com/AAI/testbed)
The environment is a simple arena with an agent that can only move left, right, forward and backward, aiming to collect positive reward and avoid negative ones. It can also hold [several objects](../documentation/definitionsOfObjects.md) which can be used to set up experiments. You can really put yourself in the shoes of an animal cognition scientist building experiments with whatever you can find in a lab.
From the agent's perspective, a classical experiment called a Y-maze looks like this(the agent must explore a simple Y-shaped maze to find a reward, often food):
<img src="documentation/notebook_data/y_maze.png" width="40%">
The agent is on an elevated platform (blue), needs to move towards the reward (green ball) and avoid going to the right in which case the agent would be stuck (the platform is too high for the agent to climb back on).
From an RL perspective this might seem like a trivial problem to solve! In a classical RL setup where you train and test on the same problem it is indeed simple. However, when tested on a similar task, an animal would encounter this problem for the first time. And this is what we encourage you to do as well: **create your own training curriculum, and use our experiments as a test set your agent has never seen before**. We believe this is needed to truly test an agent's capacity to acquire cognitive skills.
But enough chit-chat, let's dive right in with an example!
%% Cell type:code id: tags:
``` python
from IPython.display import HTML
```
%% Cell type:markdown id: tags:
## Can your agent self control?
## Can your agent self control? - Part I
Self control is hard, we've all been there (looking at you chocolate bar). But don't worry, this is something a lot of species struggle with. In [The evolution of self-control](https://www.pnas.org/content/111/20/E2140) MacLean et al. tested this ability in **36 different species**! In a very simple experiment, animals are offered food they can reach easily by reaching out to it. But then, they're shown the same food behind a transparent wall, they need to go around the wall to grab the food. They can see the food just as well, but they need to refrain from reaching out like before.
Below are videos of such animals, as well as two participants' submissions to our compeition, exhibiting similar behaviors (remember, these agents never encoutered this task during training):
%% Cell type:code id: tags:
``` python
HTML('<div><div><video width="24%" playsinline="" autoplay="" muted="" loop=""><source src="notebook_data/animal-cyl-pass.mp4" type="video/mp4"></video><video width="24%" " playsinline="" autoplay="" muted="" loop=""><source src="notebook_data/agent-cyl-pass.mp4" type="video/mp4"></video><video width="24%" playsinline="" autoplay="" muted="" loop=""><source src="notebook_data/animal-cyl-fail.mp4" type="video/mp4"></video><video width="24%" playsinline="" autoplay="" muted="" loop=""><source src="notebook_data/agent-cyl-fail.mp4" type="video/mp4"></video></div>')
```
%%%% Output: execute_result
<IPython.core.display.HTML object>
%% Cell type:markdown id: tags:
In the following sections we'll design a training curriculum which does not include the exact "reward in a transparent cylinder" task, but which we can use to train an agent that can solve this same task. In the [next tutorial](./training_tutorial.pynb), we'll train such an agent using this curriculum.
%% Cell type:markdown id: tags:
## Let's get started: experiments design
First things first, as rigorous researchers, we want to design a good training environment. To do so, we provide a [list of items](../documentation/definitionsOfObjects.md) you can include in your arena, you can have a look at the details later, this section highlights the basics.
To begin with let's train an agent to collect food right in front of it, as simple as that! To do so, you'll need to design a `yaml` file which describes the experiment setup. It contains:
- experiment parameters (maximum steps, steps at which the light is turned on/off)
- a list of objects
- their specifications (positions, rotations, sizes, colors) which are randomized if not provided
Below is the simplest example possible:
%% Cell type:code id: tags:
``` python
with open('configurations/curriculum/0.yml') as f:
print(f.read())
```
%%%% Output: stream
!ArenaConfig
arenas:
0: !Arena
-1: !Arena
pass_mark: 0
t: 250
items:
- !Item
name: Agent
positions:
- !Vector3 {x: 20, y: 0, z: 20}
rotations: [0]
- !Item
name: GoodGoal
positions:
- !Vector3 {x: 20, y: 0, z: 22}
sizes:
- !Vector3 {x: 1, y: 1, z: 1}
%% Cell type:markdown id: tags:
This file contains the configuration of one arena (`!Arena`), with only the agent on the ground (`y=0`) in the center (`x=20`,`z=20`) and a `GoodGoal` (green sphere) of size 1 in front of it (`x=20`,`z=22`). Pretty simple right!
One _little trick_ we used here: one environment can contain several arenas during training, each with its own configuration. This allows your training algorithm to collect more observations at once. You can just place the configurations one after the others like this:
```
!ArenaConfig
arenas:
0: !Arena
......
1: !Arena
......
2: !Arena
......
```
But if you want all the arenas in the environment to have the same configuration then do as we did above: define one configuration only with key `-1`
You can now use this to load an environment and play yourself ([this script does that for you](./load_config_and_play.py)). Make sure you have followed the [installation guide](../README.md#requirements) and then create an `AnimalAIEnvironment` in play mode:
%% Cell type:code id: tags:
``` python
from animalai.envs.arena_config import ArenaConfig
from animalai.envs.environment import AnimalAIEnvironment
from mlagents_envs.exception import UnityCommunicationException
try:
environment = AnimalAIEnvironment(
file_name='env/AnimalAI',
base_port=5005,
arenas_configurations=ArenaConfig('configurations/curriculum/0.yml'),
play=True,
)
except UnityCommunicationException:
# you'll end up here if you close the environment window directly
# always try to close it from script
environment.close()
```
%% Cell type:markdown id: tags:
Press **C** to change the viewpoint (bird's eye, first person, third person), and move with **W,A,S,D** or the **arrows** on your keyboard. Once you're done, let's close this environment:
%% Cell type:code id: tags:
``` python
if environment:
environment.close() # takes a few seconds
```
%% Cell type:markdown id: tags:
## Creating a curriculum
Such a training set is not going to get us very far... The food is right before the agent, it won't even learn any sort of exploration, not even turning around to see if the food is behing it. [Curriculum learning](https://lilianweng.github.io/lil-log/2020/01/29/curriculum-for-reinforcement-learning.html) uses a set of training configurations of increasing difficulty in order to learn a complex task. Think "stand up before you walk, walk before you run" type of learning.
To solve our problem, we might want to have the following consecutive learning steps:
1. food right in from of the agent (example above)
2. food in front of us further away
3. food at the same distance as 2, but randomize the agent's rotation (might be behind the agent)
4. agent and food randomly placed and rotated each on a fixed z-axis, and small transparent wall in between the two
5. same as 4 with bigger and bigger walls
To design a curriculum, we need to place all the yaml files in a folder along with a json configuration file which contains the details of when to switch from one level to the next. The above curriculum can be found in [this folder](./examples/configurations/curriculum).
The second configuration is just like the first but with say `z: 35` for `GoodGoal`. The third one only requires to randomize the rotation, to do so we can replace the `rotations: [0]` with `rotations: [-1]` as any parameter with a value of `-1` is randomized. Otherwise you can just remove the `rotations` line altogether, and the rotation will be randomized automatically (also works with positions, sizes and colors).
Putting all of the above together, we can have a look at step 4:
%% Cell type:code id: tags:
``` python
configuration = 'configurations/curriculum/3.yml'
with open(configuration) as f:
print(f.read())
try:
environment = AnimalAIEnvironment(
file_name='env/AnimalAI',
base_port=5005,
arenas_configurations=ArenaConfig(configuration),
play=True,
)
except UnityCommunicationException:
# you'll end up here if you close the environment window directly
# always try to close it from script
environment.close()
```
%%%% Output: stream
!ArenaConfig
arenas:
0: !Arena
pass_mark: 0
t: 250
items:
- !Item
name: Agent
positions:
- !Vector3 {x: -1, y: 0, z: 5}
- !Item
name: GoodGoal
positions:
- !Vector3 {x: -1, y: 0, z: 35}
sizes:
- !Vector3 {x: 1, y: 1, z: 1}
- !Item
name: WallTransparent
positions:
- !Vector3 {x: -1, y: 0, z: 20}
sizes:
- !Vector3 {x: 10, y: 5, z: 1}
rotations: [0]
%% Cell type:markdown id: tags:
Play a few runs with this configuration, you'll see the various items randomly appearing along a given axis. You can **press R** to reset the environment. Once you're done, close the environment:
%% Cell type:code id: tags:
``` python
if environment:
environment.close() # takes a few seconds
```
%% Cell type:markdown id: tags:
Finally, the file `AnimalAI.json` holds the parameters for the process (the name of this file **must remain AnimalAI.json**), which looks like this:
%% Cell type:code id: tags:
``` python
with open('configurations/curriculum/AnimalAI.json') as f:
print(f.read())
```
%%%% Output: stream
{
"measure": "reward",
"thresholds": [
0.8,
0.8,
0.8,
0.8,
0.8
],
"min_lesson_length": 100,
"signal_smoothing": true,
"configuration_files": [
"0.yml",
"1.yml",
"2.yml",
"3.yml",
"4.yml",
"5.yml"
]
}
%% Cell type:markdown id: tags:
This tells you that we'll switch from one level to the next once the reward per episod is above 0.8. Easy right?
In the next notebook we'll use the above curriculum example to train an agent that can solve the tube task we saw in the videos earlier. Before that, it is worth looking at an extra feature of the environment (blackouts) and use that to interact with the environment from python rather than playing manually.
%% Cell type:markdown id: tags:
## Interacting with the environment + bonus light switch!
In this final part, we look at the API for interacting with the environment. Namely, we want to take steps, collect observations and rewards. For this part we'll load an environment which tests for a cognitive skill called **objects permanence**. It tests the capacity of an agent to understand that an object still exist even if it moved out of sight, think of a car turning a corner, we all know the car hasn't vanished from existence. This test introduces another feature of the environment: **the light switch** which allows to switch the light in the environment on and off. Let's have a look at the experiment:
%% Cell type:code id: tags:
``` python
light_switch_conf = 'configurations/arena_configurations/light_switch.yml'
try:
environment = AnimalAIEnvironment(
file_name='env/AnimalAI',
base_port=5005,
arenas_configurations=ArenaConfig(light_switch_conf),
play=True,
)
except UnityCommunicationException:
# you'll end up here if you close the environment window directly
# always try to close it from script
environment.close()
```
%% Cell type:markdown id: tags:
Change the point of view by pressing **C** and then reset by pressing **R**. As you can see the light switches off just before you can see the reward disappear, but I guess you figured where it had gone right?
To do this you just need to add the `blackouts` parameter to your configuration file. It's a list of frames at wich the light switches on and off and on and off and so on. Below we switch off at frame 20 and back on at 20. You can **also provide a negative number** such as `blackouts: [-20]` to switch on/off every 20 frames.
%% Cell type:code id: tags:
``` python
if environment:
environment.close() # takes a few seconds
with open(light_switch_conf) as f:
print(f.read())
```
%%%% Output: stream
!ArenaConfig
arenas:
0: !Arena
pass_mark: 0
t: 500
blackouts: [20,50]
items:
- !Item
name: GoodGoalBounce
positions:
- !Vector3 {x: 30, y: 0, z: 30}
rotations: [270]
sizes:
- !Vector3 {x: 1, y: 1, z: 1}
- !Item
name: Wall
positions:
- !Vector3 {x: 7.5, y: 0, z: 25}
rotations: [90]
sizes:
- !Vector3 {x: 1, y: 3, z: 15}
colors:
- !RGB {r: 153, g: 153, b: 153}
- !Item
name: Ramp
positions:
- !Vector3 {x: 10, y: 0, z: 30}
rotations: [90]
sizes:
- !Vector3 {x: 1, y: 0.2, z: 1}
colors:
- !RGB {r: 153, g: 153, b: 153}
- !Item
name: Agent
positions:
- !Vector3 {x: 20, y: 0, z: 5}
rotations: [0]
%% Cell type:markdown id: tags:
To finish this tutorial on the environment and associated API, we look at how we can interact from Python. To do so we'll launch the environment without the play mode, this will allow a communicator between Python and Unity to exchange actions and observations. We will also set the camera resolution for our agent's observations:
%% Cell type:code id: tags:
``` python
resolution=256
try:
environment = AnimalAIEnvironment(
file_name='env/AnimalAI',
base_port=5006,
resolution=resolution,
)
except UnityCommunicationException:
# you'll end up here if you close the environment window directly
# always try to close it from script
environment.close()
```
%% Cell type:markdown id: tags:
We can first retrieve the various caracteristics of the environment:
%% Cell type:code id: tags:
``` python
agent_groups = environment.get_agent_groups()
agent_group_spec = environment.get_agent_group_spec(agent_groups[0])
print(f'Here we only have {len(agent_groups)+1} agent group: {agent_groups[0]}')
print(f'''\nAnd you can get their caracteristics: \n
visual observations shape: {agent_group_spec.observation_shapes[0]}
vector observations shape (velocity): {agent_group_spec.observation_shapes[1]}
actions are discrete: {agent_group_spec.action_type}
actions have shape: {agent_group_spec.action_shape}
''')
```
%%%% Output: stream
Here we only have 2 agent group: AnimalAI?team=0
And you can get their caracteristics:
visual observations shape: (256, 256, 3)
vector observations shape (velocity): (3,)
actions are discrete: ActionType.DISCRETE
actions have shape: (3, 3)
%% Cell type:markdown id: tags:
As you can notice we did not pass an arena configuration file. You can actually pass one to the environment at any point during training when you reset the environment. Let's do that now:
%% Cell type:code id: tags:
``` python
environment.reset(arenas_configurations=ArenaConfig(light_switch_conf))
```
%% Cell type:markdown id: tags:
Finally, we can now take actions and collect observations and rewards! This is done in two steps as opposed to the classical Gym setup Unity allows agent to request actions when they need only, but that's not very relevent for us.
%% Cell type:code id: tags:
``` python
import numpy as np
actions = [[0,0]]*50 # Do nothing until the lights come back on
actions += [[1,0]]*40 # Go forward
actions += [[0,2]]*15 # turn left
actions += [[1,0]]*50 # go forwar