Commit 2cf5f974 authored by Benjamin's avatar Benjamin
Browse files

rewrite documentation+improve watch and play

parent 2bf48e99
......@@ -5,9 +5,5 @@ You can find here the following documentation:
- [The quickstart guide](
- [How to design configuration files](
- [How training works](
- [Add a curriculum to your training using animalai-train](
- [Add a curriculum to your training using animalai_train](
- [All the objects you can include in the arenas as well as their specifications](
- [How to submit your agent](
More will come before the competition launches.
......@@ -2,24 +2,20 @@
## TL;DR
Run `python configs/configExample.yaml` to get an understanding of how the `YAML` files configure the
From the `examples` folder, run `python configs/configExample.yaml` to get an understanding of how the `YAML` files configure the
arenas for training. You will find a list of all objects you can add to an arena as well as the values for their
parameters in [the definitions]( You will find below all the technical details to create
more complex training configurations.
## Intro
To configure training arenas you can use a simple **YAML file** and/or the **ArenaConfig structure** provided in
`animalai/envs/ArenaConfig`. This makes training quite flexible and allows for the following:
`animalai.envs.arena_config`. This makes training quite flexible and allows for the following:
- load and save configurations for reusability
- on the fly changes of configuration of one or more arenas between episodes, allowing for easy curriculum learning for example
- share configurations between participants
We provide a few custom configurations, but we expect designing good environments will be an important component of doing
well in the competition.
We describe below the structure of the configuration file for an instance of the training environment, as well as all the
parameters and the values they can take. For how to change the configuration during training see `animalai/envs/`.
We describe below the structure of the configuration files for an instance of the training environment, as well as all the
parameters and the values they can take.
## The Arenas
<p align="center">
......@@ -30,6 +26,8 @@ A single arena is as shown above, it comes with a single agent (blue sphere, bla
four walls. It is a square of size 40x40, the origin of the arena is `(0,0)`. You can provide coordinates for
objects in the range `[0,40]x[0,40]` as floats.
Note that in Unity the **y** axis is the vertical axis. In the above picture with the agent on the ground in the center of the environment its coordinates are (20, 0, 20).
For visualization you can only configure a single arena, however during training you can configure as many as you want,
each will have its local set of coordinates as described above.
......@@ -37,9 +35,7 @@ For a single arena you can provide the following parameters:
- `t` an `int`, the length of an episode which can change from one episode to the other. A value of `0` means that the episode will
not terminate until a reward has been collected (setting `t=0` and having no reward will lead to an infinite episode)
- `blackouts` [see below](#blackouts)
Note that in Unity the **y** axis is the vertical axis. In the above picture with the agent on the ground in the center of the environment its coordinates are (20, 0, 20).
<!-- TODO: show (x,y,z) referential -->
- `pass_mark` the score the agent needs to reach to pass the level
## Objects
......@@ -20,7 +20,7 @@ a meta-curriculum.
## Example
An example is provided in [the example folder](../examples/configs/curriculum). The idea of this curriculum is to train
An example is provided in [the example folder](../examples/configurations/curriculum). The idea of this curriculum is to train
an agent to navigate a maze by creating maze like structures of perpendicular walls, starting with a single wall and food,
adding one more wall at each level. Below are samples from the 6 different levels.
......@@ -74,7 +74,7 @@ except for the `configuration_files`. From the ml-agents documentation:
cumulative reward of the last `100` episodes exceeds the current threshold.
The mean reward logged to the console is dictated by the `summary_freq`
parameter in the
[trainer configuration file](../examples/configs/trainer_config.yaml).
[trainer configuration file](../examples/configurations/training_configurations/trainer_ml_agents_config.yaml).
* `signal_smoothing` (true/false) - Whether to weight the current progress
measure by previous values.
* If `true`, weighting will be 0.75 (new) 0.25 (old).
......@@ -88,7 +88,7 @@ except for the `configuration_files`. From the ml-agents documentation:
Once the folder created, training is done in the same way as before but now we pass a `MetaCurriculum` object to the
`meta_curriculum` argument of a `TrainerController`.
We provide an example using the above curriculum in [examples/](../examples/
We provide an example using the above curriculum in [examples/](../examples/
Training this agent, you can see the lessons switch using tensorboard:
# Final Submission Guidelines
As the end of the competition approaches we would like to provide guidance on the final submission process and evaluation. You will be able to submit two agents for the final testing. The first will be your best agent from the current track (this will be submitted automatically and there is nothing for you to do), and the second is an agent of your choice that you can submit to the track ‘Submission 2 (optional)’. We will run both your submissions on the final test set and your final result will be the best overall score out of these agents.
If you want to be considered for the WBA prize then you should submit a short write-up explaining how your entry is biologically inspired following the submission guidelines. Full details [here](
The deadline for submissions (to either track) and to the **WBA prize** is 23:59:59 1st November (Anywhere on Earth time). Please note that there may be heavier than usual traffic close to the deadline so we encourage you to submit early if possible.
In order to submit to the second phase please refer to [the submission page on evalAI]( Note that **there is no evaluation on this track**, you only upload a docker image that we will automatically retrieve at the time of final evaluation. **The status on the EvalAI website will show as “Running”, you can ignore this.** You can make up to 20 submissions but only the latest one will be used.
As was previously announced, the final evaluation test set is very similar to the one used throughout the competition, but contains extra tests, all minor variations of those in the current test set. This is to prevent overfitting to the exact tests on the current leaderboard.
To reiterate, the final evaluation process is the following:
- We will take your best submission (total score) from the “Competition phase” track. If several agents returned the same score equal to your best one, we only consider the last one submitted chronologically.
- We will also take your latest submission on the “Submission 2 (optional)” track, if you have submitted there, and evaluate it along with the former.
We will then take the best of the two in terms of “total” score, and use this as your final submission for both global and categories ranking.
Results and prizes will be announced once we have been able to run the final tests.
Good Luck!
# Quick Start Guide
The format of this competition may be a little different to the standard machine learning model. We do not provide a single training set that you can train on out of the box and we do not provide full information about the testing set in advance. Instead, you will need to choose for yourself what you expect to be useful configurations of our training environment in order to train an agent capable of robust food retrieval behaviour.
To facilitate working with this new paradigm we created tools you can use to easily setup and visualize your training environment.
You can find below some pointers to get started with the Animal-AI.
## Running the standalone arena
The basic environment is made of a single agent in an enclosed arena that resembles an environment that could be used for experimenting with animals. In this environment you can add objects the agents can interact with, as well as goals or rewards the agent must collect or avoid. To see what this looks like, you can run the executable environment directly. This will spawn an arena filled with randomly placed objects. Of course, this is a very messy environment to begin training on, so we provide a configuration file where you choose what to spawn (see below).
You can toggle the camera between First Person and Bird's eye view using the `C` key on your keyboard. The agent can
You can toggle the camera between first person, third person and Bird's eye view using the `C` key on your keyboard. The agent can
then be controlled using `W,A,S,D` on your keyboard. Hitting `R` or collecting certain rewards (green or red) will reset the arena.
## Running a specific configuration file
Once you are familiarized with the environment and its physics, you can start building and visualizing your own. Assuming you followed the [installation instruction](../, go to the `examples/` folder and run
`python configs/exampleConfig.yaml`. This loads the `configs/exampleConfig.yaml` configuration for the
arena and lets you play as the agent.
`python`. This loads a random configuration from the competition for the arena and lets you play as the agent.
Have a look at the [configuration file](../examples/configs/exampleConfig.yaml) which specifies the objects to place. You can select
Have a look at the [configuration files](../competition_configurations) which specifies the objects to place. You can select
objects, their size, location, rotation and color, randomizing any of these parameters as you like. For more details on the configuration options and syntax please read the relevant documentation:
- The [configuration file documentation page]( which explains how to write the configuration files.
- The [definitions of objects page]( which contains a detailed list of all the objects and their
......@@ -29,16 +26,8 @@ objects, their size, location, rotation and color, randomizing any of these para
Once you're happy with your arena configurations you can start training your agent. The `animalai` package includes several features to help with this:
- It is possible to **change the environment configuration between episodes** (allowing for techniques such as curriculum learning).
- You can **choose the length of length of each episode** as part of the configuration files, even having infinite episodes.
- You can **choose the length of each episode** as part of the configuration files, even having infinite episodes.
- You can **have several arenas in a single environment instance**, each with an agent you control independently from the other, and each with its own configuration allowing for collecting observations faster.
We provide examples of training using the `animalai-train` package, you can of course start from scratch and submit agents that do not rely on this library. To understand how training an `animalai` environment we provide scripts in the
`examples/` folder:
- `` uses the `dopamine` implementation of Rainbow to train a single agent using the gym interface. This
is a good starting point if you want to try another training algorithm that works as a plug-and-play with Gym. **Note that using the gym interface only allows for training with a single arena and agent in the environment at a time.** We do offer to train with several agents in a gym environment but this will require modifying your code to accept more than one observation at a time.
- `` uses the `ml-agents` implementation of PPO to train one or more agents at a time, using the
`UnityEnvironment`. This is a great starting point if you don't mind reading some code as it directly allows to use the
functionalities described above, out of the box.
You can find more details about this in the [training documentation](
[examples folder](../examples) which contains various training examples and a readme file to explain how these work.
\ No newline at end of file
# Submission
In order to participate in the competition you will need to upload a [docker image](
containing your trained agent that interfaces with the `animalai` library. We detail the steps for participating
## Python agent and associated data
Submissions need to implement the [agent script provided](
This script must implement the methods present in the base script and keep the same class name. The methods are:
- `__init__`: this will only be called once when the agent is loaded first. It can contain loading of the model and other
related parameters.
- `reset(t)`: will be called each time the arena resets. At test time the length of episodes will vary across the 300
experiments we will run, therefore we provide the agent with the length of the episode to come. Test lengths are either 250, 500, or 1000.
- `step(obs, reward, done, info)`: the method that is called each time the agent has to take a step. The arguments
are the ones returned by the Gym environment `AnimalAIEnv` from `animalai.envs.environment`. If you wish to directly
work on the ML Agents `BrainInfo` you can access them via `info['brain_info']`
~~**NEW (v1.0.4)**: you can now select the resolution of the observation your agent takes as input, this argument will be passed to the environment directly (must be between 4 and 256)~~ (this option was removed, for evaluation inputs are 84x84, [see discussion](
Make sure any data loaded in the docker image is referred to using **absolute paths** in the container or the form `/aaio/data/...` (see below). An example that you can modify is provided [here](
## Create an EvalAI account and add submission details
The competition is kindly hosted by EvalAI. Head over to [their website](, create an account, and enroll your team in the AnimalAI challenge. To be able to submit and be eligible for prizes you will also need to register your personal details using [this form](
**Any question related solely to the submission process on EvalAI should be posted to the** [EvalAI forum](
## Docker
Docker offers a containerized platform for running applications in a closed environment. You can install all the libraries your agent will require and we will use this to run the tests as they would run on your local machine. The hardware we're using to run the tests is an AWS [p2.xlarge instance](
Take the time to read the [Docker documentation]( and follow the install process.
### Adding CUDA capabilities to Docker (optional)
As part of the evaluation we offer GPU compute on an AWS
[p2.xlarge instance]( These compute instances will run an Amazon
[Deep Learning Base AMI]( with several CUDA libraries installed.
The native docker engine does not provide a pass-through to these libraries, rendering any use of GPU capable libraries (such as `tensorflow-gpu`) impossible. To overcome this issue, NVIDIA provides a specific version of docker. We can recommend [this tutorial]( for installing this version. Note we cannot provide help with installing these.
## Creating the docker image for submission
Once you have docker up and running, you can start building your submission. Head over to `examples/sumission` and have a look at the `Dockerfile`. This script installs all the requirements for the environment, we do not recommend editing anything outside of the commented block saying `YOUR COMMANDS GO HERE`.
If your submission only requires the `animalai-train` library to run, you can use `Dockerfile` without any modification. While in `examples/submission` run:
docker build --tag=submission .
You can give your docker image the name you want, it does not have to be `submission`. Note that the Dockerfile creates two
folders `/aaio` and `/aaio/data` at the root of the image, and copies the `` file and `data` folder from your local machine into the image. Your submission must keep this architecture. References to these folders in
your code **should use absolute paths** (see the example agent provided in `examples/submission`).
## Test your docker image
As uploading and evaluating images takes a while, and you are only allowed a maximum of one submission per day, it is recommended to ensure your docker runs properly before submitting. If there is a failure during testing **you will only have access to abridged outputs** which may not be enough to debug on your own. If you cannot find a solution using the provided submission testing volume you will need to raise a question on [the forum]( and we will investigate for you (which might take time).
Bottom line: be sure to test your submission prior to uploading!
First, copy the AnimalAI **linux** environment (and AnimalAI_Data folder) to `examples/submission/test_submission/env`.
Next, you need to run the image by mounting the `test_submission` folder and its content as a volume, and execute the `` script. To do so, from the `submission` folder, run:
docker run -v "$PWD"/test_submission:/aaio/test submission python /aaio/test/
If your image and agent are set properly, you should not get any error, and the script should output the rewards for 5 simple tests and conclude with `SUCCESS`
## Submit your docker image
You can now submit your image to EvalAI for evaluation as explained on the [EvalAI submission page](
**Note**: the phase name to use when pushing is: `animalai-main-396`. To push your image use `evalai push <image>:<tag> --phase animalai-main-396` (details are at the bottom of the EvalAI page linked above).
## Docker image evaluation and results
On the EvalAI page you will see that the number of valid submissions is limited to one a day. A submission is valid if it fulfils the following requirements:
- it does not crash at any point before the first two experiments are complete (this includes loading the agent, resetting it, and completing the two experiments)
- loading the agent takes less than 5 minutes
- running the first two experiments takes less than 10 minutes
If your submission meets these requirements it will be flagged as valid and you will not be able to submit again until the following day.
Completing the experiments cannot take longer than 80 minutes in total. If your submission goes over the time limit it will stop and you will score for any experiments that were completed.
Example scenarios:
- FAIL: agent loads in 2 minutes, crashes during test number 2
- FAIL: agent loads in 1 minute, takes more than 10 minutes to complete tests 1 and 2
- SUCCESS: your agent loads in 3 minutes, takes 30 seconds for test 1, takes 1 minute for test two, it therefore has 78.5 minutes to complete the remaining 298 experiments
- SUCCESS: agent loads in 4 minutes, completes test 1 and 2 in 1 minute, uses all the 79 minutes remaining to complete only 100 tests, you will get results based on the 102 experiments ran
# Training
## Overview
The `animalai` packages offers two kind of interfaces to use for training: a gym environment and an ml-agents one. We
also provide the `animalai-train` package to showcase how training and submissions work. This can serve as a starting
point for your own code, however, you are not required to use this package at all for submissions.
If you are not familiar with these algorithms, have a look at
[ML-Agents' PPO](, as well as
[dopamine's Rainbow](
## Observations and actions
Before looking at the environment itself, we define here the actions the agent can take and the observations it collects:
- **Actions**: the agent can move forward/backward and rotate left/right, just like in play mode. The
actions are discrete and of dimension `2`, each component can take values `0`,`1` or `2` (`(0: nothing, 1: forward, 2:
backward)` and `(0: nothing, 1: right, 2: left)`).
- **Observations** are made of two components: visual observations which are pixel based and of dimension `84x84x3`, as
well as the speed of the agent which is continuous of dimension `3` (speed along axes `(x,y,z)` in this order). Of course, you may want to process and/or scale down the input before use with your approach.
- **Rewards**: in case of an episode of finite length `T`, each step carries a small negative reward `-1/T`. In case of
an episode with no time limit (`T=0`), no reward is returned for each step. Other rewards come from the rewards objects
(see details [here](
## The Unity Environment
Much like a gym environment, you can create a `UnityEnvironment` that manages all communications with
the environment. You will first need to instantiate the environment, you can then reset it, take steps and collect
observations. All the codebase for this is in `animalai/envs/`. Below is a quick description of these
We provide an example of training using `UnityEnvironment` in `examples/`.
### Instantiation
For example, you can call::
from animalai.envs import UnityEnvironment
env= UnityEnvironment(
file_name='env/AnimalAI', # Path to the environment
worker_id=1, # Unique ID for running the environment (used for connection)
seed=0, # The random seed
docker_training=False, # Whether or not you are training inside a docker
no_graphics=False, # Always set to False
n_arenas=4, # Number of arenas in your environment
play=False, # Set to False for training
inference=False, # Set to true to watch your agent in action
resolution=None # Int: resolution of the agent's square camera (in [4,512], default 84)
Note that the path to the executable file should be stripped of its extension. The `no_graphics` parameter should always
be set to `False` as it impacts the collection of visual observations, which we rely on.
### Reset
We have modified this functionality compared to the mlagents codebase. Here we add the possibility to pass a new
`ArenaConfiguration` as an argument to reset the environment. The environment will use the new configuration for this
reset, as well as all the following ones until a new configuration is passed. The syntax is:
env.reset(arenas_configurations=arena_config, # A new ArenaConfig to use for reset, leave empty to use the last one provided
train_mode=True # True for training
**Note**: as mentioned above, the environment will change the configuration of an arena if it receives a new `ArenaConfig`
from a `env.reset()` call. Therefore, should you have created several arenas and want to only change one (or more) arena
configuration(s), you can do so by only providing an `ArenaConfig` that contains configuration(s) for the associated arena(s).
For example, if you only want to modify arena number 3, you could create an `ArenaConfig` from the following `YAML`:
3: !Arena
t: 0
- !Item
### Step
Taking a step returns a data structure named `BrainInfo` which is defined in `animalai/envs/brain` and basically contains all the information returned by the environment including the observations. For example:
info = env.step(vector_action=take_action_vector)
This line will return all the data needed for training, in our case where `n_arenas=4` you will get:
brain = info['Learner']
brain.visual_observations # list of 4 pixel observations, each of size (84x84x3)
brain.vector_observation # list of 4 speeds, each of size 3
brain.reward # list of 4 float rewards
brain.local_done # list of 4 booleans to flag if each agent is done or not
You can pass more parameters to the environment depending on what you need for training. To learn about this and the
format of the `BrainInfo`, see the [official mal-agents' documentation](
### Close
Don't forget to close the environment once training is done so that all communications are terminated properly and ports
are not left open (which can prevent future connections).
## Gym wrapper
We also provide a gym wrapper to implement the OpenAI interface in order to directly plug baselines and start training.
One limitation of this implementation is the use of a single agent per environment. This will let you collect less
observations per episode and therefore make training slower. A later release might fix this and allow for multiple agents.
We provide an example of training using gym in `examples/`.
## Notes
Some important points to note for training:
- Instantiating an environment will open a window showing the environment from above. The size of this window will
influence the speed of training: the smaller the window, the faster the simulation and therefore the training. You can
resize the window during training but we advise to keep it small as much as possible.
Coming soon as a jupyter notebook
\ No newline at end of file
Detail the various files in this folder
- load_and_play:
- train_ml_agents
- ...
\ No newline at end of file
trainer: ppo
epsilon: 0.2
lambd: 0.95
learning_rate: 3.0e-4
learning_rate_schedule: linear
memory_size: 128
normalize: false
sequence_length: 64
summary_freq: 10000
use_recurrent: false
vis_encode_type: simple
time_horizon: 128
batch_size: 64
buffer_size: 2024
hidden_units: 256
num_layers: 1
beta: 1.0e-2
max_steps: 1500
num_epoch: 0
strength: 1.0
gamma: 0.99
strength: 0.01
gamma: 0.99
encoding_size: 256
\ No newline at end of file
import sys
import random
import os
from animalai.envs.arena_config import ArenaConfig
from animalai.envs.environment import AnimalAIEnvironment
env_path = "env/AnimalAI"
port = 5005
configuration_file = "configurations/arena_configurations/hard/3-28-1.yml"
configuration = ArenaConfig(configuration_file)
environment = AnimalAIEnvironment(
while environment.proc1:
except KeyboardInterrupt:
def load_config_and_play(configuration_file: str) -> None:
Loads a configuration file for a single arena and lets you play manually
:param configuration_file: str path to the yaml configuration
:return: None
env_path = "env/AnimalAI"
port = 5005 + random.randint(0,100) # use a random port to allow relaunching the script rapidly
configuration = ArenaConfig(configuration_file)
environment = AnimalAIEnvironment(
while environment.proc1:
except KeyboardInterrupt:
if __name__ == '__main__':
if len(sys.argv) > 1:
configuration_file = sys.argv[1]
competition_folder = '../competition_configurations/'
configuration_files = os.listdir(competition_folder)
configuration_random = random.randint(0, len(configuration_files))
configuration_file = competition_folder + configuration_files[configuration_random]
from mlagents.trainers.trainer_util import load_config
from animalai.envs.arena_config import ArenaConfig
from animalai_train.run_options_aai import RunOptionsAAI
from animalai_train.run_training_aai import run_training_aai
# TODO: add SAC trainer or maybe add directly to
# trainer_config_path = "configurations/training_configurations/train_ml_agents_config.yaml"
# environment_path = "env/AnimalAI"
# arena_config_path = "configurations/arena_configurations/train_ml_agents_arenas.yml"
# run_id = "train_ml_agents"
# base_port = 5005
# number_of_environments = 4
# number_of_arenas_per_environment = 8
# args = RunOptionsAAI(
# trainer_config=load_config(trainer_config_path),
# env_path=environment_path,
# run_id=run_id,
# base_port=base_port,
# num_envs=number_of_environments,
# arena_config=ArenaConfig(arena_config_path),
# n_arenas_per_env=number_of_arenas_per_environment,
# )
# run_training_aai(0, args)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment