Commit c6c18c91 authored by Benjamin's avatar Benjamin
Browse files

clean notebooks outputs

parent 9a1f5a0e
......@@ -20,7 +20,7 @@ We ran a competition using this environment and the associated tests, more detai
The environment is built using [Unity ml-agents](https://github.com/Unity-Technologies/ml-agents/tree/master/docs) and contains an agent enclosed in a fixed sized arena. Objects can spawn in this arena, including positive
and negative rewards (green, yellow and red spheres) that the agent must obtain (or avoid). All of the hidden tests that will appear in the competition are made using the objects in the training environment.
To get started install the requirements below, and then follow the [Quick Start Guide](documentation/quickstart.md).
To get started install the requirements below, and then follow the jupyter notebook tutorials in the [examples folder](examples).
More in depth documentation can be found on the [Documentation Page](documentation/README.md).
## Development Blog
......
......@@ -11,12 +11,16 @@ Then you can start a jupyter notebook by running `jupyter notebook` from your te
## Designing arenas
For a tutorial on how to design experiments and training configurations we provide a [jupyter notebook](environment_tutorial.ipynb)
You can use `load_config_and_play.py` to visualize a `yml` configuration for an environment arena. Make sure `animalai`
is [installed](../README.md#requirements) and run `python load_config_and_play.py your_configuration_file.yml` which will open the environment in
play mode (control with W,A,S,D or the arrows), close the environment by pressing CTRL+C in the terminal.
## Animalai-train examples
You will find a training tutorial in this [jupyter notebook](training_tutorial.ipynb)
We provide two scripts which show how to use `animalai_train` to train agents:
- `train_ml_agents.py` uses ml-agents' PPO implementation (or SAC) and can run multiple environments in parralel to speed up
the training process
......
......@@ -35,7 +35,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
......@@ -56,7 +56,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 3,
"metadata": {},
"outputs": [
{
......@@ -68,7 +68,7 @@
"<IPython.core.display.HTML object>"
]
},
"execution_count": 2,
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
......
......@@ -33,17 +33,19 @@
- and many more!
%% Cell type:code id: tags:
``` python
import warnings
warnings.filterwarnings('ignore', category=DeprecationWarning)
warnings.filterwarnings('ignore', category=FutureWarning)
from mlagents.trainers.trainer_util import load_config
from animalai_train.run_options_aai import RunOptionsAAI
from animalai_train.run_training_aai import run_training_aai
import warnings
warnings.filterwarnings('ignore')
import tensorflow as tf
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
from mlagents.trainers.trainer_util import load_config;
from animalai_train.run_options_aai import RunOptionsAAI;
from animalai_train.run_training_aai import run_training_aai;
trainer_config_path = (
"configurations/training_configurations/train_ml_agents_config_ppo.yaml"
)
......@@ -72,24 +74,24 @@
_Note_: in case you don't want to wait for the model to train, you can jump ahead to the next step as we provide a pre-trained model for inference
%% Cell type:code id: tags:
``` python
import os
# logging.getLogger('tensorflow').disabled = True
logs_dir = "summaries/"
os.makedirs(logs_dir, exist_ok=True)
%load_ext tensorboard
%tensorboard --logdir {logs_dir}
run_training_aai(0, args)
```
%%%% Output: display_data
%% Cell type:markdown id: tags:
You can see the lessons increasing as the agent gets better at each level. That's pretty much it for training using the provided library. One last thing we need to do is assess how well our agent trained with only rewards and transparent walls can perform on the transparent cylinder task. To do so we can load the model and run the model in inference
%% Cell type:code id: tags:
......@@ -102,65 +104,18 @@
args = RunOptionsAAI(
trainer_config=load_config(trainer_config_path),
env_path=environment_path,
run_id=run_id,
base_port=base_port+2,
base_port=base_port+3,
load_model=True,
train_model=False,
arena_config=ArenaConfig("configurations/arena_configurations/cylinder_task.yml")
)
run_training_aai(0, args)
```
%%%% Output: error
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
<ipython-input-2-04d80dec01d2> in <module>
13 arena_config=ArenaConfig("configurations/arena_configurations/cylinder_task.yml")
14 )
---> 15 run_training_aai(0, args)
~/AnimalAI/AnimalAI-Olympics/animalai_train/animalai_train/run_training_aai.py in run_training_aai(run_seed, options)
113 tc.start_learning(env_manager)
114 finally:
--> 115 env_manager.close()
116 write_timing_tree(summaries_dir, options.run_id)
117
~/AnimalAI/AnimalAI-Olympics/venv/lib/python3.6/site-packages/mlagents/trainers/subprocess_env_manager.py in close(self)
265 self.step_queue.join_thread()
266 for env_worker in self.env_workers:
--> 267 env_worker.close()
268
269 def _postprocess_steps(
~/AnimalAI/AnimalAI-Olympics/venv/lib/python3.6/site-packages/mlagents/trainers/subprocess_env_manager.py in close(self)
78 pass
79 logger.debug(f"UnityEnvWorker {self.worker_id} joining process.")
---> 80 self.process.join()
81
82
/usr/lib/python3.6/multiprocessing/process.py in join(self, timeout)
122 assert self._parent_pid == os.getpid(), 'can only join a child process'
123 assert self._popen is not None, 'can only join a started process'
--> 124 res = self._popen.wait(timeout)
125 if res is not None:
126 _children.discard(self)
/usr/lib/python3.6/multiprocessing/popen_fork.py in wait(self, timeout)
48 return None
49 # This shouldn't block if wait() returned successfully.
---> 50 return self.poll(os.WNOHANG if timeout == 0.0 else 0)
51 return self.returncode
52
/usr/lib/python3.6/multiprocessing/popen_fork.py in poll(self, flag)
26 while True:
27 try:
---> 28 pid, sts = os.waitpid(self.pid, flag)
29 except OSError as e:
30 # Child process not yet created. See #1731717
KeyboardInterrupt:
%% Cell type:markdown id: tags:
You should see the agent get the reward about 50% of the time. It's far from perfect, but it's a good start! Remeber it's a problem which is meant to be hard! You can now give a go at making your own algorithm to train agents that can solve one or more tasks in the `competition_configurations` folder!
%% Cell type:markdown id: tags:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment