training_tutorial.ipynb 9.77 KB
Newer Older
1
2
3
4
5
6
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
Benjamin's avatar
Benjamin committed
7
    "# Training\n",
8
    "\n",
Benjamin's avatar
Benjamin committed
9
    "The API provides both a [Unity ml-agents](https://github.com/Unity-Technologies/ml-agents) and an [OpenAI Gym](https://github.com/openai/gym) interfaces. We include training examples for both in [the examples folder](./examples), the former uses the ml-agents' own training library which is optimised for the environment, the latter uses [OpenAI baselines](https://github.com/openai/baselines).\n",
10
    "\n",
Benjamin's avatar
Benjamin committed
11
12
13
14
15
    "\n",
    "In this notebook we show you how to run the `animal-ai` trainers which are optimized to train on the AnimalAI environment. It's a powerfull modular library you can tinked with in order to implement your own algorithm. We strongly recommend you have a look at its various parts described at the end of this tutorial should you wish to make some modifications.\n",
    "\n",
    "## Can your agent self control? - Part II\n",
    "\n",
Benjamin's avatar
Benjamin committed
16
    "If you haven't done so already, go throught the environement tutorial where we decribe the problem of self-control in animals. We created a curriculum which includes increasingly difficult levels for the agent to retrieve food, while introducing items similar to those in the final experiment, without ever having the exact experiment in the training curriculum.\n",
Benjamin's avatar
Benjamin committed
17
18
19
20
21
22
23
24
    "\n",
    "We created the curriculum in the previous notebook, now we need to configure the training environemnt. The `animalai-train` library provides all the tool you need to train PPO or SAC. We'll use the former here.\n",
    "\n",
    "First, we need to set all the hyperparameters of our model, this is done by creating a yaml file as follows:"
   ]
  },
  {
   "cell_type": "code",
Benjamin's avatar
Benjamin committed
25
   "execution_count": null,
Benjamin's avatar
Benjamin committed
26
   "metadata": {},
Benjamin's avatar
Benjamin committed
27
   "outputs": [],
Benjamin's avatar
Benjamin committed
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
   "source": [
    "with open(\"configurations/training_configurations/train_ml_agents_config_ppo.yaml\") as f:\n",
    "    print(f.read())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you're already familiar with RL algorithms in general, these should be quite explainatory. Either way, you can have a look at [this page](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-Configuration-File.md) for both PPO and SAC configuration files details.\n",
    "\n",
    "You then need to configure the trainer which is just a named tuple that defines parameters such as:\n",
    "- the paths to the environment and your configuration file (above) \n",
    "- how many environments to launch in parralel and with how many agent per environment\n",
    "- the path to your curriculum\n",
    "- and many more!"
   ]
  },
  {
   "cell_type": "code",
Benjamin's avatar
Benjamin committed
48
   "execution_count": null,
Benjamin's avatar
Benjamin committed
49
   "metadata": {},
Benjamin's avatar
Benjamin committed
50
   "outputs": [],
Benjamin's avatar
Benjamin committed
51
   "source": [
Benjamin's avatar
Benjamin committed
52
    "\n",
Benjamin's avatar
Benjamin committed
53
    "import warnings\n",
Benjamin's avatar
Benjamin committed
54
55
56
    "warnings.filterwarnings('ignore')\n",
    "import tensorflow as tf\n",
    "tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)\n",
Benjamin's avatar
Benjamin committed
57
    "\n",
Benjamin's avatar
Benjamin committed
58
59
60
    "from mlagents.trainers.trainer_util import load_config;\n",
    "from animalai_train.run_options_aai import RunOptionsAAI;\n",
    "from animalai_train.run_training_aai import run_training_aai;\n",
Benjamin's avatar
Benjamin committed
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
    "\n",
    "\n",
    "trainer_config_path = (\n",
    "    \"configurations/training_configurations/train_ml_agents_config_ppo.yaml\"\n",
    ")\n",
    "environment_path = \"env/AnimalAI\"\n",
    "curriculum_path = \"configurations/curriculum\"\n",
    "run_id = \"self_control_curriculum\"\n",
    "base_port = 5005\n",
    "number_of_environments = 4\n",
    "number_of_arenas_per_environment = 8\n",
    "\n",
    "args = RunOptionsAAI(\n",
    "    trainer_config=load_config(trainer_config_path),\n",
    "    env_path=environment_path,\n",
    "    run_id=run_id,\n",
    "    base_port=base_port,\n",
    "    num_envs=number_of_environments,\n",
    "    curriculum_config=curriculum_path,\n",
    "    n_arenas_per_env=number_of_arenas_per_environment,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once this is done we're pretty much left with a one liner! The training library isn't verbose, but you can monitor training via Tensorboard. The first few lines just load tensorboard, once it is launched and you can see the orange window below, just click on the refresh button in the top right of Tensorboard, graphs will appear after a few training steps.\n",
    "\n",
    "_Note_: in case you don't want to wait for the model to train, you can jump ahead to the next step as we provide a pre-trained model for inference"
   ]
  },
  {
   "cell_type": "code",
Benjamin's avatar
Benjamin committed
95
   "execution_count": null,
Benjamin's avatar
Benjamin committed
96
   "metadata": {},
Benjamin's avatar
Benjamin committed
97
   "outputs": [],
Benjamin's avatar
Benjamin committed
98
   "source": [
Benjamin's avatar
Benjamin committed
99
    "\n",
Benjamin's avatar
Benjamin committed
100
    "import os\n",
Benjamin's avatar
Benjamin committed
101
102
    "# logging.getLogger('tensorflow').disabled = True\n",
    "\n",
Benjamin's avatar
Benjamin committed
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
    "logs_dir = \"summaries/\"\n",
    "os.makedirs(logs_dir, exist_ok=True)\n",
    "%load_ext tensorboard\n",
    "%tensorboard --logdir {logs_dir}\n",
    "\n",
    "run_training_aai(0, args)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can see the lessons increasing as the agent gets better at each level. That's pretty much it for training using the provided library. One last thing we need to do is assess how well our agent trained with only rewards and transparent walls can perform on the transparent cylinder task. To do so we can load the model and run the model in inference"
   ]
  },
  {
   "cell_type": "code",
Benjamin's avatar
Benjamin committed
121
   "execution_count": null,
Benjamin's avatar
Benjamin committed
122
   "metadata": {},
Benjamin's avatar
Benjamin committed
123
   "outputs": [],
Benjamin's avatar
Benjamin committed
124
125
126
127
128
129
130
131
132
133
   "source": [
    "from animalai.envs.arena_config import ArenaConfig\n",
    "\n",
    "#Comment the row below to load your trained model\n",
    "run_id = \"self_control_curriculum_pre_trained\"\n",
    "\n",
    "args = RunOptionsAAI(\n",
    "    trainer_config=load_config(trainer_config_path),\n",
    "    env_path=environment_path,\n",
    "    run_id=run_id,\n",
Benjamin's avatar
Benjamin committed
134
    "    base_port=base_port+3,\n",
Benjamin's avatar
Benjamin committed
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
    "    load_model=True,\n",
    "    train_model=False,\n",
    "    arena_config=ArenaConfig(\"configurations/arena_configurations/cylinder_task.yml\")\n",
    ")\n",
    "run_training_aai(0, args)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You should see the agent get the reward about 50% of the time. It's far from perfect, but it's a good start! Remeber it's a problem which is meant to be hard! You can now give a go at making your own algorithm to train agents that can solve one or more tasks in the `competition_configurations` folder!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using ML-Agents and AnimalAI for your algorithms\n",
    "\n",
Benjamin's avatar
Benjamin committed
155
    "As mentionned earlier AnimalAI is built on top of ML-Agents, and we strongly recommend you have a look at the various bits and pieces you can tinker with in order to implement your own agents. This part is a brief overview of where you can find these parts at the heart of most RL algortihms. We'll start from the higher level controllers down to the basic bricks of RL algorithms. Should you wish to modify them, you'll need to clone [ml-agents repository)(https://github.com/Unity-Technologies/ml-agents).\n",
Benjamin's avatar
Benjamin committed
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
    "\n",
    "- `animalai_train.run_training`: contains the highest level of control for training an agent. You can find all the subroutines you need in order to do so. The most import ones are:\n",
    "    - `animalai_train.subprocess_env_manager_aai.SubprocessEnvManagerAAI`: an evnironment manager which derives from `mlagents.trainers.subprocess_env_manager.SubprocessEnvManager` and manages several environments to run in parallel. In prcatice you shouldn't need to change this part.\n",
    "    - `mlagents.trainers.trainer_util.TrainerFactory`: a factory method which is in charge of creating trainer methos to manage the agents in the environment. In practice we only have a single type of agents in every all the environments, therefore there will only be one trainer to manage all the agents. **You might need to change this code** in case you add a new RL algorithm as it is designed to handle PPO and SAC only.\n",
    "    - `animalai_train.trainer_controller_aai.TrainerControllerAAI`: derives from `mlagents.trainers.trainer_controller.TrainerController` and is where the training loop is.\n",
    "\n",
    "The basic elements which are most likely to be of interest to you:\n",
    "\n",
    "- **Curriculum**: managed in `animalai_train.meta_curriculum_aai.MetaCurriculumAAI` and `animalai_train.meta_curriculum_aai.CurriculumAAI`\n",
    "- **RL algo**: you can find the implementations for PPO and SAC in `mlagents.trainers.ppo.trainer` and `mlagents.trainers.sac.trainer` respectively. They both implment the base class `mlagents.trainers.trainer.trainer` which you can implement and plug directly in the overall training setup (you'll manage all the necessary model parameters in the `TrainerFactory` mentionned above)\n",
    "- **Exploration**: there is a curiosity module already provided in `mlagents.trainers.components.reward_signals`\n",
    "- **Buffer**: the agent's replay buffer is in `mlagents.trainers.buffer`\n",
    "\n",
    "There are many more componnets you can find, one which is not implemented for AnimalAI but on our todo list is imitation learnign and the possibility to record a player's actions in the environmnent.\n",
    "\n",
    "That's pretty much all there is to know, we hope you enjoy the environment!\n"
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}