Commit 513ea281 authored by Benjamin Beyret's avatar Benjamin Beyret
Browse files

add AWS training + docker_training false dopamine

parent cefa3b06
# Training on AWS
Training an agent requires rendering the environment on a screen, which prevents training on out of the box cloud compute
instances. You will need to follow either one of the methods provided below in order to perform training on the cloud.
Both methods were tested on [AWS p2.xlarge](https://aws.amazon.com/ec2/instance-types/p2/) using a standard
[Deep Learning Base AMI](https://aws.amazon.com/marketplace/pp/B077GCZ4GR). We leave to participants the task of adapting
the information found here to different cloud providers and/or instance type.
**WARNING: using cloud services will incur costs, carefully read your provider terms of service**
## Pre-requisite: setup an AWS p2.xlarge instance
Start by creating an account on [AWS](https://aws.amazon.com/), and then open the [console](https://console.aws.amazon.com/console/home?).
Compute engines on AWS are called `EC2` and offer a vast range of configurations in terms of number and type of CPUs, GPUs,
memory and storage. You can find more details about the different types and prices [here](https://aws.amazon.com/ec2/pricing/on-demand/).
In our case, we will use a `p2.xlarge instance`, in the console select `EC2`:
![EC2](AWS/EC2.png)
by default you will have a limit restriction on the number of instances you can create. Check your limits by selecting `Limits` on the top
left menu:
![EC2](AWS/limits.png)
Request an increase for `p2.xlarge` if needed. Once you have at least a limit of 1, go back to the EC2 console and select launch instance:
![EC2](AWS/launch.png)
You can then select various images, type in `Deep learning` to see what is on offer, for now we recommend to select `AWS Marketplace` on the left panel:
![EC2](AWS/marketplace.png)
and select either `Deep Learning Base AMI (Ubuntu)` if you want a basic Ubuntu install with CUDA capabilities, or `Deep Learning AMI (Ubuntu)` if you
require deep learning libraries installed as well (tensorflow, pytorch...). On the next page select `p2.xlarge`:
![EC2](AWS/p2.png)
Click review and launch, and then launch. You will then be asked to create or select existing key pairs which will be used to ssh to your instance.
Once your instances is started, it will appear on the EC2 console. To ssh into your instance, right click the line, select connect and follow the instructions.
We can now configure our instance for training. **Don't forget to shutdown your instance once you're done using it as you get charged as long as it runs**.
## The easy way: docker
Basic Deep Learning Ubuntu images provide [NVIDIA docker](https://devblogs.nvidia.com/nvidia-docker-gpu-server-application-deployment-made-easy/)
pre-installed, which allows to use CUDA within a container. SSH into your AWS instance, clone this repo and follow the instructions below.
In the [submission guide](submission.md) we describe how to build a docker container for submission. The same process
can be used to create a docker for training an agent. The [dockerfile provided](../examples/submission/Dockerfile) can
be adapted to include all the libraries and code needed for training.
For example, should you wish to train a standard Dopamine agent provided in `animalai-train` out of the box, using GPU compute, add the following
lines to your docker in the `YOUR COMMANDS GO HERE` part, below the line installing `animalai-train`:
```
RUN git clone --single-branch --branch submission https://github.com/beyretb/AnimalAI-Olympics.git
RUN pip uninstall --yes tensorflow
RUN pip install tensorflow-gpu==1.12.2
RUN apt-get install unzip
RUN wget https://www.doc.ic.ac.uk/~bb1010/animalAI/env_linux_v1.0.0.zip
RUN mv env_linux_v1.0.0.zip AnimalAI-Olympics/env/
RUN unzip AnimalAI-Olympics/env/env_linux_v1.0.0.zip -d AnimalAI-Olympics/env/
WORKDIR /aaio/AnimalAI-Olympics/examples
sed -i 's/docker_training=False/docker_training=True/g' trainDopamine.py
```
Build your docker, from the `examples/submission` folder run:
```
docker buil --tag=test-training .
```
Once built, you can start training straight away by running:
```
docker run test-training python /aaio/AnimalAI-Olympics/examples/trainDopamine.py
```
You should see the following tensorflow line in the output which confirms you are training using the GPU:
```
```
\ No newline at end of file
......@@ -18,6 +18,7 @@ def create_env_fn():
worker_id=worker_id,
n_arenas=1,
arenas_configurations=arena_config_in,
docker_training=False,
retro=True)
return env
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment