@@ -20,7 +20,7 @@ We ran a competition using this environment and the associated tests, more detai
The environment is built using [Unity ml-agents](https://github.com/Unity-Technologies/ml-agents/tree/master/docs) and contains an agent enclosed in a fixed sized arena. Objects can spawn in this arena, including positive
and negative rewards (green, yellow and red spheres) that the agent must obtain (or avoid). All of the hidden tests that will appear in the competition are made using the objects in the training environment.
To get started install the requirements below, and then follow the [Quick Start Guide](documentation/quickstart.md).
To get started install the requirements below, and then follow the jupyter notebook tutorials in the [examples folder](examples).
More in depth documentation can be found on the [Documentation Page](documentation/README.md).
_Note_: in case you don't want to wait for the model to train, you can jump ahead to the next step as we provide a pre-trained model for inference
%% Cell type:code id: tags:
``` python
importos
# logging.getLogger('tensorflow').disabled = True
logs_dir="summaries/"
os.makedirs(logs_dir,exist_ok=True)
%load_exttensorboard
%tensorboard--logdir{logs_dir}
run_training_aai(0,args)
```
%%%% Output: display_data
%% Cell type:markdown id: tags:
You can see the lessons increasing as the agent gets better at each level. That's pretty much it for training using the provided library. One last thing we need to do is assess how well our agent trained with only rewards and transparent walls can perform on the transparent cylinder task. To do so we can load the model and run the model in inference
/usr/lib/python3.6/multiprocessing/process.py in join(self, timeout)
122 assert self._parent_pid == os.getpid(), 'can only join a child process'
123 assert self._popen is not None, 'can only join a started process'
--> 124 res = self._popen.wait(timeout)
125 if res is not None:
126 _children.discard(self)
/usr/lib/python3.6/multiprocessing/popen_fork.py in wait(self, timeout)
48 return None
49 # This shouldn't block if wait() returned successfully.
---> 50 return self.poll(os.WNOHANG if timeout == 0.0 else 0)
51 return self.returncode
52
/usr/lib/python3.6/multiprocessing/popen_fork.py in poll(self, flag)
26 while True:
27 try:
---> 28 pid, sts = os.waitpid(self.pid, flag)
29 except OSError as e:
30 # Child process not yet created. See #1731717
KeyboardInterrupt:
%% Cell type:markdown id: tags:
You should see the agent get the reward about 50% of the time. It's far from perfect, but it's a good start! Remeber it's a problem which is meant to be hard! You can now give a go at making your own algorithm to train agents that can solve one or more tasks in the `competition_configurations` folder!