The ReWork Deep Learning summit in London in September has become one of my must have go to conferences. It’s a great mix of academic talks and more practical sessions regarding applications of various types of Ai in business, so I couldn’t miss it this year either. Here’s a summary of Day 1
The honour of the first talk of the day was from Huma Ludhi, who had an intriguing talk title of “Tricks for Deep Learning”, she gave a great opening summary on the basics of deep learning to set the scene for the rest of the day discussing the differences between statistics and deep learning.
The second talk was the one that many people were waiting for. Oriol Vinyals, a veteran of Rework, gave a great summary of reinforcement learning and how this had been adapted to get much better results for StarCraft 2, a multiplayer strategy game from Blizzard. What was interesting was how a simple reward policy of “win” led to a local maxima where the agent won against the in game AI but failed to master the aspects of the game that would allow it to progress against the best human players. They solved this with some supervised pre-training and then allowing the agents to play each other and use a genetic algorithm approach to evolve. Eventually they had a system good enough to beat TLO and MaNa – two of the best human players. You can read more about the AlphaStar system on the DeepMind blog.
Finishing the first session was Jens Kober discussing reinforcement learning in robotics. Traditionally, we have shown an end result and letthe algorithms determine their own way to get there. The approach Jens suggested was very much more along the lines of human task learning. When we want to learn how to drive, we don’t get one demonstration, we get continual feedback and correction from our tutor. By taking this approach with robots, they were able to get higher accuracy in shorter times.
After a much needed 1 coffee break, the second session started with another DeepMind talk, this time from Ali Eslami, who started off with Plato’s allegory of the cave. This time, the reinforcement learning was directed at an agent that could paint using software such as photoshop, rather than a pixel by pixel creation. Only the end result was seen and the agent was rewarded based on a final portrait. There were some very impressive layering of colours to get the finished results. It’s debatable whether this is creativity and Ali left that to us as a question. Some of the experiments showed that if time was limited, abstract faces appeared. Similarly, if the time was too long, then the agent needed regular reinforcement to prevent leaving everything until the last minute. If you’ve read this blog before then you will know I sympathise with this approach :). There was a website, but it is not online yet – I will update when it is live.
Richard Turner then stepped up to discuss the two big problems with the deep learning toolkit: data inefficiency and the lack of continual learning. How can we create systems that can generalise from known structures to classify items that are previously unseen with only a few examples? He showed an approach that is described in this paper: FiLM for visual reasoning. With incremental addition of small numbers of examples of new classes on a previously trained system they were able to get state of the art accuracy. However, state of the art is still less than 40% so it’s not quite ready for business use cases yet.
Development of our own brains is a fascinating topic and we rarely think to train machines in the same ways. Pierre-Yves Oudeyer outlined some ways in which they were taking the same approach to look for emergent skills using the similar self-motivated reward mechanisms of infants. All children know the scientific method inherently and it was great to see some robot dogs in a baby gym and also a hand learning to play a game with a ball despite having no knowledge of the rules of the world around them.
Gaming is always fun in AI conferences and Katja Hofmann from Microsoft was up next with using Minecraft as a platform for reinforcement learning. She showed some examples of agents learning to navigate mazes and talked about the open sourced Project Malmo. Part of the code provided with the project shows how the agent is learning from its environment, which is great to see. There is a great paper on this work and also a competition for you to create your own agents.
The last session before lunch was from Edward Grefenstette looking at understanding language instructions. These systems need to learn generalisations as language is practically unbounded. It can be difficult to get a reward policy without drowning in source code. Can we create a system that can understand natural human speech to find answers from a question or to navigate a city? Edward managed to present Adversarial Goal-Induced Learning from Examples and the success this had on both moving objects in a 5 by 5 2d grid efficiently, but the same system showed great generalisation when trained on city navigation. Their paper is available on arxiv.
I’ll update this post with a link to the second session when it’s written.
- For me at least! ↩