ReWork Deep Learning London September 2018 part 1

Entering the conference (c) ReWork

September is always a busy month in London for AI, but one of the events I always prioritise is ReWork – they manage to pack a lot into two days and I always come away inspired. I was live-tweeting the event, but also made quite a few notes, which I’ve made a bit more verbose below.  This is part one of at least three parts and I’ll add links between the posts as I finish them.

After a few problems with the trains I arrived right at the end of Fabrizio Silvesetri’s talk on how Facebook use data – very interesting question on avoiding self-referential updates to the algorithm although I lacked the context for the answer.

So the first full talk of the day for me was Agata Lapedriza from University Oberta de Catalunya and MIT who spoke about emotion in images. Generally in computer vision we look solely at the face. While this can give some good results, it can get it badly wrong when the emotion is difficult. She showed a picture of a face and how the same face in different contexts could show disgust, anger, fear or contempt. What was going on around the face is important for us humans to get the correct emotion. Similarly, facial expressions that look like emotions can be misinterpreted out of context. A second image of a child looking surprised turned out to be the boy blowing out candles on his birthday cake. We are good at interpreting what is going on in scenes, or at least we think we are. Agata showed a low resolution video of what appeared to be a man in an office working on a computer. We’d filled in the details by upsampling in our minds, only to discover that the actual video showed the man talking into a shoe, using a stapler as a mouse while wearing drinks cans as earphones and staring at a rubbish bin and not a monitor!

Emotion in a picture varies based on context

The first problem was training data. A Google search gave them faces but with limited labels so this was crowdsourced and averaged to get a full set of emotional descriptions. To put emotions in context they created a system with two parallel convolutional models: one for the scene and the other for the faces. By combining the two they got much better results than looking at the faces or scenes alone. Labelled data was sparse for training (approximately 20 thousand examples) so they used pretrained networks (image net and scenes). Even with this very complicated task they got a reasonable answer in about half of cases.

The biggest potential for this is to build empathy recognition into the machines that will inevitably surround us as we expand our technology. There was a great question on the data annotation and how variable humans are in this task. Agata suggested that an artificial system should match the average of human distribution in understanding emotion. She also noted that it’s difficult to manage the balance between what is seen and the context. More experiments to come in this area I think! You can see a demo at

Next up was someone I’ve heard speak before and who always has some interesting work to present; Raia Hadsell from Deepmind discussing deep reinforcement learning. Games really lend themselves well to reinforcement learning as they are microcosms with large amounts of detail and in build success metrics. Raia was particularly interested in navigation tasks and whether reinforcement learning could solve this problem.

Starting with a simulated maze, could an agent learn how to get to a goal from a random spawn point on the map. In two dimension with a top down view and full visibility this is an easy problem. In a three dimensional (simulated) maze where you can only see the view in front of you it is a much harder problem as you have to build the map. They had success with their technique which included a secondary LSTM component for longer term memory and hypothesised whether this technique could work in the real world.

Using Google Maps and Street View, they generated “StreetLearn” as a reinforcement training environment. Using the street view images cropped to 84 x 84 pixels they created a map of several key cities (London, Paris, New York) and gave the system a courier task. Navigate to a goal and then be given a further goal that requires more navigation. They tried the goal description with both lat/long coordinates and also with distances and direction from specific landmarks. Similar results were observed but both data options. They also applied a learning curriculum where the targets were close to the spawn point and then moved further away as the system learned.

Demo of navigation on Street Learn

There was a beautiful video demonstration showing this in action in both London and Paris, although traffic issues were not mapped. Raia then went on to discuss transference learning where the system was trained on subsections of New York and then the architecture was frozen and only the LSTM was retrained for the new area. This showed great results, with a lot of the task awareness encoded and transferable, leaving only the map to be a newly learned item.

Variety of uses of AI in NASA FDL

I always love hearing space talks, it’s an industry I would have loved to work in. James Parr of the Nasa FDL lab gave a great talk on how AI is essential as a keystone capacity across the board for the FDL. There are too many data sets and too many problems for humans alone. The TESS mission is searching for exoplanets across about 85% of the sky and produces enough data to fill up the Sears Tower (although he didn’t specify in what format 🙂 ). Analysing this data used to be done manually and is now changing to AI. Although there was little depth to the talk in terms of science, he gave a great overview of the many projects where AI is being used: moon mapping, identifying meteorites in a debris field and asteroid shape modelling. A human takes one month to model asteroid shape and NASA has a huge backlog. Other applications include assistance in disaster response, combining high detail pre-existing imagery with lower resolution satellite data that can see through the cloud cover in e.g. hurricane situations. If you want to know more about what they do check out

Which bird was painted by a human?

Qiang Huang from the University of Surrey followed with a fascinating talk on artificial image generation. Currently, we use Generative Adversarial Networks to create images from limited inputs and there’s been moderate success with this. Images can be described in three ways: shape, which is very distinctive and helps localise objects within the image, colour, which can be further split into hue, intensity and value, and finally texture, the arrangement and frequency of variation. So how do humans draw colour pictures? He showed us a slide of two images of a bird and asked us to pick which of them was painted by a human. We all fell for it and were surprised when he said that both images had. Artists tend to draw the contours of the image and then add detail. Could this approach improve artificial image generation? They created a 2-GAN approach – the first to create outlines of birds and the second to use these outlines for images. While the results he showed were impressive, there was still a way to go for realistic images. I wonder how many other AI tasks could be improved by breaking them down into the way humans solve the problem…

Tracking moving objects

Tracking moving objects is an old computer vision task and something that can quickly become challenging. Adam Kosiorek spoke about visual attention methods. With soft attention, the whole image is processed, even if it’s not relevant, which can be wasteful. With spatial transformers you can extract a glimpse and this can be more efficient and allow transformations. He demonstrated single object tracking just from initialising a box around a target object and then discussed a new approach: hierarchical attentive recurrent tracking. Looking at the object not the image gives higher efficiency and using a glimpse (soft attention) to find the object and mask it. The object is then analysed in feature space and its movements predicted. He showed a demo of this system for tracking single cars and it worked well. Can we track multiple objects at once, without supervision and generate moving objects using this technique? Yes but it needs some supervision. He also showed a different model for multiple objects: Sequention attend infer repeat (SQAIR) which was an unsupervised generative model with strong object prior to avoid further supervision. SQAIR has more consistency and common intuition: if it moves together then it belongs together. Adam showed toy results using animated MNIST and also CCTV data with a static background and it shows promise but is currently computationally inefficient.

Another Deepmind speaker was up next, Ali Eslami on scene representation and rendering. Usually this is done with a large data set and current models work well, but this is not representative of how we learn. Similarly the model is only as strong as the data set and it will not go beyond those boundaries. They sought to ask whether computers could learn representation so that the scene could be imagines from any view point. In essence, yes. Ali showed a lovely set of examples showed both the rendered scene and also the intermediate steps. What was fascinating was when the scene was rotated so the objects were behind a wall the network placed the objects first and then occluded them. The best results shown were with 5 training images, but even with one image good results could be seen. There was some uncertainty but it was bounded within reality (i.e. no dragons hiding behind walls!).

Scene understanding and generation of new views

They then asked the question could they take this same model and apply to other tasks? With a 3D tetris problem, it worked well. Again the more images that were given the lower the uncertainty. A further question is: Is this useful? When applied to training a robot arm they achieved better results than direct training alone. The same network could also be used to predict motion in time when trained on snooker balls, implicitly learning the dynamics and physics of the world. This seemed to far outperform variational autoencoder techniques.

One of the things I love about the reWork conferences is that they will have very theory based sessions and then throw in more AI in application talks. The second such talk of the day (after the FDL talk by James Parr) was Angel Serrano from Santander. He started off discussing different ways of organising teams within businesses and why they had gone for a central data science team (hub) with further data scientists (spokes) placed in specific teams to get the best of both worlds in terms of deep understanding of business functions and collaboration. He also noted that data management is a big issue for any company. A data lake created four years ago may not be fit for purpose now in terms of usability by current AI technologies or support for GDPR and this needs regular review and change if necessary. In Santander, the models are trained outside of the data lake for performance and this is a specific consideration. Angel then went on to discuss different uses of AI within Santander, including uses that are not obvious. Bank cards have a 6 year lifespan before they begin to degrade. If they have not been issued within 2 years of manufacture then they have to be destroyed. With over 200 types of card and management with minimum orders, predicting stock requirements is an easy way that AI has added value within the bank without the need for the regulatory overhead that modelling financial information carries.

Lots of ideas and stimulating discussion in the breaks and many more great speakers to summarise.

Link to part 2 will be here when available.


Democratising AI: Who defines AI for good?

At the ReWork Retail and AI Assistants summit in London I was lucky enough to interview Kriti Sharma, VP of AI and Robotics at Sage, in a fireside chat on AI for Good.  Kriti spoke a lot about her experiences and projects not only in getting more diverse voices heard within AI but also in using the power of AI as a force for good.

We discussed the current state of AI and whether we needed legislation.  It is clear that legislation will come if we do not self-police how we are using these new tools.  In the wake of the Cambridge Analytica story breaking, I expect that there will be more of a focus on data privacy laws accelerated, but this may bleed into artificial intelligent applications using such data. Continue reading Democratising AI: Who defines AI for good?

ReWork Deep Learning London 2016 Day 1 Morning

Entering the conference (c) ReWork
Entering the conference (c) ReWork

In September 2016, the ReWork team organised another deep learning conference in London.  This is the third of their conferences I have attended and each time they continue to be a fantastic cross section of academia, enterprise research and start-ups.  As usual, I took a large amount of notes on both days and I’ll be putting these up as separate posts, this one covers the morning of day 1.  For reference, the notes from previous events can be found here: Boston 2015, Boston 2016.

Day one began with a brief introduction from Neil Lawrence, who has just moved from the University of Sheffield to Amazon research in Cambridge.  Rather strangely, his introduction finished with him introducing himself, which we all found funny.  His talk was titled the “Data Delusion” and started with a brief history of how digital data has exploded.  By 2002, SVM papers dominated NIPs, but there wasn’t the level of data to make these systems work.  There was a great analogy with the steam engine, originally invented by Thomas Newcomen in 1712 for pumping out tin mines, but it was hugely inefficient due to the amount of coal required.  James Watt took the design and improved on it by adding the condenser, which (in combination with efficient coal distribution) led to the industrial revolution1.   Machine learning now needs its “condenser” moment.

Continue reading ReWork Deep Learning London 2016 Day 1 Morning

Rework DL Boston 2016 – Day 2

Me, networking at breakfast
Me, networking at breakfast

This is a summary of day 2 of ReWork Deep Learning summit 2016 that took place in Boston, May 12-13th.  If you want to read the summary of day 1 then you can read my notes here. Continue reading Rework DL Boston 2016 – Day 2

ReWork DL Boston 2016 – Day 1

brainLast year, I blogged about the rework Deep Learning conference in Boston and, being here for the second year in a row, I thought I’d do the same.  Here’s the summary of day 1.

The day started with a great intro from Jana Eggers with a positive message about nurturing this AI baby that is being created rather than the doomsday scenario that is regularly spouted.  We are a collaborative discipline of academia and industry and we can focus on how we use this for good. Continue reading ReWork DL Boston 2016 – Day 1

Whetlab and Twitter

Whetlab joins Twitter
Whetlab joins Twitter

At the ReworkDL conference in Boston last month I listened to a fantastic presentation by Ryan Adams of Whetlab on how they’d created a business to add some science to the art of tuning deep learning engines.  I signed up to participate to their closed beta and came back to the UK very excited to use their system once I’d got my architecture in place.  Yesterday they announced that they had signed a deal with Twitter and the beta would be closed.  I was delighted for the team – the business side of me is always happy when a start-up is successful enough to get attention of a big corporate, although I was personally gutted as it means I won’t be able to make use of their software to improve my own project.

This, for me, is a tragedy. Continue reading Whetlab and Twitter

ReWork DL Boston 2015 – Day 2

This post is a very high level summary of Day 2 at the Boston ReWork Deep Learning Summit 2015.  Day 1 can be found here.

The first session kicked off with Kevin O’Brian from GreatHorn.  There are 3 major problems facing the infosec community at the moment:

  1. Modern infrastructure is far more complex than it used to be – we are using AWS, Azure as extensions of our physical networks and spaces such as GitHub as code repositories and Docker for automation.  It is very difficult for any IT professional to keep up with all of the potential vulnerabilities and ensure that everything is secure.
  2. (Security) Technical debt – there is too much to monitor/fix even if business released the time and funds to address it.
  3. Shortfall in skilled people – there is a 1.5 million shortage in infosec  people – this isn’t going to be resolved quickly.

Continue reading ReWork DL Boston 2015 – Day 2

ReWork DL Boston 2015 – Day 1

brain-like computing
Brain-like computing

So, day one of the ReWork Deep Learning Summit Boston 2015 is over.  A lot of interesting talks and demonstrations all round.  All talks were recorded so I will update this post as they become available with the links to wherever the recordings are posted – I know I’ll be rewatching them.

Following a brief introduction the day kicked off with a presentation from Christian Szegedy of Google looking at the deep learning they had set up to analyse YouTube videos.  They’d taken the traditional networks used in Google and made them smaller, discovering that an architecture with several layers of small networks was more computationally efficient that larger ones, with a 5 level (inception-5) most efficient.  Several papers were referenced, which I’ll need to look up later, but the results looked interesting.

Continue reading ReWork DL Boston 2015 – Day 1

New Horizons

So, most people know by now that in a week’s time I start a new role.  After 12 years of working for established business both small and large I am joining a start up in an area at the current edge of what is possible in computer science.  I’m very much looking forward to having my technical and scientific abilities stretched as far as they’ll go and, not unsurprisingly, the immersion in a new venture where the focus is on the solution and not why things can’t be done (often the case in established companies).

I have a reading list as long as the references for my own thesis to get through in the next few weeks so I can become an expert in my new field: deep learning and artificial intelligence.  One of the first things I’ll be doing is attending the ReWorkDL summit in Boston, MA, which is just a fascinating line up of some of the leading people in this space.  All being well I will be presenting at the 2016 summit.

I’ll be tweeting throughout the event with thoughts and comments and will do a summary post afterwards.