ReWork Deep Learning London September 2018 part 1

Entering the conference (c) ReWork

September is always a busy month in London for AI, but one of the events I always prioritise is ReWork – they manage to pack a lot into two days and I always come away inspired. I was live-tweeting the event, but also made quite a few notes, which I’ve made a bit more verbose below.  This is part one of at least three parts and I’ll add links between the posts as I finish them.

After a few problems with the trains I arrived right at the end of Fabrizio Silvesetri’s talk on how Facebook use data – very interesting question on avoiding self-referential updates to the algorithm although I lacked the context for the answer.

So the first full talk of the day for me was Agata Lapedriza from University Oberta de Catalunya and MIT who spoke about emotion in images. Generally in computer vision we look solely at the face. While this can give some good results, it can get it badly wrong when the emotion is difficult. She showed a picture of a face and how the same face in different contexts could show disgust, anger, fear or contempt. What was going on around the face is important for us humans to get the correct emotion. Similarly, facial expressions that look like emotions can be misinterpreted out of context. A second image of a child looking surprised turned out to be the boy blowing out candles on his birthday cake. We are good at interpreting what is going on in scenes, or at least we think we are. Agata showed a low resolution video of what appeared to be a man in an office working on a computer. We’d filled in the details by upsampling in our minds, only to discover that the actual video showed the man talking into a shoe, using a stapler as a mouse while wearing drinks cans as earphones and staring at a rubbish bin and not a monitor!

Emotion in a picture varies based on context

The first problem was training data. A Google search gave them faces but with limited labels so this was crowdsourced and averaged to get a full set of emotional descriptions. To put emotions in context they created a system with two parallel convolutional models: one for the scene and the other for the faces. By combining the two they got much better results than looking at the faces or scenes alone. Labelled data was sparse for training (approximately 20 thousand examples) so they used pretrained networks (image net and scenes). Even with this very complicated task they got a reasonable answer in about half of cases.

The biggest potential for this is to build empathy recognition into the machines that will inevitably surround us as we expand our technology. There was a great question on the data annotation and how variable humans are in this task. Agata suggested that an artificial system should match the average of human distribution in understanding emotion. She also noted that it’s difficult to manage the balance between what is seen and the context. More experiments to come in this area I think! You can see a demo at

Next up was someone I’ve heard speak before and who always has some interesting work to present; Raia Hadsell from Deepmind discussing deep reinforcement learning. Games really lend themselves well to reinforcement learning as they are microcosms with large amounts of detail and in build success metrics. Raia was particularly interested in navigation tasks and whether reinforcement learning could solve this problem.

Starting with a simulated maze, could an agent learn how to get to a goal from a random spawn point on the map. In two dimension with a top down view and full visibility this is an easy problem. In a three dimensional (simulated) maze where you can only see the view in front of you it is a much harder problem as you have to build the map. They had success with their technique which included a secondary LSTM component for longer term memory and hypothesised whether this technique could work in the real world.

Using Google Maps and Street View, they generated “StreetLearn” as a reinforcement training environment. Using the street view images cropped to 84 x 84 pixels they created a map of several key cities (London, Paris, New York) and gave the system a courier task. Navigate to a goal and then be given a further goal that requires more navigation. They tried the goal description with both lat/long coordinates and also with distances and direction from specific landmarks. Similar results were observed but both data options. They also applied a learning curriculum where the targets were close to the spawn point and then moved further away as the system learned.

Demo of navigation on Street Learn

There was a beautiful video demonstration showing this in action in both London and Paris, although traffic issues were not mapped. Raia then went on to discuss transference learning where the system was trained on subsections of New York and then the architecture was frozen and only the LSTM was retrained for the new area. This showed great results, with a lot of the task awareness encoded and transferable, leaving only the map to be a newly learned item.

Variety of uses of AI in NASA FDL

I always love hearing space talks, it’s an industry I would have loved to work in. James Parr of the Nasa FDL lab gave a great talk on how AI is essential as a keystone capacity across the board for the FDL. There are too many data sets and too many problems for humans alone. The TESS mission is searching for exoplanets across about 85% of the sky and produces enough data to fill up the Sears Tower (although he didn’t specify in what format 🙂 ). Analysing this data used to be done manually and is now changing to AI. Although there was little depth to the talk in terms of science, he gave a great overview of the many projects where AI is being used: moon mapping, identifying meteorites in a debris field and asteroid shape modelling. A human takes one month to model asteroid shape and NASA has a huge backlog. Other applications include assistance in disaster response, combining high detail pre-existing imagery with lower resolution satellite data that can see through the cloud cover in e.g. hurricane situations. If you want to know more about what they do check out

Which bird was painted by a human?

Qiang Huang from the University of Surrey followed with a fascinating talk on artificial image generation. Currently, we use Generative Adversarial Networks to create images from limited inputs and there’s been moderate success with this. Images can be described in three ways: shape, which is very distinctive and helps localise objects within the image, colour, which can be further split into hue, intensity and value, and finally texture, the arrangement and frequency of variation. So how do humans draw colour pictures? He showed us a slide of two images of a bird and asked us to pick which of them was painted by a human. We all fell for it and were surprised when he said that both images had. Artists tend to draw the contours of the image and then add detail. Could this approach improve artificial image generation? They created a 2-GAN approach – the first to create outlines of birds and the second to use these outlines for images. While the results he showed were impressive, there was still a way to go for realistic images. I wonder how many other AI tasks could be improved by breaking them down into the way humans solve the problem…

Tracking moving objects

Tracking moving objects is an old computer vision task and something that can quickly become challenging. Adam Kosiorek spoke about visual attention methods. With soft attention, the whole image is processed, even if it’s not relevant, which can be wasteful. With spatial transformers you can extract a glimpse and this can be more efficient and allow transformations. He demonstrated single object tracking just from initialising a box around a target object and then discussed a new approach: hierarchical attentive recurrent tracking. Looking at the object not the image gives higher efficiency and using a glimpse (soft attention) to find the object and mask it. The object is then analysed in feature space and its movements predicted. He showed a demo of this system for tracking single cars and it worked well. Can we track multiple objects at once, without supervision and generate moving objects using this technique? Yes but it needs some supervision. He also showed a different model for multiple objects: Sequention attend infer repeat (SQAIR) which was an unsupervised generative model with strong object prior to avoid further supervision. SQAIR has more consistency and common intuition: if it moves together then it belongs together. Adam showed toy results using animated MNIST and also CCTV data with a static background and it shows promise but is currently computationally inefficient.

Another Deepmind speaker was up next, Ali Eslami on scene representation and rendering. Usually this is done with a large data set and current models work well, but this is not representative of how we learn. Similarly the model is only as strong as the data set and it will not go beyond those boundaries. They sought to ask whether computers could learn representation so that the scene could be imagines from any view point. In essence, yes. Ali showed a lovely set of examples showed both the rendered scene and also the intermediate steps. What was fascinating was when the scene was rotated so the objects were behind a wall the network placed the objects first and then occluded them. The best results shown were with 5 training images, but even with one image good results could be seen. There was some uncertainty but it was bounded within reality (i.e. no dragons hiding behind walls!).

Scene understanding and generation of new views

They then asked the question could they take this same model and apply to other tasks? With a 3D tetris problem, it worked well. Again the more images that were given the lower the uncertainty. A further question is: Is this useful? When applied to training a robot arm they achieved better results than direct training alone. The same network could also be used to predict motion in time when trained on snooker balls, implicitly learning the dynamics and physics of the world. This seemed to far outperform variational autoencoder techniques.

One of the things I love about the reWork conferences is that they will have very theory based sessions and then throw in more AI in application talks. The second such talk of the day (after the FDL talk by James Parr) was Angel Serrano from Santander. He started off discussing different ways of organising teams within businesses and why they had gone for a central data science team (hub) with further data scientists (spokes) placed in specific teams to get the best of both worlds in terms of deep understanding of business functions and collaboration. He also noted that data management is a big issue for any company. A data lake created four years ago may not be fit for purpose now in terms of usability by current AI technologies or support for GDPR and this needs regular review and change if necessary. In Santander, the models are trained outside of the data lake for performance and this is a specific consideration. Angel then went on to discuss different uses of AI within Santander, including uses that are not obvious. Bank cards have a 6 year lifespan before they begin to degrade. If they have not been issued within 2 years of manufacture then they have to be destroyed. With over 200 types of card and management with minimum orders, predicting stock requirements is an easy way that AI has added value within the bank without the need for the regulatory overhead that modelling financial information carries.

Lots of ideas and stimulating discussion in the breaks and many more great speakers to summarise.

Link to part 2 will be here when available.


ImageNet in 4 Minutes? What the paper really shows us

ImageNet has been a deep learning benchmark data set since it was created.  It was the competition that showed that DL networks could outperform non-ML techniques and it’s been used by academics as a standard for testing new image classification systems.  A few days ago an exciting paper was published on arxiv for training ImageNet in four minutes.  Not weeks, days or hours but minutes.  This is on the surface a great leap forward but it’s important to dig beneath the surface.  The Register sub headline says all you need to know:

So if you don’t have a relaxed view on accuracy or thousands of GPUs lying around, what’s the point? Is there anything that can be taken from this paper?

Continue reading ImageNet in 4 Minutes? What the paper really shows us

Thinking machines – biological and artificial


How can we spot a thinking machine?

If you’ve read pretty much any other of my artificial intelligence blog posts on here then you’ll know how annoyed I am when the slightest advance in the achievements of AI spurs an onslaught of articles about “thinking machines”, that can reason and opens up the question of robots taking jobs and eventually destroying us all in some not-to-be-mentioned1 film franchise style.  Before I get onto discussing if and when we’ll get to a Detroit Become Human scenario, I’d like to cover where we are and the biggest problem in all this. Continue reading Thinking machines – biological and artificial

Presentations and speaking at conferences

Me presenting at Continuous Lifecycle London 2018

One of the things I’ve been doing more this year is speaking more at conferences and meetups. I always take the time to speak to the audience afterwards to see if there were aspects they didn’t get or enjoy, so I can hone the presentation for the next time1. Even when under embargo of product details, there’s usually lots of things that you can talk about that the wider community will find interesting and I have been encouraging people to break their presentation fear by talking at meetups.

Following on from my “Being a Panellist” post, I’ve been asked a lot how I go about writing a presentation and what I do to prepare, so I’ve gathered my thoughts here. This isn’t the only way, but it is what works for me! Continue reading Presentations and speaking at conferences

Cognitive Bias and Review of Bandwidth by Eliot Peper


Bandwidth is available now in multiple formats

One of the things I love about Kindle unlimited is that I’m regularly finding new authors that I wouldn’t necessarily know about otherwise. At my reading rate I often find that I’m trying to pick a new book at odd hours of the day (or night) and will go with something new recommended by Amazon and this is how I came across Bandwidth by Eliot Peper.

Kindle had this prominently as a Sci-Fi choice for me while I was in the middle of several different dragon-related fantasy series and I was very much motivated for something a little more thought provoking.

And this is. Continue reading Cognitive Bias and Review of Bandwidth by Eliot Peper

Dammit I’m a Dr not a Stereotype

Actual responses… prompted by the “Immodest” tweet. Image credit @ralphharrington

I’m proud to call myself Dr Bastiman. It’s on my email signature (personal and professional), it’s in my twitter name, it’s the title I use when dealing with I have to give my details for just about anything. I’m proud of it and have never consider this to be immodest. My title shows to the world that I’ve achieved something considerable. I was both surprised and then immediately not surprised when a storm started on Twitter…

I’ve had similar rants myself over the years. Particularly at one company where using my title in my email signature didn’t fit their cultural “tone of voice” yet at the same time senior males with PhDs were allowed to use their titles… I now use mine everywhere. However, the reason that the tweet came to my attention was one of the bizarre responses… Continue reading Dammit I’m a Dr not a Stereotype

Constant learning, commitment and determination


I read an interesting thread this morning that really resonated with me.  I am continually ensuring that my team have a great work-life balance, encouraging them not to work too hard and ensuring that they have time with their families.  There are occasions when things go wrong and everyone needs to pull together but this should always be the exception.  I’ve written before about the expectations of some tech companies in excessive hours as the only way. However, I have got to where I am by working as hard as I could, being determined in what I wanted to achieve,  using my evening to improve and learn new things so I have the knowledge and experience for each new step.  I pushed myself really hard, because I knew I could always do more, do better.  I still do.   Continue reading Constant learning, commitment and determination

Democratising AI: Who defines AI for good?

At the ReWork Retail and AI Assistants summit in London I was lucky enough to interview Kriti Sharma, VP of AI and Robotics at Sage, in a fireside chat on AI for Good.  Kriti spoke a lot about her experiences and projects not only in getting more diverse voices heard within AI but also in using the power of AI as a force for good.

We discussed the current state of AI and whether we needed legislation.  It is clear that legislation will come if we do not self-police how we are using these new tools.  In the wake of the Cambridge Analytica story breaking, I expect that there will be more of a focus on data privacy laws accelerated, but this may bleed into artificial intelligent applications using such data. Continue reading Democratising AI: Who defines AI for good?

Cambridge Analytica: not AI’s ethics awakening

From the wonderful XKCD, research ethics

By now, the majority of people who keep up with the news will have heard of Cambridge Analytica, the whistle blower Christopher Wylie, and the news surrounding the harvesting of Facebook data and micro targeting, along with accusations of potentially illegal activity.  In amongst all of this news I’ve also seen articles that this is the “awakening ” moment for ethics and morals AI and data science in general.  The point where practitioners realise the impact of their work.

“Now I am become Death, the destroyer of worlds”, Oppenheimer

Continue reading Cambridge Analytica: not AI’s ethics awakening

2 out of 3 ain’t bad: back to back TMAs

MS327 progress – not where it needs to be to do well on TMAs

It’s been a crazy month.  From the lead up to the product launch at work to know it seems like I’ve been doing nothing but back to back assignments for my Open University maths degree.  So much so in fact that I’ve not had time to study, but only focus on the assignments themselves.  It all started back with the second TMA for M337 (Complex analysis), which was a rush job and I got a much lower score on that than I would have liked.

I then had two weeks until a computer marked assignment for MS327 and was going into this without having looked at any of the material in the book.  As usual, I spent my commute trying to get through it, but barely made it a quarter of the way through  before I realised I’d have to start working through the questions for the assignment.  Computer marked assignments are very different from the tutor marked ones.  You either select an answer from a choice of 4-6 potential results1 or by typing a numerical result.  Therefore your answers are either correct or incorrect.  There are no marks for method.

If you find yourself in this position, my best advice is always to do the unit quizzes.  These are usually in a similar format and will get your brain in the right place for the assignment itself.  In combination with the handbook and text books you should be able to follow how to get the answers from the questions, although please make time to go back and fill in the gaps as soon as you can.

Noise levels at the local soft play centre. Earplugs don’t really help – it’s so high pitched you feel it through your teeth more than your ears 😉

With the iCMA out of the way, I then had a week for the third TMA of MS327. Fortunately on the same topics as the iCMA, but much more involved questions.  This was a lot trickier to pick up.  With the usual standing room only on the commute, I’ve had to spend a lot of evenings trying to do this around family time and study in the excessive noise of soft-play centres… not a great environment for thought!


So I’ve just submitted the MS327 TMA online and I’m pretty happy with it.  Now I have two weeks to get the M337 TMA3 done and I’m a little more nervous about that.  There isn’t time to go through the study materials and answer the questions, so I’ve given myself a week to get as far as I can and then I’ll dive into the TMA and see how far I can get…

On the plus side, the OU offers substitution on your assignments, so it’s okay to have a bad one and you’ll be allocated a score that’s the average of the other three.  So I can still pull up my average for M337, as long as I do well on this TMA.

This level of progress isn’t looking good at the moment…

I’m really hoping that I can get time to get these to 100% before the exams in June!