Explainable AI

Last week, I attended the Re Work Explainable AI mini summit. I am really loving so many great speakers being accessible online, particularly in a three to four hour format, which makes it easy to fit in around work commitments better than an in person summit – had it not been online I would have missed out on some great speakers.

Explainability is something I’ve been really focussing on recently. While it’s always been important, my frustration has been in research focussing on tools for machine learning engineers and not on clear explanations for the general public – the very people using, and being affected by, the systems we build. I was keen to attend this summit in particular as a refresh of where we were in terms of current best practise.

Many current explainability solutions are flawed. When we are seeking to “explain” what is going on in models we look at e.g. SHAP values to show the contribution of features to the overall predictions. We make simplifications when the number of features gets too big for us to justify the time and effort involved. Our output? A table of numbers which we might blur into a visualisation. We write a paper, pat ourselves on the back for a well explained model and move on.

The general public are not so lucky. The majority of people affected by or using our systems are not mathematicians and these results without further explanation are not tangible. We have a duty to ensure that people understand why they are getting the results they do, whether this is for regulatory reasons or a simple recommendation. This means that any explainability needs to be accessible for the lay-person, in plain English, that’s more informative that Baldrick’s attempt to classify a dog…

E: What about `D’?

B: I’m quite pleased with `dog’.

E: Yes, and your definition of `dog’ is…?

B: “Not a cat.”

Blackadder the Third, Ink and Incapability, very similar to the over-simplified explanations we have for classification tasks 😉

The summit started with an introduction by Anusha Sethuraman of Fiddler, who also introduced each speaker and moderated the question sessions. The first session was Tristan Ferne from BBC’s R&D team. Starting with a clear “We should all be explaining AI and machine learning better” he showed that it is ubiquitous, opaque, can easily be deployed without notification and can go wrong in unusual and unpredictable ways.

Tristan Ferne presenting at ReWork Explainable AI Summit – AI systems can be very complex – here’s the inner workings of Alexa. How can we begin to explain this in an accessible way?

Tristan argued that we need to focus on the purpose of the AI and the user needs – this will be different from system to system. Sometimes “what” is more important than “why”. He showed a garden bird identifier that had been built by his team. In these sorts of situations the user wants to know what has impacted the classification and some comfort around the uncertainty. The identifier had heat maps to show the important features that led to the classification and also offered alternatives for near matches. “Most likely to be …” and ” it could be” gave the user comfort that that even if the main prediction is wrong, the system is still performing well with the alternatives.

The system also included a visual representation of the data used to train the system, showing bias towards ducks, sparrows and starlings. Tristan highlighted a paper released a few days earlier covering this topic: What Do We Want From Explainable Artificial Intelligence (XAI)? — A Stakeholder Perspective on XAI and a Conceptual Model Guiding Interdisciplinary XAI Research1. He ended with underlining that you need to understand why you are trying to explain a system and to whom, and this was key to decide where and how to do the explanation.

Next was Rishabh Mehrota from Spotify. With 60 million tracks, it’s practically impossible for users to find new content other than what they already know. Spotify offers several ways of doing this in addition to a raw “search”. Their radio feature gives sequential predictions, there are mood hubs with common types of music, and recommendations based on listening patterns. They focus on “recsplanations” (recommended explanations) to help the user understand what they are seeing. He highlighted a great paper on this that he had co-authored a few years ago: Explore, Exploit, Explain: Personalising Explainable Recommendations with Bandits. Basically are you exploiting what you know (you like rock music, so here is more rock music), exploring something related (you like this artist so you might like these artists as well), and then explaining (you listen to this sort of music on Fridays so you might want more of that today).

How Spotify shelves work for recommendations – Rishabh Mehrota at the ReWork Explainability Summit

Rather than go into the details of these different “shelves”, Rishabh talked about some of the science they’d done in understanding whether these explanations helped users and if there were any preferences as to the type of explanation. They discovered that users do react differently based on the explanation and further more that detailed explanations give more reaction, but this didn’t work for “mood”, which might be due to its breadth as a category. The difficulty here is that there are multiple objectives in the model – if the user does not like the recommendation it is difficult to ascribe that to a specific objective.

Having my Spotify account signed in on the Smart TV in the house leads to some interesting recommendations of what I like – sorry for messing with your algorithms 😀

Spotify use “human in the loop” AI and enlist a lot of editorial help to categorise the music and create relations, which makes clear English explainability easy as it is built in to the data. The explanations have to be relatable otherwise the user does not connect.

The graph Cynthia objected to – there’s more on this in her paper from Nature Machine Intelligence in 2019

The third talk was from Cynthia Rudin from Duke University with a somewhat provocative talk that explainable AI perpetuates the problems we have and instead for high stakes decisions we should use interpretable models instead. One of the first things she did was reference a report from DARPA XAI in 2016 that showed a graph that explainability had an inverse relationship with performance. She picked this apart completely, but the key point is that this was a graph to support an assumption (or possibly a single set of experiments) rather than a general concept, but has been taken as a “truth” by the AI community since. A good reminder to look at the original data for any conclusions and if there is no data then assume it’s made up… even in Data Science!

Explainable ML is post hoc explanations for opaque models to justify the results. Interpretable ML is embedded in the design and was there as a field first! Part of this is down to the data and the other is the choice of model. Sparse models can perform similarly to neural network models if properly designed. For discrete variable data (each attribute makes sense on its own, like tabular information) sparse models are easily interpretable. Where you have raw data (sound or images for example where a single second or pixel does not make sense in isolation, it is more difficult but not impossible.

Cynthia had another bird demo2 and showed a demo of “this bit looks like that” which was very simple for an end user to understand the concepts and was covered in a paper from her lab.

Cynthia Rudin at the ReWork Explainable AI mini summit showing the “this looks like that” interpretability model for computer vision

She then discussed how interpretable concepts could be applied to complex neural networks to achieve explainability. There’s no clear concept for a node in the layers in the network – the concepts are entangled. The concept vector could be the same for two distinct concepts, making it impossible to use for interpretability directly. However, Cynthia’s lab released a paper in 2020 showing how their CW layer could replace a batch normalisation layer to disentangle latent space and put concepts along the axes while not hurting performance. Adding this type of layer can make the opaque network become translucent and aid interpretability3. Code for the techniques she presented are on her lab website.

For an opposite view on this, Cassie Kozyrkov argues that you can’t have both understanding and performance with a clear analogy about two spacecraft: one you know how it works and the other is thoroughly tested – which one do you choose? (video and Hackernoon article “Explainable AI won’t deliver. Here’s why.”, which is more verbose). She uses explainability and interpretability interchangeably unlike Cynthia Rudin. My problem with this is that the argument is reduced to an either/or – however you can test explainable (or interpretable) systems thoroughly… and there is no excuse for not testing! But also, when Cassie goes on to use the example that you don’t ask your colleagues why they chose tea over coffee… of course not, (unless it’s an unusual behaviour) but presenting a legal case or a medical diagnosis we don’t just step back satisfied on a police officer or doctor’s “gut instinct” – we need facts. We need explainability. This is important. In high stakes decisions you need to know how as well as why. While I completely agree with Cassie Kozyrkov that you need to look at the business need, I think it’s potentially dangerous to dissuade people in the industry, or even the lay person, from wanting some justification. We’ve seen many examples where the testing is not up to scratch, and most people do not have the understanding of the statistics to question test results, particularly when looking at percentages. The two are needed in combination.

Next up was Walter Crismareanu from Tipalo. His talk seemed slightly out of place in the summit as he wasn’t presenting anything on explainability as such, but instead a different way of designing neural nets, by using logical subgroups. I’ve spoken myself previously on the differences between CNNs and biological neurons and how hard it is to assess intelligence outside of our own experience of it. I was slightly frustrated by the lack of detail here. While I understand that Tipalo want to keep their IP secret, he was making big claims about self training network clusters. When asked how they were training these networks, he suggested they weren’t being trained. Possibly something lost in translation, and I assumed by his statement that you “can’t have intelligence without a body” (sensory inputs) that they must be putting the networks in some sort of body in an environment… but it would have been nice to see this, or a video, or any sort of results. Their website does give slightly more detail than presented at the conference and I will watch this company with interest.

Part of the Tipalo presentation at the Re Work Explainable AI summit showing their unique approach to training.

Rachel Alexander from Omina Technologies presented sentiments that really echo my own – explanations have been made primarily for the teams creating models and not the end users. She was looking at this from the needs of the medical industry where trust and explanations are critical. She was keen to stress that AI is for everyone, not just the happy few, and also made the distinction between explainability (why the decision was made) and interpretability (how the decision was made).

The type of information will depend on the use case. For preventable medicine, interpretability is key – models have to include this for the medical professionals and general public to trust the results. For other use cases, e.g. number of people who might be admitted, explainable models should be sufficient. SHAP values alone make models translucent, not transparent!

I had to take a break during the roundtable sessions, but rejoined in time for the final panel session with Mary Reagan (Fiddler), Merve Hickok (AIEthicist), Sara Hooker (Google Brain), and Narine Kokhlikayan (Facebook).

The panellists were mostly in agreement and discussed and extended the key points that had been raised in the preceding talks. As an industry, we need to build in interpretability as this makes better models. These tools need to be easy to use for everyone. Without transparency and accountability, we will continue to see live models with issues that become apparent only after they have done harm.

Explainability and interpretability needs to be proactive and not reactive. While there are some challenges, many of these are engineering issues and not fundamental to the techniques.

I must admit, I tend to use explainability to include interpretability because the end user wants an explanation and whether this is “how” or “why” depends on context. You need to deeply understand how your models are going to be used (the business problem) and the needs of the end user before you start creating something and then adding the explainability as an afterthought.

  1. Added to my list of things to read 🙂
  2. I guess irises are soooo last decade 😉
  3. She co-authored another paper in 2019 on Rashomon curves to help investigate models for simplicity vs accuracy

Published by


Dr Janet is a Molecular Biochemistry graduate from Oxford University with a doctorate in Computational Neuroscience from Sussex. I’m currently studying for a third degree in Mathematics with Open University. During the day, and sometimes out of hours, I work as a Chief Science Officer. You can read all about that on my LinkedIn page.

2 thoughts on “Explainable AI”

  1. In ML, people “feed” the neural nets” from outside, with only one type of “food”, namley exactly the data which corresponds to the specialty of that net. BUT, how intelligent is to build a neural DB with millions of pics of a dog, just to identify a dog in a new pic?

    What if you do not choose that special food, instead let the system of neural nets choose its food autonomously? Well, in this case, you need a body with sensors to identify the food and actors to move from one place to another, right?

    Such a pitty, you do not take notes or just download the documentation to this XAI Summit, otherwise you could remember things you missed, not understood or mislead you into rash conclusions.

    BTW, a baby is as intelligent as a human can be, only it adapts during lifetime, by gathering knowledge via own experience, let alone that animals at that early stage are much more evolved.

    This also holds true to other species, even small ones like insects, as they also do have a brain which works real-time and 24/7, for the rest of their lives, in various environmental conditions.

    Math is a toolbox, based on attributes we can count in physical space, using certain units of measurement, single or combined. Measurement implies everything we can put into numbers is linear, same like moving step by step along a line, back and forth. Biological intelligence is located in a living brain, where each neuron acts non-linear, no numerical order of any feature we know. AI requires a living organism, which has to develop in time to accumulate knowledge by exploring its environment via own experience.

    1. Thanks for reading, I always take extensive notes at conferences and re-watch any available recordings before posting. In this case, I also was all over your website for more information and linked to it for reference. My criticism was not that I disagree with your approach, far from it – you are correct that we have completely restricted systems with their limited inputs so far and I personally believe that a more biological approach is necessary (I’ve blogged and also spoken on this). My personal frustration was that I wanted far more detail than you could provide in such a short session and I also understand why there’s nothing more that the high level component explanation publicly available at this time on your website (I’ll update the post to include the ANS structure slide). I would absolutely love to have more details on how you are letting your network “choose its food” – what sensors do you have, what environment is it in, how are you testing whether it is learning in a way that is comparable with a biological intelligence, and even how you made the decisions about the different levels… In a face to face conference I would have caught up with you for an in depth conversation 🙂

      I am 100% genuine with my comment that I will follow your company with interest – I think your approach is really interesting. We are currently making progress only with more power to networks – it will take something different, like your approach, for the next innovation in AI. If we happen to be in the same conference again I will definitely seek you out!

Comments are closed.