Researchers are exploring many paths to building more explainable artificial intelligence models. I recently wrote about one of those approaches, Deep Explanations.
Today I want to dig into a second popular approach, Model Induction. Specifically, I’ll look at two techniques:
The goal of model induction is to look at the behavior of the resulting trained system and use that to infer the model that can be used to explain the behavior.
Let’s first look at LIME, which stands for Local Interpretable Model-Agnostic Explanations. We can tease apart what these words actually refer to:
LOCAL - this refers to something that is true for the model but has locality. Continuity is important; you want locality within a specific area of the image. An example shows a Labrador with a human body, playing a guitar that is both acoustic and electric guitar. It’s a very hard image to classify if you treat it globally. It’s a mash up on purpose.
By using a local approach, and carving up the image into sections, you’re able to make explanations that make sense locally.
INTERPRETABLE - This means that it has to be readable by a non-expert human and provide a qualitative understanding between the joint values of the inputs and the responses.
MODEL AGNOSTIC - This means that we don’t want to this to depend on the details of the model selected. A lot of the previous work is model dependent; this is model independent.
Using this method, instead of trying to generate textual label, we take an input and highlight what about the input made it an exemplar or not.
Doing this globally is not realistic. So we look at local super pixel groups, which underscores an important facet of explainability within these systems - part of what we want to get out of building explainable system is not just the explanation itself; it’s also a mental model that we can impart to people using the system so they can have a measure of trust for why something is being done in a particular way.
The LIME technique starts to get at this because the model can look at superpixel groups and tell you what it thought the group was indicative of, and composite all of those into an explanation.
To further understand this, let’s look at the local explainability:
You take a sample, you take a weighted random sample around it, you remove the other pixels and you feed it again through the classifier to see if it keeps getting the same classification. You do this over and over again. But there’s an additional constraint put on this, which is that it has to be interpretable by a human. The measure for this being the continuity. It needs to cluster; it’s not ok to have 4 pixels here, 8 pixels there.
You can see examples from the labrador experiment. It takes a portion from the Labrador face, but also takes a portion from pixels on the bottom which get an increased negative score because the groups are further away.
This process was done for both images and texts. They picked overall configurations based on maximizing the differences between features. When you get all the different classifications together, you want to pick all the different explanations you could have that maximize the number as wide as possible distinctions in the image; Labrador vs guitar is a pretty wide distinction in a feature, and those are selected to be put together.
An interesting note - when these researchers brought in their test groups, they used data scientists and non-data scientists, and reported results separately to remove biases. (You can read the full paper here).
Anchor LIME is an approach out of the University of Washington that adds another facet to this work.
Using Anchor LIME, researchers are trying to create local explanations where they can tie if-then rules for more precise explanations.
Again looking at the picture of the Labrador, the image is visually reasonable but hard to describe verbally. The researchers wanted to make it more reasonable to verbally describe what was happening.
In the example below, you see many features being taken into account with various weights in the LIME process. But in the ALIME process, these are distilled down into exemplar rules, or anchors, which are used to distinguish what it is that’s actually driving the explanation for the given model.
The next example highlights a visual explanation. The system is very good; the derived anchor for the image of the zebra has latched onto the stripes along its body. Even substituting in different images, the model still consistently predicts that it’s a zebra. So this is a reasonable anchor according to the model. But is it a good explanation?
It’s fairly subjective. It mentions nothing about a horse’s head, a tail, or four legs. It’s only looking at the pattern of stripes. So it’s very possible to fake out the model by showing a striped zebra pattern on a jacket, and it will still predict the image is a zebra.
This approach is certainly interesting and useful as an explanatory tool. It allows us to identify what the predictive failure models will be, and test them, so that we can refine and iterate models to eliminate those errors. But ultimately it would be more satisfying to have a model correlate more with our own personal understanding of what it means for something to be a zebra. (You can read the full paper here).
As explainable AI techniques evolve, it’s useful to remember that often these explanatory tools are useful in a debugging context more so than the context of being the ultimate explanation that the system will provide. I will touch on a third approach to explainable artificial intelligence - machine teaching - in my next post. In the meantime, you can watch my full O’Reilly AI talk on this topic at the link below or visit bons.ai to learn how we’re thinking about explainability in industrial AI systems.