3 Goals of Interpretability
Interpretability is not an end in itself, but a means to an end. It depends on your specific goals which interpretability approach to use, which deserves more discussion.1 Inspired by Adadi and Berrada (2018), I discuss three goals of interpretability: improve the model, justify the model and predictions, and discover insights.2
Improving the model
When determining your interpretability goals, evaluate your model’s performance metrics first. This can help you identify if your current goal should be model improvement.
You can use interpretability methods to improve the model. In the introduction, we talked about Clever Hans predictors, which refers to models that have learned to take “shortcuts,” like relying on non-causal features to make predictions. The tricky thing about these shortcuts is that they often don’t decrease model performance – they might actually increase it. Like relying on snow in the background to classify whether a picture shows a wolf or a dog (Ribeiro, Singh, and Guestrin 2016) or misleadingly predicting that asthma patients admitted to the emergency room are less likely to die of pneumonia (Caruana et al. 2015).3 Interpretability helps to debug the model by identifying when the model takes such unwanted shortcuts or makes other mistakes. Some of the bugs may be as simple as the wrong encoding of the target feature, or an error in feature engineering. Do feature effects contradict your domain knowledge? You may have switched target classes. A feature you know is important isn’t used by the model according to your investigations? You may have made a mistake in data processing or feature engineering.
I’ve also used interpretability methods in machine learning competitions to identify important features so that I can get ideas for feature engineering. For example, I participated in a competition to predict water supply, and through feature importance, I realized that the snow in the surrounding mountains was the most important feature. So I decided to try out alternative snow data sources that might make this feature even better, and invested time in feature engineering. In the end, it helped improve the model performance.
Justify model and predictions
Interpretable machine learning helps justify the model and its predictions to other people or entities. It’s helpful to think of the stakeholders of a machine learning system (Tomsett et al. 2018):
- Creators build the system and train the model.
- Operators interact with the system directly.
- Executors make decisions based on the outputs.
- Decision subjects are affected by the decisions.
- Auditors audit and investigate the system.
- Data subjects are people whose data the model is trained on.
These stakeholders want justification of the model and its predictions and may require very different types of justification.
A deliverable of the machine learning competition I participated in was to generate reports that explain the water supply forecasts. These reports and explanations are for hydrologists and officials who have to make decisions based on these water supply forecasts (Executors). Let’s say the model predicts an unusually low water supply; it would mean that officials would have to issue drought contingency plans and adjust water allocations. Rather than blindly trusting the predictions, the decision maker may want to verify the predictions by looking at the explanations of why this particular prediction was made. And if the explanation conflicts with domain knowledge, the decision maker might question the forecast and investigate.
In general, contesting a prediction made by a machine learning system requires interpretability of the system. Imagine that a machine learning system rejects a loan application. For a person (decision subject) to contest that rejection, there needs to be some justification for why that prediction was made. This concept is called recourse.
Another example: If a company wants to build a medical device, it has to go through a lot of regulatory processes to show that the device is safe and efficient. If the device relies on machine learning, things get more complicated since the company also has to show that the machine learning model works as intended. Interpretable machine learning is part of the solution to justify the model to the regulators (Auditors) who will either approve or reject the medical device.
Discover insights
Machine learning models are not only used for making predictions; they can also be used to make decisions or to study the relationship between features and the target. In both cases, the predictions are not enough, but we want to extract additional insights from the model.
A churn prediction model predicts how likely a person is to cancel their mobile phone contract, for example. The marketing team may rely on the model to make decisions about marketing campaigns. But without knowing why a person is likely to churn, it’s difficult to design an effective response.
More and more scientists are also applying machine learning to their research questions. For example, Zhang et al. (2019) used random forests to predict orchard almond yields based on fertilizer use. Prediction is not enough: they also used interpretability methods to extract how the different features, including fertilizer, affect the predicted yield. You need interpretability to extract the learned relationships between the features and the prediction. Otherwise, all you have is a function to make predictions.
Using machine learning in science is a much deeper philosophical question and requires more than just thinking about interpretability. That’s why Timo Freiesleben and I have written a book dedicated to justifying machine learning for science.
You can read it for free here: ml-science-book.com
Interpretable machine learning is useful not only for learning about the data, but also for learning about the model. For example, if you want to learn about how convolutional neural networks work, you can use interpretability to study what concepts individual neurons react to.
What are your goals in your machine learning project, and how can interpretability help you? Your goals will determine which interpretability approaches and methods to use. In the next chapter, we will take a look at the landscape of methods and discuss how they relate to your goals.
- When researchers propose new interpretable models or interpretability methods, they rarely discuss what specific goals they serve. I’m not excluding myself here: For example, I did not introduce this chapter on goals until the third edition of the book.↩︎ 
- The paper by Adadi and Berrada (2018) additionally introduced “control”, which refers to debugging the model and finding errors, but I subsumed this under “improvement”.↩︎ 
- A model predicted that patients who came to the emergency room with pneumonia were less likely to die from pneumonia when they had asthma, despite asthma being known as risk factor for pneumonia. Only thanks to using an interpretable model, the researchers found out that this model had learned this relationship. But it’s an unwanted “shortcut”: Asthma patients were treated earlier and more aggressively with antibiotics, so they were in fact less likely to develop severe pneumonia. The model learned the shortcut (asthma \(\Rightarrow\) lowered risk of dying from pneumonia), because it lacked features about the later treatment.↩︎ 
