1 Introduction

“What’s 2 + 5?” asked teacher Wilhelm van Osten. The answer, of course, was 7. The crowd that had gathered to witness this spectacle was amazed. Because it wasn’t a human who answered, but a horse called “Clever Hans”. Clever Hans could do math – or so it seemed. 2 + 5? That’s seven taps with the horse’s foot and not one more. Quite impressive for a horse.

And indeed, Clever Hans was very clever, as later investigations showed. But its skills were not in math, but in reading social cues. It turned out that an important success factor was that the human asking Hans knew the answer. Hans relied on the tiniest changes in the human’s body language and facial expressions to stop tapping at the right time.

Don’t blindly trust model performance

In machine learning, we have our own versions of this clever horse: Clever Hans Predictors, a term coined by Lapuschkin et al. (2019). Some examples:

A machine learning model trained to detect whales learned to rely on artifacts in audio files instead of basing the classification on the audio content (DeLMA and Cukierski 2013).
An image classifier learned to use text on images instead of visual features (Lapuschkin et al. 2019).
A wolf versus dog classifier relied on snow in the background instead of image regions that showed the animals (Ribeiro, Singh, and Guestrin 2016).

In all these examples, the flaws didn’t lower the predictive performance on the test set. So it’s not surprising that people are wary, even for well-performing models. They want to look inside the models, to make sure they are not taking shortcuts. And there are many other reasons to make models interpretable. For example, scientists are using machine learning in their work. In a survey asking scientists for their biggest concerns about using machine learning, the top answer was “Leads to more reliance on pattern recognition without understanding” (Van Noorden and Perkel 2023). This lack of understanding is not unique to science. If you work in marketing and build a churn model, you want to predict not only who is likely to churn, but also understand why. Otherwise, how would the marketing team know what the right response is? The team could send everyone a voucher, but what if the reason for high churn probability was that they are annoyed by the many emails? Good predictive performance alone wouldn’t be enough to make full use of the churn model.

Further, many data scientists and statisticians have told me that one reason they are using “simpler models” is that they couldn’t convince their boss to use a “black box model”. But what if the complex models make better predictions? Wouldn’t it be great if you could have both good performance and interpretability?

To solve trust issues, to provide insights into the models, and to better debug the models, you are reading the right book. Interpretable Machine Learning offers the tools to extract insights from the model.

A young field with old roots

Linear regression models were already used at the beginning of the 19th century. (Legendre 1806; Gauss 1877). Statistical modeling grew around that linear regression model, and today we have more options like generalized additive models and LASSO, to name some popular model classes. In classic statistics, we typically model distributions and rely on further assumptions that allow us to make conclusions about the world. To do that, interpretability is key. For example, if you model the effect of drinking alcohol on risk for cardiovascular problems, statisticians need to be able to extract that insight from the model. This is typically done by keeping the model interpretable and having a coefficient that can be interpreted as the effect of a feature on the outcome.

Machine learning has a different modeling approach. It’s more task-driven and prediction-focused, and the emphasis is on algorithms rather than distributions. Typically, machine learning produces more complex models. Foundational work in machine learning began in the mid-20th century, while later developments expanded the field further in the later half of the century. However, neural networks go back to the 1960s (Schmidhuber 2015), and rule-based machine learning, which is part of interpretable machine learning, is an active research area since the mid of the 20th century. While not the main focus, interpretability has always been a concern in machine learning, and researchers suggested ways to improve interpretability: An example would be the random forest (Breiman 2001) which already came with built-in feature importance measure.

Interpretable Machine Learning, or Explainable AI, has really exploded as a field around 2015 (Molnar, Casalicchio, and Bischl 2020). Especially the subfield of model-agnostic interpretability, which offers methods that work for any model, gained a lot of attention. New methods for the interpretation of machine learning models are still being published at breakneck speed. To keep up with everything that is published would be madness and simply impossible. That’s why you will not find the most novel and fancy methods in this book, but established methods and basic concepts of machine learning interpretability. These basics prepare you for making machine learning models interpretable. Internalizing the basic concepts also empowers you to better understand and evaluate any new paper on interpretability published on the pre-print server arxiv.org in the last 5 minutes since you began reading this book (I might be exaggerating the publication rate).

How to read the book

You don’t have to read the book cover to cover, since Interpretable Machine Learning is more of a reference book with most chapters describing one method. If you are new to interpretability, I would only recommend reading the chapters on Interpretability, Goals, and Methods Overview first to understand what interpretability is all about and to have a “map” where you can place each method.

The book is organized into the following parts:

The introductory chapters, including interpretability definitions and methods overview
Interpretable models
Local model-agnostic methods
Global model-agnostic methods
Methods for neural networks
Outlook
Machine learning terminology

Each method chapter follows a similar structure: The first paragraph summarizes the method, followed by an intuitive explanation that doesn’t rely on math. Then we look into the theory of the method to get a deeper understanding of how it works, including math and algorithms. I believe that a new method is best understood using examples. Therefore, each method is applied to real data. Some people say that statisticians are very critical people. For me, this is true because each chapter contains critical discussions about the pros and cons of the respective interpretation method. This book is not an advertisement for the methods, but it should help you decide whether a method is a good fit for your project or not. In the last section of each chapter, I listed available software implementations.

I hope you will enjoy the read!

Breiman, Leo. 2001. “Random Forests.” Machine Learning 45 (1): 5–32. https://doi.org/10.1023/A:1010933404324.

DeLMA, and Will Cukierski. 2013. “The ICML 2013 Whale Challenge - Right Whale Redux.” https://kaggle.com/competitions/the-icml-2013-whale-challenge-right-whale-redux.

Gauss, Carl Friedrich. 1877. Theoria Motus Corporum Coelestium in Sectionibus Conicis Solem Ambientium. Vol. 7. FA Perthes.

Lapuschkin, Sebastian, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. 2019. “Unmasking Clever Hans Predictors and Assessing What Machines Really Learn.” Nature Communications 10 (1): 1096. https://doi.org/10.1038/s41467-019-08987-4.

Legendre, Adrien Marie. 1806. Nouvelles méthodes Pour La détermination Des Orbites Des Comètes: Avec Un Supplément Contenant Divers Perfectionnemens de Ces méthodes Et Leur Application Aux Deux Comètes de 1805. Courcier.

Molnar, Christoph, Giuseppe Casalicchio, and Bernd Bischl. 2020. “Interpretable Machine Learning – A Brief History, State-of-the-Art and Challenges.” In ECML PKDD 2020 Workshops, edited by Irena Koprinska, Michael Kamp, Annalisa Appice, Corrado Loglisci, Luiza Antonie, Albrecht Zimmermann, Riccardo Guidotti, et al., 417–31. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-65965-3_28.

Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. 2016. “"Why Should I Trust You?": Explaining the Predictions of Any Classifier.” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–44. KDD ’16. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/2939672.2939778.

Schmidhuber, Jürgen. 2015. “Deep Learning in Neural Networks: An Overview.” Neural Networks 61 (January): 85–117. https://doi.org/10.1016/j.neunet.2014.09.003.

Van Noorden, Richard, and Jeffrey M. Perkel. 2023. “AI and Science: What 1,600 Researchers Think.” Nature 621 (7980): 672–75. https://doi.org/10.1038/d41586-023-02980-0.