Chapter 1 Introduction

This book will teach you how to make (supervised) machine learning models interpretable. The chapters contain some mathematical formulas, but you should be able to understand the ideas behind the methods even without the mathematics. This book is not for people who try to learn machine learning from scratch. If you are new to machine learning, there are loads of books and other resources for learning the basics. I recommend the book Elements of Statistical Learning from Hastie, Tibshirani, and Friedman (2009) 1 and Andrew Ng’s “Machine Learning” online course on coursera to get started with machine learning. Both the book and the course are available free of charge!

New methods for machine learning interpretability are published at breakneck speed. Keeping update with all of them would be madness and simply impossible. That’s why you won’t find the most novel and shiny methods in this book, but rather the basic concepts of machine learning interpretability. These basics will prepare you well to make machine learning models interpretable. Internalizing the basic concepts also empowers you to better understand and evaluate any new paper on interpretability that has been published on arxiv.org in the last 5 minutes since you began to read this book (I may be exaggerating).

This book starts with some (dystopian) short stories, which are not needed to understand the book, but hopefully are entertaining! Then the book explores the concepts of machine learning interpretability: It shows when interpretability is important and discusses different types of explanations. Terms used throughout the book can be looked up in the Terminology chapter. Most of the models and methods explained are presented using real data examples described in the Data chapter. One way to make machine learning interpretable is to use interpretable models, like linear models or decision trees. The other option is the use model-agnostic interpretation tools, that can be applied to any supervised machine learning model. The model-agnostic chapter covers methods like partial dependence plots and permutation feature importance. Model-agnostic methods work by changing the input of the machine learning model and measuring changes in the output. Model-agnostic methods that return data instances as explanations are covered in the chapter Example Based Explanations. All model-agnostic methods can be further differentiated by whether they attempt to explain global model behaviour over all data or individual predictions. The following methods explain the overall behaviour of the model: Partial Dependence Plots, Accumulated Local Effects, Feature Interaction, Feature Importance, Global Surrogate Models and Prototypes and Criticisms. For the explanation of individual predictions we have: Local Surrogate Models, Shapley Value Explanations, Counterfactual Explanations (and closely related Adversarial Examples). Some methods can be used to explain both aspects of the global model behaviour and individual predictions: Individual Conditional Expectation, Influential Instances.

Finish the book with an optimistic outlook on what the future of interpretable machine learning might look like.

You can either read the book from beginning to end or jump directly to the methods that interest you.

I hope you will enjoy the read!


  1. Hastie, T, R Tibshirani, and J Friedman. 2009. The elements of statistical learning. http://link.springer.com/content/pdf/10.1007/978-0-387-84858-7.pdf.