Chapter 2 Interpretability

Throughout the book, I will use this rather simple, yet elegant definition of interpretability from Miller (2017) 3 : Interpretability is the degree to which a human can understand the cause of a decision. Another take is: Interpretability is the degree to which a human can consistently predict the model’s result 4. The higher the interpretability of a model, the easier it is for someone to comprehend why certain decisions (read: predictions) were made. A model has better interpretability than another model, if its decisions are easier to comprehend for a human than decisions from the second model. I will be using both the terms interpretable and explainable equally. Like Miller (2017), I believe it makes sense to distinguish between the terms interpretability/explainability and explanation. Making a machine learning interpretable can, but does not necessarily have to, imply providing a (human-style) explanation of a prediction. See the section about explanations to learn what we humans see as a good explanation.

  1. Miller, Tim. 2017. “Explanation in Artificial Intelligence: Insights from the Social Sciences.” arXiv Preprint arXiv:1706.07269.

  2. Kim, Been, Rajiv Khanna, and Oluwasanmi O. Koyejo. “Examples are not enough, learn to criticize! criticism for interpretability.” Advances in Neural Information Processing Systems. 2016.