5.7 Local Surrogate Models (LIME)

Local surrogate models are interpretable models used to explain individual predictions of black box machine learning models. Local interpretable model-agnostic explanations (LIME) (Ribeiro, M.T., Singh, S. and Guestrin, C., 201633) is a paper in which the authors propose a concrete implementation of local surrogate models. Surrogate models are trained to approximate the predictions of the underlying black box model. Instead of trying to fit a global surrogate model, LIME focuses on fitting local surrogate models to explain why single predictions were made.

The idea is quite intuitive. First of all, forget about the training data and imagine you only have the black box model where you can input data points and get the models predicted outcome. You can probe the box as often as you want. Your goal is to understand why the machine learning model made a certain prediction. LIME tests out what happens to the model’s predictions when you feed variations of your data into the machine learning model. LIME generates a new dataset consisting of perturbed samples and the associated black box model’s predictions. On this dataset LIME then trains an interpretable model weighted by the proximity of the sampled instances to the instance of interest. The interpretable model can basically be anything from this chapter, for example LASSO or a decision tree. The learned model should be a good approximation of the machine learning model locally, but it does not have to be so globally. This kind of accuracy is also called local fidelity.

Mathematically, local surrogate models with interpretability constraint can be expressed as follows:

\[\text{explanation}(x)=\arg\min_{g\in{}G}L(f,g,\pi_x)+\Omega(g)\] The explanation model for instance x is the model g (e.g. linear regression model) that minimizes loss L (e.g. mean squared error), which measures how close the explanation is to the prediction of the original model f (e.g. an xgboost model), while the model complexity \(\Omega(g)\) is kept low (e.g. favor fewer features). G is the family of possible explanation, for example all possible linear regression models. The proximity measure \(\pi_x\) defines how large the neighbourhood is around instance x that we consider for the explanation. In practice, LIME only optimizes the loss part. The user has to determine the complexity, e.g. by selecting the maximum number of features that the linear regression model may use.

The recipe for fitting local surrogate models:

  • Choose your instance of interest for which you want to have an explanation of its black box prediction.
  • Perturb your dataset and get the black box predictions for these new points.
  • Weight the new samples by their proximity to the instance of interest.
  • Fit a weighted, interpretable model on the dataset with the variations.
  • Explain prediction by interpreting the local model.

In the current implementations (R and Python) for example linear regression can be chosen as interpretable surrogate model. Upfront you have to choose \(K\), the number of features that you want to have in your interpretable model. The lower the \(K\), the easier the model is to interpret, higher \(K\) potentially creates models with higher fidelity. There are different methods for how to fit models with exactly \(K\) features. A solid choice is Lasso. A Lasso model with a high regularisation parameter \(\lambda\) yields a model with only the intercept. By refitting the Lasso models with slowly decreasing \(\lambda\), one after each other, the features are getting weight estimates different from zero. When \(K\) features are in the model, you reached the desired number of features. Other strategies are forward or backward selection of features. This means you either start with the full model (=containing all features) or with a model with only the intercept and then testing which feature would create the biggest improvement when added or removed, until a model with \(K\) features is reached. Other interpretable models like decision trees are also possible.

How do you get the variations of the data? This differs depending on the type of data, which can be either text, an image or tabular data. For text and image the solution is turning off and on single words or super-pixels. In the case of tabular data, LIME creates new samples by perturbing each feature individually, by drawing from a normal distribution with mean and standard deviation from the feature.

5.7.1 LIME for Tabular Data

Tabular data means any data that comes in tables, where each row represents an instance and each column a feature. LIME sampling is not done around the instance of interest, but from the training data’s mass centre, which is problematic. But it increases the likelihood that the outcome for some of the sampled points predictions differ from the data point of interest and that LIME can learn at least some explanation.

It’s best to visually explain how the sampling and local model fitting works:

How LIME sampling works: A) The black box model predicts one of two classes given feature x1 and x2. Most data points have class 0 (darker colour), and the ones with class 1 are grouped in an upside-down V-shape (lighter colour). The plot displays the decision boundaries learned by a machine learning model. In this case it was a Random Forest, but it does not matter, because LIME is model-agnostic and we only care about the decision boundaries. B) The yellow point is the instance of interest, which we want to explain. The black dots are data sampled from a normal distribution around the means of the features in the training sample. This needs to be done only once and can be reused for other explanations. C) Introducing locality by giving points near the instance of interest higher weights. D) The colours and signs of the grid display the classifications of the locally learned model form the weighted samples. The white line marks the decision boundary (P(class) = 0.5) at which the classification of the local model changes.

FIGURE 5.33: How LIME sampling works: A) The black box model predicts one of two classes given feature x1 and x2. Most data points have class 0 (darker colour), and the ones with class 1 are grouped in an upside-down V-shape (lighter colour). The plot displays the decision boundaries learned by a machine learning model. In this case it was a Random Forest, but it does not matter, because LIME is model-agnostic and we only care about the decision boundaries. B) The yellow point is the instance of interest, which we want to explain. The black dots are data sampled from a normal distribution around the means of the features in the training sample. This needs to be done only once and can be reused for other explanations. C) Introducing locality by giving points near the instance of interest higher weights. D) The colours and signs of the grid display the classifications of the locally learned model form the weighted samples. The white line marks the decision boundary (P(class) = 0.5) at which the classification of the local model changes.

As always, the devil’s in the detail. The definition of a meaningful neighbourhood around a point is difficult. LIME currently uses an exponential smoothing kernel to define the neighbourhood. A smoothing kernel is a function that takes two data instances and returns a proximity measure. The kernel width determines how large the neighbourhood is: A small kernel width means that an instance must be very close to impact the local model, a larger width means that instances that are farther away also influence the model. If you look at LIME’s Python implementation (file lime/lime_tabular.py) you will see that it uses an exponential smoothing kernel (on the normalized data) and the kernel width is 0.75 times the square root of the number of columns of the training data. It looks like an innocent line of code, but it’s like an elephant sitting in your living room next to the good porcelain you got from your grandparents. The big problem is that we don’t have a good way to find the best kernel or the optimal width. And where the hell does the 0.75 even come from? In certain scenarios, you can easily flip your explanation by changing the kernel width, as shown in the following figure:

Explanation of the prediction of instance x = 1.6. The predictions of the black box model dependent on a single feature is represented as a black line and the distribution of the data is indicated with rugs. Three local surrogate models with different kernel widths are computed. The resulting linear regression model relies heavily on the kernel width: Has the feature a negative, positive or no effect for x = 1.6? Decide for yourself, I don't know the answer.

FIGURE 5.34: Explanation of the prediction of instance x = 1.6. The predictions of the black box model dependent on a single feature is represented as a black line and the distribution of the data is indicated with rugs. Three local surrogate models with different kernel widths are computed. The resulting linear regression model relies heavily on the kernel width: Has the feature a negative, positive or no effect for x = 1.6? Decide for yourself, I don’t know the answer.

The example showed only one feature. In high-dimensional feature spaces it gets much worse. It’s also very unclear whether the distance measure should treat all features equally. Is one distance unit for feature x1 the same as one unit for feature x2? Distance measures are quite arbitrary and distances in different dimensions (aka features) might not be comparable at all.

5.7.1.1 Example

Let’s look at a concrete example. We go back to the bike rental data and turn the prediction problem into a classification: After accounting for the trend that the bike rental became more popular over time we want to know on a given day if the number of rented bikes will be above or below the trend line. You can also interpret ‘above’ as being above the mean bike counts, but adjusted for the trend.

First we train a Random Forest with 100 trees on the classification task. Given seasonal and weather information, on which day will the number of rented bikes be above the trend-free average?

The explanations are created with 2 features. The results of the sparse local linear model that was fitted for two instances with different predicted classes:

LIME explanations for two instances of the bike rental dataset. Warmer temperature and good weather situation have a positive effect on the prediction. The x-axis shows the feature effect: The weight times the actual feature value.

FIGURE 5.35: LIME explanations for two instances of the bike rental dataset. Warmer temperature and good weather situation have a positive effect on the prediction. The x-axis shows the feature effect: The weight times the actual feature value.

It becomes clear from the figure, that it is easier to interpret categorical features than numerical features. A solution is to categorize the numerical features into bins.

5.7.2 LIME for Text

LIME for text differs from LIME for tabular data. Variations of the data are created differently: Starting from the original text, new texts are created by randomly removing words from it. The dataset is represented with binary features for each word. A feature is 1 if the respective word is included and 0 if it was removed.

5.7.2.1 Example

In this example we classify spam vs. ham of YouTube comments.

The black box model is a decision tree on the document word matrix. Each comment is one document (= one row) and each column is the number of occurrences of a specific word. Decision trees are easy to understand, but in this case the tree is very deep. Also in the place of this tree there could have been a recurrent neural network or a support vector machine that was trained on the embeddings from word2vec. From the remaining comments two were selected for showing the explanations.

Let’s look at two comments of this dataset and the corresponding classes:

CONTENT CLASS
267 PSY is a good guy 0
173 For Christmas Song visit my channel! ;) 1

In the next step we create some variations of the datasets, which are used in a local model. For example some variations of one of the comments:

For Christmas Song visit my channel! ;) prob weight
2 1 0 1 1 0 0 1 0.09 0.57
3 0 1 1 1 1 0 1 0.09 0.71
4 1 0 0 1 1 1 1 0.99 0.71
5 1 0 1 1 1 1 1 0.99 0.86
6 0 1 1 1 0 0 1 0.09 0.57

Each column corresponds to one word in the sentence. Each row is a variation, 1 indicates that the word is part of this variation and 0 indicates that the word has been removed. The corresponding sentence for the first variation is “Christmas Song visit my ;)”.

And here are the two sentences (one spam, one no spam) with their estimated local weights found by the LIME algorithm:

case label_prob feature feature_weight
1 0.0872151 good 0.000000
1 0.0872151 a 0.000000
1 0.0872151 PSY 0.000000
2 0.9939759 channel! 6.908755
2 0.9939759 visit 0.000000
2 0.9939759 Christmas 0.000000

The word “channel” points to a high probability of spam.

5.7.3 LIME for Images

This section was written by Verena Haunschmid.

LIME for images works differently than for tabular data and text. Intuitively it would not make much sense to perturb single pixels, since a lot more than one pixel contribute to one class. By randomly changing individual pixels, the predictions would probably not change much. Therefore, variations of the samples (i.e. images) are created by performing superpixel segmentation and switching superpixels off. Superpixels are connected pixels with similar colors and can be turned off by replacing each pixel by a user provided color (e.g., a reasonable value would by gray). The user can also provide a probability for turning off a superpixel in each permutation.

5.7.3.1 Example

Since the computation of image explanations is rather slow, the lime R package contains a precomputed example which we will also use to show the output of the method. Image data is great for visualization, the explanations can be displayed directly on the image samples. Since we can have several predicted labels per image (ordered by probability), we can explain the top n_labels. For the following image the top 3 predictions were strawberry; candle, taper, wax light; and Granny Smith. The prediction and the explanation in the first case are very reasonable. For the second prediction it is quite interesting to see which part of the image contributed to this class. We could conclude that there were objects labeled as candle, taper, wax light in the training set that looked shiny like the tomato.

LIME expalanations for the top 3 classes for image classification made by Google's Inception neural network. The example is taken from the LIME paper, Ribeiro 2016.

FIGURE 5.36: LIME expalanations for the top 3 classes for image classification made by Google’s Inception neural network. The example is taken from the LIME paper, Ribeiro 2016.

5.7.4 Advantages

  • Even if you exchange the underlying machine learning model, you can still use the same local, interpretable model for explanation. Suppose the people looking at the explanations understand decision trees best. Because you use local surrogate models, you use decision trees as explanation without having to actually use a decision tree to make the predictions. For example, you could use an SVM. And if it turns out that an xgboost model works better, you can replace the SVM and still keep the decision trees to explain the predictions.
  • Local surrogate models benefit from the literature and experience of training and interpreting interpretable models.
  • When using LASSO or short trees, the resulting explanations are short (=selective) and possibly contrastive. Therefore, they make human-friendly explanations. That’s why I see LIME more in applications where the recipient of the explanation is a lay person or someone with very little time. It is not sufficient for complete causal attributions, so I don’t see LIME in compliance scenarios where you might be legally required to fully explain a prediction. Also for debugging machine learning models, it is useful to have all the reasons instead of a few.
  • LIME is one of the few methods that works for tabular data, texts and images.
  • The fidelity measure (how well the interpretable model approximates the black box predictions) gives us a good idea of how reliable the interpretable model is in explaining the black box predictions in the neighbourhood of the data instance of interest.
  • LIME is implemented in Python and R (lime package and iml package) and is very easy to use.
  • The explanations created with local surrogate models can use other features than the original model. This can be a big advantage over other methods, especially if the original features cannot bet interpreted. A text classifier can rely on abstract word embeddings as features, but the explanation can use the presence or absence of words in a sentence. A regression model can rely on non-interpretable transformation of some attributes, but the explanations can be created with the original attributes.

5.7.5 Disadvantages

  • The correct definition of the neighbourhood is a very big, unsolved problem when using LIME with tabular data. IMHO it is the biggest problem with LIME and why I would only advise to use it with great care. For each application you have to try out different kernel settings and see for yourself if it makes sense. Unfortunately, that’s the best advice I can give to find good kernel widths.
  • Sampling could be improved in the current implementation of LIME. Data point are sampled from a Gaussian distribution, ignoring the correlation between the features. This can lead to unlikely data points which are then used to learn local explanation models.
  • The complexity of the explanation model has to be defined in advance. That’s just a small complaint, because in the end I see no way around the user having to define the tradeoff between fidelity and sparsity.
  • Another, really big issue is the instability of the explanations. In one paper 34 the authors showed that the explanations of two very close points in a simulated setting varied strongly. Also, in my experience, if you repeat the sampling process, then the explantions that come out can be different. Instability means that it’s hard to trust the explanations, and you should be very critical.

Conclusion: Local surrogate models, with LIME as a concrete implementation, are very promising. But the method is still in development phase and many problems need to be solved before it can be safely applied.


  1. Ribeiro, M.T., Singh, S. and Guestrin, C., 2016, August. Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1135-1144). ACM.

  2. Alvarez-Melis, David, and Tommi S. Jaakkola. “On the Robustness of Interpretability Methods.” arXiv preprint arXiv:1806.08049 (2018).