An Introduction to Machine Learning Interpretability

Lately there has been a lot of interest in explainable AI/ML. Nobody wants to feel discriminated against by an algorithm and when we don't like its prediction or decision we want to know why it made that decision. Plus there is an added sense of security when we feel we understand (or could understand) how something works.

The concept of how well a human can understand the prediction of a machine learning model is often called interpretability or explainability. The two terms are often used interchangeably though I'm starting to strongly believe that interpretability is the better word for what we can do today.

There are several related but different concept here so let me point out that interpretability is related to Fairness which concerns a model being (socially) unbiased and not discriminating.

... It is related to Safety and Reliability which attempts to ensure that errors are not catastrophic and are expected and properly handled.

... It is related to Privacy which is concerned with protecting sensitive information.

... It is related to Justification which examines why a decision is a good decision.

... And it is related to and often used interchangeably with Explainabilty. However, I posit that explainability more closely means why something happened while interpretability tells us more about how things are correlated.

That is, when something goes wrong or we make an unexpected prediction your boss/client/user wants to know WHY and they want the explanation to be:

Machine learning (in its current form) is really good and finding hidden correlations in our data and using those correlations to make predictions. Interpretability examines how data features are related/correlated not why. It is not perscriptive or causal and it can't explain why something is correlated just that it is.


Radical Thesis : Explainability and interpretability are two different things.

There is a huge difference between how things are correlated and why one thing causes another. As many of us learned in school correlation is not causation. Humans are built to ask why but we can only answer how.

So, what can we do?

We may not be able to get the holy grail of causation but we can certainly examine our models and explore the correlations they find. And though it is certainly fair and useful to bring our own experiences and human brain power to bear we need to constantly be careful about jumping to unsupported conclusions that is unsupported causation.

To better understand the correlations in our models lets take a look at the tools we have to examine correlations.

First we have the 'explainable' models. These are traditionally linear models and simple decision trees. With a linear regression you can read off the coefficients directly. The bigger the coefficient the more important that feature is. With a simple decision tree we can visualize the nodes and follow the rules the decision tree creates from the data.

However, often those methods are not powerful enough to create accurate models and we turn to ensemble methods or perhaps neural nets. Some implementations provide feature importances to help examine interpretability. For all models though we can use permutation importances, partial dependency plots (PDP), individual conditional expectations (ICE), Local Interpretable Model-agnostic Explanations (lime) and Shapley value analysis.

Linear modelscoefficientsglobal
Decision trees / Rules nodes global
Tree ensemble feature importance global
Feature exploration permutation importance global
Partial Dependence Plot (PDP) single feature* / global
Indiv. Cond. Expectation (ICE) single feature / subsample
Surrogate Models Local Inter. Model-agnostic Exp. (lime) multi feature / local
Shapley values multi feature / local

A Practical Example

Lets examine these approaches with a practical hands on application.

Consider the King County Washington home sales data from Kaggle. This data set has home sales in King County (containing Seattle) WA, from May 2014 and May 2015. There are 19 features and 21,613 observations and we'll try to predict sale price.

King County, Washington

Data and Features

The data has 19 features listed below. Of these we'll use 'price' as the target variable and also drop 'id', 'date' and 'zipcode'. Date and zipcode may have additional predictive power but require additional data prep and we want to keep this example simple for now.

There is a notebook on Github so you can walk through the example if you are interested in more details.

id - a notation for a house date - Date house was sold
price - Price is prediction target bedrooms - Number of Bedrooms/House
bathrooms - Number of bathrooms/bedrooms sqft_living - square footage of the home
sqft_lot - square footage of the lot floors - Total floors (levels) in house
waterfront - House which has a view to a waterfront view - Has been viewed
condition - How good the condition is ( Overall ) grade - overall grade given to the housing unit, based on King County grading system
sqft_above - square footage of house apart from basement sqft_basement - square footage of the basement
yr_built - Built Year yr_renovated - Year when house was renovated
zipcode - zip lat - Latitude coordinate
long - Longitude coordinate sqft_living - 15Living room area in 2015(implies-- some renovations) This might or might not have affected the lotsize area
sqft_lot15 - lotSize area in 2015(implies-- some renovations)

If we check the correlations between features and price we see that sqft_living, grade, sqft_above, sqft_living15, and bathrooms are strongly correlated. Note that lat is in the middle of the pack and long near the end.

Feature Correlation
sqft_living 0.702035
grade 0.667434
sqft_above 0.605567
sqft_living15 0.585379
bathrooms 0.525138
view 0.397293
sqft_basement 0.323816
bedrooms 0.308350
lat 0.307003
waterfront 0.266369
floors 0.256794
yr_renovated 0.126434
sqft_lot 0.089661
sqft_lot15 0.082447
yr_built 0.054012
condition 0.036362
long 0.021626
zipcode -0.053203

Linear Regression

We can train a linear regression model and examine the coefficients easily enough but when we look at the performance (R2 score of 0.696) we wonder if we can do better.

model = LinearRegression(normalize=True), y_train)
model.score(X_val, y_val)

Decision Trees

A large decision tree turns out to be too difficult to visualize (you'll see why in a minute) so lets see if we can use a simple decision tree and visualize the nodes.

model = DecisionTreeRegressor(max_depth=4), y_train)
export_graphviz(model,out_file='', feature_names=X_train.columns)
with open("") as f:
    dot_graph =

This is a small decision tree, with a depth of 4, but has a lower score of 0.61 and is still hard to understand visually.


Note that the feature importances are interesting in that the top 5 include grade, sqft_living, lat, long and yr_built. All of these make sense but we have to note that lat and long were not strongly correlated with price. This is because they are not 'linearly' correlated with price and shows why a non linear model is important in this case.

0.470 grade
0.295 sqft_living
0.136 lat
0.053 long
0.027 yr_built

Next we could try a more complex model: A GradientBoostingRegressor. This model has better performance, 0.861, and also happens to have feature importance capabilities. The top 5 features are pretty similar to the smaller decision tree but there are some position changes and yr_built has been replaced by waterfront.

0.325 sqft_living
0.298 grade
0.164 lat
0.068 long
0.036 waterfront

Permutation Importance

What if a model did not have feature importance capability? For example if you wanted to use a neural net or a hand crafted ensemble of various kinds of models. Or perhaps we want to measure feature importance on different datasets such as in practice (the test set) versus just the importances discovered during training.

For this we can use a permutation importance tool such as ELI5. The idea is that you look at each feature in turn and shuffle the values in each column. This should destroy the predictive power of the feature while maintaining the same feature value distribution. The features who's shuffled values affect the model performance the most are most important. This simple technique works with any kind of model as long as you can shuffle the features individually and intelligently (ie. look out for one hot encodings).

perm = PermutationImportance(model).fit(X_train, y_train)
explore_feature_importances(perm.feature_importances_, X_train.columns)

Doing this yields a slightly different feature importance score. Notice that lat (proximity to Seattle) is even more important now.

0.281 lat
0.188 grade
0.165 sqft_living
0.120 long
0.036 sqft_living15

Partial Dependence Plot (PDP)

Along with randomly shuffling the values in a feature we can systematically change the value of one feature and keep the rest the same and note the affect on the prediction. For example if we do this with latitude we'd keep everything but lat the same and make predictions on all houses as if they were at one latitude and then at the next latitude and the next, etc. And then plot the result in a Partial Dependence Plot (PDP)

pdp_lat = pdp.pdp_isolate(model=model, dataset=X_val, model_features=X_train.columns, feature='lat')
fig, axes = pdp.pdp_plot(pdp_lat, 'Latitude')

The PDP shows us that the price of homes increases as it gets closer to 47.65 and then starts to drop off. Coincidentally Seattle's geo coordinates are 47.6062° N, 122.3321° W. The PDP even drops off sharply as we run out of data north of Seattle where King County ends. Typically PDP plots show a single line representing the mean change but PDPBox also shows a shaded area to represent variance. And it can even do multiple lines where each line represents an individual sample being moved through the space.

PDPBox even lets you plot two features at once. If we use lat and long together we can clearly see that houses would sell the most if they were in the north west corner of King county. Right where Seattle is. So if you want to get top dollar for your house move it there.

Local Interpretable Model-Agnostic Explanations (lime)

Next we look at approaches that claim that simple models can be used to 'locally' interpret more complex models. One recent approach that has gotten a lot of attention is Local Interpretable Model-Agnostic Explanations (lime). Given a trained model and a particular sample:

This lets you look at a particular sample and evaluate what led to the particular prediction by examining the local area around the sample. It also has the added benefit of estimation how much each feature contributed or detracted from the particular house.

In this example we examine one sample from the data and see that not being on the water front and not being viewed was a negative but the grade, sqft_living and lat was a positive.

Shapley Values

Finally, Shapley values, use a game theoretic approach to assigning attribution to features. The Shapley value is the average expected marginal contribution of one player after all possible combinations have been considered. Calculating Shapley values can be quite expensive but estimates have been developed, and implemented in SHAP, that speed the process with only a minimal loss in accuracy.

The chart can be a little tricky to read but it says that for this particular house the lat cost it ~$46k while the grade boost the price ~$250k and the long ~$93k.


If you have any questions or comments get in touch.


Want to be automatically notified of more articles like this?