Model Understanding with Captum

# Understanding PyTorch Model Interpretability with Captum: A Comprehensive Guide

## Introduction

Welcome to the next video in the PyTorch training series! This video provides an overview of **Captum**, a powerful toolset for model interpretability in PyTorch. We will explore the basic concepts of Captum, including **attribution algorithms** and **visualizations**. The tutorial will demonstrate how to perform and visualize feature attributions for a computer vision classifier, apply layer attribution to examine the activity of a model's hidden layers, and introduce **Captive Insights**, an API for creating visualization widgets for images, text, and other features.

Captum offers a deep set of tools for explaining the behavior of your PyTorch models. This video and its accompanying interactive notebook provide only an overview of core features. For more in-depth tutorials, documentation, and an API reference, visit [captain.ai](https://captain.ai).

---

## Prerequisites

To run the interactive notebook associated with this video, ensure you have:

- Python version **3.6** or higher

- Flask version **1.1** or higher

- The latest versions of **PyTorch**, **TorchVision**, and **Captum** installed

You can install Captum easily using `pip` or the **Anaconda** distribution by specifying the PyTorch channel.

---

## Getting Started with a Pre-trained Model

We will begin by taking a pre-trained image classifier, **ResNet**, trained on the **ImageNet dataset**. Using the tools within **Captum**, we will gain insight into how the model responds to a particular input image and gives its prediction.

The first step is to import the necessary libraries for attribution methods and visualization tools from Captum:

```python

# Sample imports (as shown in the video)

from torchvision.models import resnet50

from captum.attr import IntegratedGradients, Occlusion, LayerGradCam

from captum可视化 import visualize_image_adder

import numpy as np

import matplotlib.pyplot as plt

```

Next, we load our pre-trained model and pull up an image to work with. The interactive notebook should include a folder of images for use in the tutorial—in this case, it will be a cat.

We then define some image transforms to prepare the image for consumption by the model and bring in the human-readable labels of the 1,000 ImageNet classes.

---

## Understanding Model Predictions: Feature Attribution

The core abstraction in Captum is **attribution**, a quantitative method of attributing a particular output or activity of a model to its input. The first kind of attribution we will explore is **feature attribution**, which helps answer questions like:

- Which parts of the input were most important in determining the model's prediction?

- Which pixels in an image drove the model's classification of that image?

### Integrated Gradients

The first feature attribution algorithm we will look at is **Integrated Gradients**. This gradient-based algorithm numerically approximates the integral of the gradients of the model's output with respect to its inputs, finding the most important paths through the model for a given input-output pair.

To use Integrated Gradients:

1. Create an IntegratedGradients object initialized with your model.

2. Call the `attribute` method on it, feeding in the input, output label, and an optional number of steps (note that this process can be computationally intensive).

Once the cell finishes running, we obtain a numerical importance map of the cat image with respect to the "cat" label. To visualize this map relative to the image itself, Captum's visualization module provides tools like `visualize_image_adder`.

Here’s how you might set it up:

```python

# Example code for visualizing the original image and its attributions

original_image = transformed_image.squeeze().cpu().detach().numpy()

plt.imshow(original_image)

plt.title("Original Image")

plt.axis("off")

plt.show()

attributions = integrated_gradients.attribute(...)

visualize_image_adder([None, attributions], [original_image, transformed_image], ["Original", "Attribution Heatmap"])

```

### Occlusion Method

Next, we try another feature attribution algorithm: **Occlusion**. Unlike Integrated Gradients, which is gradient-based, Occlusion involves screening out portions of the image and observing how that affects the model's output.

To use Occlusion:

1. Specify your input image and output label.

2. Define parameters like the sliding window size and stride length (analogous to configuration options in a convolutional neural network).

3. Set a baseline for occluded images (e.g., zero for zero-centered data).

After running the attribute call, we use `visualize_image_adder_multiple` to display multiple visualizations of the Occlusion attribution, including heat maps for both positive and negative attributions, and a mask method that highlights areas of focus.

---

## Exploring Model Hidden Layers: Layer Attribution

Feature attribution only covers inputs and outputs, but what about the activity inside the model? For this, we use **layer attribution**, which attributes the activity of a hidden layer to the model's input.

### Grad-CAM

One popular gradient-based algorithm for layer attribution is **Grad-CAM** (Gradient-weighted Class Activation Mapping). It computes the gradients of the output with respect to a specified model layer, averages the gradients for each channel, and multiplies this average by the layer activations to measure importance.

Here’s how you might implement Grad-CAM:

```python

# Example code for using LayerGradCam

layer_gradcam = LayerGradCam(model, target_layer)

attributions = layer_gradcam.attribute(input_image, target=cat_label)

# Visualize the activation map

activation_map = layer_gradcam.get_activations(input_image)

interpolated_map = layer_gradcam.interpolate(activation_map, size=(224, 224))

visualize_image_adder([interpolated_map], [original_image], ["Activation Map", "Interpolated Heatmap"])

```

### Visualizing Layer Contributions

Since the output of a convolutional layer is spatially correlated with the input, we can up-sample the activation map and compare it directly with the original image. Captum’s `layer attribution` parent class provides a convenience method for this purpose: `interpolate`.

By requesting a blended heat map showing the original image with an overlay, we gain insight into how hidden layers contribute to the model's output.

---

## Interactive Visualizations with Captive Insights

Finally, we explore **Captive Insights**, an advanced visualization tool that lets you create in-browser widgets for images, text, and arbitrary data. This tool allows you to experiment with different attribution methods and understand the activity that led to your model’s predictions, both correct and incorrect, with minimal code.

### Setting Up Captive Insights

To use Captive Insights:

1. Create an `AttributionVisualizer` object and configure it with your model, a scoring function for outputs (e.g., softmax), and a list of recognized classes.

2. Provide the dataset as an iterable that returns batches of images and labels.

Here’s a sample setup:

```python

# Example code for setting up Captive Insights

visualizer = AttributionVisualizer(model, score_fn=softmax, class_names=imagenet_classes)

visualizer.dataset = image_loader # A DataLoader or similar iterable

# Launch the visualization widget (details in documentation)

```

---

## Conclusion

This video and its accompanying tutorial provide a comprehensive introduction to Captum’s tools for model interpretability. By exploring feature attribution algorithms like Integrated Gradients and Occlusion, layer attribution techniques such as Grad-CAM, and interactive visualization tools like Captive Insights, you can gain deeper insights into how your PyTorch models work.

For more detailed tutorials, documentation, and access to the source code, visit [captain.ai](https://captain.ai).

"WEBVTTKind: captionsLanguage: enwelcome to the next video in the pytorch training series this video gives an overview of captain pi torches toolset for model interpretability in this video we'll discuss the basic concepts of captain that we'll be covering attributions attribution algorithms and visualizations we'll demonstrate how to perform and visualize feature attributions for a computer vision classifier we'll apply layer attribution to the same classifier to examine the activity of a model's hidden layers and finally we'll look at captive insights an api for creating visualization widgets for images text and other features captain provides a deep set of tools for explaining the behavior of your pi torch models this video and the accompanying interactive notebook provide only an overview of core features the website at captain ai contains more in-depth tutorials documentation and an api reference to run the interactive notebook associated with this video you'll want to install python version 3.6 or higher flask 1.1 or higher and the latest versions of pi torch torch vision and captain captain can be easily installed with pip or with anaconda by specifying the pi torch channel to start with we're going to take a pre-trained image classifier resnet trained against the imagenet dataset and we're going to use the tools within captum to gain insight into how the model responds to a particular input image to give its prediction this first cells a bunch of imports including attribution methods and visualization tools from captain which we'll examine shortly next we'll get our pre-trained model then we'll pull up an image to work with wherever you've got this video in the interactive notebook should also include a folder of images for use in this tutorial in our case it's going to be a cat next we'll define some image transforms to prepare the image for consumption by the model and bring in the human readable labels of the thousand imagenet classes now let's see what the model thinks this is and thinks our cat is a cat but why does the model think this is a picture of a cat for the answer to that we can look under the hood of the model with captain the core abstraction in captain is the attribution that is a quantitative method of attributing a particular output or activity of a model with its input the first kind of attribution is feature attribution this lets us ask which parts of the input were most important in determining a model's prediction it lets us find answers to questions like which words in this input question were most significant in deciding the answer which pixels in this input image drove the model's classification of the image which features of the input data were most significant to my regression model's prediction feature attribution just covers inputs and outputs though what if we want to see what's happening inside the model for that we have layer attribution this attributes the activity of a hidden layer of a model to the model's input it lets us answer questions like which neurons in this layer were most active given this input which neurons in this layer were most important to how the input influenced a particular output neuron how is the activation map output by this convolutional layer correlated to my input image finally there's neuron attribution this is similar to layer attribution but goes down to the level of individual neurons in the model in this tutorial we're going to look at feature attribution and layer attribution first feature attribution attributions are realized by an attribution algorithm a particular method of mapping model activity to inputs the first feature attribution algorithm we'll look at is called integrated gradients this algorithm numerically approximates the integral of the gradients of the model's output with respect to its inputs essentially finding the most important paths through the model for a given input output pair we'll go ahead and create an integrated gradients object initializing it with our model then we'll call the attribute method on it we'll feed it our input our output label and an optional number of steps to run note that running this cell can take a couple of minutes the process of integrating the gradients is computationally intensive once that cell finishes running we have a sort of numerical importance map of the cat image with respect to the cat label generated by the model for a simple regression model with few output categories we might just print that out as a table but for a more complicated cv model with a large input like an image it would help to be able to relate the importance map to the image visually captain's got you covered the visualization module gives you tools for exactly that here we're going to make two calls to visualize image adder the first displays the original image first we need to make some adjustments to the image we call squeeze to remove the batch dimension on the image we make sure we're running on cpu we detach the image tensor from computation history otherwise the image tension will keep tracking its computation history unnecessarily and finally we make it a numpy array and switch the dimensions around and put the color channels last the first argument of this method would normally be the attributions but for this call we're going to make that none we're just displaying the original image the second argument is our transformed image the third argument is a visualization method a string that indicates how you want the visualization to work here we told captain we just want to display the original image finally we give our visualization an instructive title the second call will make a visual mapping of the important regions of our image the first argument is the attributions we got from integrated gradients and the second is our transformed image for a method we'll specify heat map where color intensity maps to the importance of an image region captain allows you to use custom color maps from matplotlib and we've made one here that will slightly enhance the contrast of our heat map we specify sine as positive we're only looking at positive attributions running the cell we can see that the model is paying attention to the outline of the cat as well as the region around the center of the cat's face let's try another feature attribution algorithm next we'll try occlusion integrated gradients was a gradient-based attribution algorithm occlusion is different it's a based method that involves screening out portions of the image and seeing how that affects the output as before we're going to specify our input image and our output labeled the attribution algorithm for occlusion we're going to specify a few more items the first are the sliding window and the stride length and these are analogous to similar configuration options in a convolutional neural network we're also going to set our baseline that is our representation of an occluded image cell is zero depending on how your data are normalized you may wish to specify a different baseline but for zero centered data it makes sense to use zero we'll run the attribute call and give it a minute and in the next cell we're doing something new we're calling visualize image adder multiple to show multiple visualizations of the occlusion attribution besides the original image we'll show three visualizations the first two are heat maps of both positive and negative attributions you can see that we're providing a list of methods with heat map being the second and third we're also specifying a sign for each visualization and here you can see that we've asked for positive attributions on one heat map and negative on the other these indicate which for our final visualization we'll use the mask method this uses positive attributions to selectively screen the original image giving a striking visual representation of the areas of the image the model paid most attention to for this input output pair running the cell you can see that this maps well to what we learned from integrated gradients most of the activities around the cat's outline and the center of its face what about what the model is doing under the hood let's use a layer attribution algorithm to check the activity of one of the hidden layers grad cam is another gradient-based attribution algorithm designed for confidence it computes the gradients of the output with respect to the specified model layer averages the gradients for each channel and multiplies this average by the layer activations and uses this as a measure of the importance of a layer's output to get started with layer attribution we'll create a layer gradcam object and initialize it with our model and the layer we wish to examine then we'll give it the input output pair and ask it to do attribution we can visualize this with a heat map as we did before in this way you can visually examine which areas of a confidence activation map relate to your output we can do better than this though since the output of a convolutional layer is usually spatially correlated to the input we can take advantage of that by up-sampling that activation map and comparing it directly with the input the layer attribution parent class has a convenience method for upsampling the lower resolution confident activation map up to the input size we'll do that with the interpolate method here and ask the visualizer for a blended heat map showing the original image with a heat map superimposed and a masked image visualizations like this can give you insight into how hidden layers contribute to a particular output from your model captain comes with an advanced visualization tool called captain insights which lets you put together multiple visualizations in an in-browser widget that lets you configure the attribution algorithm and its parameters captain insights lets you visualize text image and arbitrary data we're going to try three images now the cat a teapot and a trilobite fossil again these images should be available wherever you've got the interactive notebook that goes with this video first we'll query the model to see what it thinks each of these are and it seems to be doing okay now let's set up captain insights we're going to use the attribution visualizer object and we'll configure it with our model a scoring function for the model's outputs here softmax a list of the classes the model recognizes here i'm stripping out an ordered list of the imagenet class names we'll tell it that we're looking at image features captive insights also handles text and arbitrary data as well and we'll give it a data set which is just an iterable that returns a batch of images and labels note that we haven't specified an algorithm or a visualization method these are things that you set up in the in browser widget now we ask the visualizer to render it starts off empty but we can set up configuration parameters and ask it to fetch our visualized attributions with the fetch button i'm going to leave things at the default setting for integrated gradients captain needs a few minutes to generate the attributions but now we can see that it ranks the first few predictions for each image with their probabilities and provides heat map attribution for the important regions of the image in this way captain insights lets you experiment with attribution methods and understand the activity that led to your model's predictions both correct and incorrect and lets you do it visually with minimal code finally don't forget to look at captain.ai for documentation tutorials an api reference and access to the source on github youwelcome to the next video in the pytorch training series this video gives an overview of captain pi torches toolset for model interpretability in this video we'll discuss the basic concepts of captain that we'll be covering attributions attribution algorithms and visualizations we'll demonstrate how to perform and visualize feature attributions for a computer vision classifier we'll apply layer attribution to the same classifier to examine the activity of a model's hidden layers and finally we'll look at captive insights an api for creating visualization widgets for images text and other features captain provides a deep set of tools for explaining the behavior of your pi torch models this video and the accompanying interactive notebook provide only an overview of core features the website at captain ai contains more in-depth tutorials documentation and an api reference to run the interactive notebook associated with this video you'll want to install python version 3.6 or higher flask 1.1 or higher and the latest versions of pi torch torch vision and captain captain can be easily installed with pip or with anaconda by specifying the pi torch channel to start with we're going to take a pre-trained image classifier resnet trained against the imagenet dataset and we're going to use the tools within captum to gain insight into how the model responds to a particular input image to give its prediction this first cells a bunch of imports including attribution methods and visualization tools from captain which we'll examine shortly next we'll get our pre-trained model then we'll pull up an image to work with wherever you've got this video in the interactive notebook should also include a folder of images for use in this tutorial in our case it's going to be a cat next we'll define some image transforms to prepare the image for consumption by the model and bring in the human readable labels of the thousand imagenet classes now let's see what the model thinks this is and thinks our cat is a cat but why does the model think this is a picture of a cat for the answer to that we can look under the hood of the model with captain the core abstraction in captain is the attribution that is a quantitative method of attributing a particular output or activity of a model with its input the first kind of attribution is feature attribution this lets us ask which parts of the input were most important in determining a model's prediction it lets us find answers to questions like which words in this input question were most significant in deciding the answer which pixels in this input image drove the model's classification of the image which features of the input data were most significant to my regression model's prediction feature attribution just covers inputs and outputs though what if we want to see what's happening inside the model for that we have layer attribution this attributes the activity of a hidden layer of a model to the model's input it lets us answer questions like which neurons in this layer were most active given this input which neurons in this layer were most important to how the input influenced a particular output neuron how is the activation map output by this convolutional layer correlated to my input image finally there's neuron attribution this is similar to layer attribution but goes down to the level of individual neurons in the model in this tutorial we're going to look at feature attribution and layer attribution first feature attribution attributions are realized by an attribution algorithm a particular method of mapping model activity to inputs the first feature attribution algorithm we'll look at is called integrated gradients this algorithm numerically approximates the integral of the gradients of the model's output with respect to its inputs essentially finding the most important paths through the model for a given input output pair we'll go ahead and create an integrated gradients object initializing it with our model then we'll call the attribute method on it we'll feed it our input our output label and an optional number of steps to run note that running this cell can take a couple of minutes the process of integrating the gradients is computationally intensive once that cell finishes running we have a sort of numerical importance map of the cat image with respect to the cat label generated by the model for a simple regression model with few output categories we might just print that out as a table but for a more complicated cv model with a large input like an image it would help to be able to relate the importance map to the image visually captain's got you covered the visualization module gives you tools for exactly that here we're going to make two calls to visualize image adder the first displays the original image first we need to make some adjustments to the image we call squeeze to remove the batch dimension on the image we make sure we're running on cpu we detach the image tensor from computation history otherwise the image tension will keep tracking its computation history unnecessarily and finally we make it a numpy array and switch the dimensions around and put the color channels last the first argument of this method would normally be the attributions but for this call we're going to make that none we're just displaying the original image the second argument is our transformed image the third argument is a visualization method a string that indicates how you want the visualization to work here we told captain we just want to display the original image finally we give our visualization an instructive title the second call will make a visual mapping of the important regions of our image the first argument is the attributions we got from integrated gradients and the second is our transformed image for a method we'll specify heat map where color intensity maps to the importance of an image region captain allows you to use custom color maps from matplotlib and we've made one here that will slightly enhance the contrast of our heat map we specify sine as positive we're only looking at positive attributions running the cell we can see that the model is paying attention to the outline of the cat as well as the region around the center of the cat's face let's try another feature attribution algorithm next we'll try occlusion integrated gradients was a gradient-based attribution algorithm occlusion is different it's a based method that involves screening out portions of the image and seeing how that affects the output as before we're going to specify our input image and our output labeled the attribution algorithm for occlusion we're going to specify a few more items the first are the sliding window and the stride length and these are analogous to similar configuration options in a convolutional neural network we're also going to set our baseline that is our representation of an occluded image cell is zero depending on how your data are normalized you may wish to specify a different baseline but for zero centered data it makes sense to use zero we'll run the attribute call and give it a minute and in the next cell we're doing something new we're calling visualize image adder multiple to show multiple visualizations of the occlusion attribution besides the original image we'll show three visualizations the first two are heat maps of both positive and negative attributions you can see that we're providing a list of methods with heat map being the second and third we're also specifying a sign for each visualization and here you can see that we've asked for positive attributions on one heat map and negative on the other these indicate which for our final visualization we'll use the mask method this uses positive attributions to selectively screen the original image giving a striking visual representation of the areas of the image the model paid most attention to for this input output pair running the cell you can see that this maps well to what we learned from integrated gradients most of the activities around the cat's outline and the center of its face what about what the model is doing under the hood let's use a layer attribution algorithm to check the activity of one of the hidden layers grad cam is another gradient-based attribution algorithm designed for confidence it computes the gradients of the output with respect to the specified model layer averages the gradients for each channel and multiplies this average by the layer activations and uses this as a measure of the importance of a layer's output to get started with layer attribution we'll create a layer gradcam object and initialize it with our model and the layer we wish to examine then we'll give it the input output pair and ask it to do attribution we can visualize this with a heat map as we did before in this way you can visually examine which areas of a confidence activation map relate to your output we can do better than this though since the output of a convolutional layer is usually spatially correlated to the input we can take advantage of that by up-sampling that activation map and comparing it directly with the input the layer attribution parent class has a convenience method for upsampling the lower resolution confident activation map up to the input size we'll do that with the interpolate method here and ask the visualizer for a blended heat map showing the original image with a heat map superimposed and a masked image visualizations like this can give you insight into how hidden layers contribute to a particular output from your model captain comes with an advanced visualization tool called captain insights which lets you put together multiple visualizations in an in-browser widget that lets you configure the attribution algorithm and its parameters captain insights lets you visualize text image and arbitrary data we're going to try three images now the cat a teapot and a trilobite fossil again these images should be available wherever you've got the interactive notebook that goes with this video first we'll query the model to see what it thinks each of these are and it seems to be doing okay now let's set up captain insights we're going to use the attribution visualizer object and we'll configure it with our model a scoring function for the model's outputs here softmax a list of the classes the model recognizes here i'm stripping out an ordered list of the imagenet class names we'll tell it that we're looking at image features captive insights also handles text and arbitrary data as well and we'll give it a data set which is just an iterable that returns a batch of images and labels note that we haven't specified an algorithm or a visualization method these are things that you set up in the in browser widget now we ask the visualizer to render it starts off empty but we can set up configuration parameters and ask it to fetch our visualized attributions with the fetch button i'm going to leave things at the default setting for integrated gradients captain needs a few minutes to generate the attributions but now we can see that it ranks the first few predictions for each image with their probabilities and provides heat map attribution for the important regions of the image in this way captain insights lets you experiment with attribution methods and understand the activity that led to your model's predictions both correct and incorrect and lets you do it visually with minimal code finally don't forget to look at captain.ai for documentation tutorials an api reference and access to the source on github you\n"