Model Understanding with Captum
# Understanding PyTorch Model Interpretability with Captum: A Comprehensive Guide
## Introduction
Welcome to the next video in the PyTorch training series! This video provides an overview of **Captum**, a powerful toolset for model interpretability in PyTorch. We will explore the basic concepts of Captum, including **attribution algorithms** and **visualizations**. The tutorial will demonstrate how to perform and visualize feature attributions for a computer vision classifier, apply layer attribution to examine the activity of a model's hidden layers, and introduce **Captive Insights**, an API for creating visualization widgets for images, text, and other features.
Captum offers a deep set of tools for explaining the behavior of your PyTorch models. This video and its accompanying interactive notebook provide only an overview of core features. For more in-depth tutorials, documentation, and an API reference, visit [captain.ai](https://captain.ai).
---
## Prerequisites
To run the interactive notebook associated with this video, ensure you have:
- Python version **3.6** or higher
- Flask version **1.1** or higher
- The latest versions of **PyTorch**, **TorchVision**, and **Captum** installed
You can install Captum easily using `pip` or the **Anaconda** distribution by specifying the PyTorch channel.
---
## Getting Started with a Pre-trained Model
We will begin by taking a pre-trained image classifier, **ResNet**, trained on the **ImageNet dataset**. Using the tools within **Captum**, we will gain insight into how the model responds to a particular input image and gives its prediction.
The first step is to import the necessary libraries for attribution methods and visualization tools from Captum:
```python
# Sample imports (as shown in the video)
from torchvision.models import resnet50
from captum.attr import IntegratedGradients, Occlusion, LayerGradCam
from captum可视化 import visualize_image_adder
import numpy as np
import matplotlib.pyplot as plt
```
Next, we load our pre-trained model and pull up an image to work with. The interactive notebook should include a folder of images for use in the tutorial—in this case, it will be a cat.
We then define some image transforms to prepare the image for consumption by the model and bring in the human-readable labels of the 1,000 ImageNet classes.
---
## Understanding Model Predictions: Feature Attribution
The core abstraction in Captum is **attribution**, a quantitative method of attributing a particular output or activity of a model to its input. The first kind of attribution we will explore is **feature attribution**, which helps answer questions like:
- Which parts of the input were most important in determining the model's prediction?
- Which pixels in an image drove the model's classification of that image?
### Integrated Gradients
The first feature attribution algorithm we will look at is **Integrated Gradients**. This gradient-based algorithm numerically approximates the integral of the gradients of the model's output with respect to its inputs, finding the most important paths through the model for a given input-output pair.
To use Integrated Gradients:
1. Create an IntegratedGradients object initialized with your model.
2. Call the `attribute` method on it, feeding in the input, output label, and an optional number of steps (note that this process can be computationally intensive).
Once the cell finishes running, we obtain a numerical importance map of the cat image with respect to the "cat" label. To visualize this map relative to the image itself, Captum's visualization module provides tools like `visualize_image_adder`.
Here’s how you might set it up:
```python
# Example code for visualizing the original image and its attributions
original_image = transformed_image.squeeze().cpu().detach().numpy()
plt.imshow(original_image)
plt.title("Original Image")
plt.axis("off")
plt.show()
attributions = integrated_gradients.attribute(...)
visualize_image_adder([None, attributions], [original_image, transformed_image], ["Original", "Attribution Heatmap"])
```
### Occlusion Method
Next, we try another feature attribution algorithm: **Occlusion**. Unlike Integrated Gradients, which is gradient-based, Occlusion involves screening out portions of the image and observing how that affects the model's output.
To use Occlusion:
1. Specify your input image and output label.
2. Define parameters like the sliding window size and stride length (analogous to configuration options in a convolutional neural network).
3. Set a baseline for occluded images (e.g., zero for zero-centered data).
After running the attribute call, we use `visualize_image_adder_multiple` to display multiple visualizations of the Occlusion attribution, including heat maps for both positive and negative attributions, and a mask method that highlights areas of focus.
---
## Exploring Model Hidden Layers: Layer Attribution
Feature attribution only covers inputs and outputs, but what about the activity inside the model? For this, we use **layer attribution**, which attributes the activity of a hidden layer to the model's input.
### Grad-CAM
One popular gradient-based algorithm for layer attribution is **Grad-CAM** (Gradient-weighted Class Activation Mapping). It computes the gradients of the output with respect to a specified model layer, averages the gradients for each channel, and multiplies this average by the layer activations to measure importance.
Here’s how you might implement Grad-CAM:
```python
# Example code for using LayerGradCam
layer_gradcam = LayerGradCam(model, target_layer)
attributions = layer_gradcam.attribute(input_image, target=cat_label)
# Visualize the activation map
activation_map = layer_gradcam.get_activations(input_image)
interpolated_map = layer_gradcam.interpolate(activation_map, size=(224, 224))
visualize_image_adder([interpolated_map], [original_image], ["Activation Map", "Interpolated Heatmap"])
```
### Visualizing Layer Contributions
Since the output of a convolutional layer is spatially correlated with the input, we can up-sample the activation map and compare it directly with the original image. Captum’s `layer attribution` parent class provides a convenience method for this purpose: `interpolate`.
By requesting a blended heat map showing the original image with an overlay, we gain insight into how hidden layers contribute to the model's output.
---
## Interactive Visualizations with Captive Insights
Finally, we explore **Captive Insights**, an advanced visualization tool that lets you create in-browser widgets for images, text, and arbitrary data. This tool allows you to experiment with different attribution methods and understand the activity that led to your model’s predictions, both correct and incorrect, with minimal code.
### Setting Up Captive Insights
To use Captive Insights:
1. Create an `AttributionVisualizer` object and configure it with your model, a scoring function for outputs (e.g., softmax), and a list of recognized classes.
2. Provide the dataset as an iterable that returns batches of images and labels.
Here’s a sample setup:
```python
# Example code for setting up Captive Insights
visualizer = AttributionVisualizer(model, score_fn=softmax, class_names=imagenet_classes)
visualizer.dataset = image_loader # A DataLoader or similar iterable
# Launch the visualization widget (details in documentation)
```
---
## Conclusion
This video and its accompanying tutorial provide a comprehensive introduction to Captum’s tools for model interpretability. By exploring feature attribution algorithms like Integrated Gradients and Occlusion, layer attribution techniques such as Grad-CAM, and interactive visualization tools like Captive Insights, you can gain deeper insights into how your PyTorch models work.
For more detailed tutorials, documentation, and access to the source code, visit [captain.ai](https://captain.ai).