Fine-Grained Neural Network Explanation by Identifying Input Features with Predictive Information

InputIBA: Fine-Grained Neural Network Explanation by Identifying Input Features with Predictive Information

¹Technical University of Munich, ²Johns Hopkins University, ³Kyung Hee University

Abstract

One principal approach for illuminating a black-box neural network is feature attribution, i.e. identifying the importance of input features for the network’s prediction. The predictive information of features is recently proposed as a proxy for the measure of their importance.

So far, the predictive information is only identified for latent features by placing an information bottleneck within the network. We propose a method to identify features with predictive information in the input domain.

The method results in fine-grained identification of input features’ information and is agnostic to network architecture. The core idea of our method is leveraging a bottleneck on the input that only lets input features associated with predictive latent features pass through.

We compare our method with several feature attribution methods using mainstream feature attribution evaluation experiments. The code is publicly available.

Previously, information-theory-based attribution inserts a information bottleneck in the latent feature. We see the result of applying information bottleneck on different layers of a VGG16 network (from conv4_1 to conv1_1).

The approximations (averaging across channels) in IBA result in the assignment of information to irrelevant areas of the image (the trashcan and areas around the image). Upscale using interpolation also causes the attribution map to be blurry and inaccurate.

As we move towards earlier layers (conv1_1), the information is distributed equally between features, and less information is assigned to the most relevant feature (the broom), due to the overestimation of mutual information in IBA.

InputIBA aims to come up with a more reasonable choice for approximation of bottleneck variable such that it is applicable for the input. In addition, InputIBA computes the mutual information directly in the input space and avoid the approximation issues inherent in IBA.

We take advantage of the gaussian distribution approximation of bottleneck being reasonable for deep layers. Thus, we first find a bottleneck variable, \(Z^*\), that corresponds to predictive deep features, by solving the IBA optimization problem on a hidden layer. The bottleneck \(Z^*\) restricts the flow of information through the network and keeps deep features with predictive information.

Then, InputIBA use a Wasserstein-GAN to fit a bottleneck variable \(Z_G\) ”on the input”, that corresponds to deep botthleneck \(Z^*\). This translates to finding a mask on input features that admits input features which correspond to informative deep features.

The final goal is keeping only predictive features of input, therefore, the input bottleneck \(Z_G\) can be used as prior knowledge for the distribution of input bottleneck \(Z_I\). We then proceed and solve the optimization by using \(Z_G\) as a prior for \(Z_I\). We refer to this methodology as InputIBA.

BibTeX

If you find our work useful, please cite our paper:

@inproceedings{ zhang2021finegrained, title={Fine-Grained Neural Network Explanation by Identifying Input Features with Predictive Information}, author={Yang Zhang and Ashkan Khakzar and Yawei Li and Azade Farshad and Seong Tae Kim and Nassir Navab}, booktitle={Thirty-Fifth Conference on Neural Information Processing Systems}, year={2021}, url={https://openreview.net/forum?id=HglgPZAYhcG} }

InputIBA: Fine-Grained Neural Network Explanation by Identifying Input Features with Predictive Information

Video

InputIBA generates saliency maps with input level resolution for CNNs and RNNs.

Explanation of "hoopskirt"

Explanation of "squirrel monkey"

Explanation of "pencil sharpener"

Explanation of "airliner"

Explanation of "sports car"

Explanation of "great white shark"

Explanation of "husky"

Explanation of "groenendael"

Abstract

How does it work?

Overview of InputIBA

Model-Agnostic Attribution

BibTeX