Proper Network Interpretability Helps Adversarial Robustness in Classification

Overview

This repository contains the code for Proper Network Interpretability Helps Adversarial Robustness in Classification. In this work, we demonstrate that interpretability sneaking attacks (ISA) cannot reduce interptetation discrepancy below a certain level when using an appropriate discrepancy metric. We leverage this to design a interpretabilit-aware robust training method based directly on robust interpretation. Empirically, we demonstrate that interpretability-aware robust training can achieve robust classifications when facing large-perturbation attacks, as well as other benefits including robust interpretations and perceptually-aligned internal features.

Setup

The following packages are required.

pip install numpy scipy scikit-learn scikit-image matplotlib opencv-python tensorflow-gpu

For running UAR attacks, pytorch is also required:

pip install torch torchvision

Clone this repository:

git clone https://github.com/AkhilanB/Interpretability.git
cd Interpretability

Train networks:

python3 train.py <number>

For <number> in 1-7. This will save all networks to networks/. Alternatively, use our pretrained networks in networks/.

How to Run

Running AAI, UAR, and PGD attacks

eval.py contains functions to evaluate networks under AAAI, UAR, and PGD attacks and visualize interpretation maps. The following functions are included:

vis_cam #Visualizes CAM interpretation maps for a network
intr_topk #Computes top-k intersection between the CAM interpretations of two different networks
aai_topk # Runs to-pk AAI and returns Tau rank correlation between original and perturbed interpretations
pgd_attack # Runs standard pgd attacks for a given perturbation size epsilon and returns PGD accuracy
min_pgd_attack # Runs minimum epsilon successful adversarial attacks and returns classification margins and interpretation discrepancies
uar_attack # Runs unforseen adversarial attacks and returns accuracy

All main experiment results can be run with:

python3 eval.py 1

and additional experiment results can be run with:

python3 eval.py 2

The commands either print numerical results or generates plot in gen_images\.

Running ISA

Results can be run with:

python3 eval_cmd.py --task <task> --dataset <dataset> --num_class <num_class> --num_image <num_image> --norm <norm> --network_type <network_type> --inp_method <inp_method> --network <network> --output_dir <output_dir> --penalties <penalties>

where <task> can be chosen from ["isa_cam", "tradeoff_cam", "tradeoff_ig", "tradeoff_repr", "isa_ig", "attack_and_display"], each corresponding to a different ISA evaluation. The command either prints numerical results or generates a plot in gen_images\.

Files

setup_*.py contains code to import datasets. Currently, there is code to import mnist, cifar and restricted imagenet.

load_model.py contains a function load_model that takes a model architecture specification and a model file, and loads it into tensorflow. The function returns a Model object and contains functions that compute prediction output, CAM, GradCAM++ and IG.

train.py contains functions to train networks with different training methods. Each function trains a network with a different method. Running a training function will save a trained model in networks\. The bottom of the file contains scripts to train all networks. These scripts can be run with:

python3 train.py <number>

where each choice of <number> corresponds to a different set of networks.

eval.py contains functions to evaluate networks under AAI, UAR and PGD attacks. Functions either return numerical results or generates plots in gen_images\.

eval_cmd.py provides a command line interface to evaluate networks with ISA. The script either prints numerical results or generates plots in gen_images\.

advex_uar/ contains code to run UAR attacks, adapted from the source repository.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
advex_uar		advex_uar
networks		networks
.gitignore		.gitignore
Interpretability_Summary.PNG		Interpretability_Summary.PNG
LICENSE		LICENSE
README.md		README.md
eval.py		eval.py
eval_cmd.py		eval_cmd.py
funcs.py		funcs.py
load_model.py		load_model.py
setup_cifar.py		setup_cifar.py
setup_mnist.py		setup_mnist.py
setup_restrictedimagenet.py		setup_restrictedimagenet.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Proper Network Interpretability Helps Adversarial Robustness in Classification

Overview

Setup

How to Run

Running AAI, UAR, and PGD attacks

Running ISA

Files

About

Releases

Packages

Languages

License

AkhilanB/Proper-Interpretability

Folders and files

Latest commit

History

Repository files navigation

Proper Network Interpretability Helps Adversarial Robustness in Classification

Overview

Setup

How to Run

Running AAI, UAR, and PGD attacks

Running ISA

Files

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages