Assessing Neural Network Robustness via Adversarial Pivotal Tuning

1University of Copenhagen, 2KTH Stockholm

APT uses the full capacity of a pretrained generator to produce semantic adversarial manipulations


Visualizations

Overview of the Generated Manipulations

alt

Row 1 shows the input images. Row 2 shows the images resulting from our manipulations. Row 3 (and 4) shows the result of Dual manifold adversarial robustness, using pixel-space adversarial manipulations applied to StyleGAN-XL’s reconstructions. Row 4 shows the result using latent space manipulates applied using StyleGAN-XL. Our method manipulates images in a non-trivial but class-preserving manner, using the full capacity of a pretrained StyleGAN generator. For example, it removes the eye of the mantis (second column), changes the type of race car (third column), changes the color of the crab tail (fifth column), removes the text in a spaceship (seventh column) and removes some of the ropes (eighth column). All of these are class-preserving examples that fool a pretrained PRIME-ResNet50 classifier. In contrast, Dual manifold adversarial robustness either generates noisy and less realistic images (row 3) or images which differ significantly semantically and which do not preserve the input class (row 4).

Abstract

The ability to assess the robustness of image classifiers to a diverse set of manipulations is essential to their deployment in the real world. Recently, semantic manipulations of real images have been considered for this purpose, as they may not arise using standard adversarial settings. However, such semantic manipulations are often limited to style, color or attribute changes. While expressive, these manipulations do not consider the full capacity of a pretrained generator to affect adversarial image manipulations. In this work, we aim at leveraging the full capacity of a pretrained image generator to generate highly detailed, diverse and photorealistic image manipulations. Inspired by recent GAN-based image inversion methods, we propose a method called Adversarial Pivotal Tuning (APT). APT first finds a pivot latent space input to a pretrained generator that best reconstructs an input image. It then adjusts the weights of the generator to create small, but semantic, manipulations which fool a pretrained classifier. Crucially, APT changes both the input and the weights of the pretrained generator, while preserving its expressive latent editing capability, thus allowing the use of its full capacity in creating semantic adversarial manipulations. We demonstrate that APT generates a variety of semantic image manipulations, which preserve the input image class, but which fool a variety of pretrained classifiers. We further demonstrate that classifiers trained to be robust to other robustness benchmarks, are not robust to our generated manipulations and propose an approach to improve the robustness towards our generated manipulations.

TL:DR We propose a framework for generation of photorealistic images that can fool a classifier using automatic semantic manipulations


The Adversarial Pivotal Tuning (APT) framework

alt

In the first step, we optimize a style code wp using standard latent optimization Lo, while keeping the generator G frozen. The loss is computed between the ground-truth image xgtr and the generated image xgen. In the second step, we freeze wp and finetune G (shown in red) using the three objectives; a reconstruction objective Lrec, the projected GAN objective using the discriminator D, LP G, and our fooling objective LCE using the classifier C. A ∗ is used to indicate a frozen component




Manipulations using different classifiers

alt

Top row shows input images. The middle row shows APT manipulations for a ResNet-50 classifier, and the bottom row shows APT manipulations from a FAN-VIT classifier. Column 1-4+7 illustrates similar manipulations for both classifiers, column 5-6 shows texture and spatial manipulations, the last column showcase a fooling image without a clear APT manipulation.




Transferability of APT generated samples

alt

For the ImageNet-1k validation set, we consider samples generated to fool a PRIME-Resnet50 (PRIME) and a FAN-VIT (FAN) pretrained classifier. We then test the accuracy (Acc) and mean softmax probability of the labelled class (Conf) on those samples. The left column indicates the classifier on which we tested the accuracy of real or generatedsamples. ∗ indicates the accuracy and confidence of samples generated and tested using the same classifier.




Average accuracy and confidence on APT samples using PRIME-ResNet50 before and after fine-tuning.

alt

We investigate the effect of finetuning a PRIME-ResNet50 model on our generated fooling images. We find that the accuracy on fooling images increases.




Acknowledgement

This research was supported by the Pioneer Centre for AI, DNRF grant number P1.



BibTeX

@article{christensen2022apt
    author    = {Christensen, Peter Ebert and Snæbjarnarson, Vésteinn and Dittadi, Andrea and Belongie, Serge and Benaim, Sagie},
    title     = {Assessing Neural Network Robustness via Adversarial Pivotal Tuning},
    journal   = {arxiv},
    year      = {2022},
  }