Exploring Defenses for Security Vulnerabilities in Neural Networks
Deep neural networks perform extremely well on image classification
tasks and therefore are commonly used for this type of problem.
However, these algorithms are vulnerable to adversarial samples: inputs
to the network that look like they belong to one class but classify
as a different, and sometimes specifically targeted, class with high
confidence. These samples can be generated in many ways, but most
commonly, an attack will make unnoticeable changes, to an image that
belongs to one class which cause it to classify as another class.
This is a major security problem because it means an adversary could
make small changes to their face to bypass facial recognition
algorithms or possibly even modify street signs with small changes to
confuse self-driving cars.
This difficulty of exploiting this vulnerability can be increased by
stacking an autoencoder on top of a standard network. We examine how
adding different types of autoencoders affects the accuracy of a network
and how it affects the generation of adversarial samples. We
explore a generalized class of adversaries, and show that networks with
autoencoders stacked on top of them increases the complexity
required to generate a successful attack with no significant reduction
in the accuracy of the network.