Skip to the content.

Abstract

Fooling deep neural networks with adversarial input have exposed a significant vulnerability in the current state-of-the-art systems in multiple domains. Both black-box and white-box approaches have been used to either replicate the model itself or to craft examples which cause the model to fail. In this work, we propose a framework which uses multi-objective evolutionary optimization to perform both targeted and un-targeted black-box attacks on Automatic Speech Recognition (ASR) systems. We apply this framework on two ASR systems: Deepspeech and Kaldi-ASR, which increases the Word Error Rates (WER) of these systems by upto 980%, indicating the potency of our approach. During both un-targeted and targeted attacks, the adversarial samples maintain a high acoustic similarity of 0.98 and 0.97 with the original audio.

Below are some of the adversarial samples generated on Kaldi-ASR and Deepspeech using the proposed framework in both targeted and untargeted setting.

Set-1 Un-targeted Attack

Click to Reveal text
Actual Text: I have got to go him
Genetated Text: it got girl
ASR: Deepspeech
Click to Reveal text
Actual Text: I have got to go him
Genetated Text: i get ill
ASR: Deepspeech
Click to Reveal text
Actual Text: I have got to go him
Genetated Text: the good girl to have
ASR: Kaldi-ASR
Click to Reveal text the scottish go to him ASR: Kaldi-ASR

Set-2 Un-targeted Attack

Click to Reveal text
Actual Text: he is the man that are written for
Generated Text: he is the man the tired
ASR: Deepspeech
Click to Reveal text
Actual Text: he is the man that are written for
Generated Text: hes the man their coverage
ASR: Deepspeech
Click to Reveal text
Actual Text: he is the man that are written for
Generated Text: these the man that's all right
ASR: Kaldi-ASR
Click to Reveal text
Actual Text: he is the man that are written for
Generated Text: he's the man that are ready and four
ASR: Kaldi-ASR

Set-3 Un-targeted Attack

Click to Reveal text
Actual Text: this is for you
Generated Text: is it all
ASR: Deepspeech
Click to Reveal text
Actual Text: this is for you
Generated Text: is it all
ASR: Deepspeech
Click to Reveal text
Actual Text: this is for you
Generated Text: this is all you
ASR: Kaldi-ASR
Click to Reveal text
Actual Text: this is for you
Generated Text: this is all you
ASR: Kaldi-ASR

Set-4 Targeted Attack

Click to Reveal text
Actual Text: ive got to go to him
Target Text: the caterpillar to have
Generated Text: the caterpillar to have
ASR: Deepspeech
Click to Reveal text
Actual Text: ive got to go to him
Target Text: the caterpillar to have<
Generated Text: the caterpillar to him
ASR: Deepspeech
Click to Reveal text
Actual Text: ive got to go to him
Target Text: the caterpillar to have
Generated Text: the caterpillar to have
ASR: Kaldi-ASR
Click to Reveal text
Actual Text: ive got to go to him
Target Text:the caterpillar to have
Generated Text: the caterpillar to have
ASR: Kaldi-ASR

Set-5 Targeted Attack

Click to Reveal text
Actual Text: follow the instructions here
Target Text: all of these
Generated Text: all of these
ASR: Deepspeech
Click to Reveal text
Actual Text: follow the instructions here
Target Text: all of these
Generated Text: all of these
ASR: Deepspeech
Click to Reveal text
Actual Text: follow the instructions here
Target Text: all of these
Generated Text: all the institutions here
ASR: Kaldi-ASR
Click to Reveal text
Actual Text: follow the instructions here
Target Text: all of these
Generated Text: all these russians year
ASR: Kaldi-ASR

Set-6 Targeted Attack

Click to Reveal text
Actual Text: never mind about that
Target Text: they will mind
Generated Text: they will mind
ASR: Deepspeech
Click to Reveal text
Actual Text: never mind about that
Target Text: they will mind
Generated Text: they were minodat
ASR: Deepspeech
Click to Reveal text
Actual Text: never mind about that
Target Text: they will mind
Generated Text:they reminded us that
ASR: Kaldi-ASR
Click to Reveal text
Actual Text: never mind about that
Target Text: they will mind
Generated Text: how her mind a out that
ASR: Kaldi-ASR