Abstract

Fooling deep neural networks with adversarial input have exposed a significant vulnerability in the current state-of-the-art systems in multiple domains. Both black-box and white-box approaches have been used to either replicate the model itself or to craft examples which cause the model to fail. In this work, we propose a framework which uses multi-objective evolutionary optimization to perform both targeted and un-targeted black-box attacks on Automatic Speech Recognition (ASR) systems. We apply this framework on two ASR systems: Deepspeech and Kaldi-ASR, which increases the Word Error Rates (WER) of these systems by upto 980%, indicating the potency of our approach. During both un-targeted and targeted attacks, the adversarial samples maintain a high acoustic similarity of 0.98 and 0.97 with the original audio.

Below are some of the adversarial samples generated on Kaldi-ASR and Deepspeech using the proposed framework in both targeted and untargeted setting.

Set-1 Un-targeted Attack

	Click to Reveal text Actual Text: I have got to go him Genetated Text: it got girl ASR: Deepspeech
	Click to Reveal text Actual Text: I have got to go him Genetated Text: i get ill ASR: Deepspeech
	Click to Reveal text Actual Text: I have got to go him Genetated Text: the good girl to have ASR: Kaldi-ASR
	Click to Reveal text the scottish go to him ASR: Kaldi-ASR

Set-2 Un-targeted Attack

	Click to Reveal text Actual Text: he is the man that are written for Generated Text: he is the man the tired ASR: Deepspeech
	Click to Reveal text Actual Text: he is the man that are written for Generated Text: hes the man their coverage ASR: Deepspeech
	Click to Reveal text Actual Text: he is the man that are written for Generated Text: these the man that's all right ASR: Kaldi-ASR
	Click to Reveal text Actual Text: he is the man that are written for Generated Text: he's the man that are ready and four ASR: Kaldi-ASR

Set-3 Un-targeted Attack

	Click to Reveal text Actual Text: this is for you Generated Text: is it all ASR: Deepspeech
	Click to Reveal text Actual Text: this is for you Generated Text: is it all ASR: Deepspeech
	Click to Reveal text Actual Text: this is for you Generated Text: this is all you ASR: Kaldi-ASR
	Click to Reveal text Actual Text: this is for you Generated Text: this is all you ASR: Kaldi-ASR

Set-4 Targeted Attack

	Click to Reveal text Actual Text: ive got to go to him Target Text: the caterpillar to have Generated Text: the caterpillar to have ASR: Deepspeech
	Click to Reveal text Actual Text: ive got to go to him Target Text: the caterpillar to have< Generated Text: the caterpillar to him ASR: Deepspeech
	Click to Reveal text Actual Text: ive got to go to him Target Text: the caterpillar to have Generated Text: the caterpillar to have ASR: Kaldi-ASR
	Click to Reveal text Actual Text: ive got to go to him Target Text:the caterpillar to have Generated Text: the caterpillar to have ASR: Kaldi-ASR

Set-5 Targeted Attack

	Click to Reveal text Actual Text: follow the instructions here Target Text: all of these Generated Text: all of these ASR: Deepspeech
	Click to Reveal text Actual Text: follow the instructions here Target Text: all of these Generated Text: all of these ASR: Deepspeech
	Click to Reveal text Actual Text: follow the instructions here Target Text: all of these Generated Text: all the institutions here ASR: Kaldi-ASR
	Click to Reveal text Actual Text: follow the instructions here Target Text: all of these Generated Text: all these russians year ASR: Kaldi-ASR

Set-6 Targeted Attack

	Click to Reveal text Actual Text: never mind about that Target Text: they will mind Generated Text: they will mind ASR: Deepspeech
	Click to Reveal text Actual Text: never mind about that Target Text: they will mind Generated Text: they were minodat ASR: Deepspeech
	Click to Reveal text Actual Text: never mind about that Target Text: they will mind Generated Text:they reminded us that ASR: Kaldi-ASR
	Click to Reveal text Actual Text: never mind about that Target Text: they will mind Generated Text: how her mind a out that ASR: Kaldi-ASR

Adversarial Black-Box Attacks on Automatic Speech Recognition Systems using Multi-Objective Evolutionary Optimization

Authors: Shreya Khare, Rahul Aralikatte, Senthil Mani