Fraudsters can record a person's voice for voice assistants like Amazon Alexa or Google Assistant and replay it to impersonate that individual. They can also stitch samples together to mimic a person's voice in order to spoof, or trick third parties.
The new solution, called Void (Voice liveness detection), can be embedded in a smartphone or voice assistant software and works by identifying the differences in spectral power between a live human voice and a voice replayed through a speaker, in order to detect when hackers are attempting to spoof a system.
Consumers use voice assistants to shop online, make phone calls, send messages, control smart home appliances and access banking services.
Muhammad Ejaz Ahmed, Cybersecurity Research Scientist at CSIRO's Data61 and lead author of the research paper, said privacy preserving technologies are becoming increasingly important in enhancing consumer privacy and security as voice technologies become part of daily life.
"Voice spoofing attacks can be used to make purchases using a victim's credit card details, control Internet of Things connected devices like smart appliances and give hackers unsolicited access to personal consumer data such as financial information, home addresses and more," Mr Ahmed said.
"Although voice spoofing is known as one of the easiest attacks to perform as it simply involves a recording of the victim's voice, it is incredibly difficult to detect because the recorded voice has similar characteristics to the victim's live voice. Void is game-changing technology that allows for more efficient and accurate detection helping to prevent people's voice commands from being misused".
Unlike existing voice spoofing techniques which typically use deep learning models, Void was designed relying on insights from spectrograms — a visual representation of the spectrum of frequencies of a signal as it varies with time to detect the 'liveness' of a voice.
This technique provides a highly accurate outcome, detecting attacks eight times faster than deep learning methods, and uses 153 times less memory, making it a viable and lightweight solution that could be incorporated into smart devices.
The Void research project was funded by Samsung Research, and tested using Automatic Speaker Verification Spoofing and Countermeasures challenges, achieving an accuracy of 99 per cent and 94 per cent for each dataset.
Research estimates that by 2023, as many as 275 million voice assistant devices will be used to control homes across the globe — a growth of 1000 percent since 2018.
Data security expert Dr Adnene Guabtni, Senior Research Scientist at CSIRO's Data61, shares tips for consumers on how to protect their data when using voice assistants:
- Always change your voice assistant settings to only activate the assistant using a physical action, such as pressing a button.
- On mobile devices, make sure the voice assistant can only activate when the device is unlocked.
- Turn off all home voice assistants before you leave your house, to reduce the risk of successful voice spoofing while you are out of the house.
- Voice spoofing requires hackers to get samples of your voice. Make sure you regularly delete any voice data that Google, Apple or Amazon store.
- Try to limit the use of voice assistants to commands that do not involve online purchases or authorisations – hackers or people around you might record you issuing payment commands and replay them at a later stage.
This research was a collaboration with Samsung Research and Sungkyunkwan University in South Korea. The research paper will be presented at the USENIX Security Symposium, a flagship security conference, in Boston in August 2020.
About the research
The paper, "Void: A fast and light voice liveness detection system", was co-authored by Muhammad Ejaz Ahmed, CSIRO’s Data61, Il-Youp Kwak, Chung-Ang University, Jun Ho Huh and Iljoo Kim, Samsung Research, Taekkyung Oh, KAIST and Sungkyunkwan University, and Hyoungshick Kim, Sungkyunkwan University, and was published in USENIX Security 2020
De-identified datasets from Samsung and Automatic Speaker Verification Spoofing and Countermeasures challenges included:
- 255,173 voice samples generated with 120 participants, 15 playback devices and 12 recording devices
- 18,030 publicly available voice samples generated with 42 participants, 26 playback devices and 25 recording devices