Combating Replay Attacks Against Voice Assistants

Slides:



Advertisements
Similar presentations
We Can Hear You with Wi-Fi !
Advertisements

Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
Whole-Home Gesture Recognition Using Wireless Signals —— MobiCom’13 Author: Qifan Pu et al. University of Washington Presenter: Yanyuan Qin & Zhitong Fei.
Hierarchy of Design Voice Controlled Remote Voice Input Control Path Speech Processing IR Interface.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
Tan Zhang, Ning Leng, Suman Banerjee University of Wisconsin Madison
Toward Semantic Indexing and Retrieval Using Hierarchical Audio Models Wei-Ta Chu, Wen-Huang Cheng, Jane Yung-Jen Hsu and Ja-LingWu Multimedia Systems,
F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)
A Wireless Spectrum Analyzer in Your Pocket
Liveness Testing Shivankush Aras. Threats to Biometric System Artificially created biometrics: e.g. image of a face or iris, lifted latent fingerprints,
Fig. 2 – Test results Personal Memory Assistant Facial Recognition System The facial identification system is divided into the following two components:
Enfold: Downclocking OFDM in WiFi
Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos
Physical-layer Identification of RFID Devices Authors: Boris Danev, Thomas S. Heyde-Benjamin, and Srdjan Capkun Presented by Zhitao Yang 1.
V-Scope: An Opportunistic Wardriving Approach to Augmenting TV Whitespace Databases Tan Zhang, Suman Banerjee University of Wisconsin Madison 1Tan Zhang.
„Bandwidth Extension of Speech Signals“ 2nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd and 23rd June.
SoundSense by Andrius Andrijauskas. Introduction  Today’s mobile phones come with various embedded sensors such as GPS, WiFi, compass, etc.  Arguably,
Snooping Keystrokes with mm-level Audio Ranging on a Single Phone
Sound-Event Partitioning and Feature Normalization for Robust Sound-Event Detection 2 Department of Electronic and Information Engineering The Hong Kong.
MULTIMEDIA INPUT / OUTPUT TECHNOLOGIES INTRODUCTION 6/1/ A.Aruna, Assistant Professor, Faculty of Information Technology.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project DKE Maastricht University.
Performance Comparison of Speaker and Emotion Recognition
Predicting Voice Elicited Emotions
It Starts with iGaze: Visual Attention Driven Networking with Smart Glasses It Starts with iGaze: Visual Attention Driven Networking with Smart Glasses.
Turning a Mobile Device into a Mouse in the Air
Speaker Verification System Middle Term Presentation Performed by: Barak Benita & Daniel Adler Instructor: Erez Sabag.
Emotion Recognition from Speech: Stress Experiment Stefan Scherer, Hansjörg Hofmann, Malte Lampmann, Martin Pfeil, Steffen Rhinow, Friedhelm Schwenker,
Accurate WiFi Packet Delivery Rate Estimation and Applications Owais Khan and Lili Qiu. The University of Texas at Austin 1 Infocom 2016, San Francisco.
BIOMETRICS VOICE RECOGNITION. Meaning Bios : LifeMetron : Measure Bios : LifeMetron : Measure Biometrics are used to identify the input sample when compared.
More Security and Programming Language Work on SmartPhones Karthik Dantu and Steve Ko.
Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.
When CSI Meets Public WiFi: Inferring Your Mobile Phone Password via WiFi Signals Adekemi Adedokun May 2, 2017.
My Smartphone knows what you print exploring smartphone-based side-channel attacks against 3d Printers Chen Song, feng lin, zongjie ba, kui ren, chi zhou,
<author>, <company>
MobiCom’13 Jie Xiong and Kyle Jamieson University College London
Research on Machine Learning and Deep Learning
Swathi Chandrashekar - Loukas Lazos
網路環境中通訊安全技術之研究 Secure Communication Schemes in Network Environments
ARTIFICIAL NEURAL NETWORKS
Advised by Professor Baird Soules
Inaudible Voice Commands Ultrasound Modulation
Lecture 5 Smaller Network: CNN
Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan
Orthogonal Frequency Division Multiplexing ...
Urban Sound Classification with a Convolution Neural Network
Leigh Anne Clevenger Pace University, DPS ’16
Context-Free Fine-Grained Motion Sensing using WiFi
WiFinger: Talk to Your Smart Devices with Finger-grained Gesture
Digital Music Audio Processing
Hardware and system development:
Sfax University, Tunisia
Acoustic Eavesdropping through Wireless Vibrometry
Audio and Speech Computers & New Media.
Keystroke Recognition using Wi-Fi Signals
Denial-of-Service Jammer Detector Training Course Worldsensing
John H.L. Hansen & Taufiq Al Babba Hasan
A maximum likelihood estimation and training on the fly approach
Advances in Deep Audio and Audio-Visual Processing
QGesture: Quantifying Gesture Distance and Direction with WiFi Signals
3D Localization for Sub-Centimeter Sized Devices
Presented by Chen-Wei Liu
Smartphone-based Acoustic Indoor Space Mapping
Presenter: Shih-Hsiang(士翔)
Music Signal Processing
Auditory Morphing Weyni Clacken
Huawei CBG AI Challenges
Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems NDSS 2019 Hadi Abdullah, Washington Garcia, Christian Peeters, Patrick.
Presentation transcript:

Combating Replay Attacks Against Voice Assistants Swadhin Pradhan, Wei Sun, Ghufran Baig, Lili Qiu UbiComp 2019

Voice is a popular interface

Voice assistants are ubiquitous Amazon Echo Google Home Apple Homepod

Voice is an open channel with different attack modalities … Human-based voice impersonation attack [ Johnson et al., BATS ‘13] Mangled voice attack [Vaidya et al., WOOT ‘15] Inaudible voice attack [Roy et al. , MobiSys ‘17]

Voice is an open channel with different attack modalities … Voice replay attack

Our threat model of replay attack An attacker can collect voice samples or synthesize the voice commands Remote Voice Replay Attacker has access to a speaker inside home and can remotely replay Local Voice Replay Attacker is physically close to the smart speaker and can replay Sophisticated Voice Replay Attacker not only replays but also manipulates his physiological signals

Design a long range device-free voice replay detection system Goal Design a long range device-free voice replay detection system

Three key observations Live voice has different spectral characteristics in both frequency and phase domain Breathing pattern changes distinctly during speech Breathing patterns during speech are distinct across users

Different characteristics of live voice More Energy in higher frequency bands

Different characteristics of live voice More phase structure present in spectrogram

Breathing have distinct patterns Breathing rate decreases during speech

Breathing have distinct patterns Different users have distinct breathing patterns

REplayed VOice Legitimacy Tester (REVOLT) workflow Human Absent ( Remote Replay ) ( Physical Replay ) Speech based Replay Detection WiFi based Breathing Detection Breathing & Speech Synchronization Human Absent WiFi Spectrogram CNN based Manipulation Detection ( Physical Replay ) Manipulated Genuine Voice ( Sophisticated Physical Replay )

Voice module of REVOLT Voice Feature Extraction Replayed ( Remote Replay ) Speech based Replay Detection MFCC, LFCC, RPS Voice Feature Extraction Voice Liveness Detection SVM MFCC: Mel Frequency Cepstral Coefficients LFCC: Linear Frequency Cepstral Coefficients RPS: Relative Phase Shift b/w two harmonics

WiFi based breathing detection Human Absent ( Physical Replay ) WiFi based Breathing Detection CSI Traces across subcarriers FFT (phase and amp.) Breathing rate Noise removal Peak detect

Breathing-Speaking synchronization Breathing & Speech Synchronization Breathing and speaking traces Breathing rates during and between speech Synch. Score > ∆ Segmentation Human Absent Thresholding & matching ( Physical Replay )

CNN based Manipulation Detection Recorded WiFi Spectrogram during Speaking Train CNN Model Voice Manipulation Detector WiFi Spectrogram CNN based Manipulation Detection Convolutional Neural Net (CNN): 3 conv. layers -> Max-pool -> 2 FC layers Manipulated Genuine Voice ( Sophisticated Physical Replay )

REVOLT prototype

Voice module evaluation Collected (~200 speaker, mic, location, user config.) Both frequency and phase structure help In voice only replay detection

helps in breathing detection WiFi breathing detection evaluation Exploiting both CSI phase and amplitude helps in breathing detection

End-to-end model evaluation Very low false-positive rate ensuring almost replay prevention

Related works Voice liveness detection Ultrasonic based lip movement (Lu et al., Infocom ’18; Tan et al., IMWUT ‘18) WiFi based lip utterance detection (Wei et al., MobiCom ‘15) Voice localization (Zhang et al., CCS ‘16) … Wearable based approach (Feng et al., MobiCom ‘17) Speaker detection via magnetometer (Chen et al., ICDCS ‘17) Needs extra hardware / small range (~10 cm) / extensive finger-printing

Future works Adding state-of-the-art COTS Wi-Fi setup based respiration sensing ( up to 8 m, IMWUT 2019 ) Incorporating other wirelessly sensed soft biometrics Heart rate, mouth movement pattern etc. Moving from mechanical directional antenna to phased array Experimenting with more users and more diverse environments to evaluate the robustness

Key takeaways Proposing a wearable-free long range voice replay detection system Combining both wireless interfaces (Audio & WiFi) present in smart speakers for combating replay Exploiting distinctiveness of breathing while speech for enhancing the security of voice assistants

Thanks a lot ! swadhin@utexas.edu Thanks a lot for you attention. For questions and queries please mail the author Swadhin Pradhan at swadhin@utexas.edu