Combating Replay Attacks Against Voice Assistants

Slides:

Advertisements

Similar presentations

We Can Hear You with Wi-Fi !

Advertisements

Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),

Whole-Home Gesture Recognition Using Wireless Signals —— MobiCom’13 Author: Qifan Pu et al. University of Washington Presenter: Yanyuan Qin & Zhitong Fei.

Hierarchy of Design Voice Controlled Remote Voice Input Control Path Speech Processing IR Interface.

Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors

Tan Zhang, Ning Leng, Suman Banerjee University of Wisconsin Madison

Toward Semantic Indexing and Retrieval Using Hierarchical Audio Models Wei-Ta Chu, Wen-Huang Cheng, Jane Yung-Jen Hsu and Ja-LingWu Multimedia Systems,

F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)

A Wireless Spectrum Analyzer in Your Pocket

Liveness Testing Shivankush Aras. Threats to Biometric System Artificially created biometrics: e.g. image of a face or iris, lifted latent fingerprints,

Fig. 2 – Test results Personal Memory Assistant Facial Recognition System The facial identification system is divided into the following two components:

Enfold: Downclocking OFDM in WiFi

Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos

Physical-layer Identification of RFID Devices Authors: Boris Danev, Thomas S. Heyde-Benjamin, and Srdjan Capkun Presented by Zhitao Yang 1.

V-Scope: An Opportunistic Wardriving Approach to Augmenting TV Whitespace Databases Tan Zhang, Suman Banerjee University of Wisconsin Madison 1Tan Zhang.

„Bandwidth Extension of Speech Signals“ 2nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd and 23rd June.

SoundSense by Andrius Andrijauskas. Introduction  Today’s mobile phones come with various embedded sensors such as GPS, WiFi, compass, etc.  Arguably,

Snooping Keystrokes with mm-level Audio Ranging on a Single Phone

Sound-Event Partitioning and Feature Normalization for Robust Sound-Event Detection 2 Department of Electronic and Information Engineering The Hong Kong.

MULTIMEDIA INPUT / OUTPUT TECHNOLOGIES INTRODUCTION 6/1/ A.Aruna, Assistant Professor, Faculty of Information Technology.

Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.

Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.

VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.

Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project DKE Maastricht University.

Performance Comparison of Speaker and Emotion Recognition

Predicting Voice Elicited Emotions

It Starts with iGaze: Visual Attention Driven Networking with Smart Glasses It Starts with iGaze: Visual Attention Driven Networking with Smart Glasses.

Turning a Mobile Device into a Mouse in the Air

Speaker Verification System Middle Term Presentation Performed by: Barak Benita & Daniel Adler Instructor: Erez Sabag.

Emotion Recognition from Speech: Stress Experiment Stefan Scherer, Hansjörg Hofmann, Malte Lampmann, Martin Pfeil, Steffen Rhinow, Friedhelm Schwenker,

Accurate WiFi Packet Delivery Rate Estimation and Applications Owais Khan and Lili Qiu. The University of Texas at Austin 1 Infocom 2016, San Francisco.

BIOMETRICS VOICE RECOGNITION. Meaning Bios : LifeMetron : Measure Bios : LifeMetron : Measure Biometrics are used to identify the input sample when compared.

More Security and Programming Language Work on SmartPhones Karthik Dantu and Steve Ko.

Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.

When CSI Meets Public WiFi: Inferring Your Mobile Phone Password via WiFi Signals Adekemi Adedokun May 2, 2017.

My Smartphone knows what you print exploring smartphone-based side-channel attacks against 3d Printers Chen Song, feng lin, zongjie ba, kui ren, chi zhou,

<author>, <company>

MobiCom’13 Jie Xiong and Kyle Jamieson University College London

Research on Machine Learning and Deep Learning

Swathi Chandrashekar - Loukas Lazos

網路環境中通訊安全技術之研究 Secure Communication Schemes in Network Environments

ARTIFICIAL NEURAL NETWORKS

Advised by Professor Baird Soules

Inaudible Voice Commands Ultrasound Modulation

Lecture 5 Smaller Network: CNN

Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan

Orthogonal Frequency Division Multiplexing ...

Urban Sound Classification with a Convolution Neural Network

Leigh Anne Clevenger Pace University, DPS ’16

Context-Free Fine-Grained Motion Sensing using WiFi

WiFinger: Talk to Your Smart Devices with Finger-grained Gesture

Digital Music Audio Processing

Hardware and system development:

Sfax University, Tunisia

Acoustic Eavesdropping through Wireless Vibrometry

Audio and Speech Computers & New Media.

Keystroke Recognition using Wi-Fi Signals

Denial-of-Service Jammer Detector Training Course Worldsensing

John H.L. Hansen & Taufiq Al Babba Hasan

A maximum likelihood estimation and training on the fly approach

Advances in Deep Audio and Audio-Visual Processing

QGesture: Quantifying Gesture Distance and Direction with WiFi Signals

3D Localization for Sub-Centimeter Sized Devices

Presented by Chen-Wei Liu

Smartphone-based Acoustic Indoor Space Mapping

Presenter: Shih-Hsiang(士翔)

Music Signal Processing

Auditory Morphing Weyni Clacken

Huawei CBG AI Challenges

Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems NDSS 2019 Hadi Abdullah, Washington Garcia, Christian Peeters, Patrick.

Presentation transcript:

Combating Replay Attacks Against Voice Assistants Swadhin Pradhan, Wei Sun, Ghufran Baig, Lili Qiu UbiComp 2019

Voice is a popular interface

Voice assistants are ubiquitous Amazon Echo Google Home Apple Homepod

Voice is an open channel with different attack modalities … Human-based voice impersonation attack [ Johnson et al., BATS ‘13] Mangled voice attack [Vaidya et al., WOOT ‘15] Inaudible voice attack [Roy et al. , MobiSys ‘17]

Voice is an open channel with different attack modalities … Voice replay attack

Our threat model of replay attack An attacker can collect voice samples or synthesize the voice commands Remote Voice Replay Attacker has access to a speaker inside home and can remotely replay Local Voice Replay Attacker is physically close to the smart speaker and can replay Sophisticated Voice Replay Attacker not only replays but also manipulates his physiological signals

Design a long range device-free voice replay detection system Goal Design a long range device-free voice replay detection system

Three key observations Live voice has different spectral characteristics in both frequency and phase domain Breathing pattern changes distinctly during speech Breathing patterns during speech are distinct across users

Different characteristics of live voice More Energy in higher frequency bands

Different characteristics of live voice More phase structure present in spectrogram

Breathing have distinct patterns Breathing rate decreases during speech

Breathing have distinct patterns Different users have distinct breathing patterns

REplayed VOice Legitimacy Tester (REVOLT) workflow Human Absent ( Remote Replay ) ( Physical Replay ) Speech based Replay Detection WiFi based Breathing Detection Breathing & Speech Synchronization Human Absent WiFi Spectrogram CNN based Manipulation Detection ( Physical Replay ) Manipulated Genuine Voice ( Sophisticated Physical Replay )

Voice module of REVOLT Voice Feature Extraction Replayed ( Remote Replay ) Speech based Replay Detection MFCC, LFCC, RPS Voice Feature Extraction Voice Liveness Detection SVM MFCC: Mel Frequency Cepstral Coefficients LFCC: Linear Frequency Cepstral Coefficients RPS: Relative Phase Shift b/w two harmonics

WiFi based breathing detection Human Absent ( Physical Replay ) WiFi based Breathing Detection CSI Traces across subcarriers FFT (phase and amp.) Breathing rate Noise removal Peak detect

Breathing-Speaking synchronization Breathing & Speech Synchronization Breathing and speaking traces Breathing rates during and between speech Synch. Score > ∆ Segmentation Human Absent Thresholding & matching ( Physical Replay )

CNN based Manipulation Detection Recorded WiFi Spectrogram during Speaking Train CNN Model Voice Manipulation Detector WiFi Spectrogram CNN based Manipulation Detection Convolutional Neural Net (CNN): 3 conv. layers -> Max-pool -> 2 FC layers Manipulated Genuine Voice ( Sophisticated Physical Replay )

REVOLT prototype

Voice module evaluation Collected (~200 speaker, mic, location, user config.) Both frequency and phase structure help In voice only replay detection

helps in breathing detection WiFi breathing detection evaluation Exploiting both CSI phase and amplitude helps in breathing detection

End-to-end model evaluation Very low false-positive rate ensuring almost replay prevention

Related works Voice liveness detection Ultrasonic based lip movement (Lu et al., Infocom ’18; Tan et al., IMWUT ‘18) WiFi based lip utterance detection (Wei et al., MobiCom ‘15) Voice localization (Zhang et al., CCS ‘16) … Wearable based approach (Feng et al., MobiCom ‘17) Speaker detection via magnetometer (Chen et al., ICDCS ‘17) Needs extra hardware / small range (~10 cm) / extensive finger-printing

Future works Adding state-of-the-art COTS Wi-Fi setup based respiration sensing ( up to 8 m, IMWUT 2019 ) Incorporating other wirelessly sensed soft biometrics Heart rate, mouth movement pattern etc. Moving from mechanical directional antenna to phased array Experimenting with more users and more diverse environments to evaluate the robustness

Key takeaways Proposing a wearable-free long range voice replay detection system Combining both wireless interfaces (Audio & WiFi) present in smart speakers for combating replay Exploiting distinctiveness of breathing while speech for enhancing the security of voice assistants

Thanks a lot ! swadhin@utexas.edu Thanks a lot for you attention. For questions and queries please mail the author Swadhin Pradhan at swadhin@utexas.edu