Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations Charles V. Wright Scott E. Coull Gerald M. Masson Lucas Ballard Fabian Monrose.

Slides:

Advertisements

Similar presentations

Traffic Morphing: An Efficient Defense Against Statistical Traffic Analysis Charles Wright, Scott Coull, Fabian Monrose Presented by Sruthi Vemulapalli.

Advertisements

Effects of Competence, Exposure, and Linguistic Backgrounds on Accurate Production of English Pure Vowels by Native Japanese and Mandarin Speakers Malcolm.

N Team 15: Final Presentation Peter Nyberg Azadeh Bararsani Adie Tong N N multicodec minisip.

Speech Compression. Introduction Use of multimedia in personal computers Requirement of more disk space Also telephone system requires compression Topics.

Non-Text Passwords CRyptography Applications Bistro Jessica Greer February 12, 2004.

Sean Powers Florida Institute of Technology ECE 5525 Final: Dr. Veton Kepuska Date: 07 December 2010 Controlling your household appliances through conversation.

2.4: Calculating Bandwidth Requirements for VoIP

Traffic Morphing: An Efficient Defense Against Statistical Traffic Analysis Presented by Yang Gao 11/2/2011 Charles V. Wright MIT Lincoln Laboratory Scott.

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.

Digital Voice Communication Link EE 413 – TEAM 2 April 21 st, 2005.

Fig. 2 – Test results Personal Memory Assistant Facial Recognition System The facial identification system is divided into the following two components:

© 2006 Cisco Systems, Inc. All rights reserved. 2.2: Digitizing and Packetizing Voice.

5/3/2006 tlpham VOIP/Security 1 Voice Over IP and Security By Thao L. Pham CS 525.

K. Salah 1 Chapter 28 VoIP or IP Telephony. K. Salah 2 VoIP Architecture and Protocols Uses one of the two multimedia protocols SIP (Session Initiation.

DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.

Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.

Digital Sound and Video Chapter 10, Exploring the Digital Domain.

Signatures As Threats to Privacy Brian Neil Levine Assistant Professor Dept. of Computer Science UMass Amherst.

Speech Signal Processing

Towards a Scalable and Secure VoIP Infrastructure Towards a Scalable and Secure VoIP Infrastructure Lab for Advanced Networking Systems Director: David.

Uncovering spoken phrases in encrypted VoIP conversations BY, RITESH CHANDRA REDDY GUNNA. PRASAD VUNNAM.

Computer Networks: Multimedia Applications Ivan Marsic Rutgers University Chapter 3 – Multimedia & Real-time Applications.

1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.

Applied Communications Technology Voice Over IP (VOIP) nas1, April 2012 How does VOIP work? Why are we interested? What components does it have? What standards.

Speaker : Chungyi Wang Advisor: Quincy Wu Date :

Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.

© 2006 Cisco Systems, Inc. All rights reserved. Optimizing Converged Cisco Networks (ONT) Module 2: Cisco VoIP Implementations.

© 2006 Cisco Systems, Inc. All rights reserved. Optimizing Converged Cisco Networks (ONT) Module 2: Cisco VoIP Implementations.

Math 5 Professor Barnett Timothy G. McManus Anthony P. Pastoors.

MULTIMEDIA TECHNOLOGY SMM 3001 MEDIA - VIDEO. In this chapter How digital video differs from conventional analog video How digital video differs from.

Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.

Media Handling in FreeSWITCH Moisés Silva Software Engineer / Manager

ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska

LOG Objectives  Describe some of the VoIP implementation challenges such as Delay/Latency, Jitter, Echo, and Packet Loss  Describe the voice encoding.

SIP Trunking As a Managed Service Why an E-SBC Matters By: Alon Cohen, CTO Phone.com.

Conditional Random Fields for ASR Jeremy Morris July 25, 2006.

ON THE SECURITY OF ANDROID COMMUNICATION APPS September 2015 By Shasi Pokharel Bachelor Of Information Technology (Honours) Supervisors: Dr. Raymond Choo,

Electrical Engineering Department EE-430 IP Telephony Presented by Adeeb Al-Harbi ID#

The Digital Revolution Changing information. What is Digital?  Discrete values used for  Input  Processing  Transmission  Storage  Display  Derived.

Performance Comparison of Speaker and Emotion Recognition

ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.

Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.

Voice over Internet Protocol Presenter: Devesh Patidar Arunjay Singh August 2, 2009.

3/10/2016 Subject Name: Computer Networks - II Subject Code: 10CS64 Prepared By: Madhuleena Das Department: Computer Science & Engineering Date :

Speech Recognition Created By : Kanjariya Hardik G.

Audio Formats. Digital sound files must be organized and structured so that your media player can read them. It's just like being able to read and understand.

PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.

Speaker Recognition UNIT -6. Introduction  Speaker recognition is the process of automatically recognizing who is speaking on the basis of information.

IIS for Speech Processing Michael J. Watts

Dr Rana Almbark r. 6th Annual Symposium for A-Level English Language Teachers (SALT) 1.

Sound Jan Růžička Institute of geoinformatics VSB-TU Ostrava 17.listopadu 15, Ostrava-Poruba,

Teaching Listening Why teach listening?

Using Speech Recognition to Predict VoIP Quality

VoIP ALLPPT.com _ Free PowerPoint Templates, Diagrams and Charts.

Speech Recognition

Team: Aaron Sproul Patrick Hamilton

Analog to digital conversion

Traffic Light Revision using Mandatory Content

VoIP -Voice over Internet Protocol

Conditional Random Fields for ASR

RTP: A Transport Protocol for Real-Time Applications

Introduction to Networking

Data Compression.

HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs

Linear Predictive Coding Methods

A research work by: Charles V. Wright, Scott E. Coull, Fabian Monrose

Direct Sequence Spread Spectrum Modulation and Demodulation using Compressive Sensing Under the guidance of M.Venugopala Rao Submitted by K.Y.K.Kumari.

ENGLISH PHONETICS AND PHONOLOGY Week 2

Human Computer Interaction Lecture 19 Universal Design

Investigation of Voice Traffic in Wi-Fi Environment

Presentation transcript:

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations Charles V. Wright Scott E. Coull Gerald M. Masson Lucas Ballard Fabian Monrose Paul DiOrio Rachel Lathbury Presented by

The Rise of Voice Over IP Projected this year: $3.19 billion; 16.6 million subscribers This year alone: +24.3% revenue; +21.2% subscribers ● Xbox Live ● Vonage, Skype, etc ● U.S. Army Land Warrior System ● Transport for traditional telephone signals ● Many users in a chat room setting ( statistics from IBISworld )‏ 2 Examples:

SIP and (S)RTP 3 SIP: Connection set-up, connection tear down RTP: Actual transmission of audio data SRTP is increasingly used for secure RTP transmission Transports the actual voice data SRTP uses Advanced Encryption Standard in one of two cipher modes to change from a block to a stream cipher It will become clear that this encryption gives a false sense of security

Audio Codec Codec: program designed to encode/decode a digital signal For Audio, converts an analog signal into a digital stream ● Good for storage or transport Audio files lend themselves nicely to lossy compression ● Eliminate inaudible sounds; “easy” vs “hard” sounds 4 Generally, it searches a collection of sounds and selects the closest match

Speex Audio Codec Code-Excited Linear Prediction Variable Bit Rates (VBR) Encodes a window of audio samples as a frame ● Sample rates of 8kHz, 16kHz or 32 kHz ● Bit rates range from 2 – 44 kbps 5

VBR Encoding 6 Goal: high sound quality with less information Easier sounds to encode require fewer bits per frame VBR encoder selects the best bit rate for each frame Vowels and fricatives encode at different bit rates ∴ Packet lengths are very good indicators of what bit rate was used

Encrypted packets? Packet Length Bit Rate Encoding (Wright, et. al.)‏ 7

8 Determining language spoken Searching for specific phrases Our research What accent is being spoken? Who is speaking? Ramifications Exploit VBR encoding and length preserving encryption

Some linguistic background 9 Vowels research, telephone, voice Fricatives research, telephone, voice Phoneme—smallest unit of language capable of distinguishing meaning Every language has its own native phonetic inventory and its own phonetic distribution Vowels are “harder” to encode than fricatives

Language Recognition (Wright, et. al.)‏ 10

(Wright, et. al.)‏ 11 Language Identification of Encrypted VoIP Traffic: Alejandro y Roberto or Alice and Bob? 2,066 speakers, 21 languages 66% accuracy (14x > random guessing) 14 languages achieve 90% Binary decisions average 86%

Our Research We hope to use similar techniques to discover which accent of English is being spoken We predict that individual accents will leak information despite encryption Our motivations: ● Part of a person's voiceprint ● Discover how much information can be exploited ● Save the world (or at least help)‏ 12

Accents and Individuals The ultimate goal: search for an individual profile To accomplish this we will begin by making accent profiles Find linguistic differences between English accents Examine these differences and their effect on packet length Ideal: Create sufficiently dissimilar packet length distributions for each accent Likely Reality: Combination of packet length distributions and other techniques 13

14 (Wright, et. al.)‏ Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations

Gathering the training data Used a large corpus of native English speakers We're using a corpus of non-native English speakers ● Speech Accent Archive (George Mason) ● 909 available samples Two utterances of a word will not be the same ● Intonation, rhythm, stress, etc. Use Hidden Markov Models for variation tolerance 15

Basic Hidden Markov Model‏ Example: a blind hermit meteorologist using seaweed 16 model: seaweed: weather:

Hidden Markov Model 17 (Wright, et. al.)‏

Testing techniques Limited corpus: target is “the bike” (dh ah b ay k)‏ “the” (dh ah) “a bird” (ah b er d)[ (dh ah) (ah b) (b ay k) ] “bicameral” (b ay k ae m ax r ax l)‏ This technique: achieved recall and precision at 0.28 ● More realistic pronunciations achieved ~0.50 Our hope: this difference shows up with accents as well 18

The Experiment and Results Attacker has a 1 in 3 chance of finding target phrase Results depended on specific phrases: “Young children should avoid exposure to contagious diseases” Precision: 1.0, recall:.99 “ The fog prevented them from arriving on time” Precision: 0.84, recall: 0.72 Median true positive rate was 63%, but 20% of speakers had true positive rate under 50% 19

Mitigation and Success 20 Languages can be recognized with as much as 90% accuracy Phrases can be located with 63% accuracy We hope accents can be found at similar rates Default SRTP encryption methods are not sufficient Padding mitigates risk Performance decrease Adding noise is not effective

21 Communicate at your own risk Paul DiOrio Rachel Lathbury Prof. Dave Evans