Privacy Protection for Life-log Video Jayashri Chaudhari November 27, 2007 Department of Electrical and Computer Engineering University of Kentucky, Lexington,

Privacy Protection for Life-log Video Jayashri Chaudhari November 27, 2007 Department of Electrical and Computer Engineering University of Kentucky, Lexington, KY 40507

Outline Motivation and Background Proposed Life-Log System Privacy Protection Methodology Face Detection and Blocking Voice Segmentation and Distortion Experimental Results Segmentation Algorithm Analysis Audio Distortion Analysis Conclusions

What is a Life-Log System? Applications include Law enforcement Police Questioning Tourism Medical Questioning Journalism “A system that records everything, at every moment and everywhere you go” Existing Systems/work 1)“ MyLifeBits Project”: At Microsoft Research 2)“WearCam” Project: At University of Toronto, Steve Mann 3)“Cylon Systems”: http::/cylonsystems.com at UK (a portable body worn surveillance system)

Technical Challenges Security and Privacy Information management and storage Information Retrieval Knowledge Discovery Human Computer Interface

Why Privacy Protection? Privacy is fundamental right of every citizen Emerging technologies threaten privacy right There are no clear and uniform rules and regulations regarding video recording People are resistant toward technologies like life-log Without tackling these issues the deployment of such emerging technologies is impossible

Research Contributions Practical audio-visual privacy protection scheme for life-log systems Performance measurement (audio) on Privacy protection Usability

Proposed Life-log System “A system that protects the audiovisual privacy of the persons captured by a portable video recording device”

Privacy Protection Scheme Design Objectives Privacy Hide the identity of the subjects being captured Privacy verses usefulness: Recording should convey sufficient information to be useful √ Usefulness × Privacy × Usefulness √ Privacy √ Usefulness √ Privacy

Design Objectives Anonymity or Ambiguity The scheme should generate ambiguous identity of the recorded subjects. Every individual will look and sound identical Reduce correlation attacks Speed Protection scheme should work in real time Interview Scenario Producer is speaking with a single subject in relative quiet room

Privacy Protection Scheme Overview audio Audio Segmentation Audio Distortion Face Detection and Blocking Face Detection and Blocking video Synchronization & Multiplexing storage S P S: Subject (The person who is being recorded) P: Producer (The person who is the user of the system)

Voice Segmentation and distortion State k =State k-1 or Subject or Producer Windowed Power, P k Computation Windowed Power, P k Computation P k <T S P k <T P Y Y State k = Producer State k = Subject Storage Pitch Shifting We use the PitchSOLA time-domain pitch shifting method. * “DAFX: Digital Audio Effects” by Udo Zölzer et al.

Pitch Shifting Algorithm Pitch Shifting (Synchronous Overlap and Add): Steps 1) Time Stretching by a factor of α using window of size N and stepsize Sa Input Audio N X1(n) Sa X2(n) α*Sa Step 2) Re-sampling by a factor of 1/α to change pitch X2(n) Km Max correlation Reduce discontinuity in phase and pitch Mixing

Face Detection and Blocking camera Face Detection Face Detection Face detection is based on Viola & Jones 2001. Face Tracking Face Tracking Subject Selection Subject Selection Selective Blocking Selective Blocking Audio segmentation results Subject talking Producer talking

Initial Experiments 1 Analysis of Segmentation algorithm Analysis of Audio distortion algorithm 1) Accuracy in hiding identity 2) Usability after distortion 1: Chaudhari J., S.-C. Cheung, and M. V. Venkatesh. Privacy protection for life-log video. In IEEE Signal Processing Society SAFE 2007: Workshop on Signal Processing Applications for Public Security and Forensics, 2007.

Segmentation Experiment Experimental Data: Interview Scenario in quiet meeting room Three interviews recording of about 1 minute and 30 seconds long Transitions P S PSP P S Silence S: Subject Speaking P: Producer Speaking 

Segmentation Results Meeting#Transition# (Ground truth) Correctly identified transitions# Falsely detected Transitions# PrecisionRecall 176100.3750.857 27750.5831 366100.3531

Comparison With CMU Segmentation Algorithm Meeting #Our AlgorithmCMU Algorithm PrecisionRecallPrecisionRecall 10.3750.8570.6670.57 20.583110.57 30.35310.40.5 CMU audio segmentation algorithm 1 used as benchmark 1:Matthew A. Seigler, Uday Jain, Bhiksha Raj, and Richard M. Stern. Automatic segmentation, classification and clustering of broadcast news audio. In Proceedings of the Ninth Spoken Language Systems Technology Workshop, Harriman, New York, 1997.

Speaker Identification Experiment Experimental Data 11 Test subjects, 2 voice samples from each subject One voice sample is used as training and the other is used for testing Public domain speaker recognition software Script1 This script is used for training the speaker recognition software Train Test Script2 This script is used to test the performance of audio distortion in hiding the identity

Speaker Identification Results Person ID Without Distortion (Person ID identified) Distortion 1 (Person ID identified) Distortion 2 (Person ID identified) Distortion 3 (Person ID identified) 11585 22686 33535 44665 553106 66865 77525 88 115 99585 10 525 11 485 Error Rate 0%100%90.9%100% Distortion 1: (N=2048, Sa=256, α =1.5) Distortion 2: (N=2048, Sa=300, α =1.1) Distortion 3: (N=1024, Sa=128, α =1.5)

Usability Experiments Experimental Data 8 subjects, 2 voice samples from each subject One voice sample is used without distortion and the other is distorted Manual transcription (5 human tester) 1.Wav (transcription1) This transcription is of undistorted voice --- stored in one dot wav file. 2.Wav (transcription2) This transcription is of distorted voice sample --- in two dot wav ---. Manual Transcription Unrecognized words

Usability after distortion Word Error Rate: Standard measure of word recognition error for speech recognition system WER= (S+D+I) /N S = # substitution D = # deletion I = # insertion N = # words in reference sample Tool used: NIST tool SCLITE

Extended Experiments Data set TIMIT (Texas Instruments and Massachusetts Institute of Technology) Speech Corpora Experimental Setup Allowable range of alpha (α): 0.2-2.0 Five alpha values (α=0.5,0.75,1,1.25,1.40) Increase the scope of experiments “Subjective Experiments”: Use testers to access privacy and usability Privacy Experiments (Speaker Identification)

Total 30 audio clips in each set Re-divide the audio clips from each sets into five groups (1-5) Each group consists of 6 audio clips randomly selected from each set Each group was assigned to three testers and were asked to do 3 tasks TIMIT Corpora (630 speakers, 10 audio clips per speaker) Our Experiments (30 speakers, 5 audio clips per speaker) Set A (α=1) Set B (α=0.5) Set C (α=0.75) Set E (α=1.40) Set D (α=1.25) Experimental Setup

Task 1: Transcribe audio clips in the assigned group. Purpose: Determine usability of the recording after distortion Results: Metric: WER for each transcription by the tester Average WER for each clip from 3 testers WER for Speaker with the given alpha(α) value Subjective Experiments

Average WER per speaker for each alpha value (0-30) (0-60) (0-35)

Average WER per Set A B C D E 14.2 100 22.4 15.3 14.4 Sets

Statistical Analysis Z-test calculations Null Hypothesis: The average WER does not change (from Set A (before distortion) ) after the distortion for a given value of pitch scaling parameter (alpha) H 0 : p1 = p2 (Null Hypothesis) H a : p1 != p2 ParametersValue Population Size12*30=360 α0.05 Confidence Level95% Z-Test critical ( |Z α/2 | ) 1.96 Rule for Rejection of H 0 Z>=Z α/2 or Z<=-Z α/2 ComparisonStatistics Set A and B (0.50)46.71>=1.96 Set A and C (0.75)2.873>=1.96 Set A and D (1.25)0.419<=1.96 Set A and E (1.40)0.0695<=1.96 Z-Test parameters Z-Test Results

Subjective Experiments GroupAverage # of distinct voices per subset (Each subset consist of 6 audio clips) Subset of A Subset of B Subset of C Subset of D Subset of E 16.03.334.334.03.33 26.03.03.334.0 36.02.04.03.04.0 46.02.674.03.672.67 56.03.0 3.674.0 Average Number of Distinct voices 6.02.753.923.673.50 Task 2: Identify the number of distinct voices in each subset in the assigned group. Purpose: Estimate ambiguity created by pitch shifting Results:

Subjective Experiments Task 3: For each clip from subset of Set A (which is the original un-distorted speech set); identify a clip in other subsets in which the same speaker may be speaking Purpose: Qualitatively measure the assurance of Privacy Protection achieved by distortion Results: None of the speakers from set A was identified from other distorted sets. (100% Recognition Error Rate)

Privacy Experiments Speaker Identification Experiments ASR tools (LIA_Spk-Det and ALIZE) 1 by LIA lab at the University of Avignon Speaker Verification Tool GMM-UBM (Gaussian Mixture Model-Universal Background Model) Single Speaker Independent Background Model Decision: Likelihood Ratio: 1: Bonastre, J.-F., Wild, F., Alize: a free, open tool for speaker recognition, http://www.lia.univ- avignon.fr/heberges/ALIZE/

LIA_RAL Speaker-Det Warping TrainingInitialization World Modeling Bayesian Adaptation (MAP) Target Speaker Modeling 32 coefficients = 16 LFCC + 16 derivative coefficients ( SPRO4) 2 GMM (2048 components) 1: Male 2:Female Feature Extraction (SPRO Tool) Silence Frame Removal (EnergyDetector) Parameter Normalization (NormFeat) Front Processing Adapts a World Model (TrainWorld) (TrainTarget) Speaker Detection (ComputeTest) Feature Vectors

Experimental Setup World Model Number of male speakers = 325 Number of female speakers = 135 Target Speaker Model Number of male test clips = 20 Number of female test clips = 10 Two sets of experiments Same Model: World Model and Individual Speaker Models: (Training Set: distorted speech with the corresponding alpha) Cross Model: World Model and Individual Speaker Models: (Training Set: un- distorted speech)

Privacy Results AlphaSexSame ModelCross Model Set AM1.0 Set AF4.4 Set BM2.5150.75 Set BF1.757.80 Set CM8.65170.90 Set CF5.446.40 Set DM-185.75 Set DF20.3067.80 Set EM52.05157.45 Set EF29.2079.80 Conclusions Cross Model: Distorted speech, no matter what alpha value is used, is very different from the original speech. Same Model: Set B and Set C do not provide adequate protection as the rank is still very near the top. Numbers in table is Average rank for the true speakers of the test clips for the corresponding alpha value

Example Video

Conclusions Proposed Real time implementation of voice- distortion and face blocking for privacy protection in Life-log video Analysis of Audio Segmentation Analysis of Audio Distortion for usability Analysis of Audio Distortion for privacy protection

Acknowledgment Prof. Samson Cheung People at Center of Visualization and Virtual Environment Prof. Donohue and Prof. Zhang Thank you!

Voice Distortion Voice Identity Vocal Track (Formats) : Filters Vocal Chord (Pitch): Excitation Source Different ways to distort audio: Random mixture Makes the recording useless Voice Transformation For example, More Complex, not suitable for real-time applications Pitch-shifting Changes the pitch of voice Keeps the recording useful PitchSOLA time-domain pitch shifting method. * “DAFX: Digital Audio Effects” by U. Z. et al. Simple with less complexity

Cross Model: World Model and Individual Speaker Models: (Training Set: un-distorted speech) Same Model World Model and Individual Speaker Models: (Training Set: distorted speech)

Privacy Protection for Life-log Video Jayashri Chaudhari November 27, 2007 Department of Electrical and Computer Engineering University of Kentucky, Lexington,

Similar presentations

Presentation on theme: "Privacy Protection for Life-log Video Jayashri Chaudhari November 27, 2007 Department of Electrical and Computer Engineering University of Kentucky, Lexington,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Privacy Protection for Life-log Video Jayashri Chaudhari November 27, 2007 Department of Electrical and Computer Engineering University of Kentucky, Lexington,

Similar presentations

Presentation on theme: "Privacy Protection for Life-log Video Jayashri Chaudhari November 27, 2007 Department of Electrical and Computer Engineering University of Kentucky, Lexington,"— Presentation transcript:

Similar presentations

About project

Feedback