Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1
2 Digits sequence Noisy digits sequence Denoised by state of the art algorithm of Cohen & Berdugo Segev, Schechner, Elad, Cross-Modal Denoising
Use one modality to denoise another? Use video to denoise a soundtrack? 3 Segev, Schechner, Elad, Cross-Modal Denoising
a Very intense Non-stationary Unknown Unseen source. Noise Single microphone 4 Segev, Schechner, Elad, Cross-Modal Denoising
5 very noisy audio time (sec) Input Algorithm denoised audio Output For human and machine hearing video Cross-modal Example-Based Segev, Schechner, Elad, Cross-Modal Denoising
6
7
8 Training xample set nput test set I E Segev, Schechner, Elad, Cross-Modal Denoising
9
10 ~syllable (0.25 sec) Segev, Schechner, Elad, Cross-Modal Denoising
lophone 11 Xylophone Segev, Schechner, Elad, Cross-Modal Denoising
lophone 12 Sound Xylophone Segev, Schechner, Elad, Cross-Modal Denoising
13... Examples Segev, Schechner, Elad, Cross-Modal Denoising
14... Examples Segev, Schechner, Elad, Cross-Modal Denoising
15... Examples Segev, Schechner, Elad, Cross-Modal Denoising
16... Examples Segev, Schechner, Elad, Cross-Modal Denoising
Cross-modal representation. 17 Generating multimodal features. Cross-modal pattern recognition. Rendering a denoised signal. Learning feature statistics. Segev, Schechner, Elad, Cross-Modal Denoising
18 Input videoVideo feature-space time (sec) Input audio Audio feature-space Segev, Schechner, Elad, Cross-Modal Denoising
19 Input audio-video time (sec) Audio-video feature-space Segev, Schechner, Elad, Cross-Modal Denoising
20 Training audio-video Audio-video examples feature-space time (sec) Segev, Schechner, Elad, Cross-Modal Denoising
21 Feature-space Segev, Schechner, Elad, Cross-Modal Denoising
22 Feature-space Segev, Schechner, Elad, Cross-Modal Denoising
23 Feature-space Segev, Schechner, Elad, Cross-Modal Denoising
24 Nearest Neighbor Feature-space Segev, Schechner, Elad, Cross-Modal Denoising
25 Nearest Neighbor Feature-space Segev, Schechner, Elad, Cross-Modal Denoising
26 Examples... Segev, Schechner, Elad, Cross-Modal Denoising
27 Examples... Segev, Schechner, Elad, Cross-Modal Denoising
28 Noisy audio Clean segment Segev, Schechner, Elad, Cross-Modal Denoising
29 Noisy audio Clean segment Denoised Segev, Schechner, Elad, Cross-Modal Denoising
Examples Segev, Schechner, Elad, Cross-Modal Denoising
31 Examples... Input... Segev, Schechner, Elad, Cross-Modal Denoising
32... Examples Input Segev, Schechner, Elad, Cross-Modal Denoising
33... Examples Input Segev, Schechner, Elad, Cross-Modal Denoising
34... Examples Input Segev, Schechner, Elad, Cross-Modal Denoising
Bartender experiment 35 Segev, Schechner, Elad, Cross-Modal Denoising
36... Examples Input Segev, Schechner, Elad, Cross-Modal Denoising
Cross-modal representation. 37 Generating multimodal features. Cross-modal pattern recognition (NN). Rendering a denoised signal. Learning feature statistics. Segev, Schechner, Elad, Cross-Modal Denoising
38 Feature-space Segev, Schechner, Elad, Cross-Modal Denoising
39 Feature-space For the k-th example segment: Segev, Schechner, Elad, Cross-Modal Denoising
40 Feature-space bi fif ty two ar bi -fif -ty-two For the k-th example segment: Segev, Schechner, Elad, Cross-Modal Denoising
41 Current cluster Next cluster bityfiftwoar bi ty fif two ar Feature-space bi fif ty two ar Segev, Schechner, Elad, Cross-Modal Denoising
42 Current cluster Next cluster bityfiftwoar bi ty fif two ar Syllable consecutive probability The probability for transition between clusters = Number of examples in training set Segev, Schechner, Elad, Cross-Modal Denoising
43 Hidden Markov Model P Time delay bifif ty two bi Segev, Schechner, Elad, Cross-Modal Denoising
44 P Time delay bifif ty two bi Audio noise Segev, Schechner, Elad, Cross-Modal Denoising
45 Hidden Markov Model P Time delay bifif ty two bi + Audio noise Segev, Schechner, Elad, Cross-Modal Denoising
46 Examples... Input... Segev, Schechner, Elad, Cross-Modal Denoising
47... Examples Input... Segev, Schechner, Elad, Cross-Modal Denoising
48... Examples Input... Segev, Schechner, Elad, Cross-Modal Denoising
49 Input video Segev, Schechner, Elad, Cross-Modal Denoising
50 Input video Segev, Schechner, Elad, Cross-Modal Denoising
51 Input video Vector of indices Segev, Schechner, Elad, Cross-Modal Denoising
52 A Cost function A Regularization term A Data term A Regularization term A Data term Segev, Schechner, Elad, Cross-Modal Denoising
53 A Cost function A Regularization term A Data term A Regularization term A Data term Optimally vector of indices Segev, Schechner, Elad, Cross-Modal Denoising
54 nodes edges Complexity : Examples Input... Complexity: Dynamic Programming Segev, Schechner, Elad, Cross-Modal Denoising
55... Examples Input... Segev, Schechner, Elad, Cross-Modal Denoising
56... Examples Input... Segev, Schechner, Elad, Cross-Modal Denoising
57... Examples Input... Segev, Schechner, Elad, Cross-Modal Denoising
Cross-modal representation. 58 Generating multimodal features. Cross-modal pattern recognition. Rendering a denoised signal. Learning feature statistics. Segev, Schechner, Elad, Cross-Modal Denoising
Audio Features 59 Sensitivity to sound perception. Dimension reduction Visual Features Focusing on the motion of interest Dimension reduction Speech Features Music Features Requirements The spatial trajectory of a hitting rod DCT coefficients MFCCs Spectrogram of each segment Segev, Schechner, Elad, Cross-Modal Denoising
60 MFCCs – Mel-frequency Ceptral Coefficients Audio signal Signal spectrum Mel-frequency filter bank log(. ) DCT MFCCs Segev, Schechner, Elad, Cross-Modal Denoising
61 Spectrogram of each segment Spectrogram Xylophne signal Spectrogram accumulation Segev, Schechner, Elad, Cross-Modal Denoising
The given movie speech Segev, Schechner, Elad, Cross-Modal Denoising
Locking on the object of interest speech Segev, Schechner, Elad, Cross-Modal Denoising
64... speech Extracting global motion by tracking Segev, Schechner, Elad, Cross-Modal Denoising
65... speech Extracting global motion by tracking Segev, Schechner, Elad, Cross-Modal Denoising
Extracting features 66 DCT coefficients which highly represent motion between frames speech Segev, Schechner, Elad, Cross-Modal Denoising
The given movie Xylophone Segev, Schechner, Elad, Cross-Modal Denoising
Locking on the object of interest 68 Xylophone... Segev, Schechner, Elad, Cross-Modal Denoising
Extracting global motion by tracking 69 Xylophone... X Z Y Segev, Schechner, Elad, Cross-Modal Denoising
70 Xylophone... X ZY Extracting global motion by tracking Segev, Schechner, Elad, Cross-Modal Denoising
Extracting features 71 Xylophone Hitting rod spatial coordinates X Y Z Segev, Schechner, Elad, Cross-Modal Denoising
Speech 72 A corpus of a limited number of words and syllables: Digits and bar beverages. Video rate 25fps, Audio rate 8000Hz. Kmeans clustering, 350 clusters. Distance measurement l 2 norm. Xylophone A corpus of a limited sounds. Video rate 25fps, Audio rate 16000Hz Distance measurement l 2 norm. Segev, Schechner, Elad, Cross-Modal Denoising
73 Xylophone Training duration: 103 sec Testing duration : 100 sec Music from song by GNR: SNR = 0.9 Xylophone Melody: SNR = 1 Segev, Schechner, Elad, Cross-Modal Denoising
Speech: Digits 74 Training duration: 60 sec Testing duration : 240 sec NoisyDenoised SNR = 0.07 Segev, Schechner, Elad, Cross-Modal Denoising
Speech: Bartender 75 Music from song by Phil Collins Male SpeechWhite Gaussian Training duration: 48 sec Testing duration : 350 sec SNR = 0.59 SNR = 0.3SNR = 0.38 Segev, Schechner, Elad, Cross-Modal Denoising
76 video very noisy audio time (sec) Input Algorithm denoised audio Output For human and machine hearing Example-based Hidden Markov Model Segev, Schechner, Elad, Cross-Modal Denoising