Download presentation
Presentation is loading. Please wait.
Published byLesley Stephens Modified over 9 years ago
2
Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1
3
2 Digits sequence Noisy digits sequence Denoised by state of the art algorithm of Cohen & Berdugo Segev, Schechner, Elad, Cross-Modal Denoising
4
Use one modality to denoise another? Use video to denoise a soundtrack? 3 Segev, Schechner, Elad, Cross-Modal Denoising
5
a Very intense Non-stationary Unknown Unseen source. Noise Single microphone 4 Segev, Schechner, Elad, Cross-Modal Denoising
6
5 very noisy audio time (sec) Input Algorithm denoised audio Output For human and machine hearing video Cross-modal Example-Based Segev, Schechner, Elad, Cross-Modal Denoising
7
6
8
7
9
8 Training xample set nput test set I E Segev, Schechner, Elad, Cross-Modal Denoising
10
9
11
10 ~syllable (0.25 sec) Segev, Schechner, Elad, Cross-Modal Denoising
12
lophone 11 Xylophone Segev, Schechner, Elad, Cross-Modal Denoising
13
lophone 12 Sound Xylophone Segev, Schechner, Elad, Cross-Modal Denoising
14
13... Examples Segev, Schechner, Elad, Cross-Modal Denoising
15
14... Examples Segev, Schechner, Elad, Cross-Modal Denoising
16
15... Examples Segev, Schechner, Elad, Cross-Modal Denoising
17
16... Examples Segev, Schechner, Elad, Cross-Modal Denoising
18
Cross-modal representation. 17 Generating multimodal features. Cross-modal pattern recognition. Rendering a denoised signal. Learning feature statistics. Segev, Schechner, Elad, Cross-Modal Denoising
19
18 Input videoVideo feature-space time (sec) Input audio Audio feature-space Segev, Schechner, Elad, Cross-Modal Denoising
20
19 Input audio-video time (sec) Audio-video feature-space Segev, Schechner, Elad, Cross-Modal Denoising
21
20 Training audio-video Audio-video examples feature-space time (sec) Segev, Schechner, Elad, Cross-Modal Denoising
22
21 Feature-space Segev, Schechner, Elad, Cross-Modal Denoising
23
22 Feature-space Segev, Schechner, Elad, Cross-Modal Denoising
24
23 Feature-space Segev, Schechner, Elad, Cross-Modal Denoising
25
24 Nearest Neighbor Feature-space Segev, Schechner, Elad, Cross-Modal Denoising
26
25 Nearest Neighbor Feature-space Segev, Schechner, Elad, Cross-Modal Denoising
27
26 Examples... Segev, Schechner, Elad, Cross-Modal Denoising
28
27 Examples... Segev, Schechner, Elad, Cross-Modal Denoising
29
28 Noisy audio Clean segment Segev, Schechner, Elad, Cross-Modal Denoising
30
29 Noisy audio Clean segment Denoised Segev, Schechner, Elad, Cross-Modal Denoising
31
Examples... 30 Segev, Schechner, Elad, Cross-Modal Denoising
32
31 Examples... Input... Segev, Schechner, Elad, Cross-Modal Denoising
33
32... Examples Input Segev, Schechner, Elad, Cross-Modal Denoising
34
33... Examples Input Segev, Schechner, Elad, Cross-Modal Denoising
35
34... Examples Input Segev, Schechner, Elad, Cross-Modal Denoising
36
Bartender experiment 35 Segev, Schechner, Elad, Cross-Modal Denoising
37
36... Examples Input Segev, Schechner, Elad, Cross-Modal Denoising
38
Cross-modal representation. 37 Generating multimodal features. Cross-modal pattern recognition (NN). Rendering a denoised signal. Learning feature statistics. Segev, Schechner, Elad, Cross-Modal Denoising
39
38 Feature-space Segev, Schechner, Elad, Cross-Modal Denoising
40
39 Feature-space For the k-th example segment: Segev, Schechner, Elad, Cross-Modal Denoising
41
40 Feature-space bi fif ty two ar bi -fif -ty-two For the k-th example segment: Segev, Schechner, Elad, Cross-Modal Denoising
42
41 Current cluster Next cluster bityfiftwoar bi ty fif two ar 1 1 1 1 1 1 1 Feature-space bi fif ty two ar 1 2 1 Segev, Schechner, Elad, Cross-Modal Denoising
43
42 Current cluster Next cluster bityfiftwoar bi ty fif two ar 13 17 22 9 43 21 53 60 2 3 711 6 23 12 5 7 6 1 2 4 5261 12 Syllable consecutive probability The probability for transition between clusters = Number of examples in training set Segev, Schechner, Elad, Cross-Modal Denoising
44
43 Hidden Markov Model P Time delay bifif ty two bi Segev, Schechner, Elad, Cross-Modal Denoising
45
44 P Time delay bifif ty two bi Audio noise Segev, Schechner, Elad, Cross-Modal Denoising
46
45 Hidden Markov Model P Time delay bifif ty two bi + Audio noise Segev, Schechner, Elad, Cross-Modal Denoising
47
46 Examples... Input... Segev, Schechner, Elad, Cross-Modal Denoising
48
47... Examples Input... Segev, Schechner, Elad, Cross-Modal Denoising
49
48... Examples Input... Segev, Schechner, Elad, Cross-Modal Denoising
50
49 Input video Segev, Schechner, Elad, Cross-Modal Denoising
51
50 Input video Segev, Schechner, Elad, Cross-Modal Denoising
52
51 Input video Vector of indices Segev, Schechner, Elad, Cross-Modal Denoising
53
52 A Cost function A Regularization term A Data term A Regularization term A Data term Segev, Schechner, Elad, Cross-Modal Denoising
54
53 A Cost function A Regularization term A Data term A Regularization term A Data term Optimally vector of indices Segev, Schechner, Elad, Cross-Modal Denoising
55
54 nodes edges Complexity : Examples Input... Complexity: Dynamic Programming Segev, Schechner, Elad, Cross-Modal Denoising
56
55... Examples Input... Segev, Schechner, Elad, Cross-Modal Denoising
57
56... Examples Input... Segev, Schechner, Elad, Cross-Modal Denoising
58
57... Examples Input... Segev, Schechner, Elad, Cross-Modal Denoising
59
Cross-modal representation. 58 Generating multimodal features. Cross-modal pattern recognition. Rendering a denoised signal. Learning feature statistics. Segev, Schechner, Elad, Cross-Modal Denoising
60
Audio Features 59 Sensitivity to sound perception. Dimension reduction Visual Features Focusing on the motion of interest Dimension reduction Speech Features Music Features Requirements The spatial trajectory of a hitting rod DCT coefficients MFCCs Spectrogram of each segment Segev, Schechner, Elad, Cross-Modal Denoising
61
60 MFCCs – Mel-frequency Ceptral Coefficients Audio signal Signal spectrum Mel-frequency filter bank log(. ) DCT MFCCs Segev, Schechner, Elad, Cross-Modal Denoising
62
61 Spectrogram of each segment Spectrogram Xylophne signal Spectrogram accumulation Segev, Schechner, Elad, Cross-Modal Denoising
63
The given movie 62... speech Segev, Schechner, Elad, Cross-Modal Denoising
64
Locking on the object of interest 63... speech Segev, Schechner, Elad, Cross-Modal Denoising
65
64... speech Extracting global motion by tracking Segev, Schechner, Elad, Cross-Modal Denoising
66
65... speech Extracting global motion by tracking Segev, Schechner, Elad, Cross-Modal Denoising
67
Extracting features 66 DCT coefficients which highly represent motion between frames speech Segev, Schechner, Elad, Cross-Modal Denoising
68
The given movie 67... Xylophone Segev, Schechner, Elad, Cross-Modal Denoising
69
Locking on the object of interest 68 Xylophone... Segev, Schechner, Elad, Cross-Modal Denoising
70
Extracting global motion by tracking 69 Xylophone... X Z Y Segev, Schechner, Elad, Cross-Modal Denoising
71
70 Xylophone... X ZY Extracting global motion by tracking Segev, Schechner, Elad, Cross-Modal Denoising
72
Extracting features 71 Xylophone Hitting rod spatial coordinates X Y Z Segev, Schechner, Elad, Cross-Modal Denoising
73
Speech 72 A corpus of a limited number of words and syllables: Digits and bar beverages. Video rate 25fps, Audio rate 8000Hz. Kmeans clustering, 350 clusters. Distance measurement l 2 norm. Xylophone A corpus of a limited sounds. Video rate 25fps, Audio rate 16000Hz Distance measurement l 2 norm. Segev, Schechner, Elad, Cross-Modal Denoising
74
73 Xylophone Training duration: 103 sec Testing duration : 100 sec Music from song by GNR: SNR = 0.9 Xylophone Melody: SNR = 1 Segev, Schechner, Elad, Cross-Modal Denoising
75
Speech: Digits 74 Training duration: 60 sec Testing duration : 240 sec NoisyDenoised SNR = 0.07 Segev, Schechner, Elad, Cross-Modal Denoising
76
Speech: Bartender 75 Music from song by Phil Collins Male SpeechWhite Gaussian Training duration: 48 sec Testing duration : 350 sec SNR = 0.59 SNR = 0.3SNR = 0.38 Segev, Schechner, Elad, Cross-Modal Denoising
77
76 video very noisy audio time (sec) Input Algorithm denoised audio Output For human and machine hearing Example-based Hidden Markov Model Segev, Schechner, Elad, Cross-Modal Denoising
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.