Presentation is loading. Please wait.

Presentation is loading. Please wait.

Voice Removal from Music

Similar presentations


Presentation on theme: "Voice Removal from Music"— Presentation transcript:

1 Voice Removal from Music
Benjamas Panomruttanarug and Adam Berenzweig Final Project for EE4810 DSP Prof. Dan Ellis

2 Introduction What? Why? How? Remove singing from music Karaoke
Research tool How? Stereo center channel subtraction Enhancements to reduce side effects

3 Stereo Two ears, two channels. Audio Perception
Spatialization IID, ITD Audio Engineering Techniques “Stereo Field”, Panning Record voice mono Voice usually panned center Exceptions: stereo reverb, vocal doubling, etc. voice bass guitar hi-hat

4 Overview of Experiments
sound output Simple method spectrogram sound output Filter_yui method Original sound spectrogram Filter_adam method sound output sound output Mask method sound output sound output

5 Phase Reversal or Center-channel Subtraction
Cheap trick: Signal panned center contributes equally to L and R. Phase-reverse R (180º) and add to L is equivalent to subtraction. L(t) = .5 x1(t) + .8 x2(t) R(t) = .5 x1(t) + .2 x2(t) --> L(t)-R(t) = .6 x2(t)

6 Side Effects Doesn’t work if voice not centered. Stereo is lost.
Stereo Reverb Where’s the Beef? Lost bass. Other strangeness

7 Restoring Stereo: Phase decorrelation
Auditory system uses phase for localization correlated phase may be used to attribute multiple reflections to the same sound source precedence effect Decorrelating phase tricks auditory system into “externalizing” sound.

8 Simple Filtering to Reduce Side Effects
Only perform center-channel subtraction on frequencies where voice is present. Some stereo restored original plain ccs

9 Masking Method Selectively filter in frequency and time.
Combine masked |STFT| of L-R with original |STFT|. Preserve original phase.

10 Determining the Mask Crude heuristic: three thresholds
Absolute magnitude (in voiceband) |L| Magnitude difference |L| - |L-R| Phase difference angle(L) - angle(R)

11 Future Work Better mask? Time-domain segmentation first
Source separation… very hard. Time-domain segmentation first Machine learning techniques Adaptive optimal threshold determination? Current heuristic method has too many parameters to tweak.

12 References Kendall, G. “The Decorrelation of Audio Signals and Its Impact on Spatial Imagery”. Computer Music Journal, 19:4 Winter 1995. Erik Larsen, Philips. Private Communication.


Download ppt "Voice Removal from Music"

Similar presentations


Ads by Google