Voice Removal from Music

Voice Removal from Music
Benjamas Panomruttanarug and Adam Berenzweig Final Project for EE4810 DSP Prof. Dan Ellis

Introduction What? Why? How? Remove singing from music Karaoke
Research tool How? Stereo center channel subtraction Enhancements to reduce side effects

Stereo Two ears, two channels. Audio Perception
Spatialization IID, ITD Audio Engineering Techniques “Stereo Field”, Panning Record voice mono Voice usually panned center Exceptions: stereo reverb, vocal doubling, etc. voice bass guitar hi-hat

Overview of Experiments
sound output Simple method spectrogram sound output Filter_yui method Original sound spectrogram Filter_adam method sound output sound output Mask method sound output sound output

Phase Reversal or Center-channel Subtraction
Cheap trick: Signal panned center contributes equally to L and R. Phase-reverse R (180º) and add to L is equivalent to subtraction. L(t) = .5 x1(t) + .8 x2(t) R(t) = .5 x1(t) + .2 x2(t) --> L(t)-R(t) = .6 x2(t)

Side Effects Doesn’t work if voice not centered. Stereo is lost.
Stereo Reverb Where’s the Beef? Lost bass. Other strangeness

Restoring Stereo: Phase decorrelation
Auditory system uses phase for localization correlated phase may be used to attribute multiple reflections to the same sound source precedence effect Decorrelating phase tricks auditory system into “externalizing” sound.

Simple Filtering to Reduce Side Effects
Only perform center-channel subtraction on frequencies where voice is present. Some stereo restored original plain ccs

Masking Method Selectively filter in frequency and time.
Combine masked |STFT| of L-R with original |STFT|. Preserve original phase.

Determining the Mask Crude heuristic: three thresholds
Absolute magnitude (in voiceband) |L| Magnitude difference |L| - |L-R| Phase difference angle(L) - angle(R)

Future Work Better mask? Time-domain segmentation first
Source separation… very hard. Time-domain segmentation first Machine learning techniques Adaptive optimal threshold determination? Current heuristic method has too many parameters to tweak.

References Kendall, G. “The Decorrelation of Audio Signals and Its Impact on Spatial Imagery”. Computer Music Journal, 19:4 Winter 1995. Erik Larsen, Philips. Private Communication.

Voice Removal from Music

Similar presentations

Presentation on theme: "Voice Removal from Music"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Voice Removal from Music

Similar presentations

Presentation on theme: "Voice Removal from Music"— Presentation transcript:

Similar presentations

About project

Feedback