A System for Hybridizing Vocal Performance By Kim Hang Lau
Parameters of the singing voice Parameters of the singing voice can be loosely classified as: –Timbre –Pitch contour –Time contour (rhythm) –Amplitude envelope (projections)
Vocal Modification Vocal modification refers to the signal processing of live or recorded singing to achieve a different inflection and/or timbre Commercially available units include –Intonation corrector –Pitch/formant processor –Harmonizer –Vocoder
Objectives Prototype a system for vocal modification Modify a source vocal sample to match the time evolution, pitch contour and amplitude envelope of a similarly sung, target vocal sample Simulates a transfer of singing techniques from a target vocalist to a source vocalist – thus a hybridizing vocal performance
Order of Presentation System Overview Individual components System evaluation System limitations Conclusions and recommendations
System Overview Three components –Pitch-marking –Time-alignment –Time/pitch/amplitude modification engine Inspired by Verhelst’s prototype system for the post- synchronization of speech utterances
Targeted System Specifications Vocal performanceCommercial singing Vocal pitch range Hz Detection accuracy/resolution10 cents Detection dynamic range40dB Sampling rate44.1kHz and 48kHz Time-scale modification±20% Pitch-scale modification±600 cents
Component No.1 Pitch-marking
Pitch-marking and Glottal Closure Instants (GCIs) Information generated from pitch-marking –Pitch period –Amplitude envelope –Voiced/unvoiced segment boundaries Pitch-marks 5ms PP’P’
Pitch-marking applying Dyadic Wavelet Transform (DyWT) Kadambe adapted Mallat’s algorithm for edge detection in image signal to the detection of GCIs in speech signal He assumed the correlation between edges in image signal and GCIs in speech signal DyWT computation for dyadic scales 2^3 to 2^5 was sufficient for pitch-marking If a particular peak detected in DyWT matches for two consecutive scales, starting from a lower scale, that time-instant is taken as a GCI
MallatKadambe Original Signal 2^1 2^22^3 2^42^5 Base-band
The proposed pitch-marking scheme Detection principle –Detection of the scale that contains the fundamental period –Starting from a higher scale (of lower frequency), there is a considerable jump in frame power when this scale is encountered Features –4X decimation to support high sampling rates –Frame based processing and error correction for possible quasi-real-time detection
The proposed pitch-marking system
Comparisons of results with Auto-Tune Proposed systemAuto-Tune
Component No.2 The Modification Engine
(n): time-modification factor (n): pitch-modification factor (n): amplitude modification factor D(n): time-warping function (n) (n) (n) D(n) Time/pitch/amplitude modification engine
TD-PSOLA (Time-domain Pitch Synchronous Overlap-Add) Time-domain splicing overlap-add method Used in prosodic modification of speech
Evaluation of the modification engine Original TD-PSOLA Auto-Tune
Component No.3 Time-alignment
Time-alignment Based on Verhelst’s prototye system that applies Dynamic Time Warping (DTW) He claimed that the basic local constrain produces the most accurate time- warping path Exponential increase in computation as length of comparison increases Accuracy deteriorates as length of comparison increases
Adaptations from Verhelst’s method Proposed to perform time-alignment on a voiced/unvoiced segmental basis –DTW for voiced segments –Linear Time Warping (LTW) for unvoiced segments Global constraints are introduced to further reduce computations Synchronization of voiced/unvoiced segments are required, which is manually edited in current implementation
Manipulation of modification parameters Simple smoothing of (n), (n) using linear phase FIR low-pass filters are performed before feeding them to the modification engine
The Prototype System
System Evaluation: case 1
System Evaluation: case 2
System Limitations Segmentation –Lack of a reliable technique for voiced/unvoiced segmentation –Segmentation and classification of different vocal sounds is the key to devise rules for modification Modification engine –Lack capabilities to handle pitch transition, total dependence to the pitch-marking stage
System Limitations Pitch-marking –Proposed system lacks robustness –Despite desirable time-response of the wavelet filter bank, its frequency response is not capable of isolating harmonics effectively and efficiently Time-alignment –The DTW basic local constraint allows infinite time expansion and compression. –This factor often causes distortions in the synthesized vocal sample
Conclusions and Recommendations Current systems works well for slow and continuous singing Further improvements on the individual components are recommended to handle greater dynamic changes of the vocal signal, thereby extending the current good results to a wider range of singing styles
Questions & Answers
Wavelet filter bank
Dyadic Spline Wavelet
Wide-band analysis
DTW local constraints
Calculation of pitch-marks
DyWT