A System for Hybridizing Vocal Performance By Kim Hang Lau.

A System for Hybridizing Vocal Performance By Kim Hang Lau

Parameters of the singing voice  Parameters of the singing voice can be loosely classified as: –Timbre –Pitch contour –Time contour (rhythm) –Amplitude envelope (projections)

Vocal Modification  Vocal modification refers to the signal processing of live or recorded singing to achieve a different inflection and/or timbre  Commercially available units include –Intonation corrector –Pitch/formant processor –Harmonizer –Vocoder

Objectives  Prototype a system for vocal modification  Modify a source vocal sample to match the time evolution, pitch contour and amplitude envelope of a similarly sung, target vocal sample  Simulates a transfer of singing techniques from a target vocalist to a source vocalist – thus a hybridizing vocal performance

Order of Presentation  System Overview  Individual components  System evaluation  System limitations  Conclusions and recommendations

System Overview  Three components –Pitch-marking –Time-alignment –Time/pitch/amplitude modification engine  Inspired by Verhelst’s prototype system for the post- synchronization of speech utterances

Targeted System Specifications Vocal performanceCommercial singing Vocal pitch range60-1200 Hz Detection accuracy/resolution10 cents Detection dynamic range40dB Sampling rate44.1kHz and 48kHz Time-scale modification±20% Pitch-scale modification±600 cents

Component No.1 Pitch-marking

Pitch-marking and Glottal Closure Instants (GCIs)  Information generated from pitch-marking –Pitch period –Amplitude envelope –Voiced/unvoiced segment boundaries Pitch-marks 5ms PP’P’

Pitch-marking applying Dyadic Wavelet Transform (DyWT)  Kadambe adapted Mallat’s algorithm for edge detection in image signal to the detection of GCIs in speech signal  He assumed the correlation between edges in image signal and GCIs in speech signal  DyWT computation for dyadic scales 2^3 to 2^5 was sufficient for pitch-marking  If a particular peak detected in DyWT matches for two consecutive scales, starting from a lower scale, that time-instant is taken as a GCI

MallatKadambe Original Signal 2^1 2^22^3 2^42^5 Base-band

The proposed pitch-marking scheme  Detection principle –Detection of the scale that contains the fundamental period –Starting from a higher scale (of lower frequency), there is a considerable jump in frame power when this scale is encountered  Features –4X decimation to support high sampling rates –Frame based processing and error correction for possible quasi-real-time detection

The proposed pitch-marking system

Comparisons of results with Auto-Tune Proposed systemAuto-Tune

Component No.2 The Modification Engine

 (n): time-modification factor  (n): pitch-modification factor  (n): amplitude modification factor D(n): time-warping function  (n)  (n)  (n) D(n) Time/pitch/amplitude modification engine

TD-PSOLA (Time-domain Pitch Synchronous Overlap-Add)  Time-domain splicing overlap-add method  Used in prosodic modification of speech

Evaluation of the modification engine Original TD-PSOLA Auto-Tune

Component No.3 Time-alignment

Time-alignment  Based on Verhelst’s prototye system that applies Dynamic Time Warping (DTW)  He claimed that the basic local constrain produces the most accurate time- warping path  Exponential increase in computation as length of comparison increases  Accuracy deteriorates as length of comparison increases

Adaptations from Verhelst’s method  Proposed to perform time-alignment on a voiced/unvoiced segmental basis –DTW for voiced segments –Linear Time Warping (LTW) for unvoiced segments  Global constraints are introduced to further reduce computations  Synchronization of voiced/unvoiced segments are required, which is manually edited in current implementation

Manipulation of modification parameters  Simple smoothing of  (n),  (n) using linear phase FIR low-pass filters are performed before feeding them to the modification engine

The Prototype System

System Evaluation: case 1

System Evaluation: case 2

System Limitations  Segmentation –Lack of a reliable technique for voiced/unvoiced segmentation –Segmentation and classification of different vocal sounds is the key to devise rules for modification  Modification engine –Lack capabilities to handle pitch transition, total dependence to the pitch-marking stage

System Limitations  Pitch-marking –Proposed system lacks robustness –Despite desirable time-response of the wavelet filter bank, its frequency response is not capable of isolating harmonics effectively and efficiently  Time-alignment –The DTW basic local constraint allows infinite time expansion and compression. –This factor often causes distortions in the synthesized vocal sample

Conclusions and Recommendations  Current systems works well for slow and continuous singing  Further improvements on the individual components are recommended to handle greater dynamic changes of the vocal signal, thereby extending the current good results to a wider range of singing styles

Questions & Answers

Wavelet filter bank

Dyadic Spline Wavelet

Wide-band analysis

DTW local constraints

Calculation of pitch-marks

A System for Hybridizing Vocal Performance By Kim Hang Lau.

Similar presentations

Presentation on theme: "A System for Hybridizing Vocal Performance By Kim Hang Lau."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A System for Hybridizing Vocal Performance By Kim Hang Lau.

Similar presentations

Presentation on theme: "A System for Hybridizing Vocal Performance By Kim Hang Lau."— Presentation transcript:

Similar presentations

About project

Feedback