Download presentation
Presentation is loading. Please wait.
Published byJodie Poole Modified over 9 years ago
1
A System for Hybridizing Vocal Performance By Kim Hang Lau
2
Parameters of the singing voice Parameters of the singing voice can be loosely classified as: –Timbre –Pitch contour –Time contour (rhythm) –Amplitude envelope (projections)
3
Vocal Modification Vocal modification refers to the signal processing of live or recorded singing to achieve a different inflection and/or timbre Commercially available units include –Intonation corrector –Pitch/formant processor –Harmonizer –Vocoder
4
Objectives Prototype a system for vocal modification Modify a source vocal sample to match the time evolution, pitch contour and amplitude envelope of a similarly sung, target vocal sample Simulates a transfer of singing techniques from a target vocalist to a source vocalist – thus a hybridizing vocal performance
5
Order of Presentation System Overview Individual components System evaluation System limitations Conclusions and recommendations
6
System Overview Three components –Pitch-marking –Time-alignment –Time/pitch/amplitude modification engine Inspired by Verhelst’s prototype system for the post- synchronization of speech utterances
7
Targeted System Specifications Vocal performanceCommercial singing Vocal pitch range60-1200 Hz Detection accuracy/resolution10 cents Detection dynamic range40dB Sampling rate44.1kHz and 48kHz Time-scale modification±20% Pitch-scale modification±600 cents
8
Component No.1 Pitch-marking
9
Pitch-marking and Glottal Closure Instants (GCIs) Information generated from pitch-marking –Pitch period –Amplitude envelope –Voiced/unvoiced segment boundaries Pitch-marks 5ms PP’P’
10
Pitch-marking applying Dyadic Wavelet Transform (DyWT) Kadambe adapted Mallat’s algorithm for edge detection in image signal to the detection of GCIs in speech signal He assumed the correlation between edges in image signal and GCIs in speech signal DyWT computation for dyadic scales 2^3 to 2^5 was sufficient for pitch-marking If a particular peak detected in DyWT matches for two consecutive scales, starting from a lower scale, that time-instant is taken as a GCI
11
MallatKadambe Original Signal 2^1 2^22^3 2^42^5 Base-band
12
The proposed pitch-marking scheme Detection principle –Detection of the scale that contains the fundamental period –Starting from a higher scale (of lower frequency), there is a considerable jump in frame power when this scale is encountered Features –4X decimation to support high sampling rates –Frame based processing and error correction for possible quasi-real-time detection
13
The proposed pitch-marking system
14
Comparisons of results with Auto-Tune Proposed systemAuto-Tune
15
Component No.2 The Modification Engine
16
(n): time-modification factor (n): pitch-modification factor (n): amplitude modification factor D(n): time-warping function (n) (n) (n) D(n) Time/pitch/amplitude modification engine
17
TD-PSOLA (Time-domain Pitch Synchronous Overlap-Add) Time-domain splicing overlap-add method Used in prosodic modification of speech
18
Evaluation of the modification engine Original TD-PSOLA Auto-Tune
19
Component No.3 Time-alignment
20
Time-alignment Based on Verhelst’s prototye system that applies Dynamic Time Warping (DTW) He claimed that the basic local constrain produces the most accurate time- warping path Exponential increase in computation as length of comparison increases Accuracy deteriorates as length of comparison increases
21
Adaptations from Verhelst’s method Proposed to perform time-alignment on a voiced/unvoiced segmental basis –DTW for voiced segments –Linear Time Warping (LTW) for unvoiced segments Global constraints are introduced to further reduce computations Synchronization of voiced/unvoiced segments are required, which is manually edited in current implementation
22
Manipulation of modification parameters Simple smoothing of (n), (n) using linear phase FIR low-pass filters are performed before feeding them to the modification engine
23
The Prototype System
24
System Evaluation: case 1
25
System Evaluation: case 2
26
System Limitations Segmentation –Lack of a reliable technique for voiced/unvoiced segmentation –Segmentation and classification of different vocal sounds is the key to devise rules for modification Modification engine –Lack capabilities to handle pitch transition, total dependence to the pitch-marking stage
27
System Limitations Pitch-marking –Proposed system lacks robustness –Despite desirable time-response of the wavelet filter bank, its frequency response is not capable of isolating harmonics effectively and efficiently Time-alignment –The DTW basic local constraint allows infinite time expansion and compression. –This factor often causes distortions in the synthesized vocal sample
28
Conclusions and Recommendations Current systems works well for slow and continuous singing Further improvements on the individual components are recommended to handle greater dynamic changes of the vocal signal, thereby extending the current good results to a wider range of singing styles
29
Questions & Answers
30
Wavelet filter bank
31
Dyadic Spline Wavelet
32
Wide-band analysis
33
DTW local constraints
34
Calculation of pitch-marks
35
DyWT
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.