Download presentation
Presentation is loading. Please wait.
Published byEdwin Caldwell Modified over 9 years ago
1
Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4 PSOLA( Pitch Synchronous Overlap- Add) Algorithm for Synthesis 16.5 Synthesis based on addition of sin functions
2
16.1 Synthesis based on LPC (1) x(n) = Σ a i x(n-i), i=1~p For every frame of the original speech the p a i are extracted by LPC algorithm and stored in memory with the first p signals. When synthesis is required, the later signals could be generated by above formula.
3
16.2 Synthesis based on formants (1) The transfer characteristics of formant filter y(n) = ax(n)-by(n-1)-cy(n-2) where a=1+b+c, b=-2exp(-πBT s )cos(2πFT s ) c= exp(-2πBT s ) B is bandwidth, F is resonance frequency of filter, T s is sample frequency
4
Synthesis based on formants (2) In the range of formants deploys a couple of filters with F 1,F 2,F 3 … as the resonance frequency the whole system will close to the transfer characteristics of the vocal tract Cascade (series) or Parallel Connection of formant filters)
5
Synthesis based on homomorphic processing (1) After homomorphic processing x(n) = e(n) + v(n) For voice the e(n) is a periodic sequence. Suppose the period is N, e(n) =Σδ(n-rN), r=0~R e(n) only nonzero on mN. It is easy to separate the e(n) and restored e(n)
6
16.4 PSOLA( Pitch Synchronous Overlap-Add) Algorithm for Synthesis (1) This algorithm was proposed by F. Charpentier and E.Moulines in the end of 1980’s. The advantage is relative lower computing complexity, the clarity and naturiness are both better. In particular, the TD-PSOLA(time-domain PSOLA) can meet the real time requirement. The principle of PSOLA
7
PSOLA Algorithm for Synthesis (2) The algorithm is originated from the addition of the reconstructed short-time Fourier transform signals : The short-time Fourier transform of x[n] is : X n (e jω )=Σx(m)w(n-m)e -jωm, for-∞<m<∞ For any n it corresponds a continuous frequency spectrum function. There exists redundancy. So we can just take a sample every R samples: Y r (e jω ) = X n (e jω )| n=rR, It’s reverse transform is
8
PSOLA Algorithm for Synthesis (3) y r (m)=∫ -∞ ∞ Y r (e jω )e jωm dω/(2π) Added the y r (m)’s we get y(m)=Σy r (m)=Σx(m)w(m-rR) = x(m)Σw(rR-m), for -∞<r<∞ It is possible to prove that when R<=N/4 Σw(m-rR)≈W(e j0 )/R, so y(n)≈x(n) W(e j0 )/R
9
PSOLA Algorithm for Synthesis (4) So the difference between y(n) and x(n) is only a constant factor! If Hanning window is used, an exact relation could be derived that Σw(rN/2-m)≡1, for -∞<r<∞, for any m If x(n) is a voiced with period N p, then we can use Hanning window to intercept a signal with double periods 2N p and added by N p delay. Under idea periodic condition, it is possible to restore the original signal x(n)= Σw(rN p -n)x(n)
10
PSOLA Algorithm for Synthesis (5) In practice, there is no idea periodic condition and the reconstruction condition is not completely satisfied, and we need to change the pitch, duration and intensity so don’t want to reconstruct the original signal. By using PSOLA, we can make the mean square of spectrum minimal
11
PSOLA Algorithm for Synthesis (6) D[x(n),y(n)]=∫ -π π |X tm (e jω )-Y tg (e jω )| 2 dω Where t m and t g are pitch mark point of x(n) and y(n) respectively The procedures for PSOLA 1. Pitch synchronous analysis : to mark the pitch as accurate as possible; 2. Change time scale : for given pitch adjust parameterβand time adjust parameterγ, determine the relation between the original pitch mark sequence and the synthesized pitch mark sequence;
12
PSOLA Algorithm for Synthesis (7) 3. Change the analyzed short-time signal and create synthesized signal(TD-PSOLA only make delay and adjust the signal on frequency domain) Pitch synchronous overlay processing and create last version of the synthesized speech signal.
13
PSOLA Algorithm for Synthesis (8) Pitch Synchronous Analysis : for unvoiced speech we set the period according to fixed period; for voiced segments, the pitch marks being set correctly. So a series of pitch mark points {t m, m=1,2,…M} Times the x(n) with the series of window functions will get a series of short-time signal x m (n) :
14
PSOLA Algorithm for Synthesis (9) x m (n) = w m (t m -n)x(n) These x m (n) are intermediate representation of the waves. W is Hanning window. Window length is larger than a pitch period. Window center is located at the pitch mark. There are partly overlap between the adjucent frames.
15
PSOLA Algorithm for Synthesis (10) Time scale changing In order to perform prosodic modification, must determine the new pitch mark position on the synthesis axis t q (q=1~Q) and the mapping t m -> t q. Duration adjustment functionγ(n) and pitch adjustment function β(n) are two important parameters for determining new mark and mapping relation. They will change at same time. The change of pitch leads the increase of pitch period, so duration should make some change to adjust to the original duration. It is also could be done in one step.
16
PSOLA Algorithm for Synthesis (11) x m (n) is changed into x q (n) by modification. Then x q (n) will be synthesized according to new marks. It contains three steps: changing the numbers of short-time signal waves, changing the delay of short-time signals, changing every short-time signal itself. For TD_PSOLA, the synthesized signal is only the copy of analyzed signal. First select the number of analyzed signals; delay δ q =t q -t m, x q (n)=x m (n+δ q )=x m (n-t m +t q ) For FD-PSOLA, besides above processing, x m (n+δ q ) must be transformed on frequency domain.
17
PSOLA Algorithm for Synthesis (12) The overlap-add There are a couple of ways to add. x(n)= Σα q x q (n)w q (t q -n)/ Σw q 2 (t q -n)for q Where α q are normalized factors; w is the sequence of synthesized window. Another simple way is : x(n)= Σα q x q (n)/ Σw q (t q -n)for q
18
16.5 Synthesis based on Sin Models (1) This technique starts from the frequency spectrum decomposition of speech signal. By the decomposition, a series of frequencies, amplitudes and phases are obtained. By matching the frequency parameters and adjusting amplitude and phases, the re-addition of sin waves could synthesize new speech signal. Sin Model of Speech for Synthesis by Analysis The generation of speech could be seen as the result of a glottal excitation through a linear time-variant system. S(t) =∫ 0 t h(t-τ,t)e(τ)dτ
19
Synthesis based on Sin Models (2) e(t) =Σa l (t)cos[Ω l (t)], l=1~L Ω l (t)= ∫ 0 t ω l (σ)dσ+φ l s(t) = ΣA l (t)cos[θ l (t)], l=1~L The transfer function H(ω,t) of vocal track is the Fourier transform of h(t-τ,t), H(ω,t) =M(ω,t)exp(jψ(ω,t)) A l (t)= a l (t) M l (t), θ l (t)= Ω l (t)+ψ l (t) Speech Synthesis by Analysis Based on Sin Models 1. The estimate of frequency, amplitude and phase parameters
20
Synthesis based on Sin Models (3) The conclusion is the frequencies of synthesized speech signal correspond the the frequencies at the peaks of the short-time Fourier transform(DFT) of that frame. The amplitudes and phases are that at these frequencies. In practice, we estimate frequency, amplitude and phase parameters by peak extraction. By windowing a series of short-time speech signal. For the performance, the window length should be larger than two current pitch periods. The window used is Hamming window with length 256 and 0%-50% overlay.
21
Synthesis based on Sin Models (4) After 512 points of FFT, the spectrum is obtained. By peak Extraction, the frequencies ω l amplitudes A l and phasesθ l are obtained. l=1~L, L generally is 40-50. Frequency Matching Adjecent frame needs to do the frequency matching to facilitate the explonation. After matching, the frequency matching locus is obtained. Explonation of amplitude and phase for two frames Experiment Results.
22
Synthesis based on Sin Models (1)
24
Synthesis based on Sin Models (2)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.