Speech Processing Homomorphic Signal Processing. 5 February 2016Veton Këpuska2 Outline  Principles of Homomorphic Signal Processing  Details of Homomorphic.

Slides:

Advertisements

Similar presentations

ECE 8443 – Pattern Recognition EE 3512 – Signals: Continuous and Discrete Objectives: Response to a Sinusoidal Input Frequency Analysis of an RC Circuit.

Advertisements

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.

Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.

Filtering Filtering is one of the most widely used complex signal processing operations The system implementing this operation is called a filter A filter.

Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.

Speech and Audio Processing and Recognition

AMI 4622 Digital Signal Processing

Chapter 8: The Discrete Fourier Transform

FFT-based filtering and the Short-Time Fourier Transform (STFT) R.C. Maher ECEN4002/5002 DSP Laboratory Spring 2003.

Lecture #7 FREQUENCY RESPONSE OF LSI SYSTEMS Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, Pennsylvania.

Image (and Video) Coding and Processing Lecture 2: Basic Filtering Wade Trappe.

Spectral Analysis Spectral analysis is concerned with the determination of the energy or power spectrum of a continuous-time signal It is assumed that.

AGC DSP AGC DSP Professor A G Constantinides 1 Digital Filter Specifications Only the magnitude approximation problem Four basic types of ideal filters.

Relationship between Magnitude and Phase (cf. Oppenheim, 1999)

Digital Signals and Systems

Chapter 4: Sampling of Continuous-Time Signals

EE513 Audio Signals and Systems Digital Signal Processing (Systems) Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

Discrete-Time and System (A Review)

DTFT And Fourier Transform

1 Chapter 8 The Discrete Fourier Transform 2 Introduction  In Chapters 2 and 3 we discussed the representation of sequences and LTI systems in terms.

UNIT - 4 ANALYSIS OF DISCRETE TIME SIGNALS. Sampling Frequency Harry Nyquist, working at Bell Labs developed what has become known as the Nyquist Sampling.

Chapter 2 Discrete-Time Signals and Systems

Lecture 1 Signals in the Time and Frequency Domains

CE Digital Signal Processing Fall 1992 Z Transform

1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.

Digital Systems: Hardware Organization and Design

Fourier Series Summary (From Salivahanan et al, 2002)

Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.

Module 2 SPECTRAL ANALYSIS OF COMMUNICATION SIGNAL.

By Sarita Jondhale1 Signal Processing And Analysis Methods For Speech Recognition.

1 Linear Prediction. 2 Linear Prediction (Introduction) : The object of linear prediction is to estimate the output sequence from a linear combination.

1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.

1 1 Chapter 3 The z-Transform 2 2  Consider a sequence x[n] = u[n]. Its Fourier transform does not converge.  Consider that, instead of e j , we use.

1 Z-Transform. CHAPTER 5 School of Electrical System Engineering, UniMAP School of Electrical System Engineering, UniMAP NORSHAFINASH BT SAUDIN

Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.

Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.

Zhongguo Liu_Biomedical Engineering_Shandong Univ. Chapter 8 The Discrete Fourier Transform Zhongguo Liu Biomedical Engineering School of Control.

Department of Computer Eng. Sharif University of Technology Discrete-time signal processing Chapter 3: THE Z-TRANSFORM Content and Figures are from Discrete-Time.

1 Lecture 1: February 20, 2007 Topic: 1. Discrete-Time Signals and Systems.

Z TRANSFORM AND DFT Z Transform

ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska

Chapter 5 Finite-Length Discrete Transform

Digital Signal Processing

EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,

Signal and Systems Prof. H. Sameti Chapter 10: Introduction to the z-Transform Properties of the ROC of the z-Transform Inverse z-Transform Examples Properties.

More On Linear Predictive Analysis

Transform Analysis of LTI Systems Quote of the Day Any sufficiently advanced technology is indistinguishable from magic. Arthur C. Clarke Content and Figures.

Z Transform The z-transform of a digital signal x[n] is defined as:

By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.

Topics 1 Specific topics to be covered are: Discrete-time signals Z-transforms Sampling and reconstruction Aliasing and anti-aliasing filters Sampled-data.

Copyright ©2010, ©1999, ©1989 by Pearson Education, Inc. All rights reserved. Discrete-Time Signal Processing, Third Edition Alan V. Oppenheim Ronald W.

EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,

Chapter 5. Transform Analysis of LTI Systems Section

Learning from the Past, Looking to the Future James R. (Jim) Beaty, PhD - NASA Langley Research Center Vehicle Analysis Branch, Systems Analysis & Concepts.

1 Chapter 8 The Discrete Fourier Transform (cont.)

Professor A G Constantinides 1 Digital Filter Specifications We discuss in this course only the magnitude approximation problem There are four basic types.

Spectral Analysis Spectral analysis is concerned with the determination of the energy or power spectrum of a continuous-time signal It is assumed that.

FFT-based filtering and the

Sampling rate conversion by a rational factor

Linear Prediction.

Digital Systems: Hardware Organization and Design

Linear Predictive Coding Methods

Chapter 8 The Discrete Fourier Transform

Digital Systems: Hardware Organization and Design

Linear Prediction.

Chapter 8 The Discrete Fourier Transform

Tania Stathaki 811b LTI Discrete-Time Systems in Transform Domain Ideal Filters Zero Phase Transfer Functions Linear Phase Transfer.

Chapter 9 Advanced Topics in DSP

Chapter 8 The Discrete Fourier Transform

Speech Processing Final Project

Presentation transcript:

Speech Processing Homomorphic Signal Processing

5 February 2016Veton Këpuska2 Outline  Principles of Homomorphic Signal Processing  Details of Homomorphic Processing  Variants of Homomorphic Processing  Investigation of Homomorphic systems to speech analysis and synthesis

5 February 2016Veton Këpuska3 Principles of Homomorphic Processing  Superposition Property of Linear Systems: L x 1 [n] x 2 [n] x[n] L(x[n]) L x 1 [n] x 2 [n] a 1 L(x 1 [n]) L(x[n]) L a 2 L(x 2 [n]) a1a1 a2a2 a2a2 a1a1

5 February 2016Veton Këpuska4 Principles of Homomorphic Processing  Example 6.1: If signals fall in non-overlapping frequency bands then they are separable. x[n]=x 1 [n]+x 2 [n] X 1 ()= ℱ {x 1 [n]} & X 1 () [0,  /2], X 2 ()= ℱ {x 2 [n]} & X 2 () [  /2,  ], y[n] = h[n] ＊ (x 1 [n]+x 2 [n]) = h[n] ＊ x 1 [n] + h[n] ＊ x 2 [n] y[n] = h[n] ＊ x 2 [n] = x 2 [n] 0 for  ∈ [0,/2] 1 for  ∈ [/2, ]

5 February 2016Veton Këpuska5  Generalized Superposition Concept that would support separation of nonlinearly combined signals. Leads to the notion of Generalized Linear Filtering. Properties:  H(x 1 [n] □ x 2 [n])=H(x 1 [n]) ○ H(x 2 [n])  H(c:x [n])=c ◈ H( x [n]) Systems that satisfy those two properties are referred to as homomorphic systems and are said to satisfy a generalized principle of superposition. Principles of Homomorphic Processing H() x[n] □ Input rule : y[n] ○ Output rule ◈

5 February 2016Veton Këpuska6 Principles of Homomorphic Processing  Importance of homomorphic systems for speech processing lies in their capability of transforming nonlinearly combined signals to additively combined signals so that linear filtering can be performed on them.  Homomorphic systems can be expressed as a cascade of three homomorphic sub-systems depicted in the figure below – referred to as the canonic representation: H D□D□ x[n] □ : +. y[n] L D○D○ ○ +. ◈ III III

5 February 2016Veton Këpuska7 Canonic Representation of a Homomorphic System i.The Characteristic System: Transforms □ into add “+” ii.The linear system: transforms “add” into “add” iii.The inverse system: transforms add into ○ D□D□ x[n] □ : +. I L II y[n] D○D○ ○ +. ◈ III

5 February 2016Veton Këpuska8 Homomorphic Systems  Let the goal be removal of undesired component of the signal (e.g., noise): Type of combination rule SystemOperation Signal & Additive noise Linear SystemLinear Filtering Signal & Multiplicative noise Multiplicative System Multiplicative Filtering Signal & Convolutional Noise Convolutional System Convolutional Filtering

5 February 2016Veton Këpuska9 Multiplicative Homomorphic Systems  Consider Homomorphic Multiplicative System depicted below:  Use D □ to convert MULT into ADD.  Use D ○ to convert ADD into MULT.  Which rule (operation) transforms MULT into ADD? M[] x[n] ●● y[n] D●D● x[n] ● + y[n] L + + D●D● ● + III III

5 February 2016Veton Këpuska10 Multiplicative Homomorphic Systems  If x[n]=x 1 [n] ● x 2 [n], and x 1 [n]>0 & x 2 [n]>0 for all n  Then log(x 1 [n] ● x 2 [n])=log(x 1 [n])+log(x 2 [n])  However, x[n] may not be always positive.  Generalization to complex signals: x[n]=|x[n]|e jarg(x[n]) which requires definition of complex log operator.

5 February 2016Veton Këpuska11 Multiplicative Homomorphic Systems  An implementation of multiplicative Homomorphic System:  Definition: Complex log: Complex exp. (Inverse operation) Complex log x[n] ● + y[n] Linear System + + Complex Exp. ● + III III

5 February 2016Veton Këpuska12 Homomorphic Systems for Convolution  Consider Homomorphic System for Convolution depicted below:  Use D □ to convert “ ＊ ” into ADD.  Use D ○ to convert ADD into “ ＊ ”.  How to transform “ ＊ ” into ADD? C[] x[n] ＊＊ y[n] D＊D＊＊ + L + + D＊D＊＊ + III III x[n] C

5 February 2016Veton Këpuska13 Homomorphic Systems for Convolution  Let x[n]=x 1 [n]*x 2 [n]  Inverse Operation I. З [] ＊ ● log[] ● + З -1 [] + + x[n] D＊D＊ time“time” III. З [] + + exp[] + ● З -1 [] * ● D＊D＊ “time” y[n]

5 February 2016Veton Këpuska14 Homomorphic Systems for Convolution  For x[n]=x 1 [n]*x 2 [n]: 1.X(z)=X 1 (z)X 2 (z) 2.Log(X(z))=Log(X 1 (z)X 2 (z))= Log(X 1 (z))+Log(X 2 (z)) Complex logarithm. This operation requires special handling because: X(z) > 0 For complex X(z) phase is not uniquely defined (i.e., multiple of 2) X(z) has to be defined on unit circle (e.g., Z transform of a stable sequence).  In practice operate on unit circle z=e j. Fourier Transform:

5 February 2016Veton Këpuska15 Homomorphic Systems for Convolution  Two cases are possible in computing : 1.Complex Cepstrum (CC): 2.Real Cepstrum (RC):

5 February 2016Veton Këpuska16 Homomorphic Systems for Convolution  Example 6.3 Consider a sequence x[n] consisting of a system impulse response h[n] convolved with an impulse train p[n]:  Goal is to estimate h[n]. First form canonical representation for convolution: If D * is such that p[n] remains train of pulses, and h[n] falls between impulses then separation is possible. h[] p[n] x[n] x[n]=h[n]*p[n] ^ ^

5 February 2016Veton Këpuska17 Example 6.3 (cont.)  Let L denote such operation (i.e., rectangular window that would separate p[n] from h[n]). ^ ^ 0

5 February 2016Veton Këpuska18 Example 6.4  a,b real and positive: ⇒ log(ab) = log(a)+log(b)  a,b real but b<0 ⇒ log(ab) = log(a|b|e jk  )=log(a)+log(|b|)+jk , k=1,3,5,… log(ab) is ambiguous.  This example indicates that special consideration must be made in defining the logarithm operator for complex X(z) in order to make the logarithm of the product the sum of logarithms.

5 February 2016Veton Këpuska19 Homomorphic Systems for Convolution-Complex Logarithm  Suppose that X(z) is evaluated on the unit circle (z=e j  )  Let x[n]=x 1 [n]*x 2 [n] ⇒ X(  )=X 1 (  ) X 2 (  )  Consider then complex log of X(  ):  Considering that X(  )=X 1 (  ) X 2 (  ) then:

5 February 2016Veton Këpuska20 Homomorphic Systems for Convolution-Complex Logarithm  In the previous expression the following was assumed:  Also:  Expression generally does not hold due to the ambiguity in the definition of phase:

5 February 2016Veton Këpuska21 Homomorphic Systems for Convolution-Complex Logarithm  Note that: PV denotes principal value of the phase which falls in the interval [-,]. Arbitrary multiple of 2 can be added to the principal phase value Thus additive property generally does not hold.  How to impose uniqueness? 1.Force continuity of phase:  Select k such that ∠X(  )=PV[∠X(  )]+ 2k is a continuous function. Figure 6.5 (next slide). 2.Phase derivative approach: It can be shown that:

5 February 2016Veton Këpuska22 Fourier Transform Phase Continuity

5 February 2016Veton Këpuska23 Homomorphic Systems for Convolution  Relationship of complex cepstrum to real cepstrum c[n]: If x[n] real then:  |X()| is real and even and thus log[ |X()|] is real and even  ∠X(  ) is odd, and hence  is referred to as the complex cepstrum.  Even component of the complex cepstrum, c[n] is referred to as the real cepstrum.

5 February 2016Veton Këpuska24 Complex Cepstrum of Speech-Like Sequences  Sequences with Rational z-Transform:  General form the class of sequences is given below:  M i, N i – are zeros and poles inside the unit circle.  M o, N o – are zeros and poles outside the unit circle.  |a k |, |b k |, |c k |, |d k | are all < 1 ⇒  Thus there are no singularities on the unit circle.  A > 0.

5 February 2016Veton Këpuska25 Complex Cepstrum of Speech-Like Sequences  Applying complex logarithm gives:  is a z-transform of sequence  Want inverse z-transform to be absolutely summable ⇒ ROC of must include unit circle, |z|=1.  This condition is equivalent to having all constituent elements of have ROC’s that include unit circle, |z|=1

5 February 2016Veton Këpuska26 Complex Cepstrum of Speech-Like Sequences  In order to obtain ROC for expressions of the form: log(1-z -1 ) log(1-z) they are expressed in a power series expansion:  1 Im Re Z-plane ROC for log(1-z-1) 1/ 1 Im Re Z-plane ROC for log(1-  z)

5 February 2016Veton Këpuska27 Complex Cepstrum of Speech-Like Sequences  The ROC of is therefore given by an annulus defined by the poles & zeros of X(z) closes to the unit circle: 1 Im Re Z-plane ROC for typical rational X(z)

5 February 2016Veton Këpuska28 Complex Cepstrum of Speech-Like Sequences  Complex cepstrum associated with rational X(z) can be therefore expressed as:

5 February 2016Veton Këpuska29 Example 6.5  Let: where a, b, c, are real and <1.  The ROC of X(z) includes unit circle so that x[n] is stable.  A delay z -r corresponds to a shift in the sequence.  Thus complex cepstrum is given by:

5 February 2016Veton Këpuska30 Example 6.5 (cont.)  The inverse z-transform of the shift term is given by:  Contribution of z -r term is significant.  On the unit circle: z -r =e -jr =1 ∠-  r contributes a linear ramp to the phase and thus for a large shift r, dominates the phase representation and gives a large discontinuity at  and - .

5 February 2016Veton Këpuska31 Complex Cepstrum of Speech-Like Sequences  Relation of complex cepstrum and real cepstrum for x[n] with rational z-transform that is minimum phase:  Complex cepstrum of a minimum-phase sequence with a rational z-transform is right-sided:

5 February 2016Veton Këpuska32 Impulse Train Convolved with Rational z-Transform Sequences  Second class of sequences of interest in the speech context is the train of uniformly-spaced unit samples with varying weights and its interaction with the system: h[n] p[n] x[n] x[n]=h[n]*p[n] Z

5 February 2016Veton Këpuska33 Impulse Trans Convolved with Rational z-Transform Sequences  If p[n] is minimum phase and |a r (z N ) -1 |<1, zeros are inside the unit circle, log[P(z)] can be expressed as:  Thus is an infinite right-sided sequence of impulses spaced N-samples apart.  Note that in general for non-minimum phase sequences the complex cepstrum is two-sided with uniformly spaced impulses.

5 February 2016Veton Këpuska34 Example 6.6  Consider a sequence x[n]=h[n]*p[n] where z- transform of h[n] is given by:  b,b*, and c, c* are complex conjugate pairs.  Consider p[n] to be train of periodic pulses then: 1 Im Re Z-plane a b b*b* a*a* h[n] p[n] x[n] x[n]=h[n]*p[n]

5 February 2016Veton Këpuska35 Example 6.6 (cont)  If  ∈  and |  |<1 then p[n] is train of decaying exponentials:  Z-transform of p[n] is given by:  Then, as derived earlier: … 1 p[n] n

5 February 2016Veton Këpuska36 Example 6.6 (cont) h[n] p[n]

5 February 2016Veton Këpuska37 Homomorphic Filtering  In the cepstral domain: Pseudo-time  Quefrency Low Quefrency  Slowly varying components. High Quefrency  Fast varying components.  Removal of unwanted components (i.e., filtering) can be attempted in the cepstral domain (on the signal, in which case filtering is referred to as liftering):  When the complex cestrum of h[n] resides in a quefrency interval less than a pitch period, then the two components can be separated form each other.

5 February 2016Veton Këpuska38 Homomorphic Filtering  If log[X()] Is viewed as a “time signal” Consisting of low-frequency and high-frequency contributions. Separation of this signal with a high-pass/low-pass filter.  One implementation of low pass filter: D＊D＊＊ + y[n] l[n] + + D＊D＊＊ + x[n]=h[n]*p[n]

5 February 2016Veton Këpuska39 Homomorphic Filtering  Alternate view of “liftering” operation: Filtering operation L() applied in the log-spectral domain  Interchange of time and frequency domain by viewing the frequency-domain signal log[X()] as a time signal to be filtered. ⇒ “Cepstrum” can be thought of as spectrum of log[X ()] Time axes of is referred to as “quefrency” Filter l[n] as the “lifter”. F -1 y[n] l[n] F -1 x[n]= h[n]*p[n] F log F exp X() ^ Y() ^ L()

5 February 2016Veton Këpuska40 Homomorphic Filtering  Three elements in the doted lines of previous figure can be replaced by L(), which can be viewed as a smoothing function: y[n] L() F -1 x[n] =h[n]*p[n] F log exp X() ^ Y() ^

5 February 2016Veton Këpuska41 Practical Implementation Issues  Use FFT and IFFT for Fourier Transformations. X() is computed by: log|X()| computed as And for x[n] use ^

5 February 2016Veton Këpuska42 Practical Implementation Issues 1.Cepstrum x[n] is infinitely long thus x N [n] is aliased version of x[n]. That is: Thus it is necessary to use a largest N as possible 2.Phase component j ∠X(k) must be properly unwrapped to ensure phase continuity. Goal to determine r[k] so that ∠X(k) is continuous. ^ ^ ^

5 February 2016Veton Këpuska43 Modulo 2 Phase Unwrapper  Goal is to determine r[k] so that X(k) is continuous 2/N  --  Principal Value PV PV[X()] PV[X(k)] Phase Representation in Discrete Complex Spectrum

5 February 2016Veton Këpuska44 Modulo 2 Phase Unwrapper  Algorithm: If PV[X(k)]-PV[X(k-1)]>2-  r[k]=r[k-1]-1 # Subtract 2 Else if PV[X(k)]-PV[X(k-1)]<2-  r[k]=r[k-1]+1 # Add 2 Else  r[k]=r[k-1] # Do not change End  Note: Even with fine grid of (determined by N) 2/N, it is possible that subsequent PV samples may be more than 2 rad apart (case of poles/zeros close together).

5 February 2016Veton Këpuska45 Phase Derivate-Based Phase Unwrapper  The phase derivative is uniquely defined by:  Then:  However, since only X(k) is available must estimate from discrete values.

5 February 2016Veton Këpuska46 Phase Derivate-Based Phase Unwrapper  Re-state the Problem:  Where q( k ) is an integer-valued function.  Assuming that phase has been correctly unwrapped up-to  k-1 with the value ( k-1 ) then:  An approximation:  Select value of q( k ) such that E[k] is minimized: over q( k ).

5 February 2016Veton Këpuska47 Example

5 February 2016Veton Këpuska48 Short-Time Homomorphic Analysis of Periodic Sequences  Recall Source-System model of speech production:  For voiced speech p[n] is quasi-periodic:  For unvoiced speech p[n] is noise-like.  In practice a periodic waveform is windowed by a finite- length sequence w[n]: s[n]=w[n]x[n]=w[n](p[n]*h[n])  Approximation to s[n]: h[n] p[n] x[n]= h[n]*p[n]

5 February 2016Veton Këpuska49 Short-Time Homomorphic Analysis of Periodic Sequences  If w[n] is smooth relative to h[n], that is, P large enough so that h[n-kP] do not substantially overlap, then:  Then, Cepstrum of s[n] is: where is complex cepstrum of w[n]p[n].  Can show that: D[n] – weighting function depending on w[n]. …………()

5 February 2016Veton Këpuska50 Short-Time Homomorphic Analysis of Periodic Sequences Cepstral Domain (Quefrency) Perspective  Under what conditions can we perform deconvolution?  Cepstral Domain (Quefrency) Perspective Let x[n], a voiced speech signal, produced by an infinite train of periodic impulses: Thus the only samples in X() and log[X()] are defined at multiples of the fundamental frequency  o =2/P, i.e.,  k =(2/P)k X( k ) = P( k ) H( k ) log[X( k )] = log[P( k )] + log[H( k )]

5 February 2016Veton Këpuska51 Cepstral Domain (Quefrency) Perspective  In the cepstral domain, appear as a set of replicas of h[n] appearing at every kP.  Thus, aliasing is an issue and needs to be handled properly. That is, can this aliasing be prevented or at least minimized?  Consider: s[n]=w[n]x[n]=w[n](p[n]*h[n]) ^ F

5 February 2016Veton Këpuska52 Cepstral Domain (Quefrency) Perspective  Let’s rewrite s[n] as: s[n] = (p[n]w[n])*g[n] where g[n] ≈ h[n].  Then:  Taking log of equations under and, and solving for log[G()] the following is obtained: ………(1)

5 February 2016Veton Këpuska53 Cepstral Domain (Quefrency) Perspective  To simplify, assume W() has only one main lobe of rectangular window:  That is: with w o =2/P

5 February 2016Veton Këpuska54 Cepstral Domain (Quefrency) Perspective  Thus second log term becomes zero: 0 ………(2)

5 February 2016Veton Këpuska55 Cepstral Domain (Quefrency) Perspective  From (1) and (2) we can write: where is the complex cepstrum of p[n]w[n], and Quefrency …………()

5 February 2016Veton Këpuska56 Cepstral Domain (Quefrency) Perspective  Last equation () is a special case of Equation () with D[n]=w[n].  As with purely convolutional model: the contributions of the windowed pulse train and impulse response are additively combined so that deconvolution is possible.  Now the impulse response contribution is repeated at the pitch period rate. This aliasing is: Dependent upon pitch, and is different from aliasing due to an Insufficient DFT length (see section 6.4.4).

5 February 2016Veton Këpuska57 Cepstral Domain (Quefrency) Perspective  Conditions under which: s[n]≈(w[n]p[n])*h[n] 1.w[n] – time domain window, should be long enough so that D[n] should be smooth over |n|<P over the extent of h[n]. 2.w[n] – should be short enough to reduce contribution of replicas of h[n]. In practice w[n] is Hamming window of 2-3 pitch periods long. 3.w[n] should be centered at time origin, n=0, aligned with h[n].  Under those conditions for low-time lifter (filter in cepstral domain), l[n] of the length |n|<P/2   That is, complex cepstrum is close to that derived form conventional model.  Note that with high-pitched speakers there is stronger presence of p[n] close to the origin (as noted earlier) as well as more aliasing of replicas of h[n]. ^ ^ ^

5 February 2016Veton Këpuska58 Frequency Domain Perspective  Let x[n] where:  Then: X( k )=P( k ) H( k ) Where X( k ) represents line spectrum at  k =(2/P)k.  Question arises: Under what conditions the window properties would lead: the output to be close to actual: s[n]=w[n]x[n]=w[n](p[n]*h[n])?

5 February 2016Veton Këpuska59 Frequency Domain Perspective  Define an error measure E(  ) that would reflect degradation in the frequency domain:  Want to minimize:  It was found empirically that for Hamming window this spectral distance measure is minimized for window length in the range of roughly 2-3 pitch periods.  An implication of this result is that the length of the analysis window should be adapted to the pitch period to make the windowed waveform as close as possible (in the sense described above) to the desired convolutional model.

5 February 2016Veton Këpuska60 Short-Time Speech Analysis  Complex Cepstrum of Voiced Speech Recall: H(z)=AG(z)V(z)R L (z) The output speech then is: Gain Glottal Model Vocal tract Model Lip Radiation Model

5 February 2016Veton Këpuska61 Complex Cepstrum of Voiced Speech  General form for stable V(z): Zeros inside & outside the unit circle Poles inside the unit circle  Goal is to separate h[n] from p[n]. Let s[n]=w[n](p[n]*h[n]) be approximately equal to

5 February 2016Veton Këpuska62 Complex Cepstrum of Voiced Speech Recall that x[n]≈s[n] if window is 2-3 pitch-periods long and its center aligned with h[n]. Using the DFT of order N the following denotes discrete complex cepstrum: For a typical speaker the duration of the short-time window lies in the range of 20ms-40ms. Assuming that:  Source and systems components lie roughly in separate quefrency regions  Negligible aliasing of the replicas of h[n]  Most of the h[n] occurs within P/2 from origin  Distortion function D[n] is smooth in the same range for |n| P/2. Then, applying a cepstral lifter function: ~ ^ ^

5 February 2016Veton Këpuska63 Complex Cepstrum of Voiced Speech Low-Quefrency lifter: to separate h[n] from p[n]. Similarly high-quefrency lifter can be used to produce the input train pulse (pitch estimation). ^

5 February 2016Veton Këpuska64 Example 6.11  Voiced female speech with pitch period of 5 ms.  Sampling rate f s =10kHz.  Hamming window of 15 ms.  A 1024 point FFT/IFFT is used to obtain discrete complex cepstrum.  Center window on h[n] (more about that latter).

5 February 2016Veton Këpuska65 Example 6.11

5 February 2016Veton Këpuska66 Example 6.11 Maximum Phase Minimum Phase Maximum Phase Minimum Phase

5 February 2016Veton Këpuska67 Complex Cepstrum of Unvoiced Speech  Recall the transfer function model for the unvoiced speech: H(z) = AV(z)R(z)  In contrast to the voiced case, there is no glottal volume velocity contribution.  Resulting speech waveform in time domain: x[n]=u[n]*h[n]=u[n]*v[n]*r[n]  Resulting signal after applying short time analysis window: s[n]=w[n](u[n]*h[n]) White noise

5 February 2016Veton Këpuska68 Complex Cepstrum of Unvoiced Speech  Similarly to the arguments applied for voiced speech: Duration of the analysis window w[n] is selected so that the formant of the unvoiced speech power spectral density are not significantly broadened w[n] is sufficiently smooth so as to be as nearly constant over h[n] the following can be assumed: s[n]≈(w[n]u[n])*h[n] Defining the windowed white noise as q[n] = u[n]w[n], and Computing discrete complex cepstrum with N-point DFT

5 February 2016Veton Këpuska69 Complex Cepstrum of Unvoiced Speech  q N [n] – the discrete complex cepstrum of the noise source covers all quefrencies, and thus separation is not possible.  Phase unwrapping of noisy signals is very unreliable.  Real cepstrum is adequate for unvoiced speech (phase information not important for this case) resulting in minimum-phase versions of h[n].  Deconvolved excitation may contain interesting fine source structure for classes of sounds; e.g., voiced fricatives.

5 February 2016Veton Këpuska70 Analysis/Synthesis Structure  In speech analysis underlying parameters of the speech model are estimated  In speech synthesis stage the waveform is reconstructed from the model parameters.  Liftering of low-quefrency region of the cepstrum ⇒ provides an estimate of the system impulse response  Liftering of high- quefrency region of the cepstrum ⇒ provides an estimate of source excitation signal.  Inverting the estimate of the source signal with homomorphic system to obtain excitation function.  Convolution of the two resulting component estimates yields the original short-time segment exactly.

5 February 2016Veton Këpuska71 Analysis/Synthesis Structure  With an overlap-add reconstruction from the short-time segments, the entire waveform is recovered.  The homomorphic system performs transformation with no information reduction.  This process is analogous to reconstructing the waveform, in linear prediction analysis/synthesis, from the convolution of the all-pole filter and the output of its inverse filter.  In speech coding and speech modification applications a more efficient representation is desired.  Complex or real cepstrum provides an approach to such a representation because pitch and voicing can be estimated from the peak (or lack of peak) in the high- quefrency region of the cepstrum.

5 February 2016Veton Këpuska72 Zero and Minimum-Phase Synthesis  Assuming that we have a succinct and accurate characterization of the speech production source (as with linear prediction-based analysis/synthesis), able to synthesize an estimate of the speech waveform.  This synthesis can be performed based on any one of several possible phase functions: Zero-phase, Minimum-phase, maximum-phase Mixed-phase functions

5 February 2016Veton Këpuska73 Zero and Minimum-Phase Synthesis  General framework for homomorphic analysis/synthesis: 1024-point Real Cepstrum Analysis window of ms P/2

5 February 2016Veton Këpuska74 Mixed-Phase Synthesis  Example 6.13

5 February 2016Veton Këpuska75 Contrasting Linear Predication and Homomorphic Filtering  Homomorphic Filtering is viewed as an alternative to linear prediction. Linear PredictionHomomorphic Filtering ParametricNon-parametric Sharp smooth resonancesWider spurious resonances All-pole representationPoles and zeros can be represented. Minimum-phase response estimate only Minimum-phase as well as Mixed-phase if complex cepstrum is used. Synthesized speech “crisper” but more “mechanical” Synthesized speech more “natural” but “muffled”

5 February 2016Veton Këpuska76 Contrasting Linear Predication and Homomorphic Filtering  Similar problems with both methods: Linear PredictionHomomorphic Filtering Increased speech distortion with increasing pitch Aliasing of the vocal tract impulse response at the pitch period repetition rate Linear prediction windowing results in the prediction of nonzero values of the waveform from zeros outside the window. Windowing a periodic waveform distorts the convolutional model. Number of poles is requiredThe length of the low-quefrency lifter must be chosen Best window and order selection is often a function of the pitch of the speaker.

5 February 2016Veton Këpuska77 Homomorphic Prediction  Number of speech analysis methods rely on combining homomorphic filtering with linear prediction and are referred to collectively as homomorphic prediction.  Two primary advantages of combining the methods: 1.By reducing the effects of waveform periodicity, an all-pole estimate suffers less from the effect of high- pitch aliasing. 2.By removing ambiguity in waveform alignment, zero estimation can be performed without the requirement of pitch-synchronous analysis.

5 February 2016Veton Këpuska78 Homomorphic Prediction  Waveform Periodicity: Recall that for the waveform consisting of the convolution of a short-time impulse train and an impulse response: x[n]=p[n]*h[n] Autocorrelation function is given by the convolution of the autocorrelation function of the response and that of the impulse train: r x []=r h []*r p [] Thus, as the spacing between impulses (the pitch period) decreases, the autocorrelation function of the impulse response suffers form increasing distortion.

5 February 2016Veton Këpuska79 Homomorphic Prediction Thus if spectrogram magnitude of h[n] can be estimated accurately then linear prediction analysis can be performed with an estimate of r h [] free of the waveform periodicity. This leads to the following idea: 1.Use homomorphic filtering to deconvolve and estimate of h[n] by low-pass liftering the real or complex cepstrum of x[n]. 2.Use autocorrelation method on the resulting impulse response estimate by linear prediction analysis to obtain the model parameters.

5 February 2016Veton Këpuska80 Example 6.14  Suppose h[n] is a minimum-phase all-pole sequence of order p. Consider a waveform x[n] constructed by convolving h[n] with a sequence p[n] where: p[n] = [n] + [n-N], with <1  Complex cepstrum of x[n] is given by:  Where and are the complex cepstra of p[n] and h[n], respectively.  The autocorrelation function is given by: r x [  ] = (1+ 2 ) r h [] +  r h [ -N ] +  r h [ +N ]  r x [  ] is r h [] distorted by its neighboring terms centered at  =+N and  =-N.

5 February 2016Veton Këpuska81 Homomorphic Prediction  Important point of previous example: The first p coefficients of the real cepstrum of x[n] are undistorted (if a long-enough DFT length is used in the computation) The first p coefficients of the autocorrelation function r x [  ] of the waveform are distorted by aliasing of autocorrelation replicas (regardless of the DFT length) Cepstral lowpass lifter of duration less than p extracts a smoothed and not aliased version of the spectrum. Linear prediction coefficients can alternatively be obtained exactly through the recursive relation between the real cepstrum and predictor coefficients of the all- pole model when h[n] is all-pole (Exercise 6.13).

5 February 2016Veton Këpuska82 Homomorphic Prediction  Zero Estimation: Consider a transfer function of poles and zeros of the form: Also consider a sequence x[n]=h[n]*p[n] where p[n] is a periodic impulse train. Suppose that:  Estimate of h[n] is obtained through homographic filtering of x[n]  Number of poles and zeros is known and  Linear-phase component z -r has been removed. Then poles of h[n] can be estimated using the covariance method of linear predication. Other methods can be used (e.g., Shanks method described in Chapter 5) to estimate zeros.

5 February 2016Veton Këpuska83 Homographic Prediction

5 February 2016Veton Këpuska84 Summary  This chapter focus was on the use of Homomorphic filtering with application to deconvolution-separation of source from a system.  The presented methodology is general and can be applied not only to deconvolution of vocal tract from glottal source.  Example Applications: Control of dynamic range of multiplicatively combined signals (Exercise 6.19) Recovery of speech from degraded recordings. Old acoustic recordings suffer from convolutional distortion imparted by an acoustic horn that can be approximated by a linear resonant filter. See Exercise 6.20 for details. In image processing, homomorphic filtering can be used for contrast enhancement (See Oppenheim and Shafer Book, “Digital Signal Processing”, p487, Prentice Hall 1975.)

5 February 2016Veton Këpuska85 Summary Homomorphic processing is applied in the phase Vocoder and sinewave analysis/synthesis. It also has been found useful in speech coding (Chapter 12) Speaker Recognition (Chapter 14) It also a basis for mel-cepstrum; Fourier Transform of a constant-Q filtered log-spectrum. Mel-cepstrum it is hypothesized that it approximates signal processing in the early stages of human auditory perception. Homomorphic filtering applied along the temporal trajectories of the mel-cepstral coefficients can be used to remove convolutional channel distortions even when the cepstrum of these distortions overlaps the cepstrum of speech (Chapter 13):  Cepstral Mean Subtraction and  RASTA processing.

END 5 February 2016Veton Këpuska86