1 Analysis and Synthesis of Pathological Vowels Prospectus Brian C. Gabelman 6/13/2003.

Slides:

Advertisements

Similar presentations

Acoustic/Prosodic Features

Advertisements

Digital Signal Processing

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.

Analysis and Digital Implementation of the Talk Box Effect Yuan Chen Advisor: Professor Paul Cuff.

Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),

Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.

Overview of Real-Time Pitch Tracking Approaches Music information retrieval seminar McGill University Francois Thibault.

Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.

Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.

Automatic Lip- Synchronization Using Linear Prediction of Speech Christopher Kohnert SK Semwal University of Colorado, Colorado Springs.

Xkl: A Tool For Speech Analysis Eric Truslow Adviser: Helen Hanson.

Analysis and Synthesis of Shouted Speech Tuomo Raitio Jouni Pohjalainen Manu Airaksinen Paavo Alku Antti Suni Martti Vainio.

6/3/20151 Voice Transformation : Speech Morphing Gidon Porat and Yizhar Lavner SIPL – Technion IIT December

280 SYSTEM IDENTIFICATION The System Identification Problem is to estimate a model of a system based on input-output data. Basic Configuration continuous.

Communications & Multimedia Signal Processing Report of Work on Formant Tracking LP Models and Plans on Integration with Harmonic Plus Noise Model Qin.

Design of a Control Workstation for Controller Algorithm Testing Aaron Mahaffey Dave Tastsides Dr. Dempsey.

Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.

Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,

A PRESENTATION BY SHAMALEE DESHPANDE

Representing Acoustic Information

IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults.

Technion – Israel Institute of Technology Department of Electrical Engineering Winter 2009 Instructor Amit Berman Students Evgeny Hahamovich Yaakov Aharon.

Topics covered in this chapter

Synthesis advanced techniques. Other modules Synthesis would be fairly dull if we were limited to mixing together and filtering a few standard waveforms.

1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.

Dual-Channel FFT Analysis: A Presentation Prepared for Syn-Aud-Con: Test and Measurement Seminars Louisville, KY Aug , 2002.

Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto.

Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.

Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.

93 SOURCE TIME SERIES VOCAL TRACT TRANSFER FUNCTION VOICE TIME SERIES VOICE SPECTRUM SOURCE TIME SERIES VOCAL TRACT TRANSFER FUNCTION VOICE TIME SERIES.

1 Linear Prediction. 2 Linear Prediction (Introduction) : The object of linear prediction is to estimate the output sequence from a linear combination.

1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:

1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.

♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.

Jitter Experiment Final presentation Performed by Greenberg Oleg Hahamovich Evgeny Spring 2008 Supervised by Mony Orbah.

Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.

Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )

Structure of Spoken Language

SPECTRUM ANALYZER 9 kHz GHz

ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska

119 PLANT RESPONSES A/D CONTROL ALGORITHM D/A CM D IN INSIDE COMPUTER + - SENSORS A/D ERR SENSOR FEEDBACKS Figure 4.1. Real-time digital control loop.

VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.

Frequency Modulation ECE 4710: Lecture #21 Overview:

Performance Comparison of Speaker and Emotion Recognition

EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,

Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,

More On Linear Predictive Analysis

Automatic Equalization for Live Venue Sound Systems Damien Dooley, Final Year ECE Progress To Date, Monday 21 st January 2008.

Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.

By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.

FREQUENCY (HZ) LOG10 MAGNITUDE Figure 2.30a. PSD of original voice

EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,

Linear Prediction.

1 Speech Compression (after first coding) By Allam Mousa Department of Telecommunication Engineering An Najah University SP_3_Compression.

Adv DSP Spring-2015 Lecture#11 Spectrum Estimation Parametric Methods.

146 Figure 5.1. Measured synthetic fundamental period combines portions from two separately synthesized pulses; time between minima is measured: the last.

PATTERN COMPARISON TECHNIQUES

Linear Prediction.

1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.

Linear Predictive Coding Methods

Measured Period VOICE SIGNAL

ESTIMATED INVERSE SYSTEM

FUNDAMENTAL PERIOD TRACK FEATURES

Digital Systems: Hardware Organization and Design

Linear Prediction.

Speech Processing Final Project

Presentation transcript:

1 Analysis and Synthesis of Pathological Vowels Prospectus Brian C. Gabelman 6/13/2003

2 OVERVIEW OF PRESENTATION I. Background II. Analysis of pathological voices III. Synthesis of pathological voices IV. Summary

3 What is this project? BACKGROUND Methods for modeling, analysis, and synthesis of pathological vowels incorporating novel approaches in: - System identification - Parameterization of non-periodic components (AM, FM, and noise) - Synthesizer designs for realtime and offline use What is a pathological vowel? May be caused by physical or neural problems. Characterized by substantial and complex NONPERIODIC signal components.

4 Why do it? - Create a non-subjective basis to compare pathological voices for: 1. Improved diagnosis 2. Tracking changes in a patient’s voice - Generate voice samples with known levels of variations (noise, roughness, etc.) for: 1. Evaluation of model parameters 2. Evaluation of listener variability 3. Evaluation of importance of levels of pathological features. BACKGROUND What has been done before? - A well-established theory exists for NORMAL voices - Recent studies of pathological voices employ perturbation of normal features plus additive noise. - ES (external source) stimulation of the vocal tract to analyze formants (vowels) since 1942 for NORMAL voices.

5 For the theoretical/analytical aspect of the project, an expression of the hypothesis of the dissertation in one sentence is: “By means of FM and AM demodulation techniques, estimation of nonperiodic features of pathological vowels may be improved.” BACKGROUND

6 GLOTTAL SOURCE WAVEFORM (time domain) VOCAL TRACT FREQ. RESP. (freq domain) RESULTING VOICE SIG. (time domain) g(t) G(s) f(t) F(s) v(t) V(s) g(t) conv. f(t) = v(t) (time domain) G(s) x F(s) = V(s) (freq domain) ANALYSIS SOURCE - FILTER MODEL OF SPEECH

7 Steps for analysis: PERIODIC ANALYSIS: 1. FORMANT DETERMINATION Uses LP (linear prediction) to model vocal tract as cascaded 2nd order digital resonators. External source testing is shown to augment or replace LP for pathological vowels (Inv). 2. SOURCE MODELING Uses inverse filtering and least squares optimization to fit source waveform to a standard model (LF). NONPERIODIC ANALYSIS: 3. ANALYSIS OF PITCH VARIATION Uses high resolution pitch tracking to measure detailed nonperiodic frequency variation. Variations are segmented into low and high frequeny FM with Gaussian form. ANALYSIS

8 4. FM DEMODULATION Pitch variations are removed from original voice to achieve accurate noise estimation (step 7) (Inv.) 5. ANALYSIS OF POWER VARIATION Uses power tracking to measure detailed nonperiodic loudness variations. Variations are segmented into low and high frequency AM with Gaussian form. 6. AM DEMODULATION Power variations are removed from original voice to achieve accurate noise estimation (step 7) (Inv.) 7. ASPIRATION NOISE Frequency domain methods are used to separate aspiration noise component. The noise is spectrally modeled.[Gus de Krom] ANALYSIS

9 COLLECTION OF VOICE SAMPLES VOICE ANALYSIS PARAMETERS (PITCH, NSR, FORMANTS,..) VALIDATION PERCEPTUAL COMPARISONS + - VOICE SYNTHESIS ANALYSIS BY SYNTHESIS

10 ANALYSIS ANALYSIS/SYNTHESIS MODEL OVERVIEW SOURCE WAVEFORM FM MODULATION AM MODULATION SPECTRAL SHAPING GAUSSIAN NOISE VOCAL TRACT OUTPUT VOICE PERIODIC NONPERIODIC + + SYNTHESIS ANALYSIS

11 MIC SIGNAL LPC FORMANT ANALYSIS / MANUAL OPS JITTER ESTIMATE. TREMOR TIME HIST INVERSE FILTERING RAW FLOW DERIVATIVE SRC NOISE SPECTRUM RESAMPLE TO REMOVE TREMOR SYNTHESIZER FORMANTS FITTED LF SOURCE PULSE LEAST SQR LF FIT PITCH TRACKER CEPSTRAL NOISE ANALYSIS NSR ESTIMATE CONST. PITCH VOICE PITCH TIME HIST SHIMMER ESTIMATE VOLUME TIME HIST POWER TRACKER POWER TIME HIST RESAMPLE TO REMOVE TREMOR CONST. POWER VOICE MIKE SIGNAL ANALYSIS OVERVIEW OF PROJECT OPERATONS

12 UNKOWN SYSTEM PREDICTOR u(n)s(n) s’(n) + - e(n) ESTIMATED INVERSE SYSTEM H(z) A(z) G/H(z) LINEAR PREDICTION (SOURCE) (VOCAL TRACT) Estimates the vocal tract as an all-pole filter by minimizing the error between actual and model-predicted signals. (MODEL) (ERROR) PERIODIC ANALYSIS

13 IDEALIZED LP RESULT Requires a priori knowledge of system. More difficult for pathological vowels. PERIODIC ANALYSIS

14 SOURCE TIME SERIES VOCAL TRACT TRANSFER FUNCTION VOICE TIME SERIESVOICE SPECTRUM SOURCE TIME SERIES VOCAL TRACT TRANSFER FUNCTION VOICE TIME SERIESVOICE SPECTRUM SOURCE TIME SERIES VOCAL TRACT TRANSFER FUNCTION VOICE TIME SERIESVOICE SPECTRUM CASE 0: NORMAL SOURCE AND NORMAL VOCAL TRACT CASE 1: “BREATHY” SOURCE AND NORMAL VOCAL TRACT CASE 2: NORMAL SOURCE AND ABNORMAL VOCAL TRACT SOURCE-FILTER AMBIGUITY Source & filter are mixed in final voice. Unique LP solution may be difficult. PERIODIC ANALYSIS

T (SECONDS) FLOW U(t) & DERIVATIVE E(t) E1 (FIRST SEG.)E2 (SECOND SEG.) U(t) 30*E(t) tp (te,Ee) tc (t2,Ee/2) (tpk,Epk) FITTING RAW SOURCE TO LF Having established the inverse filtered source, it is fit to the LF model [Qi] PERIODIC ANALYSIS

16 HIGH RESOLUTION PITCH TRACKING Nonperiodic analysis begins with interpolating pitch tracking. NON-PERIODIC ANALYSIS FREQ (HZ) TIME (SEC) VOICE SIGNAL PITCH TRACK

17 FM DEVIATION SEGREATED TO LOW AND HIGH FREQUENCY The pitch track is low/hi pass filtered to yield tremor and HFPV (High Frequency Pitch Variation). NON-PERIODIC ANALYSIS

DELT FREQ (%) TOTSKIP=0 FCUT=10 TOTMAN=0 #OCCURRANCES GAUSSIAN HFPV Successful pitch tracking yields a Gaussian distribution in HPFV. The standard deviation is a convenient measure of HFPV. NON-PERIODIC ANALYSIS

19 FM DEMODULATION The pitch track may be used to demodulate the original voice to obtain a version with almost no pitch variation; re-tracking verifies constant pitch (<0.1%). NON-PERIODIC ANALYSIS DELTA FREQ PERCENT TIME (SEC)

x SAMPLE # POWER POWER TRACKING Analogously to pitch tracking, voice power is tracked. NON-PERIODIC ANALYSIS POWER TRACK

21 POWER SEGREGATED TO LOW AND HIGH FREQUENCY The power track is low/hi pass filtered to yield low frequency power variations and high frequency shimmer. NON-PERIODIC ANALYSIS

DELT PWR (%) #OCCURRANCES SHIM% HIST. STAND DEV = 4.823% GAUSSIAN POWER VARIATIONS Shimmer also displays Gaussian variations. NON-PERIODIC ANALYSIS

23 AM DEMODULATION The power track may be used to demodulate the original voice to obtain a version with almost no variation in strength; re-tracking verifies constant power. NON-PERIODIC ANALYSIS

24 ASPIRATION NOISE ANALYSIS Aspiration noise is segregated via spectral techniques. Peaks in the FFT of the log of the FFT (cepstrum) represent periodic energy, and are filtered out with a comb filter (lifter). Results are used to calculate noise-to-signal ratio (NSR). NON-PERIODIC ANALYSIS

25 FM DEMODULATION IMPROVES NSR ACCURACY Using FM demodulation improves resolution of spectral peaks of periodic components, thus allowing longer FFT windows and more accurate NSR determination. NON-PERIODIC ANALYSIS FREQUENCY (HZ) LOG10 MAGNITUDE

26 CHANGES IN NSR AFTER FM AND AM DEMODULATION FM demodulation reduces NSR measures by up to 20 dB, yielding results closer to perceived levels. AM demodulation has much less effect. NON-PERIODIC ANALYSIS ORIG ALL FM REMOVED TREMOR REMOVED

27 PC #1: STIMULUS GEN. STIMULUS SIGNAL D/A AMPLIFIER XDUCER /a/ ACOUSTIC CONDUIT PC #2: DAQ SIGNAL CONDITIONER A/D MEM. MICROPHONE CH 1 CH 2 SIGNAL CONDITIONER VOCAL TRACT TUBE OF LENGTH L WITH CLOSED END. FORMANTS AT F = C/4L, 3C/4L, 5C/4L,... EXTERNAL (ES) SOURCE ANALYSIS Source-filter ambiguity may be resolved by augmenting the glottal source with an known external stimulus. [Epps]. PERIODIC ANALYSIS

28 ES VERIFICATION A simple plastic tube model verified the ES experimental setup. Resonances occur at expected frequencies. PERIODIC ANALYSIS

29 ES: NORMAL /a/ LP & FFT analysis show consistent results with ES analysis for a normal vowel. PERIODIC ANALYSIS

30 ES: SIMULATED BREATHY /a/ LP & FFT analysis show poor resolution for F3 and F4 for a breathy /a/, while the ES resolution for F3 and F4 remains good. PERIODIC ANALYSIS

31 SYNTHESIS SYNTHESIS OF PATHOLOGICAL VOWELS Synthesis is a critical step in the study of pathological vowels. It provides evidence of the success of analysis and modeling steps via immediate comparisons of original and synthetic voice. Two synthesizers were implemented: 1. A realtime hardware-based synthesizer capable of providing instant response to changes in model parameters. 2. A software synthesizer implemented in MATLAB with extended features, convenient graphical interface, and ease of modification.

32 LOW PASS FILT. CURRENT REALTIME SYNTHESIZER FUNCTIONAL OVERVIEW D/A AGC 4 ZEROES POLES GAH ASP NOISE OUTPUT SIGNAL FILE RECORD WAV STORE/RECALL CONTROL PARMS PARAMETER TIME VARIATION SOURCE SELECTION SOURCE CONTROLS JITTERSHIMMERDIPLOPHONIA CLK X86 INTERUPT AMP IMPL KGLOT 88 LF1 LF2 ARB ARBITRARY SOURCE FILE SPKR + SYNTHESIS REALTIME SYNTHESIZER - Implemented in native X86 assembly language - Executes all code within 100us cycle - Overrides PC OS to achieve determinancy - Employs dedicated clock, I/O, and control hardware implemented in a wire-wrap PCB adapter card

33 SYNTHESIS SYNTHESIS VALIDATION The current model, analysis tools, and synthesizers yield a high level of fidelity in generation of synthetic pathological vowels. The system is currently employed at the UCLA Voicelab for NIH funded perceptual studies. In order to objectively validate the analysis/synthesis process, the loop is closed by re-analyzing the synthetic time series to confirm parameter values. Re-analysis also provides opportunity to observe interactions of nonperiodic components.

34 SYNTHESIS SYNTHESIS VALIDATION Re-measured synthetic aspiration noise level agrees with level set in synthesizer.

35 SYNTHESIS SYNTHESIS VALIDATION Aspiration noise adds about %0.2 to HFPV measurements JITTER LEVEL SET IN SYNTHESIZER (%) JITTER MEASURED IN SYNTHETIC (%) A.N. SET NSR IN SYNTHESIZER (dB) MEASURED NSR IN SYNTH (dB) HFPV adds about 4dB to NSR measurements.

36 SYNTHESIS SYNTHESIS VALIDATION Subjective analysis by synthesis experiments demonstrate the success of AM and FM demodulation in achieving accurate modeling of nonperiodic features. Listeners adjust synthetic aspiration noise to match original. Match improves with demodulation SABS ASPIRATION NOISE (DB) CEPSTRAL NSR (DB) PEARSON = 0.51 ORIGINAL VOICE (NO DEMOD)

37 SYNTHESIS SYNTHESIS VALIDATION

38 SUMMARY CONCLUSION This study has achieved improved automatic, objective analysis and synthesis of speech within the specialization of pathological vowels. Specific accomplishments include: - A unique, symmetric model for nonperiodic components as AM, FM and spectrally-shaped aspiration noise - Improved accuracy of noise analysis via AM & FM demodulation - Application of ES formant identification for pathological vowels. - Implementation of realtime and offline specialized high fidelity vowel synthesizers