AMSP : Advanced Methods for Speech Processing An expression of Interest to set up a Network of Excellence in FP6 Prepared by members of COST-277 and colleagues.

Slides:

Advertisements

Similar presentations

MAJORDOME Gérard CHOLLET, Richard CROCE, Laurence LIKFORMAN,

Advertisements

Spoken Language Interaction in Telecommunication at ENST/CNRS-LTCI Gérard CHOLLET, Richard CROCE, Dijana PETROVSKA-DELACRETAZ, Marc SIGELLE, Pascal VAILLANT,

Some activities on Biometrics at ENST/CNRS-LTCI

Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.

Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh June 2005.

Introduction The aim the project is to analyse non real time EEG (Electroencephalogram) signal using different mathematical models in Matlab to predict.

Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors

Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.

December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

Face Recognition & Biometric Systems, 2005/2006 Face recognition process.

Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.

LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.

Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.

HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR.

Basic Concepts and Definitions Vector and Function Space. A finite or an infinite dimensional linear vector/function space described with set of non-unique.

Presented by Zeehasham Rasheed

Scalable Text Mining with Sparse Generative Models

Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.

Why is ASR Hard? Natural speech is continuous

A PRESENTATION BY SHAMALEE DESHPANDE

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

Introduction to Automatic Speech Recognition

Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.

Isolated-Word Speech Recognition Using Hidden Markov Models

1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.

Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.

June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST.

7-Speech Recognition Speech Recognition Concepts

CMPD273 Multimedia System Prepared by Nazrita Ibrahim © UNITEN2002 Multimedia System Characteristic Reference: F. Fluckiger: “Understanding networked multimedia,

Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.

A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.

SPEECH CODING Maryam Zebarjad Alessandro Chiumento.

Content Extraction in Majordome Overall Objective: Quick detection of short information elements for Message Filtering and Reporting to User Functional.

Speech Parameter Generation From HMM Using Dynamic Features Keiichi Tokuda, Takao Kobayashi, Satoshi Imai ICASSP 1995 Reporter: Huang-Wei Chen.

LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.

Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Second Lecture Stuttgart, October 25, 2001.

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.

Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.

Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )

Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.

Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.

Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.

Speech Signal Processing I

ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska

Preparing for the 2008 Beijing Olympics : The LingTour and KNOWLISTICS projects. MAO Yuhang, DING Xiao-Qing, NI Yang, LIN Shiuan-Sung, Laurence LIKFORMAN,

Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.

Performance Comparison of Speaker and Emotion Recognition

Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.

Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.

Introduction Part I Speech Representation, Models and Analysis Part II Speech Recognition Part III Speech Synthesis Part IV Speech Coding Part V Frontier.

ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.

Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.

Preparing for the 2008 Beijing Olympics : The LingTour and KNOWLISTICS projects. MAO Yuhang, DING Xiao-Qing, NI Yang, LIN Shiuan-Sung, Laurence LIKFORMAN,

Pattern Recognition NTUEE 高奕豪 2005/4/14. Outline Introduction Definition, Examples, Related Fields, System, and Design Approaches Bayesian, Hidden Markov.

1 Experiments with Detector- based Conditional Random Fields in Phonetic Recogntion Jeremy Morris 06/01/2007.

Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:

1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.

1 Speech Compression (after first coding) By Allam Mousa Department of Telecommunication Engineering An Najah University SP_3_Compression.

PATTERN COMPARISON TECHNIQUES

ARTIFICIAL NEURAL NETWORKS

Digital Communications Chapter 13. Source Coding

Artificial Intelligence for Speech Recognition

3.0 Map of Subject Areas.

Previous Microphone Array System Integrated Microphone Array System

8-Speech Recognition Speech Recognition Concepts

Automatic Speech Recognition: Conditional Random Fields for ASR

Advances in Deep Audio and Audio-Visual Processing

Presenter: Shih-Hsiang(士翔)

The Application of Hidden Markov Models in Speech Recognition

Presentation transcript:

AMSP : Advanced Methods for Speech Processing An expression of Interest to set up a Network of Excellence in FP6 Prepared by members of COST-277 and colleagues Submitted by Marcos FAUNDEZ-ZANUY Presented here by Gérard CHOLLET GET-ENST/CNRS-LTCI

Outline Rationale of the proposition Objectives Approaches Modeling Recognition by synthesis Robustness to environmental conditions Evaluation paradigm Excellence Integration and structuring effect

Rationale for the NoE-AMSP The areas of Automatic Speech Processing (recognition, synthesis, coding, language identification, speaker verification) should be better integrated Better models of Speech Production and Perception Investigate Nonlinear Speech Processing Understanding, Semantic interpretation

Integrated platform for Automatic Speech Processing

Levels of representations

Features of Speech Models Reflect auditory properties of human perception Explain articulatory movements Surpass the limitations of the source-filter model Capture the dynamics of speech Capable of natural speech restitution Be discriminant for segmental information Robust to noise and channel distortions Adaptable to new speakers and new environments

Time – Frequency distributions Short Time Fourier Transform Non-linear frequency scale (PLP, WLP), mel-cepstrum Wavelets, FAMlets Bilinear distributions (Wigner-Ville, Choi-Williams,...) Instantaneous frequency, Teager operator Time – dependent representations (parametric and non parametric) Vector quantisation Matrix quantisation, non linear prediction

Time-dependent Spectral Models Temporal Decomposition (B. Atal, 1983) Vectorial Autoregressive models with detection of model ruptures (A. DeLima, Y. Grenier) Segmental parameterisation using a time-dependent polynomial expansion (Y. Grenier)

Modeling of segmental units Hidden Markov Model Markov Fields Bayesian Networks, Graphical Models OR Production models Synthesis (concatenative or rule based) with voice transformation AND / OR Non linear predictor

Expected achievements in Speech Coding and Synthesis Modeling the non-linearities in Speech Production and Perception will lead to more accurate and/or compact parametric representations. Integrate segmental recognition and synthesis techniques in the coding loop to achieve bit rates as low as a few 100's bps with natural quality Develop voice transformation techniques in order to :  Adapt segmental coders to new speakers,  Modify the characteristics of synthetic voices

Expected achievements in Speech Synthesis Self-excited nonlinear feedback oscillators will allow to better match synthetic and human voices. Current concatenative techniques should be supplemented (or replaced) by (nonlinear) model based generative techniques to improve quality, naturalness, flexibility, training and adaptation. Model-based voice mimicry controled by textual, phonetic and/or parametric input should not only improve synthesis but also coding, recognition and speaker characterisation.

Automatic Speech Recognition Limitations of the HMM and hybrid HMM-ANN approaches Keyword spotting (detection with SVM), noise robustness, adaptation Large Vocabulary Speech Recognition (SIROCCO) Markov Random Fields, Bayesian Networks and Graphical Models

Markov Random Fields Bayesian Networks and Graphical Models Speech modelling with state constrained Markov Random Field over Frequency bands (Guillaume Gravier and Marc Sigelle) Comparative framework to study MRF, Bayesian Networks and Graphical Models.

Recognition by Synthesis If we could drive a synthesizer with meaningful units (phone sequences, words,...) to produce a speech signal that mimics the one to recognize, we may come close to transcription. Analysis by Synthesis (which is in fact modeling) is a powerful tool in recognition and coding. A trivial implementation is indexing a labelled speech memory

A L I S P Automatic Language Independent Speech Processing Automatic discovery of segmental units for speech coding, synthesis, recognition, language identification and speaker verification.

The robustness issue : Mismatch between training and testing conditions High Order Statistics are less sensitive to environment and transmission noise than autocorrelation CMS, RASTA filtering Independent Component Analysis From Speaker Independent to Speaker Dependent recognition (Personalisation)

Expected achievements in Automatic Speech Recognition Dynamic nonlinear models should allow to merge feature extraction and classification under a common paradigm Such models should be more robust to noise, channel distortions and missing data (transmission errors and packet losses) Indexing a speech memory may help in the verification of hypotheses (a technique shared with Very Low Bit Rate Coders) Statistical language models should be supplemented with adapted semantic information (conceptual graphs)

Voice technology in Majordome Server side background tasks: continuous speech recognition applied to voice messages upon reception  Detection of sender’s name and subject User interaction:  Speaker identification and verification  Speech recognition (receiving user commands through voice interaction)  Text-to-speech synthesis (reading text summaries, s or faxes)

Collaboration with COST-278 COST-278: Vocal Dialogue is a continuation of COST-249 High interest in Robust Speech Recognition, Word spotting, Speech to actions, Speaker adaptation,... Some members contribute to the Eureka- MAJORDOME project Could be the seed for a Network of Excellence in FP6

Evaluation paradigm DARPA NIST  Could we organize evaluation campaigns in Europe ? The 6 th program of the EU is trying to promote Networks of Excellence. How should excellence be evaluated ? Should financial support be correlated with evaluation results ?