ASAT Project Two main research thrusts –Feature extraction –Evidence combiner Feature extraction –The classical distinctive features are well explored,

Slides:



Advertisements
Similar presentations
Geo-Services. Agenda Overview Short Story Process Description GeoCoding GeoRouting Alerts Track & Trace Reporting Calculating CO2 emissions.
Advertisements

Introduction Simple Random Sampling Stratified Random Sampling
Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
Acoustic Characteristics of Consonants
Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
Abstract Binaural microphones were utilised to detect phonation in a human subject (figure 1). This detection was used to cut the audio waveform in two.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
Confidence Measures for Speech Recognition Reza Sadraei.
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
Lecture 1 Preliminaries.
Psychology 202b Advanced Psychological Statistics, II February 10, 2011.
ASR Intro: Outline ASR Research History Difficulties and Dimensions Core Technology Components 21st century ASR Research (Next two lectures)
Introduction to Speech Synthesis ● Key terms and definitions ● Key processes in sythetic speech production ● Text-To-Phones ● Phones to Synthesizer parameters.
Statistics for Linguistics Students Michaelmas 2004 Week 3 Bettina Braun
VESTEL database realistic telephone speech corpus:  PRNOK5TR: 5810 utterances in the training set  PERFDV: 2502 utterances in testing set 1 (vocabulary.
AdvAIR Supervised by Prof. Michael R. Lyu Prepared by Alex Fok, Shirley Ng 2002 Fall An Advanced Audio Information Retrieval System.
ICCS-NTUA Contributions to E-teams of MUSCLE WP6 and WP10 Prof. Petros Maragos National Technical University of Athens School of Electrical and Computer.
Analysis & Synthesis The Vocoder and its related technology.
HIWIRE Progress Report – July 2006 Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos Technical University.
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
A PRESENTATION BY SHAMALEE DESHPANDE
Statistical Methods for long-range forecast By Syunji Takahashi Climate Prediction Division JMA.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Isolated-Word Speech Recognition Using Hidden Markov Models
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
All features considered separately are relevant in a speech / music classification task. The fusion allows to raise the accuracy rate up to 94% for speech.
1 Robust HMM classification schemes for speaker recognition using integral decode Marie Roch Florida International University.
Speech Perception 4/6/00 Acoustic-Perceptual Invariance in Speech Perceptual Constancy or Perceptual Invariance: –Perpetual constancy is necessary, however,
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Linear Prediction Coding of Speech Signal Jun-Won Suh.
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
Automated Detection and Classification Models SAR Automatic Target Recognition Proposal J.Bell, Y. Petillot.
♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.
IRCS/CCN Summer Workshop June 2003 Speech Recognition.
Hidden Markov Models for Software Piracy Detection Shabana Kazi Mark Stamp HMMs for Piracy Detection 1.
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
Automatic Speech Attribute Transcription (ASAT) Project Period: 10/01/04 – 9/30/08 The ASAT Team –Mark Clements –Sorin Dusan.
New Acoustic-Phonetic Correlates Sorin Dusan and Larry Rabiner Center for Advanced Information Processing Rutgers University Piscataway,
Automated Detection and Classification Models SAR Automatic Target Recognition Proposal J.Bell, Y. Petillot.
Indoor Location Detection By Arezou Pourmir ECE 539 project Instructor: Professor Yu Hen Hu.
Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The.
Christopher M. Bishop Object Recognition: A Statistical Learning Perspective Microsoft Research, Cambridge Sicily, 2003.
© 2005, it - instituto de telecomunicações. Todos os direitos reservados. Arlindo Veiga 1,2 Sara Cadeias 1 Carla Lopes 1,2 Fernando Perdigão 1,2 1 Instituto.
A Recognition Model for Speech Coding Wendy Holmes 20/20 Speech Limited, UK A DERA/NXT Joint Venture.
Performance Comparison of Speaker and Emotion Recognition
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 26 Chapter 11 Section 1 Inference about Two Means: Dependent Samples.
Colour and Texture. Extract 3-D information Using Vision Extract 3-D information for performing certain tasks such as manipulation, navigation, and recognition.
Su-ting, Chuang 1. Outline Introduction Related work Hardware configuration Detection system Optimal parameter estimation framework Conclusion 2.
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
Various Topics of Interest to the Inquiring Orthopedist Richard Gerkin, MD, MS BGSMC GME Research.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
1 Speech Compression (after first coding) By Allam Mousa Department of Telecommunication Engineering An Najah University SP_3_Compression.
Research Methodology Proposal Prepared by: Norhasmizawati Ibrahim (813750)
ESTIMATION.
ECO 173 Chapter 10: Introduction to Estimation Lecture 5a
Conditional Random Fields for ASR
ECO 173 Chapter 10: Introduction to Estimation Lecture 5a
Introduction to Inference
Automatic Speaker Identification Using Sentinel Word Discrimination
The Vocoder and its related technology
Missing feature theory
汉语连续语音识别 年1月4日访北京工业大学 973 Project 2019/4/17 汉语连续语音识别 年1月4日访北京工业大学 郑 方 清华大学 计算机科学与技术系 语音实验室
Presenter: Shih-Hsiang(士翔)
Presentation transcript:

ASAT Project Two main research thrusts –Feature extraction –Evidence combiner Feature extraction –The classical distinctive features are well explored, but not solved. –Many other waveform features and events can be extracted – reflecting time properties, spectral properties, various vocal tract model parameters, glottal features, prosodic events and combinations thereof. –Features may at first glace have little relevance to articulatory gestures (modulation products, etc.) –Successful feature sets can then be subject to perceptual interpretation. –This approach was successfully implemented in a thesis by Necioglu for speaker characterization

ASAT Project –Feature extraction (cont’d) Statistical characterizations that extract recurrent patterns can be the basis for such features One example useful for ultra-low-bit-rate coding: Ergodic HMMs that are not phonetically based but are useful for pattern extraction. Take advantage of segmentation event detectors used in the latest speech coders (despite dogma, the problem and ASR and speech coding cannot be completely orthogonal!) Robust feature extraction should have confidence measures included First steps: build a toolbox of feature extraction modules.

ASAT Project Evidence Combining / Fusion –Events will never be perfectly detected. –Phonetic/sub-word features are never going to be perfectly extracted. –Features can be fuzzy (e.g., nasalization has degrees) –Reliability is affected by speaking style, the channel, the length of the event. –“Error bars” can be extremely wide –Common framework: seek to represent confidence measures as probabilities for straightforward combinations. Do not apply thresholding. –This will require each detected event and each high order feature detected to have individual non-linear normalizations trained to before overall combination.

ASAT Project Evidence Combining / Fusion (cont’d) –This will require each detected event and each high order feature to have individual non-linear normalizations trained before overall combination. –Some level of brute force will be required to estimate these normalizations for new contributors. –Will begin with simple detectors to verify approach –Will study alternate approaches as reported.