Automatic Speech Attribute Transcription (ASAT) Project Period: 10/01/04 – 9/30/08 The ASAT Team –Mark Clements –Sorin Dusan.

Slides:



Advertisements
Similar presentations
Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis.
Advertisements

V-Detector: A Negative Selection Algorithm Zhou Ji, advised by Prof. Dasgupta Computer Science Research Day The University of Memphis March 25, 2005.
Distinctive Feature Detection For Automatic Speech Recognition
Update on Goals 1 and 2 Curricular Domain Curricular Domain – accomplishments to date Developed baseline information about current level of faculty.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Alternate Software Development Methodologies
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
Speech Group INRIA Lorraine
The Center for Signal & Image Processing Georgia Institute of Technology Kernel-Based Detectors and Fusion of Phonological Attributes Brett Matthews Mark.
Development of Automatic Speech Recognition and Synthesis Technologies to Support Chinese Learners of English: The CUHK Experience Helen Meng, Wai-Kit.
Confidence Measures for Speech Recognition Reza Sadraei.
Detection of Recognition Errors and Out of the Spelling Dictionary Names in a Spelled Name Recognizer for Spanish R. San-Segundo, J. Macías-Guarasa, J.
A Study on Detection Based Automatic Speech Recognition Author : Chengyuan Ma Yu Tsao Professor: 陳嘉平 Reporter : 許峰閤.
Acoustical and Lexical Based Confidence Measures for a Very Large Vocabulary Telephone Speech Hypothesis-Verification System Javier Macías-Guarasa, Javier.
Designing a Multi-Lingual Corpus Collection System Jonathan Law Naresh Trilok Pace University 04/19/2002 Advisors: Dr. Charles Tappert (Pace University)
Effort in hours Duration Over Weeks Or Months Inception Launch Web Lifecycle Methodology Maintenance Phases Copyright Wonderlane Studios.
Conditional Random Fields   A form of discriminative modelling   Has been used successfully in various domains such as part of speech tagging and other.
CHAPTER 12 ADVANCED INTELLIGENT SYSTEMS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang.
Presented by: Fang-Hui, Chu Automatic Speech Recognition Based on Weighted Minimum Classification Error Training Method Qiang Fu, Biing-Hwang Juang School.
Automatic detection of microchiroptera echolocation calls from field recordings using machine learning algorithms Mark D. Skowronski and John G. Harris.
Hierarchical Dirichlet Process (HDP) A Dirichlet process (DP) is a discrete distribution that is composed of a weighted sum of impulse functions. Weights.
Chapter 11: Software Prototyping Omar Meqdadi SE 273 Lecture 11 Department of Computer Science and Software Engineering University of Wisconsin-Platteville.
Multimodal Information Analysis for Emotion Recognition
Presented by: Fang-Hui Chu Boosting HMM acoustic models in large vocabulary speech recognition Carsten Meyer, Hauke Schramm Philips Research Laboratories,
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
A few thoughts about ASAT Some slides from NSF workshop presentation on knowledge integration Thoughts about “islands of certainty” Neural networks: the.
Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky.
Automatic Identification and Classification of Words using Phonetic and Prosodic Features Vidya Mohan Center for Speech and Language Engineering The Johns.
A Phonetic Search Approach to the 2006 NIST Spoken Term Detection Evaluation Roy Wallace, Robbie Vogt and Sridha Sridharan Speech and Audio Research Laboratory,
Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
ENTERFACE 08 Project 1 “MultiParty Communication with a Tour Guide ECA” Mid-term presentation August 19th, 2008.
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Recognition of foreign names spoken by native speakers Frederik Stouten & Jean-Pierre Martens Ghent University.
New Acoustic-Phonetic Correlates Sorin Dusan and Larry Rabiner Center for Advanced Information Processing Rutgers University Piscataway,
Learning Long-Term Temporal Feature in LVCSR Using Neural Networks Barry Chen, Qifeng Zhu, Nelson Morgan International Computer Science Institute (ICSI),
Conditional Random Fields for ASR Jeremy Morris July 25, 2006.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
1 CRANDEM: Conditional Random Fields for ASR Jeremy Morris 11/21/2008.
Institute of Information Science, Academia Sinica 12 July, IIS, Academia Sinica Automatic Detection-based Phone Recognition on TIMIT Hung-Shin Lee.
Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.
Discriminative Phonetic Recognition with Conditional Random Fields Jeremy Morris & Eric Fosler-Lussier The Ohio State University Speech & Language Technologies.
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
Perceptual and Neural Modeling Automatic Speech Attribute Transcription (ASAT) Project Sorin Dusan Center for Advanced Information Processing Rutgers University.
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
VoiceXML – Speech Recognition Yousef Rabah. VoiceXML Markup Language Dialogs Dependencies Standalone Vs. Hosted Speaker Dependent Vs. Speaker Independent.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
ASAT Project Two main research thrusts –Feature extraction –Evidence combiner Feature extraction –The classical distinctive features are well explored,
Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.
Speech Analysis and Cognition Using Category-Dependent Features in a Model of the Central Auditory System Woojay Jeon Research Advisor: Fred Juang School.
1 Experiments with Detector- based Conditional Random Fields in Phonetic Recogntion Jeremy Morris 06/01/2007.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
By: Nicole Cappella. Why I chose Speech Recognition  Always interested me  Dr. Phil Show Manti Teo Girlfriend Hoax  Three separate voice analysts proved.
Combining Phonetic Attributes Using Conditional Random Fields Jeremy Morris and Eric Fosler-Lussier – Department of Computer Science and Engineering A.
Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.
Speaker Recognition UNIT -6. Introduction  Speaker recognition is the process of automatically recognizing who is speaking on the basis of information.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
Stand alone system. Team members PC dependent Speech recognition.
Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology Mark Hasegawa-Johnson
Conditional Random Fields for ASR
Course Projects Speech Recognition Spring 1386
Advanced systems for prevention & early detection of forest fires (ASPires) KICK OFF MEETING Call for proposals for prevention and preparedness project.
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
Jeremy Morris & Eric Fosler-Lussier 04/19/2007
Automatic Speech Recognition: Conditional Random Fields for ASR
Research on the Modeling of Chinese Continuous Speech Recognition
Speaker Identification:
Presenter : Jen-Wei Kuo
2017 APSIPA A Study on Landmark Detection Based on CTC and Its Application to Pronunciation Error Detection Chuanying Niu1, Jinsong Zhang1, Xuesong Yang2.
SPECIAL ISSUE on Document Analysis, 5(2):1-15, 2005.
Presentation transcript:

Automatic Speech Attribute Transcription (ASAT) Project Period: 10/01/04 – 9/30/08 The ASAT Team –Mark Clements –Sorin Dusan –Eric Fosler-Lussier –Keith Johnson –Fred Juang –Larry Rabiner –Chin Lee (Coordinator, NSF HLC Program Director:

ASAT Paradigm and SoW Overall System Prototypes and Common Platform

1.Bank of Speech Attribute Detectors Each detected attribute is represented by a time series (event) –An example: frame-based detector (0-1 simulating posterior probability) ANN-based Attribute Detectors –An example: nasal and stop detectors Sound-specific parameters and feature detectors –An example: “VOT” for V/UV stop discrimination Biologically-motivated processors and detectors –Analog detectors, short-term and long-term detectors Perceptually-motivated processors and detectors –Converting speech into neural activity level functions Others?

Nasal j+ve d+ing z+ii j+i g+ong h+e g+uo d+e m+ing +vn Stop Vowel XX An Example: More Visible than Spectrogram? Early acoustic to linguistic mapping !!

2.Event Merger Merge multiple time series into another time series –Maintaining the same detector output characteristics Combine temporal events –An example: combining phones into words (word detectors) Combine spatial events –An example: combining vowel and nasal features into nasalized vowels Extreme: Build a 20K-word recognizer by implementing 20K keyword detectors Others: OOV, partial recognition

3.Evidence Verifier Provide confidence measures to events and evidences –Utterance verification algorithms can be used Output recognized evidences (words and others) –Hypothesis testing is needed in every stage Prune event and evidence lattices –Pruning threshold decisions Minimum verification error (MVE) verifiers Many new theories can be developed Others?

Word and Phone Verifiers (/w/+/ /+/n/ = “one”)

4.Knowledge Sources: Definition & Evaluation Explore large body of speech science literature Define training, evaluation and testing databases Develop Objective Evaluation Methodology –Defining detectors, mergers, verifiers, recognizers –Defining/collecting evaluation data for all Document all pieces on the web

5.Prototype ASR Systems and Platform Continuous Phone Recognition: TIMIT? Continuous Speech Recognition –Connected digit recognition –Wall Street Journal –Switchboard? Establishment of a collaborative platform –Implementing divide-’n’-conquer strategy –Developing a user community

Summary ASAT Goal: Go beyond state-of-the-art ASAT Spirit: Work for team excellence ASAT team member responsibilities –MAC: Event Fusion –SD: Perception-based processing –EF: Knowledge Integration (Event Merger) –KJ: Acoustic Phonetics –BHJ: Evidence Verifier –LRR: Attribute Detector –CHL: Overall