To an Automatic Speech Recognition System? Jiang Wu Electrical Engineering Department.

Slides:



Advertisements
Similar presentations
Speech Synthesis Markup Language V1.0 (SSML) W3C Recommendation on September 7, 2004 SSML is an XML application designed to control aspects of synthesized.
Advertisements

1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.
MIMICKING THE HUMAN EAR Philipos Loizou (author) Oliver Johnson (me)
Jessica E. Huber Ph.D. in Speech Science from University at Buffalo MA in Speech-Language Pathology, Certified Speech- Language Pathologist Assistant Professor,
Mandarin Chinese Speech Recognition. Mandarin Chinese Tonal language (inflection matters!) Tonal language (inflection matters!) 1 st tone – High, constant.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Model-Based Fusion of Bone and Air Sensors for Speech Enhancement and Robust Speech Recognition John Hershey, Trausti Kristjansson, Zhengyou Zhang, Alex.
2 Personal Introduction previousnexthome end Academic Experience ( ) Bachelor and Master Degree on Electrical Engineering, Zhejiang University,
Digital audio and computer music COS 116: 2/26/2008.
Digital Signal Processing A Merger of Mathematics and Machines 2002 Summer Youth Program Electrical and Computer Engineering Michigan Technological University.
Fig. 2 – Test results Personal Memory Assistant Facial Recognition System The facial identification system is divided into the following two components:
Why is ASR Hard? Natural speech is continuous
Digital audio and computer music COS 116, Spring 2012 Guest lecture: Rebecca Fiebrink.
Dragon Naturally Speaking Tutorial What is Dragon Naturally Speaking? Dragon is a dictation software, students can dictate a paper rather than type it.
Representing Acoustic Information
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
Project 1 : Eigen-Faces Applied to Speech Style Classification Brad Keserich, Senior, Computer Engineering College of Engineering and Applied Science;
Audio Scene Analysis and Music Cognitive Elements of Music Listening
Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.
Input Devices.  Identify audio and video input devices  List the function of the respective devices.
Speech Perception1 Fricatives and Affricates We will be looking at acoustic cues in terms of … –Manner –Place –voicing.
- Pronunciation - Intonation.  How many different tones does Mandarin have?  4  What effect do these different tones have on the language?  Each tone.
(CMSC5720-1) MSC projects by Prof K.H. Wong (21 July2014) (shb907) MSC projects supervised by Prof.
World Languages Mandarin English Challenges in Mandarin Speech Recognition  Highly developed language model is required due to highly contextual nature.
Temple University QUALITY ASSESSMENT OF SEARCH TERMS IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone, PhD Department of Electrical and Computer.
Overview of Part I, CMSC5707 Advanced Topics in Artificial Intelligence KH Wong (6 weeks) Audio signal processing – Signals in time & frequency domains.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
ICASSP Speech Discrimination Based on Multiscale Spectro–Temporal Modulations Nima Mesgarani, Shihab Shamma, University of Maryland Malcolm Slaney.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Digital audio and computer music COS 116, Spring 2010 Adam Finkelstein Slides and demo thanks to Rebecca Fiebrink.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
A NEW FEATURE EXTRACTION MOTIVATED BY HUMAN EAR Amin Fazel Sharif University of Technology Hossein Sameti, S. K. Ghiathi February 2005.
Ann Morrison, Ph.D..  effects/59/mp3/607974_SOUNDDOGS__su.m p3 effects/59/mp3/607974_SOUNDDOGS__su.m.
Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.
Audio processing methods on marine mammal vocalizations Xanadu Halkias Laboratory for the Recognition and Organization of Speech and Audio
It sure is smart but can it swing? (Digital audio and computer music)
GUIDED BY T.JAYASANKAR, ASST.PROFESSOR OF ECE, ANNA UNIVERSITY OF TIRUCHIRAPPALLI. PRESENTED BY C.SENTHILKUMAR, REG.NO: , M.E(MBCBS),COM SYSTEM,VI.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Higher Vision, language and movement. Strong AI Is the belief that AI will eventually lead to the development of an autonomous intelligent machine. Some.
Basic structure of sphinx 4
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
Introduction to psycho-acoustics: Some basic auditory attributes For audio demonstrations, click on any loudspeaker icons you see....
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Lecture 1 Phonetics – the study of speech sounds
For Audio Recording Copyright © Texas Education Agency, All rights reserved.
Glencoe Introduction to Multimedia Chapter 8 Audio 1 Section 8.1 Audio in Multimedia Audio plays many roles in multimedia. Effective use in multimedia.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
2014 Development of a Text-to-Speech Synthesis System for Yorùbá Language Olúòkun Adédayọ̀ Tolulope Department of Computer Science.
Using Speech Recognition to Predict VoIP Quality
SPEECH TECHNOLOGY An Overview Gopala Krishna. A
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
ARTIFICIAL NEURAL NETWORKS
Spoken Digit Recognition
Course Projects Speech Recognition Spring 1386
Computer, Communication and Information Sciences
Term Project Presentation By: Keerthi C Nagaraj Dated: 30th April 2003
What is Pattern Recognition?
Creating Transcripts of Your Narrated PowerPoints Richard Oliver Department of Information Systems 2018 Quality in Online Education Conference.
Voice source characterisation
EE513 Audio Signals and Systems
AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION
Human Speech Communication
Introduction.
Presented by Chen-Wei Liu
Music Signal Processing
Presentation transcript:

to an Automatic Speech Recognition System? Jiang Wu Electrical Engineering Department

 Say if you are only allowed to use 39 values to represent a speech seg of 1 sec long…  What are good features? ◦ Discriminative ◦ “Curse of Dimensionality”

 For each frame of pre- recorded speech, we try to extract the feature as to compress its spectrum.

 Most recent research has shown that spectral trajectories, over time, also play an important role in ASR.  Thus, we also want to let computers see what happens over time, about the center of each static feature.

 Pitch Contour : ◦ Strongly related to “tones” ◦ Very popular feature type for tonal languages: Mandarin, Cantonese, Some of Korean dialects, etc.  “Perceptual Features: ◦ Analyze speech signal as to how human’s auditory system “perceptually” process sound ◦ Frequency resolution and time resolution both depend on frequency and time..

 Dr. Montri Karnjanadecha(Force Alignment)  Chandra Vootkuri (Ph.D.)(Landmark Theory)  Brian Wong (M.S.)(Freq. Non-linearity)  Andrew Hwang(M.S.)(Feature Transform)  Project 1: To create an open source multi- language audio database for spoken language processing applications.  Project 2: To understand tonal languages.

◦ Signal processing (A/D) ◦ Probability theory, pattern recognition and machine learning ◦ Understanding of human auditory system/ linguistic/musicality will be a bonus!

◦ Pronunciation therapy ◦ Singing voice processing + ◦ Hearing aids Tons of interesting applications!!

◦ Not just the Microsoft speech-text software on your PC..

Questions?