ECE5526 HW#1 By Clay McCreary.

Slides:



Advertisements
Similar presentations
Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.
Advertisements

Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.
An Algorithm for Determining the Endpoints for Isolated Utterances L.R. Rabiner and M.R. Sambur The Bell System Technical Journal, Vol. 54, No. 2, Feb.
Human Speech Recognition Julia Hirschberg CS4706 (thanks to John-Paul Hosum for some slides)
Speech Enhancement through Noise Reduction By Yating & Kundan.
Lecture XXIII.  In general there are two kinds of hypotheses: one concerns the form of the probability distribution (i.e. is the random variable normally.
Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
6/3/20151 Voice Transformation : Speech Morphing Gidon Porat and Yizhar Lavner SIPL – Technion IIT December
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
Chapter 1: Introduction to Pattern Recognition
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Detecting missrecognitions Predicting with prosody.
Communications & Multimedia Signal Processing Formant Tracking LP with Harmonic Plus Noise Model of Excitation for Speech Enhancement Qin Yan Communication.
How to Debug Debugging Detectives Debugging Desperados I GIVE UP! MyClass.java.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
LE 460 L Acoustics and Experimental Phonetics L-13
Psy B07 Chapter 8Slide 1 POWER. Psy B07 Chapter 8Slide 2 Chapter 4 flashback  Type I error is the probability of rejecting the null hypothesis when it.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Example Clustered Transformations MAP Adaptation Resources: ECE 7000:
Speech Signal Processing
Multimedia Specification Design and Production 2013 / Semester 2 / week 3 Lecturer: Dr. Nikos Gazepidis
1 VIU Seminar April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech.
Utterance Verification for Spontaneous Mandarin Speech Keyword Spotting Liu Xin, BinXi Wang Presenter: Kai-Wun Shih No.306, P.O. Box 1001,ZhengZhou,450002,
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Cisco Public © 2012 Cisco and/or its affiliates. All rights reserved. 1.
Matlab Matlab Sigmoid Sigmoid Perceptron Perceptron Linear Linear Training Training Small, Round Blue-Cell Tumor Classification Example Small, Round Blue-Cell.
Speech Enhancement Using Spectral Subtraction
Incorporating Dynamic Time Warping (DTW) in the SeqRec.m File Presented by: Clay McCreary, MSEE.
Speech Perception 4/4/00.
AGA 4/28/ NIST LID Evaluation On Use of Temporal Dynamics of Speech for Language Identification Andre Adami Pavel Matejka Petr Schwarz Hynek Hermansky.
Turn-taking Discourse and Dialogue CS 359 November 6, 2001.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
English Phonetics 许德华 许德华. Objectives of the Course This course is intended to help the students to improve their English pronunciation, including such.
1 Prosody-Based Automatic Segmentation of Speech into Sentences and Topics Elizabeth Shriberg Andreas Stolcke Speech Technology and Research Laboratory.
Pitch Estimation by Enhanced Super Resolution determinator By Sunya Santananchai Chia-Ho Ling.
1 Broadcast News Segmentation using Metadata and Speech-To-Text Information to Improve Speech Recognition Sebastien Coquoz, Swiss Federal Institute of.
Article Summary of The Structural Complexity of Software: An Experimental Test By Darcy, Kemerer, Slaughter and Tomayko In IEEE Transactions of Software.
Speech Recognition Raymond Sastraputera.  Introduction  Frame/Buffer  Algorithm  Silent Detector  Estimate Pitch ◦ Correlation and Candidate ◦ Optimal.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
Speech recognition Home Work 1. Problem 1 Problem 2 Here in this problem, all the phonemes are detected by using phoncode.doc There are several phonetics.
STILL MORE 9.1. VI. CORRELATION & CAUSATION Just because there is a strong relationship, this does NOT imply cause and effect!
VB Conditionals If Then, Select Case. If Then Useful computer programs typically have to make a lot of decisions. In VB, If…Then code is used for decision.
Wake Up Word Software Development. Progress Report Compatible on OSX Streaming Live Input File Input Controls PowerPoint and Keynote.
Speech Recognition with Matlab ® Neil E. Cotter ECE Department UNIVERSITY OF UTAH
My Wonderful World of Stuff This is a sample slide upload.
Yow-Bang Wang, Lin-Shan Lee INTERSPEECH 2010 Speaker: Hsiao-Tsung Hung.
Review of Hypothesis Testing: –see Figures 7.3 & 7.4 on page 239 for an important issue in testing the hypothesis that  =20. There are two types of error.
Linguistic knowledge for Speech recognition
Improving Chinese handwriting Recognition by Fusing speech recognition
Vocoders.
Speech Analysis TA:Chuan-Hsun Wu
Pattern Recognition Sergios Theodoridis Konstantinos Koutroumbas
Copyright © American Speech-Language-Hearing Association
Sample Presentation. Slide 1 Info Slide 2 Info.
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Recognizing Structure: Sentence, Speaker, andTopic Segmentation
Linear Predictive Coding Methods
Automatic Speaker Identification Using Sentinel Word Discrimination
Investigation of Prosodic Features for Wake-Up-Word Speech Recognition Task by Chih-Ti Shih Good morning everyone, my name is Chih-T. I am a computer engineering.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Magnification algorithms applied to videos
Speech Perception (acoustic cues)
OMGT LECTURE 10: Elements of Hypothesis Testing
Sound and Matlab® Neil E. Cotter ECE Department
Research on the Modeling of Chinese Continuous Speech Recognition
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Spontaneous Voice Driven Interaction with Avatars: Discriminating Alerting and Referential Contexts of Sentinel Words The Problem The Solution e-WUW:
Sound and Matlab® Neil E. Cotter ECE Department
Speech Communications
Automatic Prosodic Event Detection
Presentation transcript:

ECE5526 HW#1 By Clay McCreary

Problem 1 See the following slides for the plots Words from lecture slides: Pg 18 Messages Pg 27 Test shot Pg 31 Fisherman Pg 36 Numerals Pg 40 Summit

Problem 2

Problem 2 Example

Problem #3 Used the Pitch.m code to integrate pitch information into specgram_nist.m Included pitch information for every phoneme Pitch.m does not seem to perform well if there is a period of silence during the phoneme sample period (ie. unvoiced plosive) Improving the capabilities of the Pitch.m code is beyond the scope of this project, but will need to be improved

Problem #3 MATLAB Code

Problem #3 Example

Problem #3 Hypothesis From “Prosodic Modeling for Improved Speech Recognition and Understanding” by Wang, there are three elements to prosodic modeling: Pitch Duration Energy

Problem #3 Hypothesis cont. The pitch information incorporated into specgram_nist.m will allow for some prosodic context in the decision making algorithm. However, pitch, by itself, is susceptible to errors and is considered noisy and unreliable. To enhance the decision making capability of the WUW recognizer, duration and energy should also be considered.

Problem #3 Hypothesis cont. From my experience, I suggest that a typical WUW would contain the following characteristics: Preceded by a short pause First syllable would be higher pitch Short duration High energy I hypothesize that incorporating these factors into the WUW recognizer would improve it’s performance