Speech Processing AEGIS RET All-Hands Meeting

Slides:



Advertisements
Similar presentations
ECE 8443 – Pattern Recognition EE 3512 – Signals: Continuous and Discrete Objectives: Response to a Sinusoidal Input Frequency Analysis of an RC Circuit.
Advertisements

Fourier Transforms and Their Use in Data Compression
Speech Processing AEGIS RET All-Hands Meeting University of Central Florida July 20, 2012 Applications of Images and Signals in High Schools.
Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
Easily extensible unix software for spectral analysis, display modification, and synthesis of musical sounds James W. Beauchamp School of Music Dept.
DFT/FFT and Wavelets ● Additive Synthesis demonstration (wave addition) ● Standard Definitions ● Computing the DFT and FFT ● Sine and cosine wave multiplication.
Speech Processing AEGIS RET All-Hands Meeting University of Central Florida July 20, 2012 Applications of Images and Signals in High Schools.
Han Q Le© ECE 3336 Introduction to Circuits & Electronics Lecture Set #10 Signal Analysis & Processing – Frequency Response & Filters Dr. Han Le ECE Dept.
SIMS-201 Characteristics of Audio Signals Sampling of Audio Signals Introduction to Audio Information.
Introduction The aim the project is to analyse non real time EEG (Electroencephalogram) signal using different mathematical models in Matlab to predict.
Pitch Recognition with Wavelets Final Presentation by Stephen Geiger.
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
Multi-Resolution Analysis (MRA)
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
Unit 7 Fourier, DFT, and FFT 1. Time and Frequency Representation The most common representation of signals and waveforms is in the time domain Most signal.
CELLULAR COMMUNICATIONS DSP Intro. Signals: quantization and sampling.
Joshua “Rock Star” Jenkins Jeff “Tremolo” Smith Jairo “the boss” Rojas
Representing Acoustic Information
Ni.com Data Analysis: Time and Frequency Domain. ni.com Typical Data Acquisition System.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
Basics of Signal Processing. SIGNALSOURCE RECEIVER describe waves in terms of their significant features understand the way the waves originate effect.
Motivation Music as a combination of sounds at different frequencies
Fourier series. The frequency domain It is sometimes preferable to work in the frequency domain rather than time –Some mathematical operations are easier.
Copyright ©2010, ©1999, ©1989 by Pearson Education, Inc. All rights reserved. Discrete-Time Signal Processing, Third Edition Alan V. Oppenheim Ronald W.
Jacob Zurasky ECE5526 – Spring 2011
Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.
1 Prof. Nizamettin AYDIN Digital Signal Processing.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Digital Image Processing Chapter 4 Image Enhancement in the Frequency Domain Part I.
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Fourier Transform.
Fourier and Wavelet Transformations Michael J. Watts
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
CS 376b Introduction to Computer Vision 03 / 17 / 2008 Instructor: Michael Eckmann.
The Discrete Fourier Transform
Time Compression/Expansion Independent of Pitch. Listening Dies Irae from Requiem, by Michel Chion (1973)
Bryant Tober. Problem Description  View the sound wave produced from a wav file  Apply different modulations to the wave file  Hear the effect of the.
بسم الله الرحمن الرحيم Lecture (1) Introduction to DSP Dr. Iman Abuel Maaly University of Khartoum Department of Electrical and Electronic Engineering.
DSP First, 2/e Lecture 6 Periodic Signals, Harmonics & Time-Varying Sinusoids.
Speech Processing Dr. Veton Këpuska, FIT Jacob Zurasky, FIT.
Lecture 19 Spectrogram: Spectral Analysis via DFT & DTFT
Instructor: Mian Shahzad Iqbal
CS 591 S1 – Computational Audio -- Spring, 2017
Lecture 6 Periodic Signals, Harmonics & Time-Varying Sinusoids
Topic: Waveforms in Noesis
Ch. 2 : Preprocessing of audio signals in time and frequency domain
Fourier Series Prof. Brian L. Evans
CS 591 S1 – Computational Audio
Spectrum Analysis and Processing
CS 591 S1 – Computational Audio
Speech Processing AEGIS RET All-Hands Meeting
ARTIFICIAL NEURAL NETWORKS
Speech Processing AEGIS RET All-Hands Meeting
Lecture 12 Linearity & Time-Invariance Convolution
Sampling and Reconstruction
Fourier and Wavelet Transformations
Intro to Fourier Series
Linear Predictive Coding Methods
Lecture 13 Frequency Response of FIR Filters
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
LECTURE 18: FAST FOURIER TRANSFORM
Signal Processing First
Richard M. Stern demo January 12, 2009
Digital Systems: Hardware Organization and Design
ECE 791 Project Proposal Project Title: Developing and Evaluating a Tool for Converting MP3 Audio Files to Staff Music Project Team: Salvatore DeVito.
ENEE222 Elements of Discrete Signal Analysis Lab 9 1.
LECTURE 18: FAST FOURIER TRANSFORM
Geol 491: Spectral Analysis
Presentation transcript:

Speech Processing AEGIS RET All-Hands Meeting Applications of Images and Signals in High Schools AEGIS RET All-Hands Meeting University of Central Florida July 20, 2012

Contributors Dr. Veton Këpuska, Faculty Mentor, FIT vkepuska@fit.edu Jacob Zurasky, Graduate Student Mentor, FIT jzuraksy@my.fit.edu Becky Dowell, RET Teacher, BPS Titusville High dowell.jeanie@brevardschools.org

Speech Processing Project Speech recognition requires speech to first be characterized by a set of “features” Features are used to determine what words are spoken. Our project implements the feature extraction stage of a speech processing application.

Applications Call center speech recognition Speech-to-text applications Dictation software Visual voice mail Hands-free user-interface Siri http://www.apple.com/iphone/features/siri.html OnStar XBOX Kinect Medical Applications Parkinson’s Voice Initiative

Difficulties Differences in speakers Dialects/Accents Male/female Continuous Speech (word boundaries) Noise Background Other speakers

Speech Recognition Front End: Pre-processing Back End: Recognition Speech Recognized speech Large amount of data. Ex: 256 samples Features Reduced data size. Ex: 13 features Front End – reduce amount of data for back end, but keep enough data to accurately describe the signal. Output is feature vector. 256 samples ------> 13 features Back End - statistical models used to classify feature vectors as a certain sound in speech

Front-End Processing of Speech Recognizer High pass filter to compensate for higher frequency roll off in human speech Pre-emphasis

Front-End Processing of Speech Recognizer High pass filter to compensate for higher frequency roll off in human speech Pre-emphasis Window Separate speech signal into frames Apply window to smooth edges of framed speech signal

Front-End Processing of Speech Recognizer High pass filter to compensate for higher frequency roll off in human speech Pre-emphasis Window FFT Separate speech signal into frames Apply window to smooth edges of framed speech signal Transform signal from time domain to frequency domain Human ear perceives sound based on frequency content

Front-End Processing of Speech Recognizer High pass filter to compensate for higher frequency roll off in human speech Pre-emphasis Window FFT Mel-Scale Separate speech signal into frames Apply window to smooth edges of framed speech signal Transform signal from time domain to frequency domain Human ear perceives sound based on frequency content Convert linear scale frequency (Hz) to logarithmic scale (mel-scale)

Front-End Processing of Speech Recognizer High pass filter to compensate for higher frequency roll off in human speech Pre-emphasis Window FFT Mel-Scale log Separate speech signal into frames Apply window to smooth edges of framed speech signal Transform signal from time domain to frequency domain Human ear perceives sound based on frequency content Convert linear scale frequency (Hz) to logarithmic scale (mel-scale) Take the log of the magnitudes (multiplication becomes addition) to allow separation of signals

Front-End Processing of Speech Recognizer High pass filter to compensate for higher frequency roll off in human speech Pre-emphasis Window FFT Mel-Scale log IFFT Separate speech signal into frames Apply window to smooth edges of framed speech signal Transform signal from time domain to frequency domain Human ear perceives sound based on frequency content Convert linear scale frequency (Hz) to logarithmic scale (mel-scale) Cepstral domain – spectrum of frequency spectrum – fourier transform of the log of the spectrum – rate of change Take the log of the magnitudes (multiplication becomes addition) to allow separation of signals Inverse of FFT to transform to Cepstral Domain… the result is the set of “features”

Speech Analysis and Sound Effects (SASE) Project Implements front-end pre-processing (feature extraction) Graphical User Interface (GUI) Speech input Record and save audio Read sound file (*.wav, *.ulaw, *.au) Graphs the entire audio signal Processes user selected speech frame and displays graphs of output for each stage Displays spectrogram on entire signal and user selected 3-second segment Modifies speech with user-configurable audio effects

MATLAB Code Graphical User Interface (GUI) Front-end speech processing GUIDE (GUI Development Environment) Callback functions Front-end speech processing Modular functions for reusability Graphs of output for each stage Sound Effects Echo, Reverb, Flange, Chorus, Vibrato, Tremolo, Voice Changer

Buttons GUI Components Plotting Axes

SASE Lab Demo Record, play, save audio to file, open existing audio files Select and process speech frame, display graphs of stages of front-end processing Display spectrogram for entire speech signal or user selectable 3 second sample Play speech – all or selected 3 sec sample Apply sound effects, show user configurable parameters Graphs spectrogram and speech processing on sound effects Points to include in demo of project – don’t actually use this slide

Points to include in demo of project – don’t actually use this slide

SASE Lab

Applications of Signal Processing in High Schools Convey the relevance and importance of math to high school students Bring knowledge of technological innovation and academic research into high school classrooms Provide opportunity for students to acquire technical knowledge and analytical skills through hands-on exploration of real-world applications in the field of Signal Processing Encourage students to pursue higher education and careers in STEM fields From AEGIS website – need reference?

Unit Plan: Speech Processing Collection of lesson plans introduce high school students to fundamentals of speech and sound processing Cohesive unit of four lessons The Sound of a Sine Wave Frequency Analysis Sound Effects SASE Lab Hand-on lessons Teacher notes MATLAB projects

Unit Plan: Speech Processing Connections to Pre-Calculus Course Mathematical Modeling Trigonometric Functions Complex Numbers in Rectangular and Polar Form Function Operations Logarithmic Functions Sequences and Series NGSSS and Common Core Mathematics Standards

Unit Introduction Students research, explore, and discuss current applications of speech and audio processing

Lesson 1: The Sound of a Sine Wave Modeling sound as a sinusoidal function Continuous vs. Discrete Functions Frequency of Sine Wave Composite signals Connections to real-world applications: Synthesis of digital speech and music

Lesson 1: The Sound of a Sine Wave Student MATLAB Project Create discrete sine waves with given frequencies Create composite signal of the sine waves Plot graphs and play sounds of the sine waves Analyze the effect of frequency and amplitude on the graphs and the sounds of the sine functions

Lesson 1: The Sound of a Sine Wave % plays C4, C5, C6 - frequencies double between octave % sine_sound_sample(8000, 261.626, 523.251, 1046.500, 1); Student analysis - How does graph change as frequency increases - How does sound change as frequency increases - Use geometric sequence to find frequencies of musical notes - Use of logarithmic scale to represent frequencies of piano notes

Lesson 1: The Sound of a Sine Wave Project Extension – Music Notes % twinkle twinkle little star % music = 'C4Q C4Q G4Q G4Q A4Q A4Q G4H '; % super mario bros % music = 'FS4+EN5,Q E4,Q E4,Q RR,Q E4,Q RR,Q C4,Q E4,Q RR,Q G4,Q';

Lesson 1: The Sound of a Sine Wave Project Extension – Vowel Sounds Vowel sounds characterized by lower three formants aa “Bob” aa_m = struct('F1', 750, 'F2', 1150, 'F3', 2400, 'Duration', 215, 'W1', 1, 'W2', 1, 'W3', 1); iy “Beat” iy_m = struct('F1', 340, 'F2', 2250, 'F3', 3000, 'Duration', 196, 'W1', 1, 'W2', 30, 'W3', 30); Formants – main three frequencies In isolation, can be difficult to distinguish without rest of the word, simple implementation F1, F2, F3 are frequencies W1, W2, W3 are weightings (amplitude) for each one Duration is length of time

Lesson 2: Frequency Analysis Use of Fourier Transformation to transform functions from time domain to frequency domain Modeling harmonic signals as a series of sinusoids Sine wave decomposition Fourier Transform Euler’s Formula Frequency spectrum Connections to real-world applications: Speech processing and recognition

Lesson 2: Frequency Analysis Student MATLAB Project Create a composite signal with the sum of harmonic sine waves Plot graphs and play sounds of the sine waves Compute the FFT of the composite signal Plot and analyze the frequency spectrum

Lesson 2: Frequency Analysis % create five harmonic signals with fundamental frequency 262 % square_wave(8000, 262, 1, 1024); Student analysis Write the composite signal in sigma notation How does increasing the number of harmonic sine waves affect the composite signal Use data cursor to read the frequencies in the frequency spectrum and compare with original signal How does increasing/decreasing fft size affect the frequency spectrum

Lesson 3: Sound Effects Time-delay based sound effects Discrete functions Time-delay functions Function operations Connections to real-world applications: Digital music effects and speech sound effects

Lesson 3: Sound Effects Student MATLAB Project Read a *.wav file Use a delay function to modify the signal with an echo sound effect Plot graphs and play sounds of the signals Analyze the effect of changing parameters on the graphs and the sounds of the functions

Lesson 3: Sound Effects Delay time of echo depends on distance to reflection surface Volume of echo depends the reflection surface Reflection coefficient α

Lesson 3: Sound Effects Block diagram of echo effect Output signal = input signal + reflection coefficient * delayed version of input signal y[n] = x[n] + α*x[n-D]

Lesson 3: Sound Effects % echo at 50 m with reflection coefficient = 0.5 % echo_effect('becky.wav', 50, 0.5); Student analysis Write the composite signal in sigma notation How does increasing the number of harmonic sine waves affect the composite signal Use data cursor to read the frequencies in the frequency spectrum and compare with original signal How does increasing/decreasing fft size affect the frequency spectrum

Lesson 4: SASE Lab Guided inquiry of SASE Lab program Experiment with different sound inputs Analyze spectrogram Make connections to previous lessons

Unit Conclusion Students summarize and reflect on lessons in a presentation and report/poster

References Ingle, Vinay K., and John G. Proakis. Digital signal processing using MATLAB. 2nd ed. Toronto, Ont.: Nelson, 2007. Oppenheim, Alan V., and Ronald W. Schafer. Discrete-time signal processing. 3rd ed. Upper Saddle River: Pearson, 2010. Weeks, Michael. Digital signal processing using MATLAB and wavelets. Hingham,Mass.: Infinity Science Press, 2007.

Questions? Thank you! AEGIS website: http://research2.fit.edu/aegis-ret/ Contacts: Becky Dowell, dowell.jeanie@brevardschools.org Dr. Veton Këpuska, vkepuska@fit.edu Jacob Zurasky, jzuraksy@my.fit.edu