Speech Processing AEGIS RET All-Hands Meeting University of Central Florida July 20, 2012 Applications of Images and Signals in High Schools.

Slides:

Advertisements

Similar presentations

Speech Processing AEGIS RET All-Hands Meeting University of Central Florida July 20, 2012 Applications of Images and Signals in High Schools.

Advertisements

Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),

Easily extensible unix software for spectral analysis, display modification, and synthesis of musical sounds James W. Beauchamp School of Music Dept.

CMPS1371 Introduction to Computing for Engineers PROCESSING SOUNDS.

Han Q Le© ECE 3336 Introduction to Circuits & Electronics Lecture Set #10 Signal Analysis & Processing – Frequency Response & Filters Dr. Han Le ECE Dept.

SIMS-201 Characteristics of Audio Signals Sampling of Audio Signals Introduction to Audio Information.

Introduction The aim the project is to analyse non real time EEG (Electroencephalogram) signal using different mathematical models in Matlab to predict.

IT-101 Section 001 Lecture #8 Introduction to Information Technology.

Introduction to Signals and Systems David W. Graham EE 327.

Pitch Recognition with Wavelets Final Presentation by Stephen Geiger.

Introduction to Matlab II EE 2303 Lab. Basic Matlab Review Data file input/output string, char, double, struct  Types of variables load, save  directory/workspace.

A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

1 The Mathematics of Signal Processing - an Innovative Approach Peter Driessen Faculty of Engineering University of Victoria.

Multi-Resolution Analysis (MRA)

Digital signal Processing Digital signal Processing ECI Semester /2004 Telecommunication and Internet Engineering, School of Engineering, South.

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.

CELLULAR COMMUNICATIONS DSP Intro. Signals: quantization and sampling.

Basics of Signal Processing. frequency = 1/T  speed of sound × T, where T is a period sine wave period (frequency) amplitude phase.

Representing Acoustic Information

Graphic Equalizer Table By Jose Lerma. Main Idea The main idea of this table is to display the frequencies of any sound or audio input, either by microphone.

You Want It! We Got It! Anthony Salcedo The Mott Hall School Business Math.

GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.

Basics of Signal Processing. SIGNALSOURCE RECEIVER describe waves in terms of their significant features understand the way the waves originate effect.

Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.

Lecture 9 Fourier Transforms Remember homework 1 for submission 31/10/08 Remember Phils Problems and your notes.

Jacob Zurasky ECE5526 – Spring 2011

Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.

1 Prof. Nizamettin AYDIN Digital Signal Processing.

Digital Image Processing Chapter 4 Image Enhancement in the Frequency Domain Part I.

Signals and Systems 1 Lecture 1 Dr. Ali. A. Jalali August 19, 2002.

Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.

Pre-Class Music Paul Lansky Six Fantasies on a Poem by Thomas Campion.

Module Overview. Aims apply your programming skills to an applied study of Digital Image Processing, Digital Signal Processing and Neural Networks investigate.

7- 1 Chapter 7: Fourier Analysis Fourier analysis = Series + Transform ◎ Fourier Series -- A periodic (T) function f(x) can be written as the sum of sines.

NSF/STEER Program California State University, Los Angeles Summer 2003 Digital Signal Processing Laboratory Mentored by Dr. Jeffrey Y. Beyon Presented.

Instructor: Mian Shahzad Iqbal

Fourier and Wavelet Transformations Michael J. Watts

By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.

CS 376b Introduction to Computer Vision 03 / 17 / 2008 Instructor: Michael Eckmann.

The Discrete Fourier Transform

Speech Processing Using HTK Trevor Bowden 12/08/2008.

Time Compression/Expansion Independent of Pitch. Listening Dies Irae from Requiem, by Michel Chion (1973)

Copyright ©2010, ©1999, ©1989 by Pearson Education, Inc. All rights reserved. Discrete-Time Signal Processing, Third Edition Alan V. Oppenheim Ronald W.

The Frequency Domain Digital Image Processing – Chapter 8.

Bryant Tober. Problem Description  View the sound wave produced from a wav file  Apply different modulations to the wave file  Hear the effect of the.

بسم الله الرحمن الرحيم Lecture (1) Introduction to DSP Dr. Iman Abuel Maaly University of Khartoum Department of Electrical and Electronic Engineering.

DSP First, 2/e Lecture 6 Periodic Signals, Harmonics & Time-Varying Sinusoids.

Data statistics and transformation revision Michael J. Watts

Speech Processing Dr. Veton Këpuska, FIT Jacob Zurasky, FIT.

Lecture 6 Periodic Signals, Harmonics & Time-Varying Sinusoids

ECE3340 Review of Numerical Methods for Fourier and Laplace Transform Applications – Part 1 Fourier Spring 2016 Prof. Han Q. Le Note: PPT file is the.

Speech Processing AEGIS RET All-Hands Meeting

Ch. 2 : Preprocessing of audio signals in time and frequency domain

Fourier Series Prof. Brian L. Evans

CS 591 S1 – Computational Audio – Spring 2017

CS 591 S1 – Computational Audio

Spectrum Analysis and Processing

CS 591 S1 – Computational Audio

Speech Processing AEGIS RET All-Hands Meeting

ARTIFICIAL NEURAL NETWORKS

Speech Processing AEGIS RET All-Hands Meeting

Spoken Digit Recognition

Duy dang, Robert kern, esteban kleckner

Fourier and Wavelet Transformations

Kocaeli University Introduction to Engineering Applications

LECTURE 18: FAST FOURIER TRANSFORM

Richard M. Stern demo January 12, 2009

ECE 791 Project Proposal Project Title: Developing and Evaluating a Tool for Converting MP3 Audio Files to Staff Music Project Team: Salvatore DeVito.

LECTURE 18: FAST FOURIER TRANSFORM

Presentation transcript:

Speech Processing AEGIS RET All-Hands Meeting University of Central Florida July 20, 2012 Applications of Images and Signals in High Schools

Contributors Dr. Veton Këpuska, Faculty Mentor, FIT Jacob Zurasky, Graduate Student Mentor, FIT Becky Dowell, RET Teacher, BPS Titusville High

Speech Processing Project Speech recognition requires speech to first be characterized by a set of “features” Features are used to determine what words are spoken. Our project implements the feature extraction stage of a speech processing application.

Timeline 1874: Alexander Graham Bell proves frequency harmonics from electrical signal can be divided 1952: Bell Labs develops first effective speech recognizer DARPA: speech should be understood, not just recognized 1980’s: Call center and text-to-speech products commercially available 1990’s: PC processing power allows use of SR software by ordinary user Timeline of Speech Recognition.

Applications Call center speech recognition Speech-to-text applications (e.g. dictation software) Hands-free user-interface (e.g., OnStar, XBOX Kinect, Siri) – Science Fiction 1968: Stanley Kubrick’s 2001: A Space Odyssey – Science Fact 2011: Apple iPhone 4S Siri Medical Applications – Parkinson’s Voice Initiative – Detection of Sleep Disorders

Difficulties Continuous Speech (word boundaries) Noise – Background – Other speakers Differences in speakers – Dialects/Accents – Male/female

Speech Recognition Front End: Pre-processing Back End: Recognition Speech Recognized speech Large amount of data. Ex: 256 samples Features Reduced data size. Ex: 13 features Front End – reduce amount of data for back end, but keep enough data to accurately describe the signal. Output is feature vector. 256 samples > 13 features Back End - statistical models used to classify feature vectors as a certain sound in speech

Front-End Processing of Speech Recognizer Pre- emphasis High pass filter to compensate for higher frequency roll off in human speech

Front-End Processing of Speech Recognizer Pre- emphasis Window High pass filter to compensate for higher frequency roll off in human speech Separate speech signal into frames Apply window to smooth edges of framed speech signal

Front-End Processing of Speech Recognizer Pre- emphasis Window FFT High pass filter to compensate for higher frequency roll off in human speech Separate speech signal into frames Apply window to smooth edges of framed speech signal Transform signal from time domain to frequency domain Human ear perceives sound based on frequency content

Front-End Processing of Speech Recognizer Pre- emphasis Window FFT Mel-Scale High pass filter to compensate for higher frequency roll off in human speech Separate speech signal into frames Apply window to smooth edges of framed speech signal Transform signal from time domain to frequency domain Human ear perceives sound based on frequency content Convert linear scale frequency (Hz) to logarithmic scale (mel-scale)

Front-End Processing of Speech Recognizer Pre- emphasis Window FFT Mel-Scale log High pass filter to compensate for higher frequency roll off in human speech Separate speech signal into frames Apply window to smooth edges of framed speech signal Transform signal from time domain to frequency domain Human ear perceives sound based on frequency content Convert linear scale frequency (Hz) to logarithmic scale (mel-scale) Take the log of the magnitudes (multiplication becomes addition) to allow separation of signals

Front-End Processing of Speech Recognizer Pre- emphasis Window FFT Mel-Scale log IFFT High pass filter to compensate for higher frequency roll off in human speech Separate speech signal into frames Apply window to smooth edges of framed speech signal Transform signal from time domain to frequency domain Human ear perceives sound based on frequency content Convert linear scale frequency (Hz) to logarithmic scale (mel-scale) Take the log of the magnitudes (multiplication becomes addition) to allow separation of signals Inverse of FFT to transform to Cepstral Domain… the result is the set of “features”

Speech Analysis and Sound Effects (SASE) Project Graphical User Interface (GUI) Speech input – Record and save audio – Read sound file (*.wav, *.ulaw, *.au) Graphs the entire audio signal Process user selected speech frame and display output for each stage of processing Displays spectrogram Apply audio effects

MATLAB Code Graphical User Interface (GUI) – GUIDE (GUI Development Environment) – Callback functions Front-end speech processing – Modular functions for reusability – Graphs display output for each stage Sound Effects – Echo, Reverb, Flange, Chorus, Vibrato, Tremolo, Voice Changer

GUI Components

Plotting Axes

GUI Components Plotting Axes Buttons

SASE Lab Demo Record, play, save audio to file, open existing audio files Select and process speech frame, display graphs of stages of front-end processing Display spectrogram for entire speech signal or user selectable 3 second sample Play speech – all or selected 3 sec sample Show differences in certain sounds in spectrogram and the features ex: “a e i o u” so audience understands how these graphs tell us about the sounds Apply sound effects, show user configurable parameters Graphs spectrogram and speech processing on sound effects – Show echo effect in spectrogram Use as teaching tool

Future Work on SASE Lab Audio Effects – Ex: Pitch removal Noise Filtering

Applications of Signal Processing in High Schools Convey the relevance and importance of math to high school students Bring knowledge of engineering, technological innovation, and academic research into high school classrooms Opportunity for students to acquire technical knowledge and analytical skills through hands-on exploration of real-world applications in the field of Signal Processing Encourage students to pursue higher education and careers in STEM fields

Unit Plan: Speech Processing Collection of lesson plans introduce high school students to fundamentals of speech and sound processing Connections to Pre-Calculus mathematics standards (NGSSS and Common Core) – Mathematical Modeling – Trigonometric Functions – Complex Numbers in Rectangular and Polar Form – Function Operations – Logarithmic Functions – Sequences and Series – Matrices Hand-on lessons involving MATLAB projects Teacher notes

Unit Introduction Students research, explore, and discuss current applications of speech and audio processing

Lesson 1: The Sound of a Sine Wave Modeling sound as a sinusoidal function Concepts covered: – Continuous vs. Discrete Functions – Frequency of Sine Wave – Composite signals Connections to real-world applications: – Synthesis of digital speech and music

Lesson 1: The Sound of a Sine Wave Student MATLAB Project – Create discrete sine waves with given frequencies – Create composite signal of the sine waves – Plot graphs and play sounds of the sine waves – Analyze the effect of frequency on the graphs and the sounds of the sine functions Project Extensions – Play songs using sine waves – Synthesize vowel sounds with sine waves

Lesson 2: Frequency Analysis Use of Fourier Transformation to transform functions from time domain to frequency domain Concepts covered: – Modeling harmonic signals as a series of sinusoids – Sine wave decomposition – Fourier Transform – Euler’s Formula – Frequency spectrum Connections to real-world applications: – Speech processing and recognition

Lesson 2: Frequency Analysis Student MATLAB Project – Create a composite signal with the sum of harmonic sine waves – Plot graphs and play sounds of the sine waves – Compute the FFT of the composite signal – Plot and analyze the frequency spectrum

Lesson 3: Sound Effects Concepts covered: Connections to real-world applications: – Digital music effects and speech sound effects

Lesson 3: Sound Effects Student MATLAB Project

Unit Conclusion Student presentation and report or poster – Summarize and reflect on lessons – Ask research questions – Develop new ideas for applications of speech processing

References Ingle, Vinay K., and John G. Proakis. Digital signal processing using MATLAB. 2nd ed. Toronto, Ont.: Nelson, Oppenheim, Alan V., and Ronald W. Schafer. Discrete-time signal processing. 3rd ed. Upper Saddle River: Pearson, Weeks, Michael. Digital signal processing using MATLAB and wavelets. Hingham,Mass.: Infinity Science Press, Timeline of Speech Recognition.

AEGIS website: Lesson plans available for download ????? Contacts: – Becky Dowell, – Dr. Veton Këpuska, – Jacob Zurasky, AEGIS Project

Thank you! Questions?