ASSESS: a descriptive scheme for speech in databases Roddy Cowie.

Slides:

Advertisements

Similar presentations

Change-Point Detection Techniques for Piecewise Locally Stationary Time Series Michael Last National Institute of Statistical Sciences Talk for Midyear.

Advertisements

Introduction to modelling extremes

Belfast Naturalistic Database

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Periodograms Bartlett Windows Data Windowing Blackman-Tukey Resources:

November 12, 2013Computer Vision Lecture 12: Texture 1Signature Another popular method of representing shape is called the signature. In order to compute.

The Implementation of the Cornell Ionospheric Scintillation Model into the Spirent GNSS Simulator Marcio Aquino, Zeynep Elmas,

Editing and Imputing VAT Data for the Purpose of Producing Mixed- Source Turnover Estimates Hannah Finselbach and Daniel Lewis Office for National Statistics,

Challenge the future Delft University of Technology Blade Load Estimations by a Load Database for an Implementation in SCADA Systems Master Thesis.

Computer Vision Lecture 16: Texture

Major Scale Construction. Properties of a Major Scale Let’s look at a C Major scale What do you notice about this sequence of pitches?

Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.

VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.

3.11 Adaptive Data Assimilation to Include Spatially Variable Observation Error Statistics Rod Frehlich University of Colorado, Boulder and RAL/NCAR Funded.

Scientific Programming MAIN INPUTINITCOMPUTEOUTPUT SOLVER DERIV FUNC2 TABUL FUNC1 STATIC BLASLAPACKMEMLIB.

EE 7730 Image Segmentation.

LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.

Texture Turk, 91.

Digital Communications I: Modulation and Coding Course Term 3 – 2008 Catharina Logothetis Lecture 2.

Introduction to Wavelets

CS292 Computational Vision and Language Visual Features - Colour and Texture.

Systems: Definition Filter

Spread Spectrum Techniques

EE Audio Signals and Systems Psychoacoustics (Masking) Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

The Project AH Computing. Functional Requirements  What the product must do!  Examples attractive welcome screen all options available as clickable.

COMP An Introduction to Computer Programming : University of the West Indies COMP6015 An Introduction to Computer Programming Lecture 02.

Modulation, Demodulation and Coding Course Period Sorour Falahati Lecture 2.

Introduction to Interactive Media 10: Audio in Interactive Digital Media.

Computer vision.

Lecture 1. References In no particular order Modern Digital and Analog Communication Systems, B. P. Lathi, 3 rd edition, 1998 Communication Systems Engineering,

Data Processing Functions CSC508 Techniques in Signal/Data Processing.

Multimodal Interaction Dr. Mike Spann

By Sarita Jondhale1 Pattern Comparison Techniques.

CS 179: GPU Computing Lecture 3 / Homework 1. Recap Adding two arrays… a close look – Memory: Separate memory space, cudaMalloc(), cudaMemcpy(), … – Processing:

Time Series Spectral Representation Z(t) = {Z 1, Z 2, Z 3, … Z n } Any mathematical function has a representation in terms of sin and cos functions.

Wireless and Mobile Computing Transmission Fundamentals Lecture 2.

Discrete Images (Chapter 7) Fourier Transform on discrete and bounded domains. Given an image: 1.Zero boundary condition 2.Periodic boundary condition.

Signal Encoding Techniques. Lecture Learning Outcomes Be able to understand, appreciate and differentiate the different signal encoding criteria available.

1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:

ECE 4710: Lecture #6 1 Bandlimited Signals  Bandlimited waveforms have non-zero spectral components only within a finite frequency range  Waveform is.

Similarity Matrix Processing for Music Structure Analysis Yu Shiu, Hong Jeng C.-C. Jay Kuo ACM Multimedia 2006.

INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.

Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.

ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska

ECE 4710: Lecture #13 1 Bit Synchronization  Synchronization signals are clock-like signals necessary in Rx (or repeater) for detection (or regeneration)

1 Overview Importing data from generic raster files Creating surfaces from point samples Mapping contours Calculating summary attributes for polygon features.

Constellation Diagram

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.

By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.

1 Acoustic Phonetics 3/28/00. 2 Nasal Consonants Produced with nasal radiation of acoustic energy Sound energy is transmitted through the nasal cavity.

Machine Vision Edge Detection Techniques ENT 273 Lecture 6 Hema C.R.

Instructor: Mircea Nicolescu Lecture 7

HBD Transmission Monitor Update III: Noise Analysis B. Azmoun & S. Stoll BNL HBD Working Group Meeting 10/10/06.

ECE 8443 – Pattern Recognition ECE 3163 – Signals and Systems Objectives: Eigenfunctions Fourier Series of CT Signals Trigonometric Fourier Series Dirichlet.

Aggregator Stage : Definition : Aggregator classifies data rows from a single input link into groups and calculates totals or other aggregate functions.

Learning from the Past, Looking to the Future James R. (Jim) Beaty, PhD - NASA Langley Research Center Vehicle Analysis Branch, Systems Analysis & Concepts.

Bryant Tober. Problem Description  View the sound wave produced from a wav file  Apply different modulations to the wave file  Hear the effect of the.

VIDYA PRATISHTHAN’S COLLEGE OF ENGINEERING, BARAMATI.

Principios de Comunicaciones EL4005

3D Vision Interest Points.

CS 591 S1 – Computational Audio -- Spring, 2017

Signal processing.

DataLyzer® Spectrum SPC Wizard.

The SIFT (Scale Invariant Feature Transform) Detector and Descriptor

LECTURE 18: FAST FOURIER TRANSFORM

7.1 Introduction to Fourier Transforms

8.6 Autocorrelation instrument, mathematical definition, and properties autocorrelation and Fourier transforms cosine and sine waves sum of cosines Johnson.

Measuring the Similarity of Rhythmic Patterns

LECTURE 18: FAST FOURIER TRANSFORM

Presentation transcript:

ASSESS: a descriptive scheme for speech in databases Roddy Cowie

to refresh people’s memory…  ASSESS embodies an approach to processing audio element of a database  It is about going beyond the raw audio signal;  Providing processing that a lot of people might want,  But not everyone can do.

ASSESS covers several levels:  Basic transformations of the signal;  Key boundaries and the units that go with them;  Properties of the units.  the system generates a lot of files but a lot of the things you might want are there if you know where to look

The processes ASSESS uses  A reasonable model: Developed for inconsiderate inputs Developed for inconsiderate inputs Robust Robust Maximise availability Maximise availability Systematic rather than selective Systematic rather than selective

ASSESS input characteristics  Input file: Reasonably long (up to 2.5 mins) Reasonably long (up to 2.5 mins) 20kHz sampling rate 20kHz sampling rate No header (.raw, not.wav) No header (.raw, not.wav) Messy, but conversion techniques are easily available

Using ASSESS  Woefully undramatic  Supply 3 command lines eg for a file called ‘test’ lasting x secs eg for a file called ‘test’ lasting x secs filterbank test.raw test.spc 20000filterbank test.raw test.spc howard test.raw test.txhoward test.raw test.tx stage2 teststage2 test  Wait about x/2 secs  Admire outputs

Basic transformations Basic transformations and 1 st order output  Intensity  1/3 octave spectrum  ‘pulses’ corresponding to vocal cord openings - basis for estimating pitch - basis for estimating pitch  1 st order output consists of 2 files intensity & 1/3 octave spectrum intensity & 1/3 octave spectrum estimated ‘pulses’ estimated ‘pulses’  Everything else ASSESS calculates is derived from those

Conditioning 1 st order outputs ASSESS Conditioning 1 st order outputs in ASSESS  Raw intensity Scaled by parameter derived from a ‘reference’ file Scaled by parameter derived from a ‘reference’ file - representing normal speaking level under same recording conditions - representing normal speaking level under same recording conditions  Clumsy, but checks show it allows reasonable comparison across files  Same scaling applied to spectrum

Conditioning 1 st order outputs ASSESS Conditioning 1 st order outputs in ASSESS  Raw pulse estimates cleaned  by selecting sequences where intervals are very close  Results (in pink) comparable to standard autocorrelation, but easier to clean further  High noise associated with frication filtered using spectrum

Conditioning 1 st order outputs ASSESS Conditioning 1 st order outputs in ASSESS  Fitting flexible ‘rope’ filters extremes, captures broad shape  (zeroes mark pause boundaries – taken into account)

Conditioning 1 st order outputs ASSESS Conditioning 1 st order outputs in ASSESS  In contrast, standard methods try to correct for octave jumps -  with the kind of result shown in the lower panel

Boundary finding ASSESS Boundary finding in ASSESS  Silences are found iteratively find an intensity level that separates a cluster of low- intensity samples (pauses) from a cluster of high-intensity samples (speech); find an intensity level that separates a cluster of low- intensity samples (pauses) from a cluster of high-intensity samples (speech); fine-tune using the spectrum of the definite pauses. fine-tune using the spectrum of the definite pauses.  Again, robust: in a comparison sample a phonetician identified 503 pauses a phonetician identified 503 pauses ASSESS identified 498 ASSESS identified 498  difference between times of corresponding bounds averaged 10.4 ms for pause starts 10.4 ms for pause starts -1.7ms for pause ends -1.7ms for pause ends  A similar approach is applied to frication

. exm files specify  pitch and intensity contours in terms of local maxima and minima in terms of local maxima and minima and speech/silence boundaries and speech/silence boundaries  episodes with frication (boundaries & average spectra) 2 nd order output of ASSESS

Describing units – 3rd order outputs ASSESS Describing units – 3rd order outputs of ASSESS  Basic units: Pauses Pauses Tunes (structures between pauses lasting over 150ms) Tunes (structures between pauses lasting over 150ms)  Pauses have only duration  Tunes have multiple attributes, and ASSESS covers them systematically

Describing units – 3rd order outputs ASSESS Describing units – 3rd order outputs of ASSESS  Basic module of description (in.psg file) - Pattern repeated for pitch, & for each tune

Describing units – structural properties  Tune properties include global slope & curvature of pitch contour, global slope & curvature of pitch contour, movement at start and end, movement at start and end, measures of spectral balance & change measures of spectral balance & change  Relations between tunes include abruptness of change from last tune abruptness of change from last tune ‘crescendo’ … ‘crescendo’ …  etc.

Summary  ASSESS is part system, part philosophy  The system delivers robust estimates of spectrum, F0 and intensity contours, key boundaries, and properties of the units they define  The philosophy is using signal processing expertise to make multiple alternatives at multiple levels available to others.