Department of Computer Science University of California, San Diego

Slides:



Advertisements
Similar presentations
Chapter 11 Signal Processing with Wavelets. Objectives Define and illustrate the difference between a stationary and non-stationary signal. Describe the.
Advertisements

CS335 Principles of Multimedia Systems Audio Hao Jiang Computer Science Department Boston College Oct. 11, 2007.
Digital Signal Processing
Sound can make multimedia presentations dynamic and interesting.
Part A Multimedia Production Rico Yu. Part A Multimedia Production Ch.1 Text Ch.2 Graphics Ch.3 Sound Ch.4 Animations Ch.5 Video.
Chapter 4: Representation of data in computer systems: Sound OCR Computing for GCSE © Hodder Education 2011.
Franz de Leon, Kirk Martinez Web and Internet Science Group  School of Electronics and Computer Science  University of Southampton {fadl1d09,
CMPS1371 Introduction to Computing for Engineers PROCESSING SOUNDS.
I Power Higher Computing Multimedia technology Audio.
1 Copyright 2011 G.Tzanetakis Music Information Retrieval George Tzanetakis Associate Professor, IEEE Senior Member.
DFT/FFT and Wavelets ● Additive Synthesis demonstration (wave addition) ● Standard Definitions ● Computing the DFT and FFT ● Sine and cosine wave multiplication.
Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.
Audio Meets Image Retrieval Techniques Dave Kauchak Department of Computer Science University of California, San Diego
Outline Introduction Music Information Retrieval Classification Process Steps Pitch Histograms Multiple Pitch Detection Algorithm Musical Genre Classification.
Content-Based Classification, Search & Retrieval of Audio Erling Wold, Thom Blum, Douglas Keislar, James Wheaton Presented By: Adelle C. Knight.
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.
DEVON BRYANT CS 525 SEMESTER PROJECT Audio Signal MIDI Transcription.
Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.
Introduction to Wavelets -part 2
1 Music Classification Using SVM Ming-jen Wang Chia-Jiu Wang.
Improving Musical Genre Classification with RBF Networks Douglas Turnbull Department of Computer Science and Engineering University of California, San.
CELLULAR COMMUNICATIONS DSP Intro. Signals: quantization and sampling.
Digital Audio Multimedia Systems (Module 1 Lesson 1)
Image Processing David Kauchak cs458 Fall 2012 Empirical Evaluation of Dissimilarity Measures for Color and Texture Jan Puzicha, Joachim M. Buhmann, Yossi.
Representing Acoustic Information
Audio Retrieval David Kauchak cs458 Fall Administrative Assignment 4 Two parts Midterm Average:52.8 Median:52 High:57 In-class “quiz”: 11/13.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Audio and Music Representations (Part 2) 1.
LE 460 L Acoustics and Experimental Phonetics L-13
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
Introduction to Interactive Media 10: Audio in Interactive Digital Media.
COMP Representing Sound in a ComputerSound Course book - pages
Multiresolution STFT for Analysis and Processing of Audio
Media Representations - Audio
The Story of Wavelets.
Preprocessing Ch2, v.5a1 Chapter 2 : Preprocessing of audio signals in time and frequency domain  Time framing  Frequency model  Fourier transform 
COSC 1P02 Introduction to Computer Science 4.1 Cosc 1P02 Week 4 Lecture slides “Programs are meant to be read by humans and only incidentally for computers.
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.
Part I: Image Transforms DIGITAL IMAGE PROCESSING.
Wavelet transform Wavelet transform is a relatively new concept (about 10 more years old) First of all, why do we need a transform, or what is a transform.
Hidden Markov Classifiers for Music Genres. Igor Karpov Rice University Comp 540 Term Project Fall 2002.
ECE472/572 - Lecture 13 Wavelets and Multiresolution Processing 11/15/11 Reference: Wavelet Tutorial
CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2009.
Audio Tempo Extraction Presenter: Simon de Leon Date: February 9, 2006 Course: MUMT611.
Encoding and Simple Manipulation
Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The.
MMDB-8 J. Teuhola Audio databases About digital audio: Advent of digital audio CD in Order of magnitude improvement in overall sound quality.
CSCI-100 Introduction to Computing Hardware Part II.
MSc Project Musical Instrument Identification System MIIS Xiang LI ee05m216 Supervisor: Mark Plumbley.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Fourier and Wavelet Transformations Michael J. Watts
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2014.
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 4 – Audio and Digital Image Representation Klara Nahrstedt Spring 2010.
ADAPTIVE BABY MONITORING SYSTEM Team 56 Michael Qiu, Luis Ramirez, Yueyang Lin ECE 445 Senior Design May 3, 2016.
Lifecycle from Sound to Digital to Sound. Characteristics of Sound Amplitude Wavelength (w) Frequency ( ) Timbre Hearing: [20Hz – 20KHz] Speech: [200Hz.
Audio Processing Mitch Parry. Similar to Image Processing? For images a pixel is the smallest unit The color is a distribution of the spectrum of visible.
Automatic Classification of Audio Data by Carlos H. L. Costa, Jaime D. Valle, Ro L. Koerich IEEE International Conference on Systems, Man, and Cybernetics.
Spectrum Analysis and Processing
ARTIFICIAL NEURAL NETWORKS
Fourier and Wavelet Transformations
Multi-resolution analysis
Wavelet transform Wavelet transform is a relatively new concept (about 10 more years old) First of all, why do we need a transform, or what is a transform.
Lecture 2: Frequency & Time Domains presented by David Shires
AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION
Presenter: Simon de Leon Date: March 2, 2006 Course: MUMT611
Govt. Polytechnic Dhangar(Fatehabad)
Measuring the Similarity of Rhythmic Patterns
Presentation transcript:

Department of Computer Science University of California, San Diego Automatic Music Genre Classification of Audio Signals George Tzanetakis, Georg Essl & Perry Cook Presented by: Dave Kauchak Department of Computer Science University of California, San Diego dkauchak@cs.ucsd.edu

Image Classification ? ? ?

Audio Classification ? ? ? Rock Classical Country

Hierarchy of Sound Sound Music Speech Other? ? Jazz Country Sports Announcer Male Rock Classical Female Disco Hip Hop Choir Orchestra String Quartet Piano

Classification Procedure Raw audio Digitally encode Extract features Build class models Preprocessing Decide class Raw audio Digitally encode Extract features Input processing

Digitally Encoding Raw Sound is simply a longitudinal compression wave traveling through some medium (often, air). Must be digitized to be processed WAV MIDI MP3 Others…

WAV Simple encoding Sample sound at some interval (e.g. 44 KHz). High sound quality Large file sizes

MIDI Musical Instrument Digital Interface MIDI is a language Sentences describe the channel, note, loudness, etc. 16 channels (each can be though of and recorded as a separate instrument) Common for audio retrieval an classification applications

MIDI Example Music Melodies Tempo Instrument Sequence of Notes Channel Pitch amplitude Duration

MP3 Common compression format 3-4 MB vs. 30-40 MB for uncompressed Perceptual noise shaping The human ear cannot hear certain sounds Some sounds are heard better than others The louder of two sounds will be heard

MP3 Example

Extract Features Mel-scaled cepstral coefficients (MFCCs) Musical surface features Rhythm Features Others…

Tools for Feature Extraction Fourier Transform (FT) Short Term Fourier Transform (STFT) Wavelets

Fourier Transform (FT) Time-domain Frequency-domain

Another FT Example Time Frequency

Problem? 

Problem with FT FT contains only frequency information No Time information is retained Works fine for stationary signals Non-stationary or changing signals cause problems FT shows frequencies occurring at all times instead of specific times

Solution: STFT How can we still use FT, but handle non-stationary signals? How can we include time? Idea: Break up the signal into discrete windows Each signal within a window is a stationary signal Take FT over each part

STFT Example Window functions

Better STFT Example

Problem: Resolution We can vary time and frequency accuracy Narrow window: good time resolution, poor frequency resolution Wide window: good time resolution, poor frequency resolution So, what’s the problem?

Varying the resolution

Where’s the problem? How do you pick an appropriate window? Too small = poor frequency resolution Too large may result in violation of stationary condition Different resolutions at different frequencies?

Solution: Wavelet Transform Idea: Take a wavelet and vary scale Check response of varying scales on signal

Wavelet Example: Scale 1

Wavelet Example: Scale 2

Wavelet Example: Scale 3

Wavelet Example Scale = 1/frequency Translation  Time

Discrete Wavelet Transform (DWT) Wavelet comes in pairs (high pass and low pass filter) Split signal with filter and downsample

DWT cont. Continue this process on the high frequency portion of the signal

DWT Example

How did this solve the resolution problem? Higher frequency resolution at high frequencies Higher time frequency at low frequencies

Don’t Forget… Why did we do we need these tools (FT, STFT & DWT)? Features extraction: Mel-frequency cepstral coefficients (MFCCs) Musical surface features Rhythm Features

MFCC Common for speech Pre-Emphasis Window then FFT Mel-scaling Filter out high frequencies to imitate ear Window then FFT Mel-scaling Run frequency signal through bandpass filters Filters are designed to mimic “critical bandwidths” in human hearing Cepstral coefficients Normalized Cosine transform

Musical surface features Represents characteristics of music Texture Timbre Instrumentation Statistics over spectral distribution Centroid Rolloff Flux Zero Crossings Low Energy

Calculating Surface Features Calculate feature for window … Divide into windows Calculate mean and std. dev. over windows FFT over window Signal

Surface Features Centroid: Measures spectral brightness Rolloff: Spectral Shape R such that: M[f] = magnitude of FFT at frequency bin f over N bins

More surface features Flux: Spectral change Zero Crossings: Noise in signal Low Energy: Percentage of windows that have energy less than average Where, Mp[f] is M[f] of the previous window

Rhythm Features Wavelet Transform Full Wave Rectification Low Pass Filtering Downsampling Normalize

Rhythm Features cont. Autocorrelation – The cross-correlation of a signal with itself (i.e. portions of a signal with it’s neighbors) Take first 5 peaks Histogram over windows of the signal

Actual Rhythm Features Using the “beat” histogram… Period0 - Period in bpm of first peak Amplitude0 - First peak divided by sum of amplitude RatioPeriod1 - Ratio of periodicity of first peak to second peak Amplitude1- Second peak divided by sum of amplitudes RatioPeriod2, Amplitude2, RatioPeriod3, Amplitude3

Experimental Setup Songs collected from radio, CDs and Web 50 samples for each class, 30 sec. Long 15 genres Music genres: Surface and rhythm features Classical: MFCC features Speech: MFCC features Gaussian classifier 10 Fold cross validation

General Results Music vs. Speech Genres Voices Classical Random 50 16 33 25 Gaussian 86 62 74 76

Results: Musical Genres Classic Country Disco Hiphop Jazz Rock 86 2 4 18 1 57 5 12 13 6 55 15 28 90 7 37 19 11 27 48 Pseudo-confusion matrix

Results: Classical Choral Orchestral Piano String 99 10 16 12 53 2 5 1 53 2 5 1 20 75 3 17 7 80 Confusion matrix

Analysis of Features

GUI for Audio Classification Genre Gram Graphically present classification results Results change in real time based on confidence Texture mapped based on category Genre Space Plots sound collections in 3-D space PCA to reduce dimensionality Rotate and interact with space

Genre Gram

Genre Space

Summary Audio retrieval is a relatively new field Wide range of genres and types of audio A number of digital encoding formats Various different types of features Tools for feature extraction FT STFT Wavelet Transform

Thanks Robi Polikar for his tutorial (http://www.public.iastate.edu/~rpolikar/WAVELETS/WTtutorial.html) Karlheinz Brandenburg for developing mp3 