Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

CS335 Principles of Multimedia Systems Audio Hao Jiang Computer Science Department Boston College Oct. 11, 2007.
A Musical Data Mining Primer CS235 – Spring ’03 Dan Berger
Computational Rhythm and Beat Analysis Nick Berkner.
Franz de Leon, Kirk Martinez Web and Internet Science Group  School of Electronics and Computer Science  University of Southampton {fadl1d09,
Automatic Video Shot Detection from MPEG Bit Stream Jianping Fan Department of Computer Science University of North Carolina at Charlotte Charlotte, NC.
1 Copyright 2011 G.Tzanetakis Music Information Retrieval George Tzanetakis Associate Professor, IEEE Senior Member.
Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.
Department of Computer Science University of California, San Diego
Outline Introduction Music Information Retrieval Classification Process Steps Pitch Histograms Multiple Pitch Detection Algorithm Musical Genre Classification.
LAM: Musical Audio Similarity Michael Casey Centre for Cognition, Computation and Culture Department of Computing Goldsmiths College, University of London.
1 Content-Based Retrieval (CBR) -in multimedia systems Presented by: Chao Cai Date: March 28, 2006 C SC 561.
Content-Based Classification, Search & Retrieval of Audio Erling Wold, Thom Blum, Douglas Keislar, James Wheaton Presented By: Adelle C. Knight.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.
Digital audio and computer music COS 116: 2/26/2008.
Segmentation and Event Detection in Soccer Audio Lexing Xie, Prof. Dan Ellis EE6820, Spring 2001 April 24 th, 2001.
Speaker Adaptation for Vowel Classification
The Automatic Musicologist Douglas Turnbull Department of Computer Science and Engineering University of California, San Diego UCSD AI Seminar April 12,
Visual Information Retrieval Chapter 1 Introduction Alberto Del Bimbo Dipartimento di Sistemi e Informatica Universita di Firenze Firenze, Italy.
Multimedia Database Systems Retrieval by Content Department of Informatics Aristotle University of Thessaloniki Fall-Winter 2008.
Outline of Presentation Introduction of digital video libraries Introduction of the CMU Informedia Project Informedia: user perspective Informedia:
Tutorial 7 Working with Multimedia. XP Objectives Explore various multimedia applications on the Web Learn about sound file formats and properties Embed.
Improving Musical Genre Classification with RBF Networks Douglas Turnbull Department of Computer Science and Engineering University of California, San.
Digital audio and computer music COS 116, Spring 2012 Guest lecture: Rebecca Fiebrink.
Image Compression - JPEG. Video Compression MPEG –Audio compression Lossy / perceptually lossless / lossless 3 layers Models based on speech generation.
Audio Retrieval David Kauchak cs458 Fall Administrative Assignment 4 Two parts Midterm Average:52.8 Median:52 High:57 In-class “quiz”: 11/13.
Advanced Multimedia Music Information Retrieval Tamara Berg.
Sound Applications Advanced Multimedia Tamara Berg.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.
1 Faculty of Information Technology Generic Fourier Descriptor for Shape-based Image Retrieval Dengsheng Zhang, Guojun Lu Gippsland School of Comp. & Info.
Audio Compression Usha Sree CMSC 691M 10/12/04. Motivation Efficient Storage Streaming Interactive Multimedia Applications.
Multimedia Databases (MMDB)
Multiresolution STFT for Analysis and Processing of Audio
1 Chapter 5 Image Transforms. 2 Image Processing for Pattern Recognition Feature Extraction Acquisition Preprocessing Classification Post Processing Scaling.
Zhiphone: A Mobile Phone that Learns Context and User Preferences Jing Michelle Liu Raphael Hoffmann CSE567 class project, AUT/05.
Music Information Retrieval -or- how to search for (and maybe find) music and do away with incipits Michael Fingerhut Multimedia Library and Engineering.
Tutorial 7 Working with Multimedia. New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 2 Objectives Explore various multimedia applications.
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.
ICASSP Speech Discrimination Based on Multiscale Spectro–Temporal Modulations Nima Mesgarani, Shihab Shamma, University of Maryland Malcolm Slaney.
Understanding The Semantics of Media Chapter 8 Camilo A. Celis.
Digital audio and computer music COS 116, Spring 2010 Adam Finkelstein Slides and demo thanks to Rebecca Fiebrink.
[The Band SIG] MPEG7 - Audio 손우람 2007 년 12 월 1 일.
It sure is smart but can it swing? (Digital audio and computer music)
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The.
MSc Project Musical Instrument Identification System MIIS Xiang LI ee05m216 Supervisor: Mark Plumbley.
Automatic Transcription System of Kashino et al. MUMT 611 Doug Van Nort.
Introduction to MPEG  Moving Pictures Experts Group,  Geneva based working group under the ISO/IEC standards.  In charge of developing standards for.
A content-based System for Music Recommendation and Visualization of User Preference Working on Semantic Notions Dmitry Bogdanov, Martin Haro, Ferdinand.
Audio Processing Mitch Parry. Resource! Sound Waves and Harmonic Motion.
Audio Processing Mitch Parry. Similar to Image Processing? For images a pixel is the smallest unit The color is a distribution of the spectrum of visible.
Automatic Classification of Audio Data by Carlos H. L. Costa, Jaime D. Valle, Ro L. Koerich IEEE International Conference on Systems, Man, and Cybernetics.
Audio Fingerprinting Wes Hatch MUMT-614 Mar.13, 2003.
1 Tempo Induction and Beat Tracking for Audio Signals MUMT 611, February 2005 Assignment 3 Paul Kolesnik.
The Greek Audio Dataset
David Sears MUMT November 2009
Visual Information Retrieval
Audio Segmentation, Classification, and Retrieval
Automatic Video Shot Detection from MPEG Bit Stream
Objective and Subjective Audio Assessment of MP3 Players’ Quality
ARTIFICIAL NEURAL NETWORKS
Automatic Classification of Music Genre
Digital Music Audio Processing
Multimedia Information Retrieval
Presenter: Simon de Leon Date: March 2, 2006 Course: MUMT611
Presentation on Timbre Similarity
Presentation transcript:

Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University

Bits of the history of bits Hello world Understanding multimedia content = Web Multimedia

Music ➢ 4 million recorded CDs ➢ 4000 CDs / month ➢ 60-80% ISP bandwidth ➢ Global ➢ Pervasive ➢ Complex

The not so far future of MIR ➢ Library of all recorded music ➢ Tasks: organize, search, retrieve, classify recommend, browse, listen, annotate ➢ Examples:

Talk Outline Dancing Human Computer Interaction Hearing Signal Processing Understanding Machine Learning MP3 feature extraction DWT Beat Histograms QBE similarity retrieval Genre Classification Content & Context Aware Timbregrams Timbrespaces

Hearing - Feature extraction Feature Space Feature vector

Timbral Texture Timbre = differentiate sounds of same pitch and loudness Timbral Texture = differentiate mixtures of sounds (possibly with the same or similar rhythmic and pitch content) Global, statistical and fuzzy properties

Time-domain waveform Input Time ? Decompose to building blocks Frequency Time

Spectrum and Shape Descriptors M F Centroid Rolloff Flux Bandwidth Moments.... Centroid Feature Space Feature vector =

Time-Frequency Analysis Fourier Transform

Short Time Fourier Transform Time-varying spectra Fast Fourier Transform FFT Input Time t t+1 t+2 Filters Oscillators Output Amplitude Frequency

STFT- Wavelets Heisenberg uncertaintyTime – Frequency

MPEG Audio Feature Extraction Analysis Filterbank Psychoacoustics Model Available bits Perceptual Audio Coding (slow encoding, fast decoding) MP3 Pye ICASSP 00 Tzanetakis & Cook ICASSP 00

Summary of Timbral Texture Features ➢ Time-Frequency analysis ➢ Signal processing (STFT, DWT) ➢ Perceptual (MFCC, MPEG) ➢ Statistics over “texture” window

Rhythm ➢ Rhythm = movement in time ➢ Origins in poetry (iamb, trochaic...) ➢ Foot tapping ➢ Hierarchical semi-periodic structure ➢ Linked to motion ➢ Running vs global

Wavelet-based Rhythm Analysis Tzanetakis et al AMTA01 Goto, Muraoka CASA98 Foote, Uchihashi ICME01 Scheirer JASA98 Input Signal Full Wave Rectification - Low Pass Filtering - Normalization + Peak Picking Beat Histogram Envelope Extraction DWTDWT Autocorrelation

Beat Histograms Tzanetakis et al AMTA01 max(h(i)), argmax(h(i))

Musical Content Features ➢ Timbral Texture (19) ➢ Spectral Shape ➢ MFCC (perceptually motivated features, ASR) ➢ Rhythmic structure (6) ➢ Beat Histogram Features ➢ Harmonic content (5) ➢ Pitch Histogram Features

Understanding Musical Piece Trajectory Point

Query-by-Example Content-based Retrieval Rank List Collection of 3000 clips

Automatic Musical Genre Classification ➢ Categorical music descriptions created by humans ➢ Fuzzy boundaries ➢ Statistical properties ➢ Timbral texture, rhythmic structure, harmonic content ➢ Evaluate musical content features ➢ Structure audio collections

p( | ) * P( ) Statistical Supervised Learning Decision boundary Partitioning of feature space P( | )= p( ) Music Speech

Non-parametric classifiers p( | ) * P( ) P( | )= p( ) Nearest-neighbor classifiers (K-NN)

Parametric classifiers p( | ) * P( ) P( | )= p( ) Gaussian Classifier Gaussian Mixture Models

Classification Evaluation – 10 genres Automatic (different collection) Gaussian Mixture Model (GMM) 10-fold cross-validation 61% (70%) Perrot & Gjerdingen, M.Cognition 99 Tzanetakis & Cook, TSAP 10(5) 2002 Manual (52 subjects) 0.25 seconds 40% 3 seconds 70%

GenreGram DEMO Dynamic real time 3D display for classification of radio signals

Audio Segmentation ➢ Segmentation = changes of sound "texture" Music Male VoiceFemale Voice News:

Multifeature Segmentation Methodology ➢ Time series of feature vectors V(t) ➢ f(t) = d(V(t), V(t-1)) ➢ D(x,y) = (x-y)C -1 (x-y) t (Mahalanobis) ➢ df/dt peaks correspond to texture changes Tzanetakis & Cook, WASPAA 99

Interaction ➢ Automatic results not perfect ➢ Music listening subjective ➢ Browsing vs retrieval ➢ Adapt UI to audio “Content & Context” ➢ Computer Audition ➢ Visualization Cooledit

Content and Context ➢ Content ~ file ➢ Genre, male voice, high frequency ➢ Context ~ file and collection ➢ Similarity ➢ Slow – fast ➢ Multiple visualizations ➢ Same content, different context Billie Holiday Ella Fitzerald Christina Aguilera

Principal Component Analysis Projection matrix PCA Eigenanalysis of collection correlation matrix

Timbregrams and Timbrespaces PCA = content & context Tzanetakis & Cook DAFX00, ICAD01

Integration

Implementation ➢ MARSYAS : free software framework for computer audition research ➢ Server in C++ (numerical signal processing and machine learning) ➢ Client in JAVA (GUI) ➢ Linux, Solaris, Irix and Wintel (VS, Cygwin) ➢ Apr downloads, 2300 different hosts, 30 countries since March 2001 Tzanetakis & Cook Organized Sound 4(3) 00

Marsyas users Desert Island Jared Hoberock Dan Kelly Ben Tietgen Marc Cardle Music-driven motion editing Real time music-speech discrimination

Current Work-Collaborations ➢ CMU ➢ Structural analysis ➢ Query -by-humming ➢ MIR over P2P networks (Jun Gao) ➢ Informedia ➢ Princeton: sound fx analysis-synthesis(P.Cook) ➢ Rochester: machine learning (Tao Li) ➢ Northwestern: perception of musical genre (R.Gjerdingen) Tzanetakis, Hu and Dannenberg, WIAMIS 03

Future Work ➢ Music ➢ Singer identification ➢ Chord progression detection ➢ Intermediate representations ➢ Motion capture signals ➢ Biological signals – time series in general ➢ Content and context aware multimedia UI

Auditory Scene Analysis Albert Bregman