1 Copyright 2011 G.Tzanetakis Music Information Retrieval George Tzanetakis Associate Professor, IEEE Senior Member.

Slides:



Advertisements
Similar presentations
Speaker Associate Professor Ning-Han Liu. What’s MIR  Music information retrieval (MIR) is the interdisciplinary science of retrieving information from.
Advertisements

Masters Presentation at Griffith University Master of Computer and Information Engineering Magnus Nilsson
Franz de Leon, Kirk Martinez Web and Internet Science Group  School of Electronics and Computer Science  University of Southampton {fadl1d09,
Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.
Department of Computer Science University of California, San Diego
Overview of Real-Time Pitch Tracking Approaches Music information retrieval seminar McGill University Francois Thibault.
Outline Introduction Music Information Retrieval Classification Process Steps Pitch Histograms Multiple Pitch Detection Algorithm Musical Genre Classification.
LAM: Musical Audio Similarity Michael Casey Centre for Cognition, Computation and Culture Department of Computing Goldsmiths College, University of London.
Overview What : Stroke type Transformation: Timbre Rhythm When: Stroke timing Resynthesis.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Content-Based Classification, Search & Retrieval of Audio Erling Wold, Thom Blum, Douglas Keislar, James Wheaton Presented By: Adelle C. Knight.
F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)
Computer Graphics Recitation 6. 2 Motivation – Image compression What linear combination of 8x8 basis signals produces an 8x8 block in the image?
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Classifying Motion Picture Audio Eirik Gustavsen
Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University
Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.
DEVON BRYANT CS 525 SEMESTER PROJECT Audio Signal MIDI Transcription.
Multimedia Database Systems Retrieval by Content Department of Informatics Aristotle University of Thessaloniki Fall-Winter 2008.
The Chinese University of Hong Kong Department of Computer Science and Engineering Lyu0202 Advanced Audio Information Retrieval System.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Introduction to MIR Course Overview 1.
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING MARCH 2010 Lan-Ying Yeh
Representing Acoustic Information
Audio Retrieval David Kauchak cs458 Fall Administrative Assignment 4 Two parts Midterm Average:52.8 Median:52 High:57 In-class “quiz”: 11/13.
Advanced Multimedia Music Information Retrieval Tamara Berg.
Sound Applications Advanced Multimedia Tamara Berg.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
Multimedia and Time-series Data
August 12, 2004IAML - IASA 2004 Congress, Olso1 Music Information Retrieval, or how to search for (and maybe find) music and do away with incipits Michael.
Student: Mike Jiang Advisor: Dr. Ras, Zbigniew W. Music Information Retrieval.
Implementing a Speech Recognition System on a GPU using CUDA
Music Information Retrieval -or- how to search for (and maybe find) music and do away with incipits Michael Fingerhut Multimedia Library and Engineering.
Jacob Zurasky ECE5526 – Spring 2011
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
Fundamentals of Music Processing
Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
Understanding The Semantics of Media Chapter 8 Camilo A. Celis.
Music Information Retrieval Information Universe Seongmin Lim Dept. of Industrial Engineering Seoul National University.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Audio Tempo Extraction Presenter: Simon de Leon Date: February 9, 2006 Course: MUMT611.
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The.
MMDB-8 J. Teuhola Audio databases About digital audio: Advent of digital audio CD in Order of magnitude improvement in overall sound quality.
Performance Comparison of Speaker and Emotion Recognition
Ask a Librarian: The Role of Librarians in the Music Information Retrieval Community Jenn Riley, Indiana University Constance A. Mayer, University of Maryland.
Fourier and Wavelet Transformations Michael J. Watts
Query by Singing and Humming System
1 Hidden Markov Model: Overview and Applications in MIR MUMT 611, March 2005 Paul Kolesnik MUMT 611, March 2005 Paul Kolesnik.
Speech Processing Using HTK Trevor Bowden 12/08/2008.
Audio Processing Mitch Parry. Resource! Sound Waves and Harmonic Motion.
Audio Processing Mitch Parry. Similar to Image Processing? For images a pixel is the smallest unit The color is a distribution of the spectrum of visible.
Automatic Classification of Audio Data by Carlos H. L. Costa, Jaime D. Valle, Ro L. Koerich IEEE International Conference on Systems, Man, and Cybernetics.
Audio Fingerprinting Wes Hatch MUMT-614 Mar.13, 2003.
1 Tempo Induction and Beat Tracking for Audio Signals MUMT 611, February 2005 Assignment 3 Paul Kolesnik.
Ionian University Department of Informatics
The Greek Audio Dataset
David Sears MUMT November 2009
PATTERN COMPARISON TECHNIQUES
CS 591 S1 – Computational Audio
ARTIFICIAL NEURAL NETWORKS
Introduction to Music Information Retrieval (MIR)
Fourier and Wavelet Transformations
Digital Music Audio Processing
Presenter: Simon de Leon Date: March 2, 2006 Course: MUMT611
Measuring the Similarity of Rhythmic Patterns
Music Signal Processing
Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems NDSS 2019 Hadi Abdullah, Washington Garcia, Christian Peeters, Patrick.
Presentation transcript:

1 Copyright 2011 G.Tzanetakis Music Information Retrieval George Tzanetakis Associate Professor, IEEE Senior Member Tier II Canada Research Chair Computer Science Department (also in Music, ECE) University of Victoria, Canada

Copyright 2011 G.Tzanetakis MIR ‣ Interdisciplinary science of retrieving information from music ‣ ISMIR - Int. Symposium -> Int. Conf. on MIR -> Int. Conf. of the Society of MIR ‣ First ISMIR in 2000 ‣ Increasing presence in ICASSP, ICME, ACMM, TMM, TASLP, MMTA ‣ All proceedings are freely available online ‣

3 Copyright 2011 G.Tzanetakis Connections Machine Learning Signal Processing Psychology Computer Science Information Science Human-Computer Interaction MUSIC

Copyright 2011 G.Tzanetakis 4 Music today ‣ Music is produced, distributed and consumed digitally ‣ 2011 digital music sales > physical album sales

5 Copyright 2011 G.Tzanetakis Industry

Copyright 2011 G.Tzanetakis Music Collections ‣ Personal music collections ~ thousands ‣ Streaming music sites, stores ~ millions ‣ Great celestial jukebox in the sky ~ all of recorded music in human history ‣ A 5-minute music track is digitally represented using approximately 26 million floating point numbers

7 Copyright 2011 G.Tzanetakis Overview  Focus on signal processing and audio  Audio Feature Extraction  Timbre, Pitch, Rhythm  Analysis  Similarity, Classification, Modelling Time  Tasks  Similarity, Genre classification, Tag annotation, Query-by-Humming, Audio-Score Alignment

8 Copyright 2011 G.Tzanetakis Audio Feature Extraction  Sound and sine waves  Timbral Features  Short Time Fourier Transform (STFT) Mel-Frequency Cepstral Coefficients (MFCC), Perceptual Audio Compression  Pitch and Harmony  Rhythm

9 Copyright 2011 G.Tzanetakis Linear Systems and Sinusoids in1 in2 in1 + in2 out1 out2 out1 + out2 Amplitude Period = 1 / Frequency Phase True sine waves last forever sine wave -> LTI -> new sine wave

10 Copyright 2011 G.Tzanetakis Fourier Transform Text

Copyright 2011 G.Tzanetakis Short Time Fourier Transform Time-varying spectra Fast Fourier Transform FFT Input Time t t+1 t+2 Filters Oscillators Output Amplitude Frequency

12 Copyright 2011 G.Tzanetakis Spectrum and Shape Descriptors M F Centroid Rolloff Flux Bandwidth Moments.... Centroid Feature Space Feature vector =

13 Copyright 2011 G.Tzanetakis Mel Frequency Cepstral Coefficients Mel-scale 13 linearly-spaced filters 27 log-spaced filters CF CF-130 CF / CF+130 CF * Mel-filtering Log DCT MFCCs

14 Copyright 2011 G.Tzanetakis Audio Feature Extraction

15 Copyright 2011 G.Tzanetakis Traditional Music Representations

16 Copyright 2011 G.Tzanetakis Pitch content  Harmony, melody = pitch concepts  Music Theory Score = Music  Bridge to symbolic MIR  Automatic music transcription  Non-transcriptive arguments Split the octave to discrete logarithmically spaced intervals

17 Copyright 2011 G.Tzanetakis Pitch Detection P Time-domain Frequency-domain Perceptual Pitch is a PERCEPTUAL attribute correlated but not equivalent to fundamental frequency

18 Copyright 2011 G.Tzanetakis Time Domain C4 Clarinet Note C4 Sine Wave # zero-crossings sensitive to noise – needs LPF

19 Copyright 2011 G.Tzanetakis AutoCorrelation Efficient computation possible for powers of 2 using FFT F(f) = FFT(X(t)) S(f) = F(f) F*(f) R(l) = IFFT(S(f))

20 Copyright 2011 G.Tzanetakis Frequency Domain Fundamental frequency (as well as pitch) will correspond to peaks in the Spectrum. The fundamental does not necessarily have the highest amplitude. Sine C4Clarinet C4

21 Copyright 2011 G.Tzanetakis Chroma – Pitch perception

22 Copyright 2011 G.Tzanetakis Automatic Rhythm Description

23 Copyright 2011 G.Tzanetakis Beat Histograms Tzanetakis et al AMTA01 max(h(i)), argmax(h(i)) Beat Histogram Features

24 Copyright 2011 G.Tzanetakis Analysis Overview Musical Piece Trajectory Point Cloud

25 Copyright 2011 G.Tzanetakis Content-based Similarity Retrieval (or query-by-example) Point Input: Query example Output: Ranked list of similar audio files based on feature vector similarity

26 Copyright 2011 G.Tzanetakis p( | ) * P( ) Classification Decision boundary Partitioning of feature space Generative vs discriminative models P( | )= p( ) Music Speech

27 Copyright 2011 G.Tzanetakis Classification  Genre/Style  Emotion/Mood  Artist  Instrument MIREX genres second clips / genre

28 Copyright 2011 G.Tzanetakis Multi-tag annotation  Free-form tags (female voice, woman singing)  Multi-label classification problems with twists  Issues: synonyms, subpart relations, sparse,noisy  Cold start problem  Typically each tag is treated independently as a classification problem  Inverse also interesting (query-by-keywords)

29 Copyright 2011 G.Tzanetakis Stacking

30 Copyright 2011 G.Tzanetakis Polyphonic Audio-Score Alignment  Representation  Time Series of Chroma  Matching Procedure  Dynamic Time Warping

31 Copyright 2011 G.Tzanetakis Dynamic Time Wraping Aligned Performances of the same orchestral piece Attempting to align two different orchestra pieces

32 Copyright 2011 G.Tzanetakis Query-by-humming  User sings a melody  Computer searches database for song containing the melody  The challenge of difficult queries

33 Copyright 2011 G.Tzanetakis The MUSART system  Query preprocessing  Pitch contour extraction (audio)  Note segmentation (symbolic)  Target preprocessing (symbolic)  Theme extraction  Model-forming, representation  Search to find approximate match  Dynamic Time Warping, HMMs

34 Copyright 2011 G.Tzanetakis Conclusions  Through a combination of digital signal processing and machine learning techniques a variety of music information retrieval tasks have been explored in the literature  The tasks covered in this presentation are representative of existing work and there are already commercial implementations for them. There are many more that are actively being investigated.  Music is a complex and fascinating signal and we are just beginning to understand it better using computers