Query by Singing and Humming System

Slides:

Advertisements

Similar presentations

Speaker Associate Professor Ning-Han Liu. What’s MIR  Music information retrieval (MIR) is the interdisciplinary science of retrieving information from.

Advertisements

1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.

Advanced Speech Enhancement in Noisy Environments

Multipitch Tracking for Noisy Speech

Franz de Leon, Kirk Martinez Web and Internet Science Group  School of Electronics and Computer Science  University of Southampton {fadl1d09,

1 Copyright 2011 G.Tzanetakis Music Information Retrieval George Tzanetakis Associate Professor, IEEE Senior Member.

Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.

An Overview of Pitch Detection Algorithms Alexandre Savard MUMT611: Music Information Acquisition, Preservation, and Retrieval February 2006.

Overview of Real-Time Pitch Tracking Approaches Music information retrieval seminar McGill University Francois Thibault.

LAM: Musical Audio Similarity Michael Casey Centre for Cognition, Computation and Culture Department of Computing Goldsmiths College, University of London.

/25 Singer Similarity A Brief Literature Review Catherine Lai MUMT-611 MIR March 24,

Evaluation of the Audio Beat Tracking System BeatRoot By Simon Dixon (JNMR 2007) Presentation by Yading Song Centre for Digital Music

Soundprism An Online System for Score-informed Source Separation of Music Audio Zhiyao Duan and Bryan Pardo EECS Dept., Northwestern Univ. Interactive.

Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.

1 Robust Temporal and Spectral Modeling for Query By Melody Shai Shalev, Hebrew University Yoram Singer, Hebrew University Nir Friedman, Hebrew University.

DEVON BRYANT CS 525 SEMESTER PROJECT Audio Signal MIDI Transcription.

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING MARCH 2010 Lan-Ying Yeh

Advanced Multimedia Music Information Retrieval Tamara Berg.

Sound Applications Advanced Multimedia Tamara Berg.

1 AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI LANGUAGE AND INTELLIGENCE U N I V E R S I T Y O F P I S A DEPARTMENT OF COMPUTER SCIENCE Automatic.

GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.

A Time Based Approach to Musical Pattern Discovery in Polyphonic Music Tamar Berman Graduate School of Library and Information Science University of Illinois.

FRIP: A Region-Based Image Retrieval Tool Using Automatic Image Segmentation and Stepwise Boolean AND Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 7,

Student: Mike Jiang Advisor: Dr. Ras, Zbigniew W. Music Information Retrieval.

2010/12/11 Frequency Domain Blind Source Separation Based Noise Suppression to Hearing Aids (Part 1) Presenter: Cian-Bei Hong Advisor: Dr. Yeou-Jiunn Chen.

Polyphonic Music Transcription Using A Dynamic Graphical Model Barry Rafkind E6820 Speech and Audio Signal Processing Wednesday, March 9th, 2005.

Multimodal Information Analysis for Emotion Recognition

Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,

2015/10/221 Progressive Filtering and Its Application for Query-by-Singing/Humming J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept.,

MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.

Speaker independent Digit Recognition System Suma Swamy Research Scholar Anna University, Chennai 10/22/2015 9:10 PM 1.

Overview of Part I, CMSC5707 Advanced Topics in Artificial Intelligence KH Wong (6 weeks) Audio signal processing – Signals in time & frequency domains.

Demos for QBSH J.-S. Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.

Nico De Clercq Pieter Gijsenbergh.  Problem  Solutions  Single-channel approach  Multichannel approach  Our assignment Overview.

Polyphonic Transcription Bruno Angeles McGill University - Schulich School of Music MUMT-621 Fall /14.

Timbre and Modulation Features for Music Genre/Mood Classification J.-S. Roger Jang & Jia-Min Ren Multimedia Information Retrieval Lab Dept. of CSIE, National.

Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.

Introduction to Onset Detection Functions HAO-HSUN LI 1/30.

EE Audio Signals and Systems Linear Prediction Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

Audio processing methods on marine mammal vocalizations Xanadu Halkias Laboratory for the Recognition and Organization of Speech and Audio

Music Information Retrieval from a Singing Voice Using Lyrics and Melody Information Motoyuki Suzuki, Toru Hosoya, Akinori Ito, and Shozo Makino EURASIP.

Audio Tempo Extraction Presenter: Simon de Leon Date: February 9, 2006 Course: MUMT611.

Singer similarity / identification Francois Thibault MUMT 614B McGill University.

Pitch Estimation by Enhanced Super Resolution determinator By Sunya Santananchai Chia-Ho Ling.

Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.

QBSH Corpus The QBSH corpus provided by Roger Jang [1] consists of recordings of children’s songs from students taking the course “Audio Signal Processing.

By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.

CELP / FS-1016 – 4.8kbps Federal Standard in Voice Coding

1 Hidden Markov Model: Overview and Applications in MIR MUMT 611, March 2005 Paul Kolesnik MUMT 611, March 2005 Paul Kolesnik.

Classification of melody by composer using hidden Markov models Greg Eustace MUMT 614: Music Information Acquisition, Preservation, and Retrieval.

Iris-based Authentication System Daniel Schonberg and Darko Kirovski, “Iris Compression for Cryptographically Secure Person Identification”, in Proceedings.

R ESEARCH P ROGRESS R EPORT – C OVER S ONGS I DENTIFICATION Ken.

Automatic Music Transcription: Employing Hidden Markov Models to assist with Multiple Fundamental Pitch Estimation ASHWIN D. D’CRUZ Acting Supervisor:

Looking for New, Not Known Music Only : Music Retrieval by Melody Style Fang-Fei Kuo Dept. of Computer Science and Information Engineering National Chiao.

Audio Processing Mitch Parry. Resource! Sound Waves and Harmonic Motion.

1 Tempo Induction and Beat Tracking for Audio Signals MUMT 611, February 2005 Assignment 3 Paul Kolesnik.

A NONPARAMETRIC BAYESIAN APPROACH FOR

David Sears MUMT November 2009

MIR Lab: R&D Foci and Demos （ MIR實驗室：研發重點及展示）

MATCH A Music Alignment Tool Chest

自我介紹學歷：研究方向：經歷： 1984：學士，台大電機系 1992：博士，加州大學柏克萊分校、電機電腦系

Computational analysis on folk music of Cyprus Internal report

Introduction to Music Information Retrieval (MIR)

CS 188: Artificial Intelligence Fall 2008

Presenter: Simon de Leon Date: March 2, 2006 Course: MUMT611

Handwritten Characters Recognition Based on an HMM Model

Learning for Efficient Retrieval of Structured Data with Noisy Queries

Measuring the Similarity of Rhythmic Patterns

Speech Enhancement Based on Nonparametric Factor Analysis

Music Signal Processing

Presentation transcript:

Query by Singing and Humming System LIN CHIAO WEI 2015/12/02

QBSH Retrieve a song when forgetting the names of singer and song. Extracting information from the humming input, comparing with database, and ranking by similarity. Include three main part: Onset detection Pitch estimation Melody matching

system diagram

Onset detection Pitch estimation Melody matching - Magnitude Method - Short-term Energy Method - Surf Method - Envelope Match Filter Pitch estimation - Autocorrelation Function - Average Magnitude Difference Function - Harmonic Product Spectrum - Proposed Method Melody matching - Hidden Markov Model - Dynamic Programming - Linear Scaling

Onset detection Pitch estimation Melody matching - Magnitude Method - Short-term Energy Method - Surf Method - Envelope Match Filter Pitch estimation - Autocorrelation Function - Average Magnitude Difference Function - Harmonic Product Spectrum - Proposed Method Melody matching - Hidden Markov Model - Dynamic Programming - Linear Scaling

Onset Onset refers to the beginning of a sound or music note. Capture the sudden changes of volume in music signal. [1] J. P. Bello, L. Daudet, S. Abdallah et al., “A tutorial on onset detection in music signals,” Speech and Audio Processing, IEEE Transactions on, vol. 13, no. 5, pp. 1035-1047, 2005.

Magnitude Method Use volume as feature. Steps: Find envelope amplitude: 𝐴 𝑘 =max 𝐿𝑃𝐹{𝑥 𝑛 } 𝑘 𝑛 0 ≤𝑛≤(𝑘+1) 𝑛 0 (2) Magnitude difference: 𝐷 𝑘 = 𝐴 𝑘 − 𝐴 𝑘−1 (3) If 𝐷 𝑘 >threshold, 𝑘 𝑛 0 is recognized as the location of onset. Disadvantage: highly effected by the background noise and the chosen threshold value difference over the threshold value, it means that there is a sudden, sufficient energy growth, which is exactly the position of onset.

Magnitude Method difference over the threshold value, it means that there is a sudden, sufficient energy growth, which is exactly the position of onset.

Short-term Energy Method Use energy as feature. Disadvantage: sensitive to noise and the chosen threshold value Two ways to implement.

Short-term Energy Method (1) Type 1: similar to magnitude method. Steps: 𝐸 𝑘 = 𝑛=𝑘 𝑛 0 𝑘+1 𝑛 0 −1 𝑥 2 [𝑛] (2) 𝐷 𝑘 = 𝐸 𝑘 − 𝐸 𝑘−1 (3) If 𝐷 𝑘 >threshold, 𝑘 𝑛 0 is recognized as the location of onset.

Short-term Energy Method (2) Type 2: transfer to binary sequence. Steps: (1) 𝐸 𝑘 = 𝑛=𝑘 𝑛 0 𝑘+1 𝑛 0 −1 𝑥 2 [𝑛] (2) 𝐷 𝑘 = 1, if 𝐸 𝑘 >threshold 0, if 𝐸 𝑘 ≤threshold (3) For each continuous 1-sequences, set the first one as onset and the last one as offset. 假設二個note之間一定有silence 1 ↑onset ↑offset ↑onset ↑offset

Short-term Energy Method

Surf Method Use the slope of envelope to detect onsets. Disadvantage: require more computation time. [2] S. Pauws, "CubyHum: a fully operational" query by humming" system.“, ISMIR, pp. 187-196, 2002

Surf Method Steps: Find envelope amplitude: 𝐴 𝑘 =max 𝐿𝑃𝐹{𝑥 𝑛 } 𝑘 𝑛 0 ≤𝑛≤(𝑘+1) 𝑛 0 (2) Approximate Am for m=k-2 ~ k+2 by a second-order polynomial function p m = 𝑎 𝑘 + 𝑏 𝑘 𝑚−𝑘 + 𝑐 𝑘 (𝑚−𝑘) 2 . The coefficients 𝑏 𝑘 is the slope of the center (m=0) for which 𝑏 𝑘 = 𝜏=−2 2 𝐴 𝑘+𝜏 𝜏 / 𝜏=−2 2 𝜏 2 . (3) If bk > threshold, 𝑘 𝑛 0 is recognized as the location of onset.

Surf Method

Envelope Match Filter

Envelope Match Filter Steps: Find envelope amplitude: 𝐴 𝑘 =max 𝑥 𝑛 𝑘 𝑛 0 ≤𝑛≤(𝑘+1) 𝑛 0 (2) Normalization 𝐵 𝑘 = ( 𝐴 𝑘 0.2+0.1∗ 𝐴 𝑘 ) 0.7 (3) 𝐶 𝑘 =𝑐𝑜𝑛𝑣𝑜𝑙𝑢𝑡𝑖𝑜𝑛( 𝐵 𝑘 ,𝑓), where f is the match filter. (4) If 𝐶 𝑘 >threshold, then 𝑘 𝑛 0 is recognized as the location of onset. B: normalize 不是onset部份的波動也會放大→ ^0.7 Auto-correlation= f* conj(f(-t))

Envelope Match Filter B: normalize 不是onset部份的波動也會放大→ ^0.7

Onset detection Pitch estimation Melody matching - Magnitude Method - Short-term Energy Method - Surf Method - Envelope Match Filter Pitch estimation - Autocorrelation Function - Average Magnitude Difference Function - Harmonic Product Spectrum - Proposed Method Melody matching - Hidden Markov Model - Dynamic Programming - Linear Scaling

Pitch extraction Estimate the fundamental frequency of each note. Sound produced by humming are along with harmonics which interrupt the estimation of fundamental frequency.

Autocorrelation Function ACF(𝑛)= 1 𝑁−𝑛 𝑘=0 𝑁−1−𝑛 𝑥(𝑘)𝑥(𝑘+𝑛) Where N is the length of signal x, n is the time lag value. If ACF has highest value at n=K → K ＝time period of signal → fundamental frequency ＝ 1/K. Inner product of overlap part [4] J.-S. R. Jang, “Audio signal processing and recognition,” Information on http://www. cs. nthu. edu. tw/~ jang, 2011.

Average Magnitude Difference Function AMDF n = 1 𝑁−𝑛 𝑘=0 𝑁−1−𝑛 𝑥 𝑘 −𝑥(𝑘+𝑛) If AMDF has a low value approximate to 0 at n=K → K ＝time period of signal → fundamental frequency ＝ 1/K. max(amdf)-amdf-max(amdf)*linspace(0,1,length(amdf))‘ 抓max [4] J.-S. R. Jang, “Audio signal processing and recognition,” Information on http://www. cs. nthu. edu. tw/~ jang, 2011.

Harmonic Product Spectrum pitch extraction method in the frequency domain [4] J.-S. R. Jang, “Audio signal processing and recognition,” Information on http://www. cs. nthu. edu. tw/~ jang, 2011.

Proposed method Frequency domain method Get top 3 peaks at f1, f2, f3. Fundamental frequency=min(f1, f2, f3).

Onset detection Pitch estimation Melody matching - Magnitude Method - Short-term Energy Method - Surf Method - Envelope Match Filter Pitch estimation - Autocorrelation Function - Average Magnitude Difference Function - Harmonic Product Spectrum - Proposed Method Melody matching - Hidden Markov Model - Dynamic Programming - Linear Scaling

Melody Matching Transfer the pitch sequence extracted into MIDI number. Compare the numeral sequence of sung input with those in database. Difficulty: sing at wrong key, sing too many or too few notes or sing from any part of the song

Dynamic Programming A method to find an optimum solution to a multi-stage decision problem. Use in DNA sequence matching. Alignment matrix constructed by query sequence Q and target sequence T As long as solution can be refine recursively DNA {A,T,C,G} 𝐴𝑙𝑖𝑔𝑛𝑆𝑐𝑜𝑟𝑒 𝑖,𝑗 =max & 𝐴𝑙𝑖𝑔𝑛𝑆𝑐𝑜𝑟𝑒 𝑖−1,𝑗−1 +𝑚𝑎𝑡𝑐ℎ𝑆𝑐𝑜𝑟𝑒( 𝑞 𝑖 , 𝑡 𝑗 &𝐴𝑙𝑖𝑔𝑛𝑆𝑐𝑜𝑟𝑒 𝑖−1,𝑗 −1 &𝐴𝑙𝑖𝑔𝑛𝑆𝑐𝑜𝑟𝑒 𝑖,𝑗−1 −1 𝑚𝑎𝑡𝑐ℎ𝑆𝑐𝑜𝑟𝑒 𝑞 𝑖 , 𝑡 𝑗 = &2, 𝑖𝑓 𝑞 𝑖 = 𝑡 𝑗 &−2, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Dynamic Programming 𝐴𝑙𝑖𝑔𝑛𝑆𝑐𝑜𝑟𝑒 𝑖,𝑗 =max & 𝐴𝑙𝑖𝑔𝑛𝑆𝑐𝑜𝑟𝑒 𝑖−1,𝑗−1 +𝑚𝑎𝑡𝑐ℎ𝑆𝑐𝑜𝑟𝑒( 𝑞 𝑖 , 𝑡 𝑗 &𝐴𝑙𝑖𝑔𝑛𝑆𝑐𝑜𝑟𝑒 𝑖−1,𝑗 −1 &𝐴𝑙𝑖𝑔𝑛𝑆𝑐𝑜𝑟𝑒 𝑖,𝑗−1 −1 𝑚𝑎𝑡𝑐ℎ𝑆𝑐𝑜𝑟𝑒 𝑞 𝑖 , 𝑡 𝑗 = &2, 𝑖𝑓 𝑞 𝑖 = 𝑡 𝑗 &−2, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 Target Query G A B -1 -2 -3 -4 2 1 D 3 C -5 4 Trace back 𝐴𝑙𝑖𝑔𝑛𝑆𝑐𝑜𝑟𝑒 𝑖,𝑗 =max & 1+𝑚𝑎𝑡𝑐ℎ𝑆𝑐𝑜𝑟𝑒( 𝑞 𝑖 , 𝑡 𝑗 =3 &0−1 =−1 &0−1 =−1

Dynamic Programming Target Query G A B -1 -2 -3 -4 2 1 D 3 C -5 4 G A B -1 -2 -3 -4 2 1 D 3 C -5 4 route 1 2 3 4 Target G - AB - B G - A - BB G - ABB G - A - B B Query GDA - CB GDAC - B GDACB G D A C B -

Markov Model Markov model: a probability transition model Three basic elements: (1)A set of states 𝑆={ 𝑠 1 , 𝑠 2 ,…, 𝑠 𝑁 } (2)A set of transition probabilities T (3)A initial probability distribution p from to a b g w 1 0.5

Hidden Markov Model Hidden Markov model: an extended version of Markov Model. Each state is a probability function. RGBGGBBGRRR…… [8] Fundamentals of Speech Signal Processing, http://speech.ee.ntu.edu.tw/DSP2015Autumn/

Hidden Markov Model for melody matching No zero-probability transition exists. → Give the observations not occur a minimal probability 𝑃 𝑚 From To a b g w t 0.05 1 0.5 From To a b g w t 0.0425 0.0434 0.2 0.8333 0.4348 t

Linear Scaling A straightforward frame-based method. 3 factors: scaling factor, scaling-factor bounds and resolution. [4] J.-S. R. Jang, “Audio signal processing and recognition,” Information on http://www. cs. nthu. edu. tw/~ jang, 2011.

Conclusion Query-By-Singing and Humming system makes people search their desired songs by content-based method. Some onset detection methods: magnitude method, surf method, and envelope match filter. Pitch detection method: autocorrelation function, average magnitude difference function, harmonic product spectrum and our proposed method. Melody matching: dynamic programming, hidden-Markov model and linear scaling. Onset: 98% TP rate

Reference [1] J. P. Bello, L. Daudet, S. Abdallah et al., “A tutorial on onset detection in music signals,” Speech and Audio Processing, IEEE Transactions on, vol. 13, no. 5, pp. 1035-1047, 2005. [2]S. Pauws, "CubyHum: a fully operational" query by humming" system.“, ISMIR, pp. 187-196, 2002 [3] J.-J. Ding, C.-J. Tseng, C.-M. Hu et al., "Improved onset detection algorithm based on fractional power envelope match filter." pp. 709-713. [4] J.-S. R. Jang, “Audio signal processing and recognition,” Information on http://www. cs. nthu. edu. tw/~ jang, 2011. [5] X.-D. Mei, J. Pan, and S.-h. Sun, "Efficient algorithms for speech pitch estimation." pp. 421-424.

Reference [6] M. J. Ross, H. L. Shaffer, A. Cohen et al., “Average magnitude difference function pitch extractor,” Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 22, no. 5, pp. 353-362, 1974. [7] M. R. Schroeder, “Period Histogram and Product Spectrum: New Methods for Fundamental‐Frequency Measurement,” The Journal of the Acoustical Society of America, vol. 43, no. 4, pp. 829-834, 1968. [8] Fundamentals of Speech Signal Processing, http://speech.ee.ntu.edu.tw/DSP2015Autumn/ [9] R. Bellman, “Dynamic programming and Lagrange multipliers,” Proceedings of the National Academy of Sciences of the United States of America, vol. 42, no. 10, pp. 767, 1956. [10] L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257-286, 1989.