Soundprism An Online System for Score-informed Source Separation of Music Audio Zhiyao Duan and Bryan Pardo EECS Dept., Northwestern Univ. Interactive.

Slides:

Advertisements

Similar presentations

Note-level Music Transcription by Maximum Likelihood Sampling Zhiyao Duan ¹ & David Temperley ² 1.Department of Electrical and Computer Engineering 2.Eastman.

Advertisements

Source separation and analysis of piano music signals using instrument-specific sinusoidal model Wai Man SZETO and Kin Hong WONG

Online PLCA for Real-Time Semi-supervised Source Separation Zhiyao Duan, Gautham J. Mysore, Paris Smaragdis 1. EECS Department, Northwestern University.

Recording-based performance analysis: Feature extraction in Chopin mazurkas Craig Sapp (Royal Holloway, Univ. of London) Andrew Earis (Royal College of.

Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.

Multipitch Tracking for Noisy Speech

Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.

tracking beat tracking beat Mcgill university :: music technology :: mumt 611>>

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.

VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.

Chapter 15 Probabilistic Reasoning over Time. Chapter 15, Sections 1-5 Outline Time and uncertainty Inference: ltering, prediction, smoothing Hidden Markov.

Evaluation of the Audio Beat Tracking System BeatRoot By Simon Dixon (JNMR 2007) Presentation by Yading Song Centre for Digital Music

Presenter: Yufan Liu November 17th,

1 Robust Temporal and Spectral Modeling for Query By Melody Shai Shalev, Hebrew University Yoram Singer, Hebrew University Nir Friedman, Hebrew University.

Communications & Multimedia Signal Processing Meeting 7 Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 23 November,

DEVON BRYANT CS 525 SEMESTER PROJECT Audio Signal MIDI Transcription.

Learning to Align Polyphonic Music. Slide 1 Learning to Align Polyphonic Music Shai Shalev-Shwartz Hebrew University, Jerusalem Joint work with Yoram.

1 Integration of Background Modeling and Object Tracking Yu-Ting Chen, Chu-Song Chen, Yi-Ping Hung IEEE ICME, 2006.

Dynamic Time Warping Applications and Derivation

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING MARCH 2010 Lan-Ying Yeh

/14 Automated Transcription of Polyphonic Piano Music A Brief Literature Review Catherine Lai MUMT-611 MIR February 17,

Harmonically Informed Multi-pitch Tracking Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio Lab,

Introduction to Automatic Speech Recognition

1 AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI LANGUAGE AND INTELLIGENCE U N I V E R S I T Y O F P I S A DEPARTMENT OF COMPUTER SCIENCE Automatic.

Instrument Recognition in Polyphonic Music Jana Eggink Supervisor: Guy J. Brown University of Sheffield

Educational Software using Audio to Score Alignment Antoine Gomas supervised by Dr. Tim Collins & Pr. Corinne Mailhes 7 th of September, 2007.

Applications Statistical Graphical Models in Music Informatics Yushen Han Feb I548 Presentation.

Polyphonic Music Transcription Using A Dynamic Graphical Model Barry Rafkind E6820 Speech and Audio Signal Processing Wednesday, March 9th, 2005.

Fundamentals of Music Processing

TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.

Rhythmic Transcription of MIDI Signals Carmine Casciato MUMT 611 Thursday, February 10, 2005.

Structure Discovery of Pop Music Using HHMM E6820 Project Jessie Hsu 03/09/05.

Similarity Matrix Processing for Music Structure Analysis Yu Shiu, Hong Jeng C.-C. Jay Kuo ACM Multimedia 2006.

Forward-Scan Sonar Tomographic Reconstruction PHD Filter Multiple Target Tracking Bayesian Multiple Target Tracking in Forward Scan Sonar.

Extracting Melody Lines from Complex Audio Jana Eggink Supervisor: Guy J. Brown University of Sheffield {j.eggink

MULTIMEDIA INPUT / OUTPUT TECHNOLOGIES INTRODUCTION 6/1/ A.Aruna, Assistant Professor, Faculty of Information Technology.

Detection, Classification and Tracking in a Distributed Wireless Sensor Network Presenter: Hui Cao.

Polyphonic Transcription Bruno Angeles McGill University - Schulich School of Music MUMT-621 Fall /14.

Song-level Multi-pitch Tracking by Heavily Constrained Clustering Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio.

Audio processing methods on marine mammal vocalizations Xanadu Halkias Laboratory for the Recognition and Organization of Speech and Audio

Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimation method.

Audio Tempo Extraction Presenter: Simon de Leon Date: February 9, 2006 Course: MUMT611.

Sparse Bayesian Learning for Efficient Visual Tracking O. Williams, A. Blake & R. Cipolloa PAMI, Aug Presented by Yuting Qi Machine Learning Reading.

Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.

Mixture Kalman Filters by Rong Chen & Jun Liu Presented by Yusong Miao Dec. 10, 2003.

QBSH Corpus The QBSH corpus provided by Roger Jang [1] consists of recordings of children’s songs from students taking the course “Audio Signal Processing.

MSc Project Musical Instrument Identification System MIIS Xiang LI ee05m216 Supervisor: Mark Plumbley.

Present document contains informations proprietary to France Telecom. Accepting this document means for its recipient he or she recognizes the confidential.

Kalman Filtering And Smoothing

Automatic Transcription System of Kashino et al. MUMT 611 Doug Van Nort.

Query by Singing and Humming System

The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.

Piano Music Transcription Wes “Crusher” Hatch MUMT-614 Thurs., Feb.13.

Zhiyao Duan, Changshui Zhang Department of Automation Tsinghua University, China Music, Mind and Cognition workshop.

Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:

The Computational Nature of Language Learning and Evolution 10. Variations and Case Studies Summarized by In-Hee Lee

This research was funded by a generous gift from the Xerox Corporation. Beat-Detection: Keeping Tabs on Tempo Using Auto-Correlation David J. Heid

Synthesizing a Clarinet Nicole Bennett. Overview  Frequency modulation  Using FM to model instrument signals  Generating envelopes  Producing a clarinet.

What is automatic music transcription? Transforming an audio signal of a music performance in a symbolic representation (MIDI or score). Aim: This prototype.

Genre Classification of Music by Tonal Harmony Carlos Pérez-Sancho, David Rizo Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante,

1 Tempo Induction and Beat Tracking for Audio Signals MUMT 611, February 2005 Assignment 3 Paul Kolesnik.

Automatic Transcription of Polyphonic Music

Onset Detection, Tempo Estimation, and Beat Tracking

David Sears MUMT November 2009

Rhythmic Transcription of MIDI Signals

MART: Music Assisted Running Trainer

Catherine Lai MUMT-611 MIR February 17, 2005

Measuring the Similarity of Rhythmic Patterns

Harmonically Informed Multi-pitch Tracking

Music Signal Processing

Presentation transcript:

Soundprism An Online System for Score-informed Source Separation of Music Audio Zhiyao Duan and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio Lab, For presentation in MMIRG2011, Evanston, IL Based on a paper accepted by IEEE Journal of Selected Topics on Signal Processing

From Prism to Soundprism

Potential Applications Personalize one’s favorite mix in live concerts or broadcasts Music-Minus-One then Music-Plus-One Music editing

Related Work Assume audio and score are well-aligned –[Raphael, 2008] –[Hennequin, David & Badeau, 2011] Use Dynamic Time Warping (DTW), offline –[Woodruff, Pardo & Dannenberg, 2006] –[Ganseman, Mysore, Scheunders & Abel, 2010] To our knowledge, no existing work addresses online score-informed source separation

System Overview

Score Following Given a score, there is a 2-d performance space View an performance as a path in the space Task: estimate the path of the audio performance Score position (beats) Tempo (BPM)

Design the Model Decompose audio into frames (46ms long) as observations Create a state variable (to be estimated later ) for each frame Define a state process model (Markovian) Define an observation model Tempo Score position Audio frame States Observs … … Hidden Markov Process ?

Process Model Transition prob. between previous and current states Dynamical system –Position: –Tempo: where tempo noise If the previous position just passed a score onset otherwise

Observation Model Generation prob. from current state to observation was trained on thousands of isolated musical chords as in [1] Define deterministicprobabilistic [1] Z. Duan, B. Pardo and C. Zhang, “Multiple fundamental frequency estimation by modeling spectral peaks and non-peak regions,” IEEE Trans. Audio Speech Language Process. Vol. 18, no. 8, pp , 2010.

Inference Given models Infer the hidden state from previous observations i.e. Estimate, then decide By particle filtering

System Overview

Source Separation 1. Accurately estimate performed pitches –Around score pitches

Reconstruct Source Signals 2. Allocate mixture’s spectral energy –Non-harmonic bins To all sources, evenly –Non-overlapping harmonic bins To the active source, solely –Overlapping harmonic bins To active sources, in inverse proportion to the square of harmonic numbers 3. Inverse Fourier transform with mixture’s phase Frequency bins Amplitude Harmonic positions for Source 1 Harmonic positions for Source 2

Experiments on Real Performances Data source –Score: 10 pieces of J.S. Bach 4-part chorales –Audio: played by a quartet (violin, clarinet, saxophone, bassoon). Each part was individually recorded while the performer was listening to others –Score: constant tempo; audio: tempo varies, fermata Data set –All 15 combinations of 4 parts of each piece –150 pieces = 40 solo pieces + 60 duets + 40 trios + 10 quartets Ground-truth alignment –Manually annotated

Score Following Results Align Rate (AR): percentage of correctly aligned notes in the score (unit: %) where is the onset of the note Scorealign: an offline DTW-based algorithm [2] [2] N. Hu, R.B. Dannenberg and G. Tzanetakis, “Polyphonic audio matching and alignment for music retrieval,” in Proc. WASPAA, New Paltz, New York, USA, 2003, pp

Source Separation Results 1. Soundprism 2. Ideally-aligned –Ground-truth alignment + separation 3. Ganseman10 –Offline algorithm –DTW alignment –Train source model from MIDI synthesized audio 4. MPET (score not used) –Multi-pitch tracking + separation 5. Oracle (theoretical upper bound) Results on 110 pieces

Examples “Ach lieben Christen, seid getrost”, by J.S. Bach –MIDIAudioAligned audio with MIDI –Separated sources

Examples cont. Clarinet Quintet in B minor, op rd movement, by J. Brahms, from RWC database –MIDIAudioAligned audio with MIDI –Separated sources

Conclusions Soundprism: an online score-informed source separation algorithm A hidden Markov process model for score following –View a performance as a path in the 2-d state space –Use multi-pitch information in the observation model A simple algorithm for source separation Experiments on a real music dataset –Score following outperforms an offline algorithm –Source separation outperforms an offline score- informed source separation algorithm –Opens interesting potential applications

Thank you!

Source Separation Results Soundprism Ideally-aligned –Ground-truth alignment + separation Ganseman10 –Offline algorithm –DTW alignment –Train source model from MIDI synthesized audio MPET (score not used) –Multi-pitch tracking + separation Oracle (theoretical upper bound) Results on 60 duets

Inference by Particle Filtering Represent and update the distribution by a fixed number of particles –Randomize 1000 particles at the 1 st frame:, –For the n th frame Update particles using the process model Calculate weights using the observation model Sample particles according to their weights Output mean of the particles as the estimate of the current state Particles represent