IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING MARCH 2010 Lan-Ying Yeh 99.04.12.

Slides:



Advertisements
Similar presentations
Speaker Associate Professor Ning-Han Liu. What’s MIR  Music information retrieval (MIR) is the interdisciplinary science of retrieving information from.
Advertisements

KARAOKE FORMATION Pratik Bhanawat (10bec113) Gunjan Gupta Gunjan Gupta (10bec112)
1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.
Speech Recognition with Hidden Markov Models Winter 2011
Masters Presentation at Griffith University Master of Computer and Information Engineering Magnus Nilsson
Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai
Results obtained in speaker recognition using Gaussian Mixture Models Marieta Gâta*, Gavril Toderean** *North University of Baia Mare **Technical University.
Franz de Leon, Kirk Martinez Web and Internet Science Group  School of Electronics and Computer Science  University of Southampton {fadl1d09,
/25 Singer Similarity A Brief Literature Review Catherine Lai MUMT-611 MIR March 24,
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Communications & Multimedia Signal Processing Meeting 6 Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 6 July,
Communications & Multimedia Signal Processing Meeting 7 Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 23 November,
Speech Enhancement Based on a Combination of Spectral Subtraction and MMSE Log-STSA Estimator in Wavelet Domain LATSI laboratory, Department of Electronic,
Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.
Representing Acoustic Information
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Sound Applications Advanced Multimedia Tamara Berg.
All features considered separately are relevant in a speech / music classification task. The fusion allows to raise the accuracy rate up to 94% for speech.
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
INTRODUCTION  Sibilant speech is aperiodic.  the fricatives /s/, / ʃ /, /z/ and / Ʒ / and the affricatives /t ʃ / and /d Ʒ /  we present a sibilant.
Cepstral Vector Normalization based On Stereo Data for Robust Speech Recognition Presenter: Shih-Hsiang Lin Luis Buera, Eduardo Lleida, Antonio Miguel,
Chapter 14 Speaker Recognition 14.1 Introduction to speaker recognition 14.2 The basic problems for speaker recognition 14.3 Approaches and systems 14.4.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Jacob Zurasky ECE5526 – Spring 2011
Multimodal Information Analysis for Emotion Recognition
Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.
Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
G AUSSIAN M IXTURE M ODELS David Sears Music Information Retrieval October 8, 2009.
Overview of Part I, CMSC5707 Advanced Topics in Artificial Intelligence KH Wong (6 weeks) Audio signal processing – Signals in time & frequency domains.
Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Second Lecture Stuttgart, October 25, 2001.
Extracting Melody Lines from Complex Audio Jana Eggink Supervisor: Guy J. Brown University of Sheffield {j.eggink
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Music Information Retrieval Information Universe Seongmin Lim Dept. of Industrial Engineering Seoul National University.
Variation of aspect ratio Voice section Correct voice section Voice Activity Detection by Lip Shape Tracking Using EBGM Purpose What is EBGM ? Experimental.
A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj NIPS 2009.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.
Audio processing methods on marine mammal vocalizations Xanadu Halkias Laboratory for the Recognition and Organization of Speech and Audio
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
Pitch Estimation by Enhanced Super Resolution determinator By Sunya Santananchai Chia-Ho Ling.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Query by Singing and Humming System
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
UNIT-IV. Introduction Speech signal is generated from a system. Generation is via excitation of system. Speech travels through various media. Nature of.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
Automatic Transcription of Polyphonic Music
Speech Enhancement Summer 2009
ARTIFICIAL NEURAL NETWORKS
Spoken Digit Recognition
Automated Detection of Speech Landmarks Using
Statistical Models for Automatic Speech Recognition
Introduction to Music Information Retrieval (MIR)
Learning Feature Mappings Using Evolutionary Computation
Term Project Presentation By: Keerthi C Nagaraj Dated: 30th April 2003
Statistical Models for Automatic Speech Recognition
A Tutorial on Bayesian Speech Feature Enhancement
Popular Music Vocal Analysis
Missing feature theory
Presenter: Simon de Leon Date: March 2, 2006 Course: MUMT611
Presentation on Timbre Similarity
Measuring the Similarity of Rhythmic Patterns
Combination of Feature and Channel Compensation (1/2)
Music Signal Processing
Presentation transcript:

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING MARCH 2010 Lan-Ying Yeh

Introduction  Music Information Retrieval (MIR)  Singer identification & Vocal-Timbre- Similarity  Feature extraction  Influence from other instruments

Related Studies  Using a statistically based speaker- identification method for speech signals in noisy environments First estimated an accompaniment-only model from interlude sections a vocal-only model by subtract the accompaniment-only model from the vocal- plus-accompaniment model Assume singing voice and accompaniment sounds statistically independent Not always satisfied Estimation have problem

Related Studies  Using vocal separation method Similar to their accompaniment sound reduction method Did not dealt with interlude sections Conducted experiments, using only vocal sections

Method Overview

Accompaniment Sound Reduction  F0 estimation PreFEst (Predominant-F0 estimation method) Observed power spectrum in units of cents A band pass filter designed for most melody Observed pdf of frequency components Each observed pdf is generated from weighted- mixture model of possible tone model Estimate the weighting by EM algorithm (MAP), regard as F0’s pdf Track dominant F0

Accompaniment Sound Reduction  Harmonic Structure Extraction Extract the frequency and amplitude of the l-th overtone Allow r cent error Search local maximum amplitude in an area

Accompaniment Sound Reduction  Re-synthesis Model by sinusoidal Quadratic function approximate changes in phase Linear function approximate changes in amplitude

Accompaniment Sound Reduction  Evaluation

 To be continued…

Feature Extraction  LPC-Derived Mel Cepstral Coefficients (LPMCCs)  ∆F0s

Reliable Frame Selection  The feature vectors obtained from accompaniment sounds regions are unreliable  Set up a vocal GMM and a nonvocal GMM  Determine whether the feature vector x is reliable or not by threshold:  Difficult to determine a global η  Use α% of all the frames in each song are selected as reliable frames

Reliable Frame Selection  Evaluation