Finding a single voice in music Christine Smit April 26, 2007.

Slides:



Advertisements
Similar presentations
© Fraunhofer FKIE Corinna Harwardt Automatic Speaker Recognition in Military Environment.
Advertisements

DSP II: Final presentation Vocoder - making music talk Van Damme Wim Hemeryck Martijn.
Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.
Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
1 Machine learning for note onset detection. Alexandre Lacoste & Douglas Eck.
1 A scheme for racquet sports video analysis with the combination of audio-visual information Visual Communication and Image Processing 2005 Liyuan Xing,
Chord Recognition EE6820 Speech and Audio Signal Processing and Recognition Mid-term Presentation JunHao Ip.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 1 Feature selection for audio-visual speech recognition Mihai Gurban.
Toward Semantic Indexing and Retrieval Using Hierarchical Audio Models Wei-Ta Chu, Wen-Huang Cheng, Jane Yung-Jen Hsu and Ja-LingWu Multimedia Systems,
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Berenzweig - Music Recommendation1 Music Recommendation Systems: A Progress Report Adam Berenzweig April 19, 2002.
On Recognizing Music Using HMM Following the path craved by Speech Recognition Pioneers.
Segmentation and Event Detection in Soccer Audio Lexing Xie, Prof. Dan Ellis EE6820, Spring 2001 April 24 th, 2001.
Speaker Adaptation for Vowel Classification
Finding solos in music Christine Smit. What is a ‘solo’? a single note sounding at a time a single note sounding at a time.
Language and Speaker Identification using Gaussian Mixture Model Prepare by Jacky Chau The Chinese University of Hong Kong 18th September, 2002.
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
AdvAIR Supervised by Prof. Michael R. Lyu Prepared by Alex Fok, Shirley Ng 2002 Fall An Advanced Audio Information Retrieval System.
Optimal Adaptation for Statistical Classifiers Xiao Li.
Advisor: Prof. Tony Jebara
By Jiazhi Ou Tal Blum Wild Dolphin Project Speech Final Project.
Crowd++: Unsupervised Speaker Count with Smartphones Chenren Xu, Sugang Li, Gang Liu, Yanyong Zhang, Emiliano Miluzzo, Yih-Farn Chen, Jun Li, Bernhard.
All features considered separately are relevant in a speech / music classification task. The fusion allows to raise the accuracy rate up to 94% for speech.
Study of Word-Level Accent Classification and Gender Factors
9 th Conference on Telecommunications – Conftele 2013 Castelo Branco, Portugal, May 8-10, 2013 Sara Candeias 1 Dirce Celorico 1 Jorge Proença 1 Arlindo.
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
Automatic detection of microchiroptera echolocation calls from field recordings using machine learning algorithms Mark D. Skowronski and John G. Harris.
Silent Classroom Timer
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Jacob Zurasky ECE5526 – Spring 2011
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
G AUSSIAN M IXTURE M ODELS David Sears Music Information Retrieval October 8, 2009.
Overview of Part I, CMSC5707 Advanced Topics in Artificial Intelligence KH Wong (6 weeks) Audio signal processing – Signals in time & frequency domains.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Music Information Retrieval Information Universe Seongmin Lim Dept. of Industrial Engineering Seoul National University.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
SPAM DETECTION AND FILTERING By Prasanna Kunchavaram.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
A NEW FEATURE EXTRACTION MOTIVATED BY HUMAN EAR Amin Fazel Sharif University of Technology Hossein Sameti, S. K. Ghiathi February 2005.
STATE LEVEL SYMPOSIUM ON SOCIAL ISSUES. Sample Slide Show This is the sample slide show illustrating the format of presentation for QEST-09.
Training begins in… 15:00 minutes Training begins in… 14:00 minutes.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Context-based vision system for place and object recognition Antonio Torralba Kevin Murphy Bill Freeman Mark Rubin Presented by David Lee Some slides borrowed.
Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga Maurício O. Tsugawa ©2002,
What’s Next? Why are we here and what are we going to do? Why are we here and what are we going to do?
MSc Project Musical Instrument Identification System MIIS Xiang LI ee05m216 Supervisor: Mark Plumbley.
Timbre and Memory An experiment for the musical mind Emily Yang Yu Music 151, 2008.
Physical Science Week 30 Sound and Light. Monday Warm Up (133) If you beat a drum harder what characteristics of the sound wave will you change and how?
Sound Controlled Smoke Detector Group 67 Meng Gao, Yihao Zhang, Xinrui Zhu 1.
What Is Labview What Is Labview Laboratory Virtual Instrument Engineering Workbench An Environment For Graphical Programming (G Language)
Detection Of Anger In Telephone Speech Using Support Vector Machine and Gaussian Mixture Model Prepared By : Siti Marahaini Binti Mahamood.
BTV Solo Review. BTV Solo functions as a strong beginning point for individuals who wish to make music such as the pros. Unlike other beat making software,
Course Outline (6 Weeks) for Professor K.H Wong
What Helped Dana in physics go from a 54 exam score to a 91, and end up with a 90 on the final? Dana’s Problem: She used the example problems in the book.
Music and Arts Project March – December 2010.
Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan
Voice conversion using Artificial Neural Networks
Elements of Music.
Presented by Steven Lewis
Presentation for EEL6586 Automatic Speech Processing
On Convolutional Neural Network
AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION
Presentation on Timbre Similarity
How-to Research Paper Still got time to fix it!.
Automatic Prosodic Event Detection
Presentation transcript:

Finding a single voice in music Christine Smit April 26, 2007

Outline Introduction Introduction Classification Strategies: Classification Strategies: Counting silent frequency bins Counting silent frequency bins Pitch cancellation Pitch cancellation MFCCs MFCCs Trading recall for precision Trading recall for precision What worked and what didn’t What worked and what didn’t

Introduction What am I doing?

What is a ‘single voice’? a single note sounding at a time a single note sounding at a time

Why do this? single voice finder + instrument identifier = instrument sample library

What are the data sets? training set: 10 1-minute samples training set: 10 1-minute samples test set: 10 1-minute test samples test set: 10 1-minute test samples 25% single voice, 75% multi-voice/silence 25% single voice, 75% multi-voice/silence mixture of classical and folk music mixture of classical and folk music

What characterizes a single voice? non-solo solonon-solo

What characterizes a single voice?

Strategies

Strategy #1: Silence detection find silence silent HMM? music silence counts raw classification Nothing really worked

Strategy #2: Pitch Cancellation music filtered music raw classification final classification filter pitch single voice? HMM

Strategy #3: MFCCs MFCC GMM HMM music 13 features likelihood final classification

Trading recall for precision

Quick reminder Precision = out of the stuff we got, how much of it was right? Precision = out of the stuff we got, how much of it was right? Are google’s results relevant? Recall = out of all the right stuff, how much did we get? Recall = out of all the right stuff, how much did we get? If I asked google for the UN, did I get all the UN’s websites?

Precision is important If I have a large enough database, I can afford to have relatively low recall. But I want high precision so what I do get is what I want. If I have a large enough database, I can afford to have relatively low recall. But I want high precision so what I do get is what I want.

Strategy #2: Pitch Cancellation music filtered music raw classification final classification filter pitch single voice? HMM

Strategy #3: MFCCs MFCC GMM HMM music 13 features likelihood final classification

Results

Strategy #1: Silence detection (just for comparison)

Strategy #2: Pitch Cancellation

Strategy #3: MFCCs

Conclusion Silence detection really didn’t work out. Silence detection really didn’t work out. MFCCs + GMM is really just as good as pitch cancellation MFCCs + GMM is really just as good as pitch cancellation At 90% precision, I get about 25% recall. At 90% precision, I get about 25% recall.

Acknowledgements Much thanks to Professor Ellis for his assistance on this project.

Questions?