Music Information Retrieval from a Singing Voice Using Lyrics and Melody Information Motoyuki Suzuki, Toru Hosoya, Akinori Ito, and Shozo Makino EURASIP.

Slides:



Advertisements
Similar presentations
Speaker Associate Professor Ning-Han Liu. What’s MIR  Music information retrieval (MIR) is the interdisciplinary science of retrieving information from.
Advertisements

KARAOKE FORMATION Pratik Bhanawat (10bec113) Gunjan Gupta Gunjan Gupta (10bec112)
Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai
A Robust Algorithm for Pitch Tracking David Talkin Hsiao-Tsung Hung.
Lyric alignment in popular songs Luong Minh Thang.
Evaluation of the Audio Beat Tracking System BeatRoot By Simon Dixon (JNMR 2007) Presentation by Yading Song Centre for Digital Music
Confidence Measures for Speech Recognition Reza Sadraei.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Berenzweig and Ellis - WASPAA 011 Locating Singing Voice Segments Within Music Signals Adam Berenzweig and Daniel P.W. Ellis LabROSA, Columbia University.
Data Mining Techniques Outline
ADVISE: Advanced Digital Video Information Segmentation Engine
Localized Key-Finding: Algorithms and Applications Ilya Shmulevich, Olli Yli-Harja Tampere University of Technology Tampere, Finland October 4, 1999.
Speaker Adaptation for Vowel Classification
Jonah Shifrin, Bryan Pardo, Colin Meek, William Birmingham
Lyric alignment in popular songs Luong Minh Thang WING group meeting 12 Oct, 2007.
9/20/2004Speech Group Lunch Talk Speaker ID Smorgasbord or How I spent My Summer at ICSI Kofi A. Boakye International Computer Science Institute.
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
The Chinese University of Hong Kong Department of Computer Science and Engineering Lyu0202 Advanced Audio Information Retrieval System.
Grammars, Languages and Finite-state automata Languages are described by grammars We need an algorithm that takes as input grammar sentence And gives a.
Speech Technology Lab Ƅ ɜ: m ɪ ŋ ǝ m EEM4R Spoken Language Processing - Introduction Training HMMs Version 4: February 2005.
Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING MARCH 2010 Lan-Ying Yeh
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Advanced Multimedia Music Information Retrieval Tamara Berg.
Sound Applications Advanced Multimedia Tamara Berg.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Overview of NIT HMM-based speech synthesis system for Blizzard Challenge 2011 Kei Hashimoto, Shinji Takaki, Keiichiro Oura, and Keiichi Tokuda Nagoya.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Improving Utterance Verification Using a Smoothed Na ï ve Bayes Model Reporter : CHEN, TZAN HWEI Author :Alberto Sanchis, Alfons Juan and Enrique Vidal.
Utterance Verification for Spontaneous Mandarin Speech Keyword Spotting Liu Xin, BinXi Wang Presenter: Kai-Wun Shih No.306, P.O. Box 1001,ZhengZhou,450002,
7-Speech Recognition Speech Recognition Concepts
1 Computational Linguistics Ling 200 Spring 2006.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
Overview of Part I, CMSC5707 Advanced Topics in Artificial Intelligence KH Wong (6 weeks) Audio signal processing – Signals in time & frequency domains.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Takeshi SAITOU 1, Masataka GOTO 1, Masashi UNOKI 2 and Masato AKAGI 2 1 National Institute of Advanced Industrial Science and Technology (AIST) 2 Japan.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
MMDB-8 J. Teuhola Audio databases About digital audio: Advent of digital audio CD in Order of magnitude improvement in overall sound quality.
Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.
Performance Comparison of Speaker and Emotion Recognition
QBSH Corpus The QBSH corpus provided by Roger Jang [1] consists of recordings of children’s songs from students taking the course “Audio Signal Processing.
Musical / Rhythmic Intelligence A. K. A. Music Smart!
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Query by Singing and Humming System
1 Hidden Markov Model: Overview and Applications in MIR MUMT 611, March 2005 Paul Kolesnik MUMT 611, March 2005 Paul Kolesnik.
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences Recurrent Neural Network-based Language Modeling for an Automatic.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Classification of melody by composer using hidden Markov models Greg Eustace MUMT 614: Music Information Acquisition, Preservation, and Retrieval.
Confidence Measures As a Search Guide In Speech Recognition Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering, University.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.
A Tutorial on Speaker Verification First A. Author, Second B. Author, and Third C. Author.
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
Student Gesture Recognition System in Classroom 2.0 Chiung-Yao Fang, Min-Han Kuo, Greg-C Lee, and Sei-Wang Chen Department of Computer Science and Information.
David Sears MUMT November 2009
Statistical Models for Automatic Speech Recognition
Introduction to Music Information Retrieval (MIR)
LTI Student Research Symposium 2004 Antoine Raux
Computational Learning Theory
A maximum likelihood estimation and training on the fly approach
Speech recognition, machine learning
Speaker Identification:
Measuring the Similarity of Rhythmic Patterns
Music Signal Processing
Auditory Morphing Weyni Clacken
Speech recognition, machine learning
Presentation transcript:

Music Information Retrieval from a Singing Voice Using Lyrics and Melody Information Motoyuki Suzuki, Toru Hosoya, Akinori Ito, and Shozo Makino EURASIP Journal on Advances in Signal Processing, Volume 2007 presenter : 王崇喆

2 Outline Overview of the system Lyrics recognition based on a finite state automaton Verification of hypothesis using melody information Experiments Discussion

3 Overview of the system Each hypothesis h has the following information Song name S(h) Recognized text W(h) Recognition score R(h) Time alignment information F(h)

4 Lyrics recognition based on a finite state automaton Acoustic model was trained from read speech MLLR (maximum likelihood linear regression) method was used as an adaptation algorithm.

5 Verification of hypothesis using melody information The melody information (relative pitch/IOI of each note) can be calculated from the tune in the database extracted using the estimated pitch sequence of the singing voice and time alignment information Pitch sequence is calculated by the praat system frame-by-frame The pitch of the note is defined as the median of the pitch sequence corresponding to the note IOI of the note is obtained as the duration between boundaries.

6 Experiments(1/2) The number of songs in the database was 156 Test queries : singing voice, each of which consisted of 5 words Singers : 6 male university students 110 choruses were collected as song data The total number of test queries was hypotheses were output from the lyrics recognizer Some similar hypotheses were output as another hypotheses W(h) is slightly different from W(h), even though S(h) is exactly the same as S(h) The maximum retrieval accuracy was limited to 97.4%

7 Experiments(2/2)

8 Discussion The proposed system assumes that the input singing voice consists of a part of the correct lyrics the lyrics recognizer can correctly recognize because of the grammatical restriction of FSA. The proposed system was examined using a very small database. When the system is applied to practical use, following 2 problems will be occurred computation time in the lyrics recognition step pre-selection algorithm would be needed before lyrics recognition deterioration of the recognition performance There are many songs which have similar lyrics in the large database these misrecognition can be corrected by using melody information

9 The End \ ⊙▽⊙ /