Lyric alignment in popular songs Luong Minh Thang.

Slides:



Advertisements
Similar presentations
Toward Automatic Music Audio Summary Generation from Signal Analysis Seminar „Communications Engineering“ 11. December 2007 Patricia Signé.
Advertisements

Outline Introduction Music Information Retrieval Classification Process Steps Pitch Histograms Multiple Pitch Detection Algorithm Musical Genre Classification.
Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science.
LAM: Musical Audio Similarity Michael Casey Centre for Cognition, Computation and Culture Department of Computing Goldsmiths College, University of London.
Unsupervised learning. Summary from last week We explained what local minima are, and described ways of escaping them. We investigated how the backpropagation.
Motivations Performance Analysis Artistic Visualization for Performance.
Chord Recognition EE6820 Speech and Audio Signal Processing and Recognition Mid-term Presentation JunHao Ip.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
1 Texmex – November 15 th, 2005 Strategy for the future Global goal “Understand” (= structure…) TV and other MM documents Prepare these documents for applications.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
On-line Learning with Passive-Aggressive Algorithms Joseph Keshet The Hebrew University Learning Seminar,2004.
On Recognizing Music Using HMM Following the path craved by Speech Recognition Pioneers.
Create Photo-Realistic Talking Face Changbo Hu * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.
Lyric alignment in popular songs Luong Minh Thang WING group meeting 12 Oct, 2007.
A Supervised Approach for Detecting Boundaries in Music using Difference Features and Boosting Douglas Turnbull Computer Audition Lab UC San Diego, USA.
Systems Analysis and Design in a Changing World, 6th Edition
FYP0202 Advanced Audio Information Retrieval System By Alex Fok, Shirley Ng.
Introduction to Automatic Speech Recognition
Human Emotion Synthesis David Oziem, Lisa Gralewski, Neill Campbell, Colin Dalton, David Gibson, Barry Thomas University of Bristol, Motion Ripper, 3CR.
Overview of NIT HMM-based speech synthesis system for Blizzard Challenge 2011 Kei Hashimoto, Shinji Takaki, Keiichiro Oura, and Keiichi Tokuda Nagoya.
Educational Software using Audio to Score Alignment Antoine Gomas supervised by Dr. Tim Collins & Pr. Corinne Mailhes 7 th of September, 2007.
Speech Recognition with Hidden Markov Models Winter 2011
 Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors.
1 ELEN 6820 Speech and Audio Processing Prof. D. Ellis Columbia University Midterm Presentation High Quality Music Metacompression Using Repeated- Segment.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 On-line Learning of Sequence Data Based on Self-Organizing.
Probabilistic Context Free Grammars for Representing Action Song Mao November 14, 2000.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
Fundamentals of Music Processing
Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,
Overview of Part I, CMSC5707 Advanced Topics in Artificial Intelligence KH Wong (6 weeks) Audio signal processing – Signals in time & frequency domains.
Structure Discovery of Pop Music Using HHMM E6820 Project Jessie Hsu 03/09/05.
Similarity Matrix Processing for Music Structure Analysis Yu Shiu, Hong Jeng C.-C. Jay Kuo ACM Multimedia 2006.
Understanding The Semantics of Media Chapter 8 Camilo A. Celis.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
TECHNIQUES IN YOUR PSA PERSUASIVE TECHNIQUES. HOW CAN I MAKE THEM “SEE” IT? Persuasion can be visually created through: powerful images – video, photographs,
Temple University Training Acoustic model using Sphinx Train Jaykrishna shukla,Mubin Amehed& cara Santin Department of Electrical and Computer Engineering.
Automatic Detection of Social Tag Spams Using a Text Mining Approach Hsin-Chang Yang Associate Professor Department of Information Management National.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Music Information Retrieval from a Singing Voice Using Lyrics and Melody Information Motoyuki Suzuki, Toru Hosoya, Akinori Ito, and Shozo Makino EURASIP.
A Semi-Blind Technique for MIMO Channel Matrix Estimation Aditya Jagannatham and Bhaskar D. Rao The proposed algorithm performs well compared to its training.
Introduction to String Kernels Blaz Fortuna JSI, Slovenija.
PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Topic cluster of Streaming Tweets based on GPU-Accelerated Self Organizing Map Group 15 Chen Zhutian Huang Hengguang.
Sparse Granger Causality Graphs for Human Action Classification Saehoon Yi and Vladimir Pavlovic Rutgers, The State University of New Jersey.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-25: Vowels cntd and a “grand” assignment.
Introduction to JPEG m Akram Ben Ahmed
1 Hidden Markov Model: Overview and Applications in MIR MUMT 611, March 2005 Paul Kolesnik MUMT 611, March 2005 Paul Kolesnik.
Compression techniques Adaptive and non-adaptive.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
Statistical techniques for video analysis and searching chapter Anton Korotygin.
Fundamentals of Music Processing Chapter 3: Music Synchronization Meinard Müller International Audio Laboratories Erlangen
Essential components of the implementation are:  Formation of the network and weight initialization routine  Pixel analysis of images for symbol detection.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Audio Books for Phonetics Research CatCod2008 Jiahong Yuan and Mark Liberman University of Pennsylvania Dec. 4, 2008.
Course Outline (6 Weeks) for Professor K.H Wong
MATCH A Music Alignment Tool Chest
Artificial Intelligence for Speech Recognition
Pushpak Bhattacharyya CSE Dept., IIT Bombay
Computational analysis on folk music of Cyprus Internal report
Brian Whitman Paris Smaragdis MIT Media Lab
Overview Identify similarities present in biological sequences and present them in a comprehensible manner to the biologists Objective Capturing Similarity.
Audio Books for Phonetics Research
Born in Vienna, Austria in 1874 Died in California in 1951
NON-FICTION UNIT 5th Grade
Identifying Functions
Measuring the Similarity of Rhythmic Patterns
Presentation transcript:

Lyric alignment in popular songs Luong Minh Thang

Outline Project description & techniques Some important knowledge Base-line system

Outline Project description & techniques Some important knowledge Base-line system

Project description Given: textual transcription of lyrics acoustic musical signal of a song Purpose: find the timestamps for the beginning and ending points for each line of the song.

Techniques to be investigated A repetition based technique for detecting self-similarities in both the audio and the text Dynamic programming (also known as dynamic time warping) will be employed to align the repetition analysis of both media together to produce an alignment

Outline Project description & techniques Some important knowledge Base-line system

Chroma vectors 1 octaves with 12 semitones 12-dimensional chroma vector C C# D … B C C# D…B C C# D…B C C# D…B Tone heights 1 octaves with 12 semitones

Dynamic time warping algorithm Actually, dynamic programming

Word similarity using phoneme dictionary Using CMU phoneme dictionary Each word is decomposed into phoneme sequence ALIGNMENT: AH0 L AY1 N M AH0 N T ALGORITHMS: AE1 L G ER0 IH2 DH AH0 M Z Similarity of 2 words is the similarity of the 2 phoneme sequences

Outline Project description & techniques Some important knowledge Base-line system

Base-line system: overview Musical signal inputText input Chroma-vector calculations Simplification Text processing Music notation sequence Word sequence Word self- similarity matrix Mapping Aligning Phoneme dictionary Symbol Self- similarity matrix

Base-line system: overview Musical signal inputText input Chroma-vector calculations Simplification Text processing Music notation sequence Word sequence Word self- similarity matrix Mapping Aligning Phoneme dictionary Symbol Self- similarity matrix

Base-line system: overview Musical signal inputText input Chroma-vector calculations Simplification Text processing Music notation sequence Word sequence Word self- similarity matrix Mapping Aligning Phoneme dictionary Symbol Self- similarity matrix

Base-line system: overview Musical signal inputText input Chroma-vector calculations Simplification Text processing Music notation sequence Word sequence Symbol Self- similarity matrix Mapping Aligning Phoneme dictionary Word self- similarity matrix

Base-line system: overview Musical signal inputText input Chroma-vector calculations Simplification Text processing Music notation sequence Word sequence Symbol Self- similarity matrix Mapping Aligning Phoneme dictionary Word self- similarity matrix

Thank you !