“Which Line Is it Anyway”:

Slides:



Advertisements
Similar presentations
Introduction to Machine Learning BITS C464/BITS F464
Advertisements

Imbalanced data David Kauchak CS 451 – Fall 2013.
Data Visualization STAT 890, STAT 442, CM 462
Three kinds of learning
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Introduction to machine learning
School of Engineering and Computer Science Victoria University of Wellington Copyright: Peter Andreae, VUW Image Recognition COMP # 18.
Clustering Prof. Ramin Zabih
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Preliminary Transformations Presented By: -Mona Saudagar Under Guidance of: - Prof. S. V. Jain Multi Oriented Text Recognition In Digital Images.
Neural Networks Lecture 4 out of 4. Practical Considerations Input Architecture Output.
Naïve Bayes Classification Recitation, 1/25/07 Jonathan Huang.
Introduction to Machine Learning, its potential usage in network area,
CSC 108H: Introduction to Computer Programming
Brief Intro to Machine Learning CS539
Big data classification using neural network
Hybrid Deep Learning for Reflectance Confocal Microscopy Skin Images
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
AP CSP: Cleaning Data & Creating Summary Tables
Machine Learning for Computer Security
Learn all about anger and healthy ways to cope!
Information Organization: Overview
Semi-Supervised Clustering
Deep learning David Kauchak CS158 – Fall 2016.
Support Vector Machines
Deep Learning Amin Sobhani.
an introduction to: Deep Learning
Ananya Das Christman CS311 Fall 2016
Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
An Artificial Intelligence Approach to Precision Oncology
School of Computer Science & Engineering
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Saliency-guided Video Classification via Adaptively weighted learning
Efficient Image Classification on Vertically Decomposed Data
Matt Gormley Lecture 16 October 24, 2016
cs540 - Fall 2015 (Shavlik©), Lecture 25, Week 14
Introduction to Data Science Lecture 7 Machine Learning Overview
Supervised Learning Seminar Social Media Mining University UC3M
Mean Shift Segmentation
Using Transductive SVMs for Object Classification in Images
Non-linear classifiers Neural networks
Machine Learning & Data Science
State-of-the-art face recognition systems
Finding Clusters within a Class to Improve Classification Accuracy
Efficient Image Classification on Vertically Decomposed Data
A Hybrid PCA-LDA Model for Dimension Reduction Nan Zhao1, Washington Mio2 and Xiuwen Liu1 1Department of Computer Science, 2Department of Mathematics Florida.
Brief Review of Recognition + Context
Machine Learning 101 Intro to AI, ML, Deep Learning
Lecture: Deep Convolutional Neural Networks
Course Introduction CSC 576: Data Mining.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Tonga Institute of Higher Education IT 141: Information Systems
CSSE463: Image Recognition Day 18
Autoencoders hi shea autoencoders Sys-AI.
Information Retrieval
Microarray Data Set The microarray data set we are dealing with is represented as a 2d numerical array.
Tonga Institute of Higher Education IT 141: Information Systems
Junheng, Shengming, Yunsheng 11/09/2018
Course Recap and What’s Next?
Information Organization: Overview
An introduction to: Deep Learning aka or related to Deep Neural Networks Deep Structural Learning Deep Belief Networks etc,
Automatic Handwriting Generation
Word representations David Kauchak CS158 – Fall 2016.
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Modeling IDS using hybrid intelligent systems
Semantic Segmentation
EE 193/Comp 150 Computing with Biological Parts
Evaluation David Kauchak CS 158 – Fall 2019.
Presentation transcript:

“Which Line Is it Anyway”: Clustering for Staff Line Removal in Optical Music Recognition Sam Vinitsky NAME

Optical Music Recognition: An Overview OMR is the task of taking an image of sheet music and converting it into a format readable by computers T: WTF is sheet music

Optical Music Recognition: An Overview OMR is the task of taking an image of sheet music and converting it into a format readable by computers T: WTF is sheet music

Sheet Music Primer Sheet music is the what musicians use to write down + read music Visual language T: Consists of staff lines and symbols

Sheet Music Primer: Staff Lines Traintracks that you follow along See a symbol, do a thing

Sheet Music Primer: Symbols Everything else is a “symbol” (black) Symbols tell you what to do What notes to play How loud to be Location with respect to staff lines matters T: Now that we’re all on the same page...

Back to Optical Music Recognition... Again, our goal... Computer readable format (many of these exist) T: You might be wondering “why do we want to do this”?

Applications of OMR: Handwritten Music Biggest application Taking handwritten music and converting it into a clean digital format (for mass producing or achive) people like to write by hand Easier Time consuming to digitize Other cool applications... T: So how are we gonna actually do this?

Optical Music Recognition: Pipeline

Optical Music Recognition: Pipeline Find and remove staff lines They’re just gonna get in the way We do care where they are though (symbols located wrt them) Write down for later

Optical Music Recognition: Pipeline Find and remove staff lines Find symbols Using CC’s

Optical Music Recognition: Pipeline Find and remove staff lines Find symbols Determine type of each symbol 1 = note 2 = treble clef 3 = sharp 4 = sharp 5 = three 6 = note ...

Optical Music Recognition: Pipeline Find and remove staff lines Find symbols Determine type of each symbol Determine how symbols relate to each other 1 = note (G4) 2 = treble clef (first clef) 3 = sharp (key sig with 4) 4 = sharp (key sig with 3) 5 = three (triplet for 6 8 11) 6 = note (E5, triplet 5) ... Semantically Hierarchical structure

Optical Music Recognition: Pipeline Find and remove staff lines Find symbols Determine type of each symbol Determine how symbols relate to each other Convert into computer readable format (musicXML, MIDI, etc...) 1 = note (G4) 2 = treble clef (first clef) 3 = sharp (key sig with 4) 4 = sharp (key sig with 3) 5 = three (triplet for 6 8 11) 6 = note (E5, triplet 5) ... THIS IS HARD Each step depends on the last one T: We’re just gonna focus on the first step...

Optical Music Recognition: Pipeline *** Find and remove staff lines *** Find symbols Determine type of each symbol Determine how symbols relate to each other Convert into computer readable format (musicXML, MIDI, etc...) 1 = note (G4) 2 = treble clef (first clef) 3 = sharp (key sig with 4) 4 = sharp (key sig with 3) 5 = three (triplet for 6 8 11) 6 = note (E5, triplet 5) ... Even this is a formidable task! T: let’s go

Staff Line Removal: Overview Input: original score image Output: staffless image To reiterate: get rid of staff lines T: Now you might be thinking to yourself, “Hey Sam, didn’t we already learn how to find lines in an image????” (If you forgot, Hough transform, we did a HW on it…)

Reminder: Hough Transform... The details don’t matter, because it doesn’t work T: This isn’t gonna work… big reason is….

Staff “Line” Removal... Staff lines aren’t always lines… T: Let’s zoom

Staff “Line” Removal...

Staff “Line” Removal...

Staff “Line” Removal... Curved due to page (scanning from a book) Thin → everything ruins it Rotations (can’t just derotate) Distortion or noise from scanning Need to keep this in mind → robust T: so how we actually gonna do this?

Staff Line Removal: Algorithms Handcrafted: Run-length [Carter & Bacon ‘92] Shortest Path [Cardoso et al. ‘08] Etc… [see Dalitz et al. ‘08] Handcrafted (use domain) Good if the music is perfect Usually not that good in PRACTICE (most scores aren’t perfect) Because not robust / don’t generalize T: Work to use ML to get better generalization

Staff Line Removal: Algorithms Handcrafted: Run-length [Carter & Bacon ‘92] Shortest Path [Cardoso et al. ‘08] Etc… [see Dalitz et al. ‘08] Supervised Machine Learning: (State of the Art) Classification of pixels [Calvo-Zarogoza et al. ‘16] Deep learning on image Convolutional Neural Networks [Calvo-Zarogoza et al. ‘17] Generative Adversarial Networks [Bhunia et al. ‘18] Supervised learning: (SOA) At pixel level or image level Pixel level use labelled data (pixels labelled staff + nonstaff) learn a rule to discriminate between them GAN/CNN: image level Images are LARGE (3000x1000) = SUPER COMPUTATIONALLY INTENSIVE Both have issued: Lots of VARIATION in sheet music style… / distortions / style Requires a good variety of labelled data to generalize well - (and we don’t have that cuz expensive to label) T: So I thought, how can we leverage the power of ML without needing so much labelled data?

Staff Line Removal: Algorithms Handcrafted: Run-length [Carter & Bacon ‘92] Shortest Path [Cardoso et al. ‘08] Etc… [see Dalitz et al. ‘08] Supervised Machine Learning: (State of the Art) Classification of pixels [Calvo-Zarogoza et al. ‘16] Deep learning on image Convolutional Neural Networks [Calvo-Zarogoza et al. ‘17] Generative Adversarial Networks [Bhunia et al. ‘18] Unsupervised Machine Learning: Clustering of pixels [Vinitsky ‘18] Unsupervised: Doesn’t need labelled data Robust + generalize well (FIXED BOTH OUR PROBLMES) T: Alg I came up with is to use clustering

Staff Line Removal via Pixel Clustering Algorithm: (Vinitsky ‘18) Convert each black pixel into a feature vector Idea: convert each pixel into a point in n-dimensional space (like a dot) T: How? Image modified from http://sli.ics.uci.edu/Classes-CS178-Notes/LinearClassify

Staff Line Removal via Pixel Clustering Image source: Calvo-Zaragova et al., ‘16 Lots of ways, just talk about one… Look at a window around each pixel, convert it into a row vector. This is a point in 2w^2 dimensional space… T: so again, the idea...

Staff Line Removal via Pixel Clustering Algorithm: (Vinitsky ‘18) Convert each black pixel into a feature vector Each point is a pixel’s window Image modified from http://sli.ics.uci.edu/Classes-CS178-Notes/LinearClassify

Staff Line Removal via Pixel Clustering Algorithm: (Vinitsky ‘18) Convert each black pixel into a feature vector “Cluster” the feature vectors into two groups Clustering into groups These all look alike... Image modified from http://sli.ics.uci.edu/Classes-CS178-Notes/LinearClassify

Staff Line Removal via Pixel Clustering Algorithm: (Vinitsky ‘18) Convert each black pixel into a feature vector “Cluster” the feature vectors into two groups Figure out which cluster is “staff” and which is “non-staff” Remove pixels from staff one Why should this work? Each image has hundreds of thousands of black pixels Staff and non-staff pixels look very different (in right FV) T: How did we do? Image modified from http://sli.ics.uci.edu/Classes-CS178-Notes/LinearClassify

Staff Line Removal: Results * = implemented ** = original algorithm Used standard datasets (handwritten + typeset) sampled 12 Compared to some of my other algo’s vs reported results of SOA (on the same handwritten) If you don’t know F1 score, think of it as accuracy from 0 to 1 (it’s not) Ours looks worse, but... Only one of mine that did okay on both datasets Not as good as SOA, but pretty close T: Qualitative results

Staff Line Removal: Results Clustering output Ground truth staff-less Typeset: Got almost all of the stafflines Notes look good Chopped up C (classify) Hashtags + clef degraded but recognizable T: handwritten

Staff Line Removal: Results Clustering output Ground truth staff-less Didn’t mess up any symbols too badly, Messed up lines did leave staff lines in… T: we did okay, how could we do better? Stuff I want to work on...

Future Work Clustering for Staff Line Removal: Other clustering methods I used K-means for finding the clusters… others out there

Future Work Clustering for Staff Line Removal: Other clustering methods Better feature vectors I tried a few different FV’s... Biggest factor

Future Work Clustering for Staff Line Removal: Other clustering methods Better feature vectors Picking the “staff” cluster There’s also the question of how to pick which cluster is staff First: picking the smaller one (naive) → ended up doing it by hand

Future Work Clustering for Staff Line Removal: Other clustering methods Better feature vectors Picking the “staff” cluster White pixels can be staff (due to noise) The issue that... 3 clusters - Background, staff, or nonstaff

Future Work Clustering for Staff Line Removal: Other clustering methods Better feature vectors Picking the “staff” cluster White pixels can be staff (due to noise) Optical Music Recognition: The rest of the pipeline...

References [1] Jorge Calvo-Zaragoza, Luisa Mico, and Jose Oncina. Music staff removal with supervised pixel classification. International Journal on Document Analysis and Recognition (IJDAR), 19(3):211–219, Sep 2016. [2] Jorge Calvo-Zaragoza, Jose J. Valero-Mas, and Antonio Pertusa. End-to-end optical music recognition using neural networks. In ISMIR, 2017. [3] Jaime Cardoso, Artur Capela, Ana Rebelo, and Carlos Guedes. A connected path approach for staff detection on a music score, 2008. [4] I. Fujinaga, M. Droettboom, C. Dalitz, and B. Pranzas. A comparative study of staff removal algorithms. IEEE Transactions on Pattern Analysis & Machine Intelligence, 30:753–766, 2008. [5] A. Konwer, A. K. Bhunia, A. Bhowmick, P. Banerjee, P. Pratim Roy, and U. Pal. Staff line Removal using Generative Adversarial Networks. ArXiv e-prints, 2018. [6] J. Novotny and J. Pokorny. Introduction to optical music recognition: Overview and practical challenges. 2015. [7] N.P. Carter, R.A. Bacon: Automatic Recognition of Printed Music. In H.S. Baird, H. Bunke, K. Yamamoto (editors): “Structured Document Image Analysis”, pp. 454-65, Springer, 1992.

OMR is the task of taking an image of sheet music and converting it into a format readable by computers T: WTF is sheet music

Staff Line Removal: Hough Transform (Issues) Original image Predicted staff (red) Actual staff (red) Suppose we DID find the line Hough will find ENTIRE line CROSSING over symbols, not stop at each symbol (we can modify this with trimming)

Staff Line Removal: Hough Transform (Issues) Original image Predicted staff-less Actual staff-less This will break each symbol into multiple CC’s...

Staff Line Removal: Hough Transform (Issues) Original image Predicted staff-less Actual staff-less So when we try to do step 2 (finding symbols), we’ll have trouble Step 3 (classification) is impossible There are techniques to choose which pixels to remove in a smart way (or put them back together), these tend to be #bad Keep this in mind when designing & effects evaluation (def of success) T: Okay so Hough Transform was bad, can we do better?

Staff Line Removal: Results Algorithm F1 - Typeset F1 - Handwritten Run-length* 0.9690 0.5956 Run-length (smarter)** 0.9841 0.0191 Clustering (4-dist)** 0.9040 0.4912 Clustering (window, w=3)** 0.7535 0.8007 Classifier (k-nn) n/a 0.8969 Classifier (SVM) 0.9603 Deep Learning (GAN’s) 0.9932 * = implemented ** = original algorithm Used standard datasets (handwritten + typeset) sampled 12 Compared to some of my other algo’s vs reported results of SOA (on the same handwritten) If you don’t know F1 score, think of it as accuracy from 0 to 1 (it’s not) Ours looks worse, but... Only one of mine that did okay on both datasets Not as good as SOA, but pretty close T: Qualitative results

Staff Line Removal: Results Algorithm F1 - Typeset F1 - Handwritten Run-length* 0.9690 0.5956 Run-length (smarter)** 0.9841 0.0191 Clustering (4-dist)** 0.9040 0.4912 Clustering (window, w=3)** 0.7535 0.8007 Classifier (k-nn) n/a 0.8969 Classifier (SVM) 0.9603 Deep Learning (GAN’s) 0.9932 * = implemented ** = original algorithm Used standard datasets (handwritten + typeset) sampled 12 Compared to some of my other algo’s vs reported results of SOA (on the same handwritten) If you don’t know F1 score, think of it as accuracy from 0 to 1 (it’s not) Ours looks worse, but... Only one of mine that did okay on both datasets Not as good as SOA, but pretty close T: Qualitative results