Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.

Slides:

Advertisements

Similar presentations

KARAOKE FORMATION Pratik Bhanawat (10bec113) Gunjan Gupta Gunjan Gupta (10bec112)

Advertisements

Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),

Rapid Object Detection using a Boosted Cascade of Simple Features Paul Viola, Michael Jones Conference on Computer Vision and Pattern Recognition 2001.

Rapid Object Detection using a Boosted Cascade of Simple Features Paul Viola, Michael Jones Conference on Computer Vision and Pattern Recognition 2001.

Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.

Franz de Leon, Kirk Martinez Web and Internet Science Group  School of Electronics and Computer Science  University of Southampton {fadl1d09,

Face detection Many slides adapted from P. Viola.

Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei Li,

FINGER PRINTING BASED AUDIO RETRIEVAL Query by example Content retrieval Srinija Vallabhaneni.

Content-Based Classification, Search & Retrieval of Audio Erling Wold, Thom Blum, Douglas Keislar, James Wheaton Presented By: Adelle C. Knight.

Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.

Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.

Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson

A Robust Real Time Face Detection. Outline  AdaBoost – Learning Algorithm  Face Detection in real life  Using AdaBoost for Face Detection  Improvements.

Distinctive image features from scale-invariant keypoints. David G. Lowe, Int. Journal of Computer Vision, 60, 2 (2004), pp Presented by: Shalomi.

A Robust Real Time Face Detection. Outline  AdaBoost – Learning Algorithm  Face Detection in real life  Using AdaBoost for Face Detection  Improvements.

Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.

Introduction to Wavelets

Foundations of Computer Vision Rapid object / face detection using a Boosted Cascade of Simple features Presented by Christos Stoilas Rapid object / face.

F ACE D ETECTION FOR A CCESS C ONTROL By Dmitri De Klerk Supervisor: James Connan.

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING MARCH 2010 Lan-Ying Yeh

Information Retrieval in Practice

Representing Acoustic Information

VEHICLE NUMBER PLATE RECOGNITION SYSTEM. Information and constraints Character recognition using moments. Character recognition using OCR. Signature.

ECE 8443 – Pattern Recognition ECE 3163 – Signals and Systems Objectives: Pattern Recognition Feature Generation Linear Prediction Gaussian Mixture Models.

GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.

Instrument Recognition in Polyphonic Music Jana Eggink Supervisor: Guy J. Brown University of Sheffield

Educational Software using Audio to Score Alignment Antoine Gomas supervised by Dr. Tim Collins & Pr. Corinne Mailhes 7 th of September, 2007.

EADS DS / SDC LTIS Page 1 7 th CNES/DLR Workshop on Information Extraction and Scene Understanding for Meter Resolution Image – 29/03/07 - Oberpfaffenhofen.

Detecting Pedestrians Using Patterns of Motion and Appearance Paul Viola Microsoft Research Irfan Ullah Dept. of Info. and Comm. Engr. Myongji University.

Implementing a Speech Recognition System on a GPU using CUDA

1 Recognition by Appearance Appearance-based recognition is a competing paradigm to features and alignment. No features are extracted! Images are represented.

Window-based models for generic object detection Mei-Chen Yeh 04/24/2012.

Jacob Zurasky ECE5526 – Spring 2011

Supervisor: Dr. Eddie Jones Co-supervisor: Dr Martin Glavin Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification.

Multimodal Information Analysis for Emotion Recognition

MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.

TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.

Overview of Part I, CMSC5707 Advanced Topics in Artificial Intelligence KH Wong (6 weeks) Audio signal processing – Signals in time & frequency domains.

Music Information Retrieval Information Universe Seongmin Lim Dept. of Industrial Engineering Seoul National University.

Combining Audio Content and Social Context for Semantic Music Discovery José Carlos Delgado Ramos Universidad Católica San Pablo.

NON-NEGATIVE MATRIX FACTORIZATION FOR REAL TIME MUSICAL ANALYSIS AND SIGHT-READING EVALUATION Chih-Chieh Cheng, Diane J. Hu, and Lawrence K. Saul, UC San.

Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.

ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska

The Viola/Jones Face Detector A “paradigmatic” method for real-time object detection Training is slow, but detection is very fast Key ideas Integral images.

A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER

Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.

EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,

MSc Project Musical Instrument Identification System MIIS Xiang LI ee05m216 Supervisor: Mark Plumbley.

Quiz Week 8 Topical. Topical Quiz (Section 2) What is the difference between Computer Vision and Computer Graphics What is the difference between Computer.

Notes on HW 1 grading I gave full credit as long as you gave a description, confusion matrix, and working code Many people’s descriptions were quite short.

DYNAMIC TIME WARPING IN KEY WORD SPOTTING. OUTLINE KWS and role of DTW in it. Brief outline of DTW What is training and why is it needed? DTW training.

ADAPTIVE BABY MONITORING SYSTEM Team 56 Michael Qiu, Luis Ramirez, Yueyang Lin ECE 445 Senior Design May 3, 2016.

Sound Controlled Smoke Detector Group 67 Meng Gao, Yihao Zhang, Xinrui Zhu 1.

CS262: Computer Vision Lect 06: Face Detection

A review of audio fingerprinting (Cano et al. 2005)

ARTIFICIAL NEURAL NETWORKS

Artist Identification Based on Song Analysis

Traffic Sign Recognition Using Discriminative Local Features Andrzej Ruta, Yongmin Li, Xiaohui Liu School of Information Systems, Computing and Mathematics.

Session 7: Face Detection (cont.)

Supervised Time Series Pattern Discovery through Local Importance

Image Segmentation Techniques

Presented by Steven Lewis

ADABOOST(Adaptative Boosting)

Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa

Adaboost for faces. Material

AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION

Presenter: Simon de Leon Date: March 2, 2006 Course: MUMT611

Measuring the Similarity of Rhythmic Patterns

Auditory Morphing Weyni Clacken

Presentation transcript:

Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome

A short project ’ s overview: S ong recognition. Recognition of musical tracks by measuring the distance between the signal’s fingerprint. Feature extraction Finding the most compatible fingerprints and thresholds in order to classify music in different categories such as leading instrument, genre and mood. We are putting an emphasis on automatically acquiring data from the music signal itself. Our assumption is that the music itself encapsulates sufficient information which will enable us to recognize songs and features in songs.

The Product – a brief reminder: We are building a Matlab based package which addresses these Challenges: Building a fingerprinting system for recognition and estimation of distance between musical tracks. The system is flexible enough to enable us to consider different types of parameters. Testing the different fingerprinting parameters and thresholds and finding the most compatible ones for every application (e.g. song recognition, feature extraction). Performing song recognition and demonstrating it using a Graphical User Interface.

Song Recognition Upon receiving a song snippet (possibly noisy), find the closest match from a database of songs. Signal over time First we record the song:

Spectrogram - STFT One frame’s spectrum (3 sec) Peaks vector

We've conducted a song-recognition test which takes 3 frames out of each song, in random offset and random length (2-4 seconds), and for each frame: - creates their fingerprint. - compare the prints to find the closest 3-second frame print in the db. - build a vector of similarities between this frame to all db songs (based on the closest frame of each song). Results: Accurate: 99.4% success (number of frames successfully identified / total number of frames). Fast: About 2 seconds to recognize a song among 500 songs. Even without optimizations, our method is scalable to large databases. When testing with 4000 song DB, no degradation was noticed. We got good results for noisy analog recordings as well (using a computer mic). Test Results:

The resulting similarity matrix size is 1500 test frames x 500 songs. We got a diagonal matrix which indicates that each frame is most similar to the fingerprint of its original song in the db.

Feature extraction classify music in categories such as leading instrument, genre, mood, etc. A computer vision approach: we observed that audio researches commonly employ 2-D representations such as spectrogram, when analyzing sound or speech. We apply a current computer vision technique: boosted classifiers on local object-recognition features. By learning these “images”, we extract features from the song.

Create a raw vector of candidate filters: We first transform the audio data into a Mel Frequency Cepstral Coefficients matrix: 20 rows of MFCC coefficients X 1200 columns (40 frames in sec X 30-sec snippet). On this matrix we apply a set of filters (Viola - Jones object detection) in order to capture important time-frequency characteristics of audio. We are roughly applying 3500 filters. Candidate filter set (Haar basis function). The value of a two-rectangle feature is the difference between the sum of the pixels within two rectangular regions. Training (offline): We apply a multi-label boosting algorithm (Adaboost) which makes a use of a weak learner in order to select a compact subset of these filters, which best divides the data (with minimal error), according to a specific feature. The algorithm returns the thresholds for each feature. Output: A function that takes a song snippet as input, and returns whether its genre is rock for Instance, or if it includes a leading guitar or a flute.

Results: Haven’t accomplished the expected results yet: Training error lowered to zero but testing error stayed quite high. Possible reasons: Problematic database Inappropriate features selection.

Demonstration idea : Record the music playing at the background of a user’s office (taken by the lap-top recorder for example), recognize the song and send back through the internet relevant information. Record the music Send back online information Recognize the song from the songs DB Music is playing at A user’s living room

GUI Prototype Porting of on-line algorithms to python, to enable GUI demonstration: