Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,

Slides:

Advertisements

Similar presentations

Speaker Associate Professor Ning-Han Liu. What’s MIR  Music information retrieval (MIR) is the interdisciplinary science of retrieving information from.

Advertisements

KARAOKE FORMATION Pratik Bhanawat (10bec113) Gunjan Gupta Gunjan Gupta (10bec112)

Franz de Leon, Kirk Martinez Web and Internet Science Group  School of Electronics and Computer Science  University of Southampton {fadl1d09,

Transform Techniques Mark Stamp Transform Techniques.

Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.

A System for Hybridizing Vocal Performance By Kim Hang Lau.

Outline Introduction Music Information Retrieval Classification Process Steps Pitch Histograms Multiple Pitch Detection Algorithm Musical Genre Classification.

Motivation Application driven -- VoD, Information on Demand (WWW), education, telemedicine, videoconference, videophone Storage capacity Large capacity.

/25 Singer Similarity A Brief Literature Review Catherine Lai MUMT-611 MIR March 24,

Evaluation of the Audio Beat Tracking System BeatRoot By Simon Dixon (JNMR 2007) Presentation by Yading Song Centre for Digital Music

Robust video fingerprinting system Daniel Luis

FINGER PRINTING BASED AUDIO RETRIEVAL Query by example Content retrieval Srinija Vallabhaneni.

Video Table-of-Contents: Construction and Matching Master of Philosophy 3 rd Term Presentation - Presented by Ng Chung Wing.

1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.

DEVON BRYANT CS 525 SEMESTER PROJECT Audio Signal MIDI Transcription.

Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.

T.Sharon 1 Internet Resources Discovery (IRD) Music IR.

Lyric alignment in popular songs Luong Minh Thang WING group meeting 12 Oct, 2007.

Multi-Resolution Analysis (MRA)

21 / 06 / 2000Segmentation of Sea-bed Images.1 Josepha UNIA Ecole Centrale de Lyon.

1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System Supervisor: Prof Michael Lyu Presented by: Lewis Ng,

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING MARCH 2010 Lan-Ying Yeh

/14 Automated Transcription of Polyphonic Piano Music A Brief Literature Review Catherine Lai MUMT-611 MIR February 17,

VEHICLE NUMBER PLATE RECOGNITION SYSTEM. Information and constraints Character recognition using moments. Character recognition using OCR. Signature.

Introduction to Sound Sounds are vibrations that travel though the air or some other medium A sound wave is an audible vibration that travels through.

LE 460 L Acoustics and Experimental Phonetics L-13

GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.

CSE &CSE Multimedia Processing Lecture 8. Wavelet Transform Spring 2009.

1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.

1 ELEN 6820 Speech and Audio Processing Prof. D. Ellis Columbia University Midterm Presentation High Quality Music Metacompression Using Repeated- Segment.

Transforms. 5*sin (2  4t) Amplitude = 5 Frequency = 4 Hz seconds A sine wave.

Temporal Compression Of Speech: An Evaluation IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 4, MAY 2008 Simon Tucker and Steve.

Student: Mike Jiang Advisor: Dr. Ras, Zbigniew W. Music Information Retrieval.

Blind Pattern Matching Attack on Watermark Systems D. Kirovski and F. A. P. Petitcolas IEEE Transactions on Signal Processing, VOL. 51, NO. 4, April 2003.

Music Information Retrieval -or- how to search for (and maybe find) music and do away with incipits Michael Fingerhut Multimedia Library and Engineering.

Modified Patchwork Algorithm: Anovel Audio Watermarking Scheme In-Kwon Yeo and Hyoung Joong Kim.

Information Security Principles Assistant Professor Dr. Sana’a Wafa Al-Sayegh 1 st Semester ITGD 2202 University of Palestine.

Multimodal Information Analysis for Emotion Recognition

Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.

Structure Discovery of Pop Music Using HHMM E6820 Project Jessie Hsu 03/09/05.

Similarity Matrix Processing for Music Structure Analysis Yu Shiu, Hong Jeng C.-C. Jay Kuo ACM Multimedia 2006.

More Meaningful Jargon Or, All You Need to Know to Speak Like a Geek Sound.

Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.

Computer Vision Lecture #10 Hossam Abdelmunim 1 & Aly A. Farag 2 1 Computer & Systems Engineering Department, Ain Shams University, Cairo, Egypt 2 Electerical.

Audio Tempo Extraction Presenter: Simon de Leon Date: February 9, 2006 Course: MUMT611.

Singer similarity / identification Francois Thibault MUMT 614B McGill University.

Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.

Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.

MSc Project Musical Instrument Identification System MIIS Xiang LI ee05m216 Supervisor: Mark Plumbley.

Content Based Color Image Retrieval vi Wavelet Transformations Information Retrieval Class Presentation May 2, 2012 Author: Mrs. Y.M. Latha Presenter:

Query by Singing and Humming System

1 Automatic Music Style Recognition Arturo Camacho.

Chapter 8 Lossy Compression Algorithms. Fundamentals of Multimedia, Chapter Introduction Lossless compression algorithms do not deliver compression.

Topic: Pitch Extraction

Introduction to Digital Audio

Audio Processing Mitch Parry. Resource! Sound Waves and Harmonic Motion.

[1] National Institute of Science & Technology Technical Seminar Presentation 2004 Suresh Chandra Martha National Institute of Science & Technology Audio.

1 Tempo Induction and Beat Tracking for Audio Signals MUMT 611, February 2005 Assignment 3 Paul Kolesnik.

Automatic Video Shot Detection from MPEG Bit Stream

Cristian Ferent and Alex Doboli

Introduction to Music Information Retrieval (MIR)

Term Project Presentation By: Keerthi C Nagaraj Dated: 30th April 2003

Image Segmentation Techniques

Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.

Ying Dai Faculty of software and information science,

Presenter: Simon de Leon Date: March 2, 2006 Course: MUMT611

Govt. Polytechnic Dhangar(Fatehabad)

Research Institute for Future Media Computing

Segmentation of Sea-bed Images.

Measuring the Similarity of Rhythmic Patterns

Presentation transcript:

Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia, Vol. 7, No. 1, February 2005 Mark A. Bartsch, Member, IEEE, and Gregory H. Wakefield, Member, IEEE

Introduction Multimedia content is growing rapidly Efficient method of browsing is necessary Indexing and retrieval methods are media- dependent

Primary goal Minimize audition time for a given type of media

Current methods Images –Downsampling Produces a smaller version of image (thumbnail) Reduces cost of delivery and display

Current methods Audio: speech –Symbolic representation Produces a transcript of the audio

What about music? Adapt an existing method: –Downsampling (time compression) Results in highly distorted, unintelligible audio

What about music? Adapt an existing method (cont’d): –Symbolic representation (score transcription) Extremely difficult Results in essentially meaningless information Does not convey other important elements: –Vocal style –Instruments used –Processing effects used

Essential problem: Adapting existing methods cannot reduce the audition time for music and effectively convey the “gist” of the song

Possible Solution: Audio thumbnailing via chroma- based analysis

Audio thumbnailing Produces a short clip of the selection to represent the “gist” of the song

Chroma-based analysis Based on the extraction of chroma features from the audio Chroma Feature Extraction Algorithm: –Frame Segmentation –Feature Calculation –Correlation Calculation –Correlation Filtering –Thumbnail Selection

Chroma Feature Extraction Extract frequencies from audio file Calculate chroma values from frequencies: Categorize chroma values into pitch classes –12 pitch classes: A, A#/Bb, C, C#/Db, …, G#/Ab

Frame Segmentation Author’s Implementation: –Determined via beat tracking algorithm –Range: 0.25s to 0.56s Our Implementation: –Average of range: 0.41s

Feature Calculation Calculate 12-element chroma feature vector, v t for each frame: –Apply FFT to each frequency: –Constraints: Minimum frequency: 20 Hz –Lower limit of human hearing Maximum frequency: 2000 Hz –Higher frequencies effect the perception of chroma

Correlation Calculation Calculate similarity matrix C –Each element is equal to the correlation between two feature vectors: –High correlation along diagonals in the matrix indicate repetitions within the song

Correlation Filtering Calculate the filtered time-lag matrix T: –Exposes similarity between extended segments that are separated by constant lag –Filtering is performed along the diagonals of C Uses a symmetric rectangular windowing function (a uniform moving average filter) –T is then “rotated” so that the diagonals are oriented vertically

Thumbnail Selection Select maximum value in T –The location of this value indicates: Occurrence of the segment (the y-coordinate) Lag time (the x-coordinate) –Constraints: Minimum lag time = 1/10 of song length Maximum start time = 3/4 of song length –To reduce susceptibility to “fading repeat”

Results Jimmy Buffet – “Math Sucks” –System: [64, 89] Lifehouse – “You and Me” –System: [38, 63] Gavin DeGraw – “I Don’t Want To Be” –System: [95, 120] Super Mario Brothers Theme –System: [18, 43]

Conclusion Successfully extracted time segments which closely match the chorus of the song Feature Calculation issue: –Author’s implementation unclear

Possible Uses Audio domain: –Improved search capability Searching for similar songs –Audio fingerprinting Other domains: –Detection of irregular heartbeats

Suggested Improvements and Alternatives Image-based analysis on the waveform Tested alternatives –MSE on signal frequencies Chroma-based analysis proved more correct