Chris Burges, John Platt, Jon Goldstein, Erin Renshaw Microsoft Research Name That Tune: Stream Audio Fingerprinting.

Slides:



Advertisements
Similar presentations
Matthias Gruhne, Page 1 Fraunhofer Institut Integrierte Schaltungen Robust Audio Identification for Commercial Applications Matthias.
Advertisements

Learning Introductory Signal Processing Using Multimedia 1 Outline Overview of Information and Communications Some signal processing concepts Tools available.
Chapter 3: PCM Noise and Companding
A Graduate Course on Multimedia Technology 3. Multimedia Communication © Wolfgang Effelsberg Media Scaling and Media Filtering Definition of.
aperiodic periodic Production of /z/: Covariation and weighting of harmonically decomposed streams for ASR Introduction Pitch-scaled harmonic filter Recognition.
Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.
Eigenfaces for Recognition Presented by: Santosh Bhusal.
EE 690 Design of Embodied Intelligence
                      Digital Audio 1.
03/18/2005ENEE408G Spring 2005 Multimedia Signal Processing 1 ENEE408G: Capstone Design Project: Multimedia Signal Processing Design Project 4: Digital.
Digital Signal Processing
Pro Tools 7 Session Secrets Chapter 6: After the Bounce or Life Outside of Pro Tools Life Outside of Pro Tools.
EigenFaces.
Speech Compression. Introduction Use of multimedia in personal computers Requirement of more disk space Also telephone system requires compression Topics.
Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
LYRIC-BASED ARTIST NETWORK METHODOLOGY Derek Gossi CS 765 Fall 2014.
Extracting Noise-Robust Features from Audio Data Chris Burges, John Platt, Erin Renshaw, Soumya Jana* Microsoft Research *U. Illinois, Urbana/Champaign.
FINGER PRINTING BASED AUDIO RETRIEVAL Query by example Content retrieval Srinija Vallabhaneni.
Consider Acquisition of Patent Rights Explore A Mutually Beneficial Business Opportunity 1.
Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.
Principal Component Analysis
Redundant Bit Vectors for the Audio Fingerprinting Server John Platt Jonathan Goldstein Chris Burges.
Reduced Support Vector Machine
Detecting Image Region Duplication Using SIFT Features March 16, ICASSP 2010 Dallas, TX Xunyu Pan and Siwei Lyu Computer Science Department University.
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.
PCA Channel Student: Fangming JI u Supervisor: Professor Tom Geoden.
Protection for Web Delivered Music Patcharinee Tientrakool EE 6886: Topics in Signal Processing - Multimedia Security System.
The Chinese University of Hong Kong Department of Computer Science and Engineering Lyu0202 Advanced Audio Information Retrieval System.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Self-organizing Learning Array based Value System — SOLAR-V Yinyin Liu EE690 Ohio University Spring 2005.
The audacious program Audacity Audacity might be worth a look. Suggest... Audacity is an easy to use audio production and mixing program, which enables.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Video Streaming © Nanda Ganesan, Ph.D..
BROADCASTING #1 way to reach your audience! NOTE BUY MORE TV & RADIO – IT WORKS! NOTE BUY MORE TV & RADIO – IT WORKS!
WINDOWS XP PROFESSIONAL Bilal Munir Mughal Chapter-1 1.
Blind Pattern Matching Attack on Watermark Systems D. Kirovski and F. A. P. Petitcolas IEEE Transactions on Signal Processing, VOL. 51, NO. 4, April 2003.
Copyright Protection of Images Based on Large-Scale Image Recognition Koichi Kise, Satoshi Yokota, Akira Shiozaki Osaka Prefecture University.
1 Recognition by Appearance Appearance-based recognition is a competing paradigm to features and alignment. No features are extracted! Images are represented.
Multimodal Information Analysis for Emotion Recognition
Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.
Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,
09/30/2005ENEE408G Fall 2005 Multimedia Signal Processing 1 ENEE408G: Capstone Design Project: Multimedia Signal Processing Design Project 2: Digital Audio.
AGENDA Pick Up Quiz from the Side of the Room
Hidden Markov Classifiers for Music Genres. Igor Karpov Rice University Comp 540 Term Project Fall 2002.
Basic Concepts of Audio Watermarking. Selection of Different Approaches Embedding Domain  time domain  frequency domain DFT, DCT, etc. Modulation Method.
Serverless Network File Systems Overview by Joseph Thompson.
Introduction Advantage of DSP: - Better signal quality & repeatable performance - Flexible  Easily modified (Software Base) - Handle more complex processing.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
EE Audio Signals and Systems Room Acoustics Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Audio Manipulation Through Gesticulation Garrett Fosdick, Jair Robinson José Sanchez Bradley University - Electrical & Computer Engineering October 6,
Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
© ExplorNet’s Centers for Quality Teaching and Learning 1 Objective % Understand advanced post-production methods for digital audio.
Robodog Frontal Facial Recognition AUTHORS GROUP 5: Jing Hu EE ’05 Jessica Pannequin EE ‘05 Chanatip Kitwiwattanachai EE’ 05 DEMO TIMES: Thursday, April.
Audio Fingerprinting Overview: RARE Algorithms, Resources Chris Burges, John Platt, Jon Goldstein, Erin Renshaw
Standard Demo 1 © Hacking Team All Rights Reserved.
Copyright © Texas Education Agency, All rights reserved.1 Digital & Interactive Media Digital Audio Editing.
[1] National Institute of Science & Technology Technical Seminar Presentation 2004 Suresh Chandra Martha National Institute of Science & Technology Audio.
Audio Fingerprinting Wes Hatch MUMT-614 Mar.13, 2003.
Digital & Interactive Media
CS262: Computer Vision Lect 09: SIFT Descriptors
Ken Gunnells, Ph.D. - Networking Paul Crigler - Programming
Machine Learning Feature Creation and Selection
Basic Concepts of Audio Watermarking
In summary C1={skin} C2={~skin} Given x=[R,G,B], is it skin or ~skin?
Analysis of Audio Using PCA
Neural Networks ICS 273A UC Irvine Instructor: Max Welling
ADBOT Advertisement Recognition FROM television and radio broadcast
Presentation transcript:

Chris Burges, John Platt, Jon Goldstein, Erin Renshaw Microsoft Research Name That Tune: Stream Audio Fingerprinting

Outline What is audio fingerprinting? Robust Audio Recognition Engine Demo Some ideas for applications Other companies doing audio fingerprinting How does RARE work? How well does RARE work?

What is Audio Fingerprinting? Prepare database: Given audio clips, extract a fingerprint from each clip, store Clip is not changed, unlike watermarking Run: Compare traces, computed from an audio stream, against your database of stored fingerprints, to identify whats playing. Confirm using second fingerprint.

Why Is This A Research Topic? - Audio Feature Extraction - Map high dim space of audio samples to low dim feature space Must be robust to likely input distortions Must be informative (different audio segments map to distant points) Must be efficient (use small fraction of resources on typical PC)

Identifies clips based on 6 second fingerprints 6 seconds mapped to 64 floats, generated once every seconds (or sec) Each 64-float vector checked against large database of fingerprints (e.g., 240,000 ) End-to-end system requires approx. 10% CPU on 746 MHz PC Confirmatory fingerprint can be used to add further robustness The RARE Fingerprinting System

How Confirmatory Fingerprint Works

Server Internet Client... Feature Extraction Audio stream Lookup Audio stream identity Example Architecture Pruning

RARE Demo 7 songs chosen at random, out of 239,209 Play clean song, then play distorted version Critical Mass: Good Morning: Clip to distort The Glands: Free Jane: Old Radio filter Eddie Cantor: Joe Is Here: 8% Time Compress Ella Fitzgerald: Somebody Loves Me: Add Reverb Perl Bailey: They Didnt Believe Me: Repeat Echo Next: Stop, Drop and Roll: 20dB Noise Gate Joy Division: Digital: Add treble to distort

Outline What is audio fingerprinting? Robust Audio Recognition Engine Demo Some possible applications Other companies doing audio fingerprinting How does RARE work? How well does RARE work?

Identifying Broadcast Music I Youre listening to FM radio at home Youd like to know what just played… …your eHome media center can tell you

Making WMP Smarter Youve made your own CDs by mixing and matching songs from your collection Youd like Windows Media Player to tell you whats playing!

Identifying Broadcast Music II Youre listening to FM radio on the road Youd like to know what just played, by holding a cell phone or PDA up to the source

Database Cleaning Youre managing a large database of audio Youd like to automatically identify and remove duplicates Youd like to add metadata automatically to the rest

Royalty Assessment / DRM / Commercial Tracking ASCAP wants to know what radio stations in a given area are playing Businesses want to know if the commercial they paid for aired, and when, and if it was lexiconed Content providers want to know when pirated content is played

Outline What is audio fingerprinting? Robust Audio Recognition Engine Demo Some ideas for applications Other companies doing audio fingerprinting How does RARE work? How well does RARE work?

Other AF Efforts AUDIBLE MAGIC: Partnered with Loudeye, CMJ, SESAC. RELATABLE: Used by Napster. Uses 30 sec. MOODLOGIC: File based, p2p royalty tracking. PHILIPS: Cell-phone based music recognition service. SHAZAM: Cell-phone based, UK market MSN Music: File based

Why use RARE? RARE is flexible. Features are NOT hard- wired – they can easily be re-learned to handle new kinds of noise RARE is very robust and very fast This is Microsofts own technology: easily molded to different applications, we own the code and IP (2 patents filed)

Outline What is audio fingerprinting? Robust Audio Recognition Engine Demo Some ideas for applications Other companies doing audio fingerprinting How does RARE work? How well does RARE work?

Feature Extraction Feature Extraction

De-Equalization BeforeAfter De-equalize by flattening the log spectrum.

De-Equalization Details Goal: Remove slow variation in frequency space

Perceptual Thresholding So: remove coefficients that are below a perceptual threshold to lower unwanted variance.

Feature Extraction, cont. Feature Extraction, cont.

Distortion Discriminant Analysis

Oriented PCA δ2δ2 δ1δ1 Want δ 1 much smaller than δ 2

The variance of the signal along unit vector is, where is the signal covariance matrix The sum of squared differences of (signal- noisy signal) projected along is, where is the correlation matrix of the noise Find that maximize Oriented PCA, Details

Distortion Discriminant Analysis DDA is a convolutional neural network with weights trained using OPCA

RARE Training Data s audio segments, catted (67 min) Compute 11 distortions Perform one layer of OPCA Subtract mean projections of train data, norm noise to unit variance Repeat Use 10 20s segments as validation set to compute scale factor for final projections

The Distortions Original 3:1 compress above 30dB Light Grunge Distort Bass Mackie Mid boost filter Old Radio Phone RealAudio© Compander

Outline What is audio fingerprinting? Robust Audio Recognition Engine Demo Some ideas for applications Other companies doing audio fingerprinting How does RARE work? How well does RARE work?

Test Results – Large Test Set 500 songs, 36 hours of music One stored fingerprint computed for every song, about 10s in, random start point Test alignment noise, false positive and negative rates, same with distortions Define distance between two clips as minimum distance between fingerprint of first, and all traces of the second Approx. 3.5 x10 8 chances for a false positive

DDA for Alignment Robustness Train second layer with alignment distortion:

Test Results: Positives

Test Results: Negatives Smallest negative score: 0.14, largest positive score: 0.026

Tests for Other Distortions Take all 500 test clips, grouped 10x50 Add random alignment shift to all clips Distort each group (10 different distortions) RESULTS: false positive rate , false negative rate 0.2% But that was without using confirmation!

Tests for Other Distortions, cont. First 500 out of ¼ million

Lookup: Computational Cost Overall system uses < 10% CPU on 850MHz P3 for 240K fingerprints Current lookup can handle 160 streams on a quad 2 GHz machine. With local caching, server caching, local pruning, and taking advantage of P4 architecture, expect at least a factor of 10 faster: 1,600 streams / machine

Bitvector yields 30x Speedup

Conclusions RARE gives Microsoft robust, fast, stream audio fingerprinting RARE is raring to go… For technical details: Contact aliases: cburges, jplatt