Audio Fingerprinting Overview: RARE Algorithms, Resources Chris Burges, John Platt, Jon Goldstein, Erin Renshaw

Slides:

Advertisements

Similar presentations

FINITE WORD LENGTH EFFECTS

Advertisements

Chris Burges, John Platt, Jon Goldstein, Erin Renshaw Microsoft Research Name That Tune: Stream Audio Fingerprinting.

Audio Compression ADPCM ATRAC (Minidisk) MPEG Audio –3 layers referred to as layers I, II, and III –The third layer is mp3.

Guerino Mazzola (Fall 2014 © ): Introduction to Music Technology IIIDigital Audio III.6 (Fr Oct 24) The MP3 algorithm with PAC.

MPEG-1 MUMT-614 Jan.23, 2002 Wes Hatch. Purpose of MPEG encoding To decrease data rate How? –two choices: could decrease sample rate, but this would cause.

Data Compression CS 147 Minh Nguyen.

Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.

Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.

LYRIC-BASED ARTIST NETWORK METHODOLOGY Derek Gossi CS 765 Fall 2014.

Extracting Noise-Robust Features from Audio Data Chris Burges, John Platt, Erin Renshaw, Soumya Jana* Microsoft Research *U. Illinois, Urbana/Champaign.

FINGER PRINTING BASED AUDIO RETRIEVAL Query by example Content retrieval Srinija Vallabhaneni.

Copyright 2004 Monash University IMS5401 Web-based Systems Development Topic 2: Elements of the Web (g) Interactivity.

School of Computing Science Simon Fraser University

1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.

Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.

Redundant Bit Vectors for the Audio Fingerprinting Server John Platt Jonathan Goldstein Chris Burges.

SWE 423: Multimedia Systems Chapter 7: Data Compression (1)

Server-based Quiz Program. Three-Tier Internet Project Server-side Language (eg JAVA Server Pages) Data Base (eg MS Access with SQL) Tomcat (or Apache.

Week 2 IBS 685. Static Page Architecture The user requests the page by typing a URL in a browser The Browser requests the page from the Web Server The.

Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.

Grid Load Balancing Scheduling Algorithm Based on Statistics Thinking The 9th International Conference for Young Computer Scientists Bin Lu, Hongbin Zhang.

Computer Science 101 Web Access to Databases Overview of Web Access to Databases.

1 Kingdom of Saudi Arabia Prince Norah bint Abdul Rahman University College of Computer Since and Information System NET201.

Created in 2011 at Liberty High School. Getting Started Overview on Magnet Tool – Graphics – Text – Image – Video – Sound – Wall A Sample Glog How to.

8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.

E0262 MIS - Multimedia Playback Systems Anandi Giridharan Electrical Communication Engineering, Indian Institute of Science, Bangalore – , India.

South Dakota Library Network ALEPH v21 Staff User Upgrade Information Circulation and ILL South Dakota Library Network 1200 University, Unit 9672 Spearfish,

Audio Compression Usha Sree CMSC 691M 10/12/04. Motivation Efficient Storage Streaming Interactive Multimedia Applications.

: Chapter 12: Image Compression 1 Montri Karnjanadecha ac.th/~montri Image Processing.

LiveCycle Data Services Introduction Part 2. Part 2? This is the second in our series on LiveCycle Data Services. If you missed our first presentation,

Access Control Via Face Recognition Progress Review.

Implementing a Speech Recognition System on a GPU using CUDA

1 Wenguang WangRichard B. Bunt Department of Computer Science University of Saskatchewan November 14, 2000 Simulating DB2 Buffer Pool Management.

Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.

Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.

CSE332: Data Abstractions Lecture 8: Memory Hierarchy Tyler Robison Summer

NIBEDITA MAULIK GRAND SEMINAR PRESENTATION OCT 21 st 2002.

Scaling up Decision Trees. Decision tree learning.

A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.

1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.

DELETION SERVICE ISSUES ADC Development meeting

Meta-Server System Software Lab. Overview In the Music Virtual Channel system, clients can’t query for a song initiatively Through the metadata server,

Finding Near-Duplicate Web Pages: A Large-Scale Evaluation of Algorithms Author: Monika Henzinger Presenter: Chao Yan.

LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

Evidence from Content INST 734 Module 2 Doug Oard.

Audio Fingerprinting as a New Task for MIREX-2014 Chung-Che Wang Jyh-Shing Roger Jang.

I. Understanding Record Loading and EDIS II. Database Statistics & Top 10 Search III. Problem with merging records IV. Pseudo Tag (Special 035 Tag ) V.

Introduction to Information Retrieval Introduction to Information Retrieval CS276 Information Retrieval and Web Search Chris Manning and Pandu Nayak Efficient.

Chapter 5: MULTIMEDIA DATABASE MANAGEMENT SYSTEM ARCHITECTURE BIT 3193 MULTIMEDIA DATABASE.

MOIP – Music Over IP Bandwidth Considerations and Design Improvements Keo Malope Computer Engineering with Software Specialization.

MUSIC GENRE JUKEBOX. CLIENT SPECIFICATIONS Audio Player Create, delete, and modify play lists Play, pause, stop, skip, fast forward, and rewind Send Streaming.

Info Read SEGY Wavelet estimation New Project Correlate near offset far offset Display Well Tie Elog Strata Geoview Hampson-Russell References Create New.

Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:

Improvement of Apriori Algorithm in Log mining Junghee Jaeho Information and Communications University,

Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )

Audio Fingerprinting Wes Hatch MUMT-614 Mar.13, 2003.

Arithmetic for Computers Chapter 3 1. Arithmetic for Computers  Operations on integers  Addition and subtraction  Multiplication and division  Dealing.

Network Controllable MP3 Player

CS 591 S1 – Computational Audio – Spring 2017

Updating SF-Tree Speaker: Ho Wai Shing.

III Digital Audio III.6 (Fr Oct 20) The MP3 algorithm with PAC.

Query-Friendly Compression of Graph Streams

Audio Fingerprinting Wes Hatch MUMT-614 Mar.13, 2003.

Communication and Memory Efficient Parallel Decision Tree Construction

Sublinear Algorihms for Big Data

III Digital Audio III.6 (Mo Oct 22) The MP3 algorithm with PAC.

8. Efficient Scoring Most slides were adapted from Stanford CS 276 course and University of Munich IR course.

Network Controllable MP3 Player

Govt. Polytechnic Dhangar(Fatehabad)

Lu Tang , Qun Huang, Patrick P. C. Lee

Presentation transcript:

Audio Fingerprinting Overview: RARE Algorithms, Resources Chris Burges, John Platt, Jon Goldstein, Erin Renshaw

Let’s agree on names… A ‘fingerprint’ is a vector that represents a given audio clip. It lives in a database with a lot of other fingerprints. A ‘confirmation fingerprint’ is a second fingerprint used to confirm a match. A ‘trace’ is generated from audio every 186 ms. It’s computed exactly the same way as a fingerprint.

64 floats / frame In Database? Confirmed? 6 sec Analyze a Stream Design of the Funnel sec of distorted Song A 6 sec of Song B 6 sec of Song A Find 64 good projections of 6 seconds of audio If 1, declare match δ2δ2 δ1δ1 good projection Good projections maximize δ 2 / δ 1

Feature Extraction Feature Extraction (186 ms)

De-Equalization BeforeAfter De-equalize by flattening the log spectrum.

De-Equalization Details Goal: Remove slow variation in frequency space

Perceptual Thresholding Remove coefficients that are below a perceptual threshold to lower unwanted variance. … inaudible to human … audible to human

Project to 64 Floats

Bitvector yields 50x Speedup

Server Internet Client... Feature Extraction Audio stream Lookup Audio stream identity Example Architecture Optional Pruning

Client: Resources Computing traces takes approx 10% CPU on 750 MHz P3. However we can get speedup over the current DCT, since we’re only modifying the first 6 coefficients: O(Nlog(N)) → O(6N). Total data loaded by client is 2.1MB.

Client Side Options What can be done on the client side to off- load the server lookup? Three ideas (in addition to only querying untagged music, and adding ID3 tags when found): 1. Leverage Zipf’s law (if it holds!) 2. Reduce rate at which traces are sent 3. Prune traces on the client

Client Side Pruning – Local Lookup Having a database of fingerprints for e.g. the top 10,000 songs would significantly reduce server load, but we don’t know by how much. Also requires updates (e.g. weekly?) log(# times played) log(rank) Zipf’s Law

Client Options, cont. Can reduce sampling by factor of 2 (from 186 to 372 ms) at some (likely small) loss in accuracy. This would halve both client CPU and server load.

Client Side Pruning – Margin Trees Using a tree built from first 24 components: No overpopulating, but flip 5 most error- prone bits in each trace Gets a factor 2 reduction in throughput at 0.5% increase in false neg. for very noisy data Number of nodes in tree (for 254,885 fingerprints) was found to be 1,531,508 Requires updates (e.g. weekly?)

A note on the code Upper bound: 22,000 lines of C++. File- and stream-based versions use the same libraries.