Thursday, November 13, 2008 ASA 156: Statistical Approaches for Analysis of Music and Speech Audio Signals AudioDB: Scalable approximate nearest-neighbor.

Slides:

Advertisements

Similar presentations

Song Intersection by Approximate Nearest Neighbours Michael Casey, Goldsmiths Malcolm Slaney, Yahoo! Inc.

Advertisements

Aggregating local image descriptors into compact codes

Presented by Xinyu Chang

VisualRank: Applying PageRank to Large-Scale Image Search Yushi Jing, Member, IEEE, and Shumeet Baluja, Member, IEEE.

Franz de Leon, Kirk Martinez Web and Internet Science Group  School of Electronics and Computer Science  University of Southampton {fadl1d09,

High Dimensional Search Min-Hashing Locality Sensitive Hashing

MIT CSAIL Vision interfaces Towards efficient matching with random hashing methods… Kristen Grauman Gregory Shakhnarovich Trevor Darrell.

Efficiently searching for similar images (Kristen Grauman)

LAM: Musical Audio Similarity Michael Casey Centre for Cognition, Computation and Culture Department of Computing Goldsmiths College, University of London.

Probabilistic Fingerprints for Shapes Niloy J. MitraLeonidas Guibas Joachim GiesenMark Pauly Stanford University MPII SaarbrückenETH Zurich.

FINGER PRINTING BASED AUDIO RETRIEVAL Query by example Content retrieval Srinija Vallabhaneni.

A Novel Scheme for Video Similarity Detection Chu-Hong Hoi, Steven March 5, 2003.

Computer Vision Group, University of BonnVision Laboratory, Stanford University Abstract This paper empirically compares nine image dissimilarity measures.

Fast High-Dimensional Feature Matching for Object Recognition David Lowe Computer Science Department University of British Columbia.

Volkan Cevher, Marco F. Duarte, and Richard G. Baraniuk European Signal Processing Conference 2008.

Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.

Results Audio Information Retrieval using Semantic Similarity Luke Barrington, Antoni Chan, Douglas Turnbull & Gert Lanckriet Electrical & Computer Engineering.

NCKU CSIE Visualization & Layout for Image Libraries Baback Moghaddam, Qi Tian IEEE Int’l Conf. on CVPR 2001 Speaker: 蘇琬婷.

Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.

1 Lecture 18 Syntactic Web Clustering CS

1998/5/21by Chang I-Ning1 ImageRover: A Content-Based Image Browser for the World Wide Web Introduction Approach Image Collection Subsystem Image Query.

Wedneday, January 21st, 2008 Comp. Sci. Colloquium The Problem with Music: Modeling Distance Distributions of Large Music Collections Prof. Michael Casey.

Similarity Search in High Dimensions via Hashing Aristides Gionis, Protr Indyk and Rajeev Motwani Department of Computer Science Stanford University presented.

Map-Reduce and Parallel Computing for Large-Scale Media Processing Youjie Zhou.

Spatio-chromatic image content descriptors and their analysis using Extreme Value theory Vasileios Zografos and Reiner Lenz

Statistics 270– Lecture 25. Cautions about Z-Tests Data must be a random sample Outliers can distort results Shape of the population distribution matters.

FLANN Fast Library for Approximate Nearest Neighbors

Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.

EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST

School of Information Technology & Electrical Engineering Multiple Feature Hashing for Real-time Large Scale Near-duplicate Video Retrieval Jingkuan Song*,

SISAP’08 – Approximate Similarity Search in Genomic Sequence Databases using Landmark-Guided Embedding Ahmet Sacan and I. Hakki Toroslu

Blind Pattern Matching Attack on Watermark Systems D. Kirovski and F. A. P. Petitcolas IEEE Transactions on Signal Processing, VOL. 51, NO. 4, April 2003.

MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.

Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.

IEEE Int'l Symposium on Signal Processing and its Applications 1 An Unsupervised Learning Approach to Content-Based Image Retrieval Yixin Chen & James.

Music Information Retrieval Information Universe Seongmin Lim Dept. of Industrial Engineering Seoul National University.

NEAREST NEIGHBORS ALGORITHM Lecturer: Yishay Mansour Presentation: Adi Haviv and Guy Lev 1.

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.

Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.

Singer similarity / identification Francois Thibault MUMT 614B McGill University.

Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.

Query Sensitive Embeddings Vassilis Athitsos, Marios Hadjieleftheriou, George Kollios, Stan Sclaroff.

Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.

An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)

Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.

Outline Problem Background Theory Extending to NLP and Experiment

Chapter 13 (Prototype Methods and Nearest-Neighbors )

Ch8.2 Ch8.2 Population Mean Test Case I: A Normal Population With Known Null hypothesis: Test statistic value: Alternative Hypothesis Rejection Region.

Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.

Randomized Algorithms Part 3 William Cohen 1. Outline Randomized methods - so far – SGD with the hash trick – Bloom filters – count-min sketches Today:

Content-Based MP3 Information Retrieval Chueh-Chih Liu Department of Accounting Information Systems Chihlee Institute of Technology 2005/06/16.

Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Statistical Significance Hypothesis Testing.

Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,

Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.

Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.

AP STATISTICS LESSON 11 – 1 (DAY 2) The t Confidence Intervals and Tests.

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.

Syntactic Clustering of the Web By Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, Geoffrey Zweig CSCI 572 Ameya Patil Syntactic Clustering of the.

Multiple Feature Hashing for Real-time Large Scale

A review of audio fingerprinting (Cano et al. 2005)

Fast nearest neighbor searches in high dimensions Sami Sieranoja

EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture

Introduction to Music Information Retrieval (MIR)

K Nearest Neighbor Classification

Chapter 9 Hypothesis Testing

Minwise Hashing and Efficient Search

Topological Signatures For Fast Mobility Analysis

Presentation transcript:

Thursday, November 13, 2008 ASA 156: Statistical Approaches for Analysis of Music and Speech Audio Signals AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing Michael A. Casey Digital Musics Dartmouth College, Hanover, NH

Scalable Similarity 8M tracks in commercial collection 8M tracks in commercial collection PByte of multimedia data PByte of multimedia data Require passage-level retrieval (~ 2 bars) Require passage-level retrieval (~ 2 bars) Require scalable nearest-neighbor methods Require scalable nearest-neighbor methods

Specificity Partial track retrieval Partial track retrieval Alternate versions: remix, cover, live, album Alternate versions: remix, cover, live, album Task is mid-high specificity Task is mid-high specificity

Example: remixing Original Track Original Track Remix 1 Remix 1 Remix 2 Remix 2 Remix 3 Remix 3

Audio Shingles, concatenate l frames of m dimensional features A shingle is defined as: Shingles provide contextual information about features Originally used for Internet search engines: Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, Geoffrey Zweig:Steven C. GlassmanMark S. ManasseGeoffrey Zweig “Syntactic Clustering of the Web”. Computer Networks 29(8-13): (1997)Computer Networks 29 Related to N-grams, overlapping sequences of features Applied to audio domain by Casey and Slaney : Casey, M. Slaney, M. “The Importance of Sequences in Musical Similarity”, in Proc. IEEE Int. Conf. onIEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP 2006Acoustics, Speech and Signal Processing, ICASSP 2006

Audio Shingle Similarity

, a query shingle drawn from a query track {Q}, database of audio tracks indexed by (n), a database shingle from track n Shingles are normalized to unit vectors, therefore: For shingles with M dimensions (M=l.m); m=12, 20; l=30,40

Open source: google: “audioDB” Open source: google: “audioDB” Management of tracks, sequences, salience Management of tracks, sequences, salience Automatic indexing parameters Automatic indexing parameters OMRAS2, Yahoo!, AWAL, CHARM, more… OMRAS2, Yahoo!, AWAL, CHARM, more… Web-services interface (SOAP / JSON) Web-services interface (SOAP / JSON) Implementation of LSH for large N ~ 1B Implementation of LSH for large N ~ 1B 1-10 ms whole-track retrieval from 1B vectors 1-10 ms whole-track retrieval from 1B vectors AudioDB: Shingle Nearest Neighbor Search

Whole-track similarity Often want to know which tracks are similar Often want to know which tracks are similar Similarity depends on specificity of task Similarity depends on specificity of task Distortion / filtering / re-encoding (high) Distortion / filtering / re-encoding (high) Remix with new audio material (mid) Remix with new audio material (mid) Cover song: same song, different artist (mid) Cover song: same song, different artist (mid)

Whole-track resemblance: radius-bounded search Compute the number of shingle collisions between two tracks:

Whole-track resemblance: radius-bounded search Compute the number of shingle collisions between two tracks: Requires a threshold for considering shingles to be related Need a way to estimate relatedness (threshold) for data set

Statistical approaches to modeling distance distributions

Distribution of minimum distances Database: 1.4 million shingles. The left bump is the minimum between 1000 randomly selected query shingles and this database. The right bump is a small sampling (1/ ) of the full histogram of all distances.

Radius-bounded retrieval performance: cover song (opus task) Performance depends critically on xthresh, the collision threshold Want to estimate xthresh automatically from unlabelled data

Order Statistics Minimum-value distribution is analytic Minimum-value distribution is analytic Estimate the distribution parameters Estimate the distribution parameters Substitute into minimum value distribution Substitute into minimum value distribution Define a threshold in terms of FP rate Define a threshold in terms of FP rate This gives an estimate of xthresh This gives an estimate of xthresh

Estimating xthresh from unlabelled data Use theoretical statistics Use theoretical statistics Null Hypothesis: Null Hypothesis: H 0 : shingles are drawn from unrelated tracks H 0 : shingles are drawn from unrelated tracks Assume elements i.i.d., normally distributed Assume elements i.i.d., normally distributed M dimensional shingles, d effective degrees of freedom: M dimensional shingles, d effective degrees of freedom: Squared distance distribution for H 0 Squared distance distribution for H 0

ML for background distribution Likelihood for N data points (distances squared) d = effective degrees of freedom M = shingle dimensionality

Background distribution parameters Likelihood for N data points (distances squared) d = effective degrees of freedom M = shingle dimensionality

Minimum value over N samples

Minimum value distribution of unrelated shingles

Estimate of xthresh, false positive rate

Unlabelled data experiment Unlabelled data set Unlabelled data set Known to contain: Known to contain: cover songs (same work, different performer) cover songs (same work, different performer) Near duplicate recordings (misattribution, encoding) Near duplicate recordings (misattribution, encoding) Estimate background distance distribution Estimate background distance distribution Estimate minimum value distribution Estimate minimum value distribution Set xthresh so FP rate is <= 1% Set xthresh so FP rate is <= 1% Whole-track retrieval based on shingle collisions Whole-track retrieval based on shingle collisions

Cover song retrieval

Scaling Locality sensitive hashing Locality sensitive hashing Trade-off approximate NN for time complexity Trade-off approximate NN for time complexity 3 to 4 orders of magnitude speed-up 3 to 4 orders of magnitude speed-up No noticeable degradation in performance No noticeable degradation in performance For optimal radius threshold For optimal radius threshold

LSH

Remix retrieval via LSH

Current deployment Large commercial collections Large commercial collections AWAL ~ 100,000 tracks AWAL ~ 100,000 tracks Yahoo! 2M+ tracks, related song classifier Yahoo! 2M+ tracks, related song classifier AudioDB: open-source, international consortium of developers AudioDB: open-source, international consortium of developers Google: “audioDB” Google: “audioDB”

Conclusions Radius-bounded retrieval model for tracks Radius-bounded retrieval model for tracks Shingles preserve temporal information, high d Shingles preserve temporal information, high d Implements mid-to-high specificity search Implements mid-to-high specificity search Optimal radius threshold from order statistics Optimal radius threshold from order statistics null hypothesis: shingles are drawn from unrelated tracks null hypothesis: shingles are drawn from unrelated tracks LSH requires radius bound, automatic estimate LSH requires radius bound, automatic estimate Scales to 1B shingles+ using LSH Scales to 1B shingles+ using LSH

Thanks Malcolm Slaney, Yahoo! Research Inc. Malcolm Slaney, Yahoo! Research Inc. Christophe Rhodes, Goldsmiths, U. of London Christophe Rhodes, Goldsmiths, U. of London Michela Magas, Goldsmiths, U. of London Michela Magas, Goldsmiths, U. of London Funding: EPSRC: EP/E02274X/1 Funding: EPSRC: EP/E02274X/1