Wedneday, January 21st, 2008 Comp. Sci. Colloquium The Problem with Music: Modeling Distance Distributions of Large Music Collections Prof. Michael Casey.

Slides:



Advertisements
Similar presentations
Song Intersection by Approximate Nearest Neighbours Michael Casey, Goldsmiths Malcolm Slaney, Yahoo! Inc.
Advertisements

Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.
Aggregating local image descriptors into compact codes
Presented by Xinyu Chang
Fast Algorithms For Hierarchical Range Histogram Constructions
Latent Semantic Indexing (mapping onto a smaller space of latent concepts) Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading 18.
VisualRank: Applying PageRank to Large-Scale Image Search Yushi Jing, Member, IEEE, and Shumeet Baluja, Member, IEEE.
Big Data Lecture 6: Locality Sensitive Hashing (LSH)
High Dimensional Search Min-Hashing Locality Sensitive Hashing
Searching on Multi-Dimensional Data
MIT CSAIL Vision interfaces Towards efficient matching with random hashing methods… Kristen Grauman Gregory Shakhnarovich Trevor Darrell.
Efficiently searching for similar images (Kristen Grauman)
LAM: Musical Audio Similarity Michael Casey Centre for Cognition, Computation and Culture Department of Computing Goldsmiths College, University of London.
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
Probabilistic Fingerprints for Shapes Niloy J. MitraLeonidas Guibas Joachim GiesenMark Pauly Stanford University MPII SaarbrückenETH Zurich.
FINGER PRINTING BASED AUDIO RETRIEVAL Query by example Content retrieval Srinija Vallabhaneni.
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
A Novel Scheme for Video Similarity Detection Chu-Hong Hoi, Steven March 5, 2003.
Fast High-Dimensional Feature Matching for Object Recognition David Lowe Computer Science Department University of British Columbia.
Coherency Sensitive Hashing (CSH) Simon Korman and Shai Avidan Dept. of Electrical Engineering Tel Aviv University ICCV2011 | 13th International Conference.
Algorithms for Nearest Neighbor Search Piotr Indyk MIT.
Fast and Compact Retrieval Methods in Computer Vision Part II A. Torralba, R. Fergus and Y. Weiss. Small Codes and Large Image Databases for Recognition.
NCKU CSIE Visualization & Layout for Image Libraries Baback Moghaddam, Qi Tian IEEE Int’l Conf. on CVPR 2001 Speaker: 蘇琬婷.
Thursday, November 13, 2008 ASA 156: Statistical Approaches for Analysis of Music and Speech Audio Signals AudioDB: Scalable approximate nearest-neighbor.
Detecting Near Duplicates for Web Crawling Authors : Gurmeet Singh Mank Arvind Jain Anish Das Sarma Presented by Chintan Udeshi 6/28/ Udeshi-CS572.
1 Lecture 18 Syntactic Web Clustering CS
Similarity Search in High Dimensions via Hashing Aristides Gionis, Protr Indyk and Rajeev Motwani Department of Computer Science Stanford University presented.
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
Nearest Neighbor Retrieval Using Distance-Based Hashing Michalis Potamias and Panagiotis Papapetrou supervised by Prof George Kollios A method is proposed.
Video/Image Fingerprinting & Search Naren Chittar CS 223-B project, Winter 2008.
FLANN Fast Library for Approximate Nearest Neighbors
IIIT Hyderabad Atif Iqbal and Anoop Namboodiri Cascaded.
Indexing Techniques Mei-Chen Yeh.
School of Information Technology & Electrical Engineering Multiple Feature Hashing for Real-time Large Scale Near-duplicate Video Retrieval Jingkuan Song*,
Blind Pattern Matching Attack on Watermark Systems D. Kirovski and F. A. P. Petitcolas IEEE Transactions on Signal Processing, VOL. 51, NO. 4, April 2003.
Nearest Neighbor Paul Hsiung March 16, Quick Review of NN Set of points P Query point q Distance metric d Find p in P such that d(p,q) < d(p’,q)
Understanding The Semantics of Media Chapter 8 Camilo A. Celis.
Music Information Retrieval Information Universe Seongmin Lim Dept. of Industrial Engineering Seoul National University.
NEAREST NEIGHBORS ALGORITHM Lecturer: Yishay Mansour Presentation: Adi Haviv and Guy Lev 1.
Click to edit Master subtitle style 2/23/10 Time and Space Optimization of Document Content Classifiers Dawei Yin, Henry S. Baird, and Chang An Computer.
Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
Query Sensitive Embeddings Vassilis Athitsos, Marios Hadjieleftheriou, George Kollios, Stan Sclaroff.
P ROBING THE L OCAL -F EATURE S PACE OF I NTEREST P OINTS Wei-Ting Lee, Hwann-Tzong Chen Department of Computer Science National Tsing Hua University,
An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)
Similarity Access for Networked Media Connectivity Pavel Zezula Masaryk University Brno, Czech Republic.
Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.
CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Outline Problem Background Theory Extending to NLP and Experiment
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
NTU & MSRA Ming-Feng Tsai
23 1 Christian Böhm 1, Florian Krebs 2, and Hans-Peter Kriegel 2 1 University for Health Informatics and Technology, Innsbruck 2 University of Munich Optimal.
Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
SIMILARITY SEARCH The Metric Space Approach
Visual Information Retrieval
A review of audio fingerprinting (Cano et al. 2005)
Fast nearest neighbor searches in high dimensions Sami Sieranoja
Efficient Image Classification on Vertically Decomposed Data
Traffic Sign Recognition Using Discriminative Local Features Andrzej Ruta, Yongmin Li, Xiaohui Liu School of Information Systems, Computing and Mathematics.
Efficient Image Classification on Vertically Decomposed Data
K Nearest Neighbor Classification
CS5112: Algorithms and Data Structures for Applications
Minwise Hashing and Efficient Search
Topological Signatures For Fast Mobility Analysis
Learning to Rank with Ties
Presentation transcript:

Wedneday, January 21st, 2008 Comp. Sci. Colloquium The Problem with Music: Modeling Distance Distributions of Large Music Collections Prof. Michael Casey Program in Digital Musics Dartmouth College, Hanover, NH

a.k.a. The Problem with Multimedia: Music Music Videos Videos Images

Scalable Similarity 8M tracks in commercial collection 8M tracks in commercial collection 6B Images on WWW 6B Images on WWW Require scalable nearest-neighbor methods Require scalable nearest-neighbor methods Increase scale, decrease search complexity Increase scale, decrease search complexity

Example: Hattogate

Example: Remixing / Sampling in Yahoo! Music Original Track Original Track Remix 1 Remix 1 Remix 2 Remix 2 Remix 3 Remix 3

Example: 3B Images in Flickr

Specificity Partial document (sub-track) retrieval Partial document (sub-track) retrieval Alternate versions: remix, cover, live, album Alternate versions: remix, cover, live, album Task is mid-high specificity Task is mid-high specificity

Machine Listening

Feature Extraction

Audio Shingles, concatenate l frames of m dimensional features A shingle is defined as: Shingles provide contextual information about features Originally used for Internet search engines: Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, Geoffrey Zweig:Steven C. GlassmanMark S. ManasseGeoffrey Zweig “Syntactic Clustering of the Web”. Computer Networks 29(8-13): (1997)Computer Networks 29 Related to N-grams, overlapping sequences of features Applied to audio domain by Casey and Slaney : Casey, M. Slaney, M. “The Importance of Sequences in Musical Similarity”, in Proc. IEEE Int. Conf. onIEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP 2006Acoustics, Speech and Signal Processing, ICASSP 2006

Audio Shingle Similarity

, a query shingle drawn from a query track {Q}, database of audio tracks indexed by (n), a database shingle from track n Shingles are normalized to unit vectors, therefore: For shingles with M dimensions (M=l.m); m=12, 20; l=30,40

AudioDB: Shingle Nearest Neighbor Search

Whole-track similarity Often want to know which tracks are similar Often want to know which tracks are similar Similarity depends on specificity of task Similarity depends on specificity of task Distortion / filtering / re-encoding (high) Distortion / filtering / re-encoding (high) Remix with new audio material (mid) Remix with new audio material (mid) Cover song: same song, different artist (mid) Cover song: same song, different artist (mid)

Whole-track resemblance: radius-bounded search Compute the number of shingle collisions between two tracks:

Whole-track resemblance: radius-bounded search Compute the number of shingle collisions between two tracks: Requires a threshold for considering shingles to be related Need a way to estimate relatedness (threshold) for data set

SCALE Mazurkas: 10,000 tracks ms features Mazurkas: 10,000 tracks ms features 3s clips (30 – 300 frames per vector) 3s clips (30 – 300 frames per vector) 12d – 20d features (360 – 600d vectors) 12d – 20d features (360 – 600d vectors) Yahoo! Music Yahoo! Music 6M tracks 6M tracks 1000 vectors per track 1000 vectors per track (6M x 1k)^2 search for near neighbours (6M x 1k)^2 search for near neighbours

LSH

Approximate Near Neighbor Matching

Approximate near neighbors In many applications we need only near neghbors In many applications we need only near neghbors We can exploit this by allowing a degree of approximation in retrieval We can exploit this by allowing a degree of approximation in retrieval

Space partitioning

Curse of dimensionality d=4d=8 d=1024 dist. Pr(dist)‏

Border effects in high d

ε-NN : approximate near neighbors

Setting the range

Hashing Types of hashes Types of hashes String : put Bash vs Bush in different bins String : put Bash vs Bush in different bins Locality sensitive : close matches in same bin Locality sensitive : close matches in same bin High-dimensional and probabilistic High-dimensional and probabilistic Nearest Neighbor implementations Nearest Neighbor implementations Pair-wise distance computation Pair-wise distance computation 1,000,000,000,000 comparisons in 2M song database 1,000,000,000,000 comparisons in 2M song database Hash bucket collisions Hash bucket collisions 1,000,000,000 hash projections 1,000,000,000 hash projections

Exact matching via hashing Audio fingerprinting Audio fingerprinting Shazzam, etc. Shazzam, etc. Make the feature robust Make the feature robust Use exact matching on integer hash Use exact matching on integer hash Find a sequence of hashes to identify specific recording or image Find a sequence of hashes to identify specific recording or image Drawback: only exact matches possible Drawback: only exact matches possible

Locality-Sensitive Hashing (Indyk-Motwani’98)‏ Hash functions are locality-sensitive, if, for a random hash random function h, for any pair of points p,q we have: –Pr[h(p)=h(q)] is “high” if p is “close” to q –Pr[h(p)=h(q)] is “low” if p is”far” from q

Locality Sensitive Hashing

Random Projections Random projections estimate distance Random projections estimate distance Multiple projections improve estimate Multiple projections improve estimate

h’s are locality-sensitive Pr[h(p)=h(q)]=(1-D(p,q)/d) k Pr[h(p)=h(q)]=(1-D(p,q)/d) k We can vary the probability by changing k We can vary the probability by changing k k=1k=2 distance Pr

LSH Random Projections 3d to 2d

Statistical approaches to modeling distance distributions

Distribution of minimum distances Database: 1.4 million shingles. The left bump is the minimum between 1000 randomly selected query shingles and this database. The right bump is a small sampling (1/ ) of the full histogram of all distances.

Radius-bounded retrieval performance: cover song (opus task) Performance depends critically on xthresh, the collision threshold Want to estimate xthresh automatically from unlabelled data

Order Statistics Minimum-value distribution is analytic Minimum-value distribution is analytic Estimate the distribution parameters Estimate the distribution parameters Substitute into minimum value distribution Substitute into minimum value distribution Define a threshold in terms of FP rate Define a threshold in terms of FP rate This gives an estimate of xthresh This gives an estimate of xthresh

Estimating xthresh from unlabelled data Use theoretical statistics Use theoretical statistics Null Hypothesis: Null Hypothesis: H 0 : shingles are drawn from unrelated tracks H 0 : shingles are drawn from unrelated tracks Assume elements i.i.d., normally distributed Assume elements i.i.d., normally distributed M dimensional shingles, d effective degrees of freedom: M dimensional shingles, d effective degrees of freedom: Squared distance distribution for H 0 Squared distance distribution for H 0

ML for background distribution Likelihood for N data points (distances squared) d = effective degrees of freedom M = shingle dimensionality

Background distribution parameters Likelihood for N data points (distances squared) d = effective degrees of freedom M = shingle dimensionality

Minimum value over N samples

Minimum value distribution of unrelated shingles

Estimate of xthresh, false positive rate

Unlabelled data experiment Unlabelled data set Unlabelled data set Known to contain: Known to contain: cover songs (same work, different performer) cover songs (same work, different performer) Near duplicate recordings (misattribution, encoding) Near duplicate recordings (misattribution, encoding) Estimate background distance distribution Estimate background distance distribution Estimate minimum value distribution Estimate minimum value distribution Set xthresh so FP rate is <= 1% Set xthresh so FP rate is <= 1% Whole-track retrieval based on shingle collisions Whole-track retrieval based on shingle collisions

Misattributions Joyce Hatto: 100% of known misattributions in first rank Joyce Hatto: 100% of known misattributions in first rank Sergie Fiorentino Sergie Fiorentino Eleven out of twenty-six Mazurkas performances on another Concert Artists/Fidelio disc, issued under the name of Sergio Fiorentino, are in fact copies of recordings by other artists. This is the first time that such practices have been found in the Concert Artist‘ Fidelio recordings issued other than under the name of Joyce Hatto, and prompts speculation as to how much more misattributed material remains to be found in the Concert Artists/Fidelio catalogue. Click here for further details. Eleven out of twenty-six Mazurkas performances on another Concert Artists/Fidelio disc, issued under the name of Sergio Fiorentino, are in fact copies of recordings by other artists. This is the first time that such practices have been found in the Concert Artist‘ Fidelio recordings issued other than under the name of Joyce Hatto, and prompts speculation as to how much more misattributed material remains to be found in the Concert Artists/Fidelio catalogue. Click here for further details.Click hereClick here

Cover song retrieval

Scaling Locality sensitive hashing Locality sensitive hashing Trade-off approximate NN for time complexity Trade-off approximate NN for time complexity 3 to 4 orders of magnitude speed-up 3 to 4 orders of magnitude speed-up No noticeable degradation in performance No noticeable degradation in performance For optimal radius threshold For optimal radius threshold

Remix retrieval via LSH

Open source: google: “audioDB” Open source: google: “audioDB” Management of tracks, sequences, salience Management of tracks, sequences, salience Automatic indexing parameters Automatic indexing parameters OMRAS2, Yahoo!, AWAL, CHARM, more… OMRAS2, Yahoo!, AWAL, CHARM, more… Web-services interface (SOAP / JSON) Web-services interface (SOAP / JSON) Implementation of LSH for large N ~ 1B Implementation of LSH for large N ~ 1B 1-10 ms whole-track retrieval from 1B vectors 1-10 ms whole-track retrieval from 1B vectors AudioDB: Shingle Nearest Neighbor Search

Current deployment Large commercial collections Large commercial collections AWAL ~ 100,000 tracks AWAL ~ 100,000 tracks Yahoo! 2M+ tracks, related song classifier Yahoo! 2M+ tracks, related song classifier Flickr 1B+ Images Flickr 1B+ Images AudioDB: open-source, international consortium of developers AudioDB: open-source, international consortium of developers Google: “audioDB” Google: “audioDB”

Conclusions Radius-bounded retrieval model for tracks Radius-bounded retrieval model for tracks Shingles preserve temporal information, high d Shingles preserve temporal information, high d Implements mid-to-high specificity search Implements mid-to-high specificity search Optimal radius threshold from order statistics Optimal radius threshold from order statistics null hypothesis: shingles are drawn from unrelated tracks null hypothesis: shingles are drawn from unrelated tracks LSH requires radius bound, automatic estimate LSH requires radius bound, automatic estimate Scales to 1B shingles+ using LSH Scales to 1B shingles+ using LSH

Thanks Malcolm Slaney, Yahoo! Research Inc. Malcolm Slaney, Yahoo! Research Inc. Christophe Rhodes, Goldsmiths, U. of London Christophe Rhodes, Goldsmiths, U. of London Michela Magas, Goldsmiths, U. of London Michela Magas, Goldsmiths, U. of London Funding: EPSRC: EP/E02274X/1 Funding: EPSRC: EP/E02274X/1