Point-set algorithms for pattern discovery and pattern matching in music David Meredith Goldsmiths College University of London.

Slides:



Advertisements
Similar presentations
Arnd Christian König Venkatesh Ganti Rares Vernica Microsoft Research Entity Categorization Over Large Document Collections.
Advertisements

Algorithms for pattern matching and pattern discovery in music David Meredith Aalborg University.
Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:
Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
Introduction to Information Retrieval
Music Retrieval and Analysis
BRISK (Presented by Josh Gleason)
Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi.
Music Processing Algorithms David Meredith Department of Media Technology Aalborg University.
IEPAD: Information Extraction based on Pattern Discovery Chia-Hui Chang National Central University, Taiwan
Bar Ilan University And Georgia Tech Artistic Consultant: Aviya Amir.
Inverted Index Hongning Wang
Combinatorial Pattern Matching CS 466 Saurabh Sinha.
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2006 Wednesday, 12/6/06 String Matching Algorithms Chapter 32.
1 Query Languages. 2 Boolean Queries Keywords combined with Boolean operators: –OR: (e 1 OR e 2 ) –AND: (e 1 AND e 2 ) –BUT: (e 1 BUT e 2 ) Satisfy e.
Tree structured representation of music for polyphonic music information retrieval David Rizo Departament of Software and Computing Systems University.
Text Operations: Coding / Compression Methods. Text Compression Motivation –finding ways to represent the text in fewer bits –reducing costs associated.
Dynamic Text and Static Pattern Matching Amihood Amir Gad M. Landau Moshe Lewenstein Dina Sokol Bar-Ilan University.
Sequence Alignment Variations Computing alignments using only O(m) space rather than O(mn) space. Computing alignments with bounded difference Exclusion.
Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University.
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
1 A Lempel-Ziv text index on secondary storage Diego Arroyuelo and Gonzalo Navarro Combinatorial Pattern Matching 2007.
1 Query Languages. 2 Boolean Queries Keywords combined with Boolean operators: –OR: (e 1 OR e 2 ) –AND: (e 1 AND e 2 ) –BUT: (e 1 BUT e 2 ) Satisfy e.
L. Padmasree Vamshi Ambati J. Anand Chandulal J. Anand Chandulal M. Sreenivasa Rao M. Sreenivasa Rao Signature Based Duplicate Detection in Digital Libraries.
JSymbolic and ELVIS Cory McKay Marianopolis College Montreal, Canada.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
DETECTING NEAR-DUPLICATES FOR WEB CRAWLING Authors: Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma Presentation By: Fernando Arreola.
Polyphonic Queries A Review of Recent Research by Cory Mckay.
A Time Based Approach to Musical Pattern Discovery in Polyphonic Music Tamar Berman Graduate School of Library and Information Science University of Illinois.
Music Processing Algorithms David Meredith. Recent projects Musical pattern matching and discovery Finding occurrences of a query pattern in a work Finding.
HANA HARRISON CSE 435 NOVEMBER 19, 2012 Music Composition.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Data Compression By, Keerthi Gundapaneni. Introduction Data Compression is an very effective means to save storage space and network bandwidth. A large.
A compression-boosting transform for 2D data Qiaofeng Yang Stefano Lonardi University of California, Riverside.
Geometric Matching on Sequential Data Veli Mäkinen AG Genominformatik Technical Fakultät Bielefeld Universität.
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
CSC 211 Data Structures Lecture 13
Algorithms for pattern discovery and pitch spelling in music David Meredith Goldsmiths College University of London.
Melodic Search: Strategies and Formats CS 275B/Music 254.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Fast Approximate Point Set Matching for Information Retrieval Raphaël Clifford and Benjamin Sach
CS 347Notes101 CS 347 Parallel and Distributed Data Processing Distributed Information Retrieval Hector Garcia-Molina Zoltan Gyongyi.
1 CS 430: Information Discovery Lecture 4 Files Structures for Inverted Files.
1 Embedding and Similarity Search for Point Sets under Translation Minkyoung Cho and David M. Mount University of Maryland SoCG 2008.
Melodic Similarity Presenter: Greg Eustace. Overview Defining melody Introduction to melodic similarity and its applications Choosing the level of representation.
A Compression-Based Model of Musical Learning David Meredith DMRN+7, Queen Mary University of London, 18 December 2012.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Tomáš Skopal 1, Benjamin Bustos 2 1 Charles University in Prague, Czech Republic 2 University of Chile, Santiago, Chile On Index-free Similarity Search.
ApproxHadoop Bringing Approximations to MapReduce Frameworks
Sequence Alignment.
CS307P-SYSTEM PRACTICUM CPYNOT. B13107 – Amit Kumar B13141 – Vinod Kumar B13218 – Paawan Mukker.
Discovering Musical Patterns through Perceptive Heuristics By Oliver Lartillot Presentation by Ananda Jacobs.
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
Tree structured and combined methods for comparing metered polyphonic music Kjell Lëmstrom David Rizo Valero José Manuel Iñesta CMMR’08 May 21, 2008.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
Information Retrieval in Practice
Optimizing Parallel Algorithms for All Pairs Similarity Search
Indexing & querying text
Geometric Pattern Discovery in Music
Query Languages.
Fast Fourier Transform
Tuesday, 12/3/02 String Matching Algorithms Chapter 32
Implementation Based on Inverted Files
CSE 589 Applied Algorithms Spring 1999
Feifei Li, Ching Chang, George Kollios, Azer Bestavros
Hashing.
15-826: Multimedia Databases and Data Mining
Presentation transcript:

Point-set algorithms for pattern discovery and pattern matching in music David Meredith Goldsmiths College University of London

Uses of musical pattern discovery algorithms Indexing Store themes, motives and other memorable patterns in index to enable sub-linear retrieval times Transcription and music analysis Beat tracking and metrical structure analysis - similar patterns have similar metrical structure Grouping and phrasing - “parallellism” (Lerdahl and Jackendoff, 1983) most important factor in grouping Composer’s assistant, automatic improvisation Cure composer’s block by suggesting new material based on patterns discovered in music already written Automatically create new music that develops themes discovered in music already played

Importance of repeated patterns in music analysis and cognition Schenker (1954. p.5): repetition “is the basis of music as an art” Bent and Drabkin (1987, p.5): “the central act” in all forms of music analysis is “the test for identity” Lerdahl and Jackendoff (1983, p.52): “the importance of parallelism [i.e., repetition] in musical structure cannot be overestimated. The more parallelism one can detect, the more internally coherent an analysis becomes, and the less independent information must be processed and retained in hearing or remembering a piece”

Most musical repetitions are neither perceived nor intended

Interesting musical repetitions are structurally diverse Want to discover all and only interesting repeated patterns Class of interesting repeated patterns is structurally diverse because patterns vary widely in structural characteristics many ways of transforming a musical pattern to give another pattern that is perceived to be a version of it e.g., truncated, augmented, diminished, inverted, embellished and even reversed

Example of repeated motive

Example of thematic transformation

String-based algorithms for discovering musical patterns Most previous approaches assume music represented as strings each string represents a voice or part each character represents a note or an interval between two consecutive notes in a voice Similarity between two patterns measured in terms of edit distance calculated using dynamic programming see, e.g., Lemstrom (2000), Hsu et al. (1998), Rolland (1999)

Problems with the string-based approach - Edit distance B is an embellished version of A If both patterns represented as strings each symbol represents pitch of note then edit distance between A and B is 9 If allow pattern with 9 differences to count as a match, then get many spurious hits

Problems with string-based approach - Polyphony If searching polyphonic music and do not know voice to which each note belongs (e.g., MIDI format 0 file); or interested in patterns containing notes from 2 or more voices then combinatorial explosion in number of possible string representations if don’t use all possible representations then may not find all interesting patterns

Using multidimensional point sets to represent music (1)

Using multidimensional point sets to represent music (2)

SIA - Discovering all maximal translatable patterns (MTPs) Pattern is translatable by vector v in dataset if it can be translated by v to give another pattern in the dataset MTP for a vector v contains all points mapped by v onto other points in the dataset O(kn 2 log n) time, O(kn 2 ) space O(kn 2 ) average time with hashing (Lemstrom)

SIATEC - Discovering all occurrences of all MTPs Translational Equivalence Class (TEC) is set of all translationally invariant occurrences of a pattern

Absolute running times of SIA and SIATEC SIA and SIATEC implemented in C run on a 500MHz Sparc on 52 datasets (6≤n≤3456, 2≤k≤5) < 2 mins for SIA to process piece with 3500 notes 13 mins for SIATEC to process piece with 2000 notes

Need for heuristics to isolate interesting MTPs 2 n patterns in a dataset of size n SIA generates < n 2 /2 patterns => SIA generates small fraction of all patterns in a dataset Many interesting patterns derivable from patterns found by SIA BUT many of the patterns found by SIA are NOT interesting 70,000 patterns found by SIA in Rachmaninoff’s Prelude in C# minor probably about 100 are interesting => Need heuristics for isolating interesting patterns in output of SIA and SIATEC

Heuristics for isolating musical themes and motives Cov=6 CR=6/5 Cov=9 CR=9/5 Comp = 1/3Comp = 2/5Comp = 2/3

COSIATEC - Data compression using SIATEC Start Dataset SIATEC List of pairs Print out best pattern, P, and its translators Remove occurrences of P from dataset Is dataset empty? End No Yes

Using COSIATEC for finding themes and motives in music First iterationSecond iteration

SIAM - Pattern matching using SIA (Finding maximal matches) O(knm log(nm)) time O(knm) space O(knm) average time with hashing Query pattern Dataset

Improving SIAM - Ukkonen, Lemström & Mäkinen (2003) Use sweepline-like scanning of the dataset (Bentley and Ottmann, 1979) Generalized to approximate matching of sets of horizontal line-segments Improved running time to O(mn log m) (without hashing) and working space to O(m) Implemented as algorithm P2 on C-BRAHMS demo web site

Improving SIAM - MSM (Clifford et al., 2006) Finding size of maximal match is 3SUM hard (i.e., O(n 2 ) ) Reduce problem of multi-dimensional point-set matching to 1d binary wildcard matching Random projection to 1D Length reduction by universal hashing Binary wildcard matching using FFTs Find best match and check in O(m) time exactly how many points match at the location that can be inferred from this match Reduces time complexity to O(n log n)

Evaluating MSM: Precision-Recall Compared with OMRAS (Pickens et al., 2003) Test set of 2338 documents, 480 used as queries All score encodings in strict score time Queries had notes deleted, transposed and inserted

Evaluating MSM: Running time Run on prefixes of various sizes of first movement of Beethoven’s 3rd Symphony Each prefix matched against itself Compared with largest common subset algorithm of Ukkonen, Lemström and Mäkinen (2003) MSM nearly 2 orders of magnitude faster (log scale)

Future work Compare SIA algorithms with methods developed in other more mature fields (e.g., computer vision, graph matching) Improve time complexity of SIA and SIATEC by using randomization techniques Adapt algorithms for approximate matching and scaling (matching at different tempi) Adapt SIA and SIATEC for early pruning of uninteresting patterns