How to use… [matrixProfile, profileIndex, motifIndex, discordIndex] = interactiveMatrixProfile(data, subLen); Input data: input time series subLen: subsequence.

Slides:



Advertisements
Similar presentations
Prepared by Abdullah Mueen and Eamonn Keogh
Advertisements

Example 2.2 Estimating the Relationship between Price and Demand.
Order Structure, Correspondence, and Shape Based Categories Presented by Piotr Dollar October 24, 2002 Stefan Carlsson.
Content-Based Image Retrieval
DFT/FFT and Wavelets ● Additive Synthesis demonstration (wave addition) ● Standard Definitions ● Computing the DFT and FFT ● Sine and cosine wave multiplication.
Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort
Regression Analysis Using Excel. Econometrics Econometrics is simply the statistical analysis of economic phenomena Here, we just summarize some of the.
LFDA Practical Session FMSP stock assessment tools Training Workshop.
Sorting and Searching Arrays CSC 1401: Introduction to Programming with Java Week 12 – Lectures 1 & 2 Wanda M. Kunkle.
Inputs to Signal Generation.vi: -Initial Distance (m) -Velocity (m/s) -Chirp Duration (s) -Sampling Info (Sampling Frequency, Window Size) -Original Signal.
GO BACK TO ACTIVITY SLIDE GO TO TEACHER INFORMATION SLIDE To move from one activity to the next, just click on the slide! PATTERNS OR CLICK ON A BUTTON.
WHAT IS VRAY? V-ray is a rendering engine that is used as an extension of certain 3D computer graphics software. The core developers of V-Ray are Vladimir.
The College of Saint Rose CSC 460 / CIS 560 – Search and Information Retrieval David Goldschmidt, Ph.D. from Programming Collective Intelligence by Toby.
by Chris Brown under Prof. Susan Rodger Duke University June 2012
How to make a presentation (Oral and Poster) Dr. Bernard Chen Ph.D. University of Central Arkansas July 5 th Applied Research in Healthy Information.
9/17/20151 Chapter 12 - Heaps. 9/17/20152 Introduction ► Heaps are largely about priority queues. ► They are an alternative data structure to implementing.
2 nd Order CFA Byrne Chapter 5. 2 nd Order Models The idea of a 2 nd order model (sometimes called a bi-factor model) is: – You have some latent variables.
Nelson Research, Inc – N. 88 th St. Seattle, WA USA aol.com Cell Current Noise Analysis – a Worked Example Regarding.
Moving Around in Scratch The Basics… -You do want to have Scratch open as you will be creating a program. -Follow the instructions and if you have questions.
CSC 211 Data Structures Lecture 13
Discovering Deformable Motifs in Time Series Data Jin Chen CSE Fall 1.
Hierarchical Clustering of Gene Expression Data Author : Feng Luo, Kun Tang Latifur Khan Graduate : Chien-Ming Hsiao.
Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 16, 2012.
1 The Decomposition of a House Price index into Land and Structures Components: A Hedonic Regression Approach by W. Erwin Diewert, Jan de Haan and Rens.
ADAPS multiple removal demo. The upper halves of the slides in this series show the input gathers.The bottoms show the same data with multiples removed.
Copyright © 2000, Department of Systems and Computer Engineering, Carleton University 1 Introduction An array is a collection of identical boxes.
Choose n out of these k objects For example:  Choose your three favorites out of these ten photographs  Of these fifty apps, which ten would you download.
Step 3: Tools Database Searching
Machine Vision Edge Detection Techniques ENT 273 Lecture 6 Hema C.R.
Document that explains the chosen concept to the animator 1.
Full Wave Rectifier NavigationTutorial: 6 of 8 The Full Wave Rectifier In the previous Power Diodes tutorial we discussed ways of reducing the ripple or.
Algorithm Design Techniques, Greedy Method – Knapsack Problem, Job Sequencing, Divide and Conquer Method – Quick Sort, Finding Maximum and Minimum, Dynamic.
Matrix Profile Examples
Fast Subsequence Matching in Time-Series Databases.
Environmental Remote Sensing GEOG 2021
Attend to Precision Introduction to Engineering Design
Cluster Analysis II 10/03/2012.
National Mathematics Day
Chapter 15 – Cluster Analysis
Prof. Mark Glauser Created by: David Marr
In this section you will:
What to do when a test fails
Supervised Time Series Pattern Discovery through Local Importance
^ About the.
Break and Noise Variance
Unsupervised Learning
OPSE 301: Lab13 Data Analysis – Fitting Data to Arbitrary Functions
Scripts & Functions Scripts and functions are contained in .m-files
Data Mining (and machine learning)
Graphs of Sine and Cosine Functions
Mathematical Modeling
Binary Search Back in the days when phone numbers weren’t stored in cell phones, you might have actually had to look them up in a phonebook. How did you.
Mathematical Modeling Making Predictions with Data
Free Surface Multiple Elimination
Dealing with Noisy Data
Mathematical Modeling Making Predictions with Data
Data Structures Review Session
Coding Concepts (Basics)
Sub-Quadratic Sorting Algorithms
Lesson 1-1 Linear Relations and Things related to linear functions
Data Mining – Chapter 4 Cluster Analysis Part 2
Discretisation Why Worry? Check Static Model Visually
PCA of Waimea Wave Climate
CSC 558 – Data Analytics II, Prep for assignment 1 – Instance-based (lazy) machine learning January 2018.
Matlab Basics Tutorial
Geology 491 Spectral Analysis
Biologically Inspired Computing: Operators for Evolutionary Algorithms
Simple Case Studies Using MASS
Software Development Techniques
Presentation transcript:

How to use… [matrixProfile, profileIndex, motifIndex, discordIndex] = interactiveMatrixProfile(data, subLen); Input data: input time series subLen: subsequence length Output matrixProfile: the approximated matrix profile when stopped or complete profileIndex: the approximated matrix profile index when stopped or complete motifIndex: the detected motif index when stopped or complete (3x2 cell) discordIndex: vector contains the discord’s index when stop or complete 1st motif 1st motif’s NN within certain radius 2nd motif 2nd motif’s NN within certain radius 3rd motif 3rd motif’s NN within certain radius

If this motif is not of interest to you, click discard, to force the tool to find another motif Here stop means stop the entire tool, and return all the patterns.

If you set a subsequence length that was too long, there simply would not be enough data to make three sets of motifs and three discords! So the subsequence length must be less than 1/20 the length of the time series length.

We have to impose a maximum number on subsequences that can appear in a single motif (we chose ten). Otherwise, having three different motifs would not be defined for certain pathological datasets. For example, a pure sine wave.

It actually converged at about 1%, I could not click fast enough ;-) To my eye, these results make sense. The three motifs are three variations on a theme. 800 datapoints of noise or low amplitude, then four cycles Weak cycles in the beginning, that get progressively more pronounced Five full strong cycles, that then (around 1400) fade The first discord is just the high frequency noise at the beginning of the data. The second two have cycles, but are clearly different to normal motifs. However, you might say that motif ‘1’ and ‘2’ are really the same, and should be merged, which begs the question, how did we define them in the first place (next slide…………) Here is the entire code you need to run this load dataset % Load the data TAG = data(140000:5:end,7); % Pull out a section of TAG 7 to examine subLen = 2000; % Choose the pattern length we are interested in [matrixProfile, profileIndex, motifIndex, discordIndex] = interactiveMatrixProfile(TAG, subLen);

a.      Creating an equivalent of the parameter that enabled the user to specify the similarity between instances of a motifs within a cluster (done. I did this a very sensible way, but there might be other ways) Below I explain how I did this. There might be other ways, but every other way I can think of, seems to be vulnerable to a pathologic dataset that would make it give strange results. ---- First, we have to set a parameter, R (range). Must be greater than or equal to 1.0, I made it 3.0 for all the slides today. When the motif tool finds the best pair A and B (here the pair at 30775 and 33077), we measure the Euclidean distance between them; Dist = EuclideanDistance (A,B ) Now we search the full time series, for any other patterns that are Within R*Dist of A or Within R*Dist of B All such patterns are part of the motif (these are the gray time series in the window). The second motif panel is searching for motifs, excluding anything in the first panel. The third motif panel is searching for motifs, excluding anything in the two panels.

8 degree ascending slope 4 degree ascending slope Constant slope Just for fun, I repeated the experiment in the last slide, but halved the subsequence length The three motifs are three variations on a theme. This makes sense, there is only one high-level pattern in the data. However it nicely seems to find subtle sub-motifs 8 degree ascending slope 4 degree ascending slope Constant slope 8 degrees 4 degrees 0 degrees Here is the entire code you need to run this load dataset % Load the data TAG = data(140000:5:end,7); % Pull out a section of TAG 7 to examine subLen = 1000; % Choose the pattern length we are interested in [matrixProfile, profileIndex, motifIndex, discordIndex] = interactiveMatrixProfile(TAG, subLen);

The tool seems quite robust to very noisy data, here discovering two motifs that seem ‘real’ to me, in spite of the fact that they are not “perfect”, and there is a lot of noise. {close to random}

Here are two identical experiments, except I first smoothed the data for the leftmost one. It is clear (and not surprising) that smoothing helps a little.

(no comments here, just another example)