Presentation is loading. Please wait.

Presentation is loading. Please wait.

Targeted Association Mining in Time-Varying Domains

Similar presentations


Presentation on theme: "Targeted Association Mining in Time-Varying Domains"— Presentation transcript:

1 Targeted Association Mining in Time-Varying Domains
Project proposal Fall 2016 Jennifer Lavergne

2 Project Goals Phase 1: Modify Min-Max Itemset Tree Code
Modify itemset tree code supplied for the project to find emerging and declining patters. Phase 2: Prepare datasets for testing. Datasets cleaned but need to be separated into time intervals Phase 3: Emerging/Declining pattern mining tests. Tests on real world and synthetic datasets.

3 Background

4 Association Mining Large repositories of data containing hidden gems of information. Rarer information is harder to find. Find interesting correlations: Items selling together Contributors to fatal accidents Symptoms for rare diseases (Agrawal et al)

5 Association Rules Association rule - an implication {X ⇒ Y, support, confidence}. Where X and Y are subsets of the itemset I and X∩Y = Ø Example: {{bread, milk} ⇒ {cheese}, 30%, 75%} Support = #occurrences of I in database/#rows in database Minsup – The minimum support threshold for an itemset I to be considered frequent Confidence = Support(X ⋃ Y)/Support(X) for itemset I = X ⋃ Y. Minconf – a user specified threshold that indicates the interestingness of a candidate rule I: conf(I) > minconf

6 Itemset Trees A data structure which aids in users querying for a specific itemset and it’s support: Targeted Association Mining Item mapped to numeric values: {bread} = {1}, {cheese} = {2} Numbers must be in ascending order within the itemset Ex: I = {1, 2, 56, 120} Note: Can be used to find all or specific rules within a dataset.

7 Why Emerging? By the time the data is analyzed, the value of the knowledge has greatly depreciated. Velocity: Time to action vs. Value (Hackathorn, 2002)

8 Time Stream Data Time stream – A stream of time sensitive data on which the algorithm will mine patterns. Time window – A window delineating the current “interesting” time frame made up of n time steps. Time step – An increment of time in which the time windows progress forward.

9 Time Stream Data: Moving Forward
Current time step: Move forward 1 time step:

10 Dynamic Data Mining: Pattern Types
Time

11 Project Description

12 Phase 1: Itemset Tree Code
Modify existing code or write your own: Process queries over time streams Find emerging and declining patterns Supports stored in a support array equal to the number of time steps Read papers: Itemset trees Ordered Min-Max Itemset Tree Dyn-TARM

13 Phase 2: Preparing datasets
For real world data: Separate into time steps based upon dates in the data For synthetic data: Divide data into equal parts to create time steps

14 Phase 3: Tests (both datasets)
Question 1: What effect does modifying the input measures have? (0 < minSup, minConf < 1, 0< α < 100 time step size, and time window size.) Question 2: What is the trend of each pattern over the time stream? Is their emergence/declining cyclic or maintained all through? Question 3: What is the distribution of emergence/declining patterns over each dataset?

15 Jennifer Lavergne jjslavergne@louisiana.edu
Questions? Jennifer Lavergne


Download ppt "Targeted Association Mining in Time-Varying Domains"

Similar presentations


Ads by Google