Targeted Association Mining in Time-Varying Domains Project proposal Fall 2016 Jennifer Lavergne
Project Goals Phase 1: Modify Min-Max Itemset Tree Code Modify itemset tree code supplied for the project to find emerging and declining patters. Phase 2: Prepare datasets for testing. Datasets cleaned but need to be separated into time intervals Phase 3: Emerging/Declining pattern mining tests. Tests on real world and synthetic datasets.
Background
Association Mining Large repositories of data containing hidden gems of information. Rarer information is harder to find. Find interesting correlations: Items selling together Contributors to fatal accidents Symptoms for rare diseases (Agrawal et al)
Association Rules Association rule - an implication {X ⇒ Y, support, confidence}. Where X and Y are subsets of the itemset I and X∩Y = Ø Example: {{bread, milk} ⇒ {cheese}, 30%, 75%} Support = #occurrences of I in database/#rows in database Minsup – The minimum support threshold for an itemset I to be considered frequent Confidence = Support(X ⋃ Y)/Support(X) for itemset I = X ⋃ Y. Minconf – a user specified threshold that indicates the interestingness of a candidate rule I: conf(I) > minconf
Itemset Trees A data structure which aids in users querying for a specific itemset and it’s support: Targeted Association Mining Item mapped to numeric values: {bread} = {1}, {cheese} = {2} Numbers must be in ascending order within the itemset Ex: I = {1, 2, 56, 120} Note: Can be used to find all or specific rules within a dataset.
Why Emerging? By the time the data is analyzed, the value of the knowledge has greatly depreciated. Velocity: Time to action vs. Value (Hackathorn, 2002)
Time Stream Data Time stream – A stream of time sensitive data on which the algorithm will mine patterns. Time window – A window delineating the current “interesting” time frame made up of n time steps. Time step – An increment of time in which the time windows progress forward.
Time Stream Data: Moving Forward Current time step: Move forward 1 time step:
Dynamic Data Mining: Pattern Types Time
Project Description
Phase 1: Itemset Tree Code Modify existing code or write your own: Process queries over time streams Find emerging and declining patterns Supports stored in a support array equal to the number of time steps Read papers: Itemset trees Ordered Min-Max Itemset Tree Dyn-TARM
Phase 2: Preparing datasets For real world data: Separate into time steps based upon dates in the data For synthetic data: Divide data into equal parts to create time steps
Phase 3: Tests (both datasets) Question 1: What effect does modifying the input measures have? (0 < minSup, minConf < 1, 0< α < 100 time step size, and time window size.) Question 2: What is the trend of each pattern over the time stream? Is their emergence/declining cyclic or maintained all through? Question 3: What is the distribution of emergence/declining patterns over each dataset?
Jennifer Lavergne jjslavergne@louisiana.edu Questions? Jennifer Lavergne jjslavergne@louisiana.edu