Download presentation
Presentation is loading. Please wait.
Published byChester Brooks Modified over 6 years ago
1
Targeted Association Mining in Time-Varying Domains
Project proposal Fall 2016 Jennifer Lavergne
2
Project Goals Phase 1: Modify Min-Max Itemset Tree Code
Modify itemset tree code supplied for the project to find emerging and declining patters. Phase 2: Prepare datasets for testing. Datasets cleaned but need to be separated into time intervals Phase 3: Emerging/Declining pattern mining tests. Tests on real world and synthetic datasets.
3
Background
4
Association Mining Large repositories of data containing hidden gems of information. Rarer information is harder to find. Find interesting correlations: Items selling together Contributors to fatal accidents Symptoms for rare diseases (Agrawal et al)
5
Association Rules Association rule - an implication {X ⇒ Y, support, confidence}. Where X and Y are subsets of the itemset I and X∩Y = Ø Example: {{bread, milk} ⇒ {cheese}, 30%, 75%} Support = #occurrences of I in database/#rows in database Minsup – The minimum support threshold for an itemset I to be considered frequent Confidence = Support(X ⋃ Y)/Support(X) for itemset I = X ⋃ Y. Minconf – a user specified threshold that indicates the interestingness of a candidate rule I: conf(I) > minconf
6
Itemset Trees A data structure which aids in users querying for a specific itemset and it’s support: Targeted Association Mining Item mapped to numeric values: {bread} = {1}, {cheese} = {2} Numbers must be in ascending order within the itemset Ex: I = {1, 2, 56, 120} Note: Can be used to find all or specific rules within a dataset.
7
Why Emerging? By the time the data is analyzed, the value of the knowledge has greatly depreciated. Velocity: Time to action vs. Value (Hackathorn, 2002)
8
Time Stream Data Time stream – A stream of time sensitive data on which the algorithm will mine patterns. Time window – A window delineating the current “interesting” time frame made up of n time steps. Time step – An increment of time in which the time windows progress forward.
9
Time Stream Data: Moving Forward
Current time step: Move forward 1 time step:
10
Dynamic Data Mining: Pattern Types
Time
11
Project Description
12
Phase 1: Itemset Tree Code
Modify existing code or write your own: Process queries over time streams Find emerging and declining patterns Supports stored in a support array equal to the number of time steps Read papers: Itemset trees Ordered Min-Max Itemset Tree Dyn-TARM
13
Phase 2: Preparing datasets
For real world data: Separate into time steps based upon dates in the data For synthetic data: Divide data into equal parts to create time steps
14
Phase 3: Tests (both datasets)
Question 1: What effect does modifying the input measures have? (0 < minSup, minConf < 1, 0< α < 100 time step size, and time window size.) Question 2: What is the trend of each pattern over the time stream? Is their emergence/declining cyclic or maintained all through? Question 3: What is the distribution of emergence/declining patterns over each dataset?
15
Jennifer Lavergne jjslavergne@louisiana.edu
Questions? Jennifer Lavergne
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.