Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chase Repp.  knowledge discovery  searching, analyzing, and sifting through large data sets to find new patterns, trends, and relationships contained.

Similar presentations


Presentation on theme: "Chase Repp.  knowledge discovery  searching, analyzing, and sifting through large data sets to find new patterns, trends, and relationships contained."— Presentation transcript:

1 Chase Repp

2  knowledge discovery  searching, analyzing, and sifting through large data sets to find new patterns, trends, and relationships contained within

3  Data mining differs from database querying in the following manner: database querying asks “what company purchased $100,000 worth of widgets last year?” while this asks “what company is likely to purchase over $100,000 of widgets next year and why?”

4

5  coined in the 1960s  Data mining was used to find basic information from the collections of data such as total revenue over the last three years.  classic statistics  artificial intelligence  machine learning

6

7  Predictive Data Mining Target value Future trends  Descriptive Data Mining No target value Focuses on relations

8  focuses on discovering a relationship between independent variables and a relationship between dependent and independent variables  used to forecast specific things

9  describes a data set in a brief but comprehensive way and gives interesting characteristics of the data without having any predefined target  Focus on relations

10  patterns are discovered based on a relationship of a specific item with other items in the same transaction  Descriptive  Example: groceries

11  to classify each item in a set of data into one of the predefined sets of classes or groups  Often used with machine learning  Predictive  Example: cat or dog person?

12  Different from classification, the clustering technique also defines the classes and put objects in them  Descriptive  Example: a library

13  used to predict numbers from data sets that have known target values  Predictive  Example: sales, distance, temperature, value, etc

14  discovers frequent sequences or subsequences as patterns in a sequence database  Descriptive  Derived from association mining

15  There are three categories that the main sequential pattern mining techniques fall into.  Apriori-based  Pattern-growth  Early-pruning

16  follow the apriori property - all nonempty subsets of a frequent itemset must also be frequent  if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemset  AprioriAll, GSP, PSP, and SPAM

17  Transaction data  Assume: minsup = 30% minconf = 80%  An example frequent itemset: {Chicken, Clothes, Milk} [sup = 3/7] about 43%  Association rules from the itemset: Clothes  Milk, Chicken [sup = 3/7, conf = 3/3] …… Clothes, Chicken  Milk, [sup = 3/7, conf = 3/3] t1:Beef, Chicken, Milk t2:Beef, Cheese t3:Cheese, Boots t4:Beef, Chicken, Cheese t5:Beef, Chicken, Clothes, Cheese, Milk t6:Chicken, Clothes, Milk t7:Chicken, Milk, Clothes

18  Two steps: Find all itemsets that have minimum support (frequent itemsets). Use frequent itemsets to generate rules.  E.g., a frequent itemset {Chicken, Clothes, Milk} [sup = 3/7] and one rule from the frequent itemset Clothes  Milk, Chicken [sup = 3/7, conf = 3/3]

19 itemset:count 1. scan T  C 1 : {1}:2, {2}:3, {3}:3, {4}:1, {5}:3  F 1 : {1}:2, {2}:3, {3}:3, {5}:3  C 2 : {1,2}, {1,3}, {1,5}, {2,3}, {2,5}, {3,5} 2. scan T  C 2 : { 1,2}:1, {1,3}:2, {1,5}:1, {2,3}:2, {2,5}:3, {3,5}:2  F 2 : { 1,3}:2, {2,3}:2, {2,5}:3, {3,5}:2  C 3 : {2, 3,5} 3. scan T  C 3 : {2, 3, 5}:2  F 3: {2, 3, 5} TIDItems T1001, 3, 4 T2002, 3, 5 T3001, 2, 3, 5 T4002, 5 Dataset T minsup=50%

20  divide-and-conquer strategy  to focus the search on a restricted portion of the initial database and generate as few candidate sequences as possible  FreeSpan, PrefixSpan, WAP-mine, and FS- Miner

21  utilize a sort of position induction to prune candidate sequences very early in the mining process and to avoid support counting as much as possible  LAPIN, HVSM, and DISC-all

22  searching for patterns in data through  content mining Search engines  structure mining Hyper links (hits / page rank)  usage mining User’s browser data and forms submitted

23  One use is for finding user navigational patterns on the World Wide Web by extracting knowledge from web logs

24  An example of applying sequential pattern mining  S = {a, b, c, d, e, f}  [P1, ] [P2, ] [P3, ] [P4, ]  Frequent pattern of abac

25  combines traditional mining methods and information visualization techniques user is directly involved  VDMS - simplicity, reliability, reusability, availability, and security

26  http://www.youtube.com/user/quiterian http://www.youtube.com/user/quiterian  http://www.youtube.com/watch?v=MtJ4X a4-J8g http://www.youtube.com/watch?v=MtJ4X a4-J8g  http://www.youtube.com/watch?v=_8Hz wQCFFfw http://www.youtube.com/watch?v=_8Hz wQCFFfw

27


Download ppt "Chase Repp.  knowledge discovery  searching, analyzing, and sifting through large data sets to find new patterns, trends, and relationships contained."

Similar presentations


Ads by Google