Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia.

Similar presentations


Presentation on theme: "Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia."— Presentation transcript:

1 Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia

2 What is data mining?  The automated extraction of hidden predictive information from database  Allows users to analyze large databases to solve business decision problems.  An extension of statistics, with a few artificial intelligence and machine learning twists thrown in.  Attempts to discover rules and patterns from data.

3 Data Mining - On What Kind of Data  In principle, data mining should be applicable to any kind of information repositiory: ● relational databases ● data warehouses ● transactional and advanced databases ● flat files ● World Wide Web

4 Data Mining Functionalities-What kinds of Patterns Can be Mined?  Association Analysis  Classification and Prediction  Cluster Analysis  Evolution Analysis

5 Applications of data mining  Require some sort of Prediction: for example: when a person applies for a credit card, the credit-card company wants to predict if the person is a good credit risk.  Looks for Associations: for example: if a customer buys a book, an on-line bookstore may suggest other associated books.

6 Associations Rule Discovery  Task: Discovering association rules among items in a transaction database.  How are association rules mined from large database? 1. Find all frequent itemset: each of these itemsets will occur at least as frequent as pre- determined minimum support count. 2. Generate strong association rules from the frequent itemsets: these rules must satisfy minimum support and minimum confidence.

7 Association Rules (cont.)  Retail shops are often interested in associations between items that people buy. Someone who buys bread is quite likely also to buy milk. association rule: bread => milk A person who brought the book Database System Concepts is quite likely also to buy the book Operating System Concepts. association rule: DSC => OSC

8 Association Rules (cont.)  Two numbers:  Support: is a measure of what fraction of the population satisfies both the antecedent and the consequent of the true.  Confidence: is a measure of how often the consequent is true when the antecedent is true.

9 Association Rules (cont.)  Let I = {i 1, i 2, … i m } be a total set of items D is a set of transactions d is one transaction consists of a set of items d  I  Association rule: X  Y where X  I,Y  I and X  Y =  support = (#of transactions contain X  Y ) /D confidence = (#of transactions contain X  Y ) / #of transactions contain X

10 example  Example of transaction data: 1. CD player, music ’ s CD, music ’ s book 2. CD player, music ’ s CD 3. music ’ s CD, music ’ s book 4. CD player  I = {CD player, music ’ s CD, music ’ s book}  D = 4  #of transactions contain both CD player, music ’ s CD =2  #of transactions contain CD player =3  CD player  music ’ s CD (sup=2/4, conf =2/3 )

11 Association Rules (cont.)  Rule support and confidence reflect the usefulness and certainty of discovered rules.  A support of 50% for association rule means that 50% of all the transactions under analysis that CD’s player and music CD are purchased together.  A confidence of 67% means that 67% of the customers who purchased a CD’s player also bought music CD.

12 Strong Association Rule  User sets support and confidence thresholds.  Rules above support threshold have LARGE support.  Rules above confidence threshold have HIGH confidence.  Rules satisfying both are said to be STRONG.

13 References  Professor Lee ’ s lectures http://www.cs.sjsu.edu/~lee/cs157b/cs157b.html http://www.cs.sjsu.edu/~lee/cs157b/cs157b.html  Rui Zhao, SJSU http://www.cs.sjsu.edu/~lee/cs157b/cs157b.html  Jiawei Han, Micheline Kamber Data Mining Concepts and Techniques Morgan Kaufmann Publishers

14 Thank you !


Download ppt "Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia."

Similar presentations


Ads by Google