Data Mining Tri Nguyen. Agenda Data Mining As Part of KDD Decision Tree Association Rules Clustering Amazon Data Mining Examples.

Slides:



Advertisements
Similar presentations
COMP3740 CR32: Knowledge Management and Adaptive Systems
Advertisements

The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Data Mining Lecture 9.
Rule Generation from Decision Tree Decision tree classifiers are popular method of classification due to it is easy understanding However, decision tree.
Hunt’s Algorithm CIT365: Data Mining & Data Warehousing Bajuna Salehe
Decision Tree Approach in Data Mining
ICS 421 Spring 2010 Data Mining 2 Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 4/8/20101Lipyeow Lim.
Data Quality Class 9. Rule Discovery Decision and Classification Trees Association Rules.
Basic Data Mining Techniques Chapter Decision Trees.
Basic Data Mining Techniques
Data Mining Adrian Tuhtan CS157A Section1.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Data Mining By: Thai Hoa Nguyen Pham. Data Mining  Define Data Mining  Classification  Association  Clustering.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Data Mining: A Closer Look
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Data Mining and Decision Tree CS157B Spring 2006 Masumi Shimoda.
Enterprise systems infrastructure and architecture DT211 4
Basic Data Mining Techniques
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.
Decision Trees.
Copyright: Silberschatz, Korth and Sudarshan 1 Data Mining.
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
Data Mining By Fu-Chun (Tracy) Juang. What is Data Mining? ► The process of analyzing LARGE databases to find useful patterns. ► Attempts to discover.
DATA MINING Prof. Sin-Min Lee Surya Bhagvat CS 157B – Spring 2006.
Decision Trees and Association Rules Prof. Sin-Min Lee Department of Computer Science.
Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
Computing & Information Sciences Kansas State University Friday. 30 Nov 2007CIS 560: Database System Concepts Lecture 39 of 42 Friday, 30 November 2007.
Data Mining: Association Rule By: Thanh Truong. Association Rules In Association Rules, we look at the associations between different items to draw conclusions.
Copyright © 2004 Pearson Education, Inc.. Chapter 27 Data Mining Concepts.
Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia.
Data Mining – Algorithms: Decision Trees - ID3 Chapter 4, Section 4.3.
Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.
CS690L Data Mining: Classification
Chapter 20 Data Analysis and Mining. 2 n Decision Support Systems  Obtain high-level information out of detailed information stored in (DB) transaction-processing.
Data Mining Database Systems Timothy Vu. 2 Mining Mining is the extraction of valuable minerals or other geological materials from the earth, usually.
1 Appendix D: Application of Genetic Algorithm in Classification Duong Tuan Anh 5/2014.
DATA MINING By Cecilia Parng CS 157B.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Data Mining Brandon Leonardo CS157B (Spring 2006).
Decision Trees and Association Rules Prof. Sin-Min Lee Department of Computer Science.
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Mid3 Revision 2 Prof. Sin Min Lee Deparment of Computer Science San Jose State University.
Bootstrapped Optimistic Algorithm for Tree Construction
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Classification as data mining tool Classification as data mining tool Done by William Hellela William Hellela Rauf Gadar Alex Prewett.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Computing & Information Sciences Kansas State University Friday, 01 Dec 2006CIS 560: Database System Concepts Lecture 40 of 42 Friday, 01 December 2006.
Basic Data Mining Techniques Chapter 3-A. 3.1 Decision Trees.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
MIS 451 Building Business Intelligence Systems Classification (1)
BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Presentation prepared by Yehonatan Cohen and Danny Hendler Some of the slides based on the online book “Social media mining” Danny Hendler Advanced Topics.
Chapter 20 Data Warehousing and Mining 1 st Semester, 2016 Sanghyun Park.
Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization.
DECISION TREES An internal node represents a test on an attribute.
Adrian Tuhtan CS157A Section1
Exam #3 Review Zuyin (Alvin) Zheng.
Data Analysis.
MIS2502: Data Analytics Classification using Decision Trees
The BIRCH Algorithm Davitkov Miroslav, 2011/3116
Chapter 20: Data Analysis
Presentation transcript:

Data Mining Tri Nguyen

Agenda Data Mining As Part of KDD Decision Tree Association Rules Clustering Amazon Data Mining Examples

Putting the results in practical use Data Mining and KDD

What is Data Mining? “the automated extraction of hidden predictive information from large databases” Algorithms produce patterns, rules Predict future trends/behavior Used to make business decisions

Classification Items belong to classes Given past items’ classification, predict class of new item Example: Issuing credit cards Use information: income, educational background, age, current debts Credit worthiness: Bad, good, excellent

Decision Tree Classifiers Internal Node has predicate Leaf node is class To classify instance Start at root node Traverse tree until reach leaf node Each internal node, make decision

Credit Risk Decision Tree

Decision Tree Construction Some Definitions Purity: > # instances of each leaf belonging to only 1 class means > purity Best Split: split giving the maximum information gain ratio (info gain/info content) Choose attribute and condition resulting in maximum purity

Decision Tree Construction

Association Rules antecedent  consequent if  then beer  diaper (Walmart) economy bad  higher unemployment Higher unemployment  higher unemployment benefits cost Rules associated with population, support, confidence

Association Rules Population: instances such as grocery store purchases Support % of population satisfying antecedent and consequent Confidence % consequent true when antecedent true

Association Rules Population MS, MSA, MSB, MA, MB, BA M=Milk, S=Soda, A=Apple, B=beer Support (M  S)= 3/6 (MS,MSA,MSB)/(MS,MSA,MSB,MA,MB, BA) Confidence (M  S) = 3/5 (MS, MSA, MSB) / (MS,MSA,MSB,MA,MB)

Clustering “The process of dividing a dataset into mutually exclusive groups such that the members of each group are as "close" as possible to one another, and different groups are as "far" as possible from one another, where distance is measured with respect to all available variables.”

Clustering Birch Algorithm points inserted into multidimensional tree items guided to leaf nodes "near" representative internal nodes nearby points clustered into one leaf node

Clustering Example of Clustering predict what new movies a person is interested in 1) a person’s past movie preferences 2) others with similar preferences 3) preferences of those in the pool for new movies

Clustering 1) cluster people with similar movie preferences 2) given a new movie goer, find a cluster of similar movie goers 3) then predict the cluster's new movie preferences

Amazon Examples

References tutorial.html tutorial.html