Data Mining – Day 2 Fabiano Dalpiaz Department of Information and Communication Technology University of Trento - Italy

Slides:



Advertisements
Similar presentations
Data Mining Classification: Alternative Techniques
Advertisements

Lazy vs. Eager Learning Lazy vs. eager learning
Week 9 Data Mining System (Knowledge Data Discovery)
Data Mining By Archana Ketkar.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Classification.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Data Mining: A Closer Look
Chapter 5 Data mining : A Closer Look.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
Data Mining Using IBM Intelligent Miner Presented by: Qiyan (Jennifer ) Huang.
Data Mining Techniques
Data Mining Chun-Hung Chou
1 An Introduction to Data Mining Hosein Rostani Alireza Zohdi Report 1 for “advance data base” course Supervisor: Dr. Masoud Rahgozar December 2007.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.
Bayesian Networks. Male brain wiring Female brain wiring.
COMP3503 Intro to Inductive Modeling
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Chapter 8 Discriminant Analysis. 8.1 Introduction  Classification is an important issue in multivariate analysis and data mining.  Classification: classifies.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Basic Data Mining Technique
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Knowledge Discovery and Data Mining Evgueni Smirnov.
1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.
Classification Techniques: Bayesian Classification
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.
DATA MINING By Cecilia Parng CS 157B.
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Classification And Bayesian Learning
Classification and Prediction
Outline K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Lazy Learners K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Data Mining and Decision Support
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
Topic 4: Cluster Analysis Analysis of Customer Behavior and Service Modeling.
Cluster Analysis This work is created by Dr. Anamika Bhargava, Ms. Pooja Kaul, Ms. Priti Bali and Ms. Rajnipriya Dhawan and licensed under a Creative Commons.
Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization.
Data Mining Functionalities
Data Mining – Intro.
What Is Cluster Analysis?
Chapter 6 Classification and Prediction
Data Mining 101 with Scikit-Learn
Classification and Prediction
Exam #3 Review Zuyin (Alvin) Zheng.
I don’t need a title slide for a lecture
Prepared by: Mahmoud Rafeek Al-Farra
Classification and Prediction
©Jiawei Han and Micheline Kamber
Topic 5: Cluster Analysis
Presentation transcript:

Data Mining – Day 2 Fabiano Dalpiaz Department of Information and Communication Technology University of Trento - Italy Database e Business Intelligence A.A

© P. Giorgini, F. Dalpiaz 2 Knowledge Discovery (KDD) Process Databases Data Cleaning Data Warehouse Data Mining Pattern Evaluation Selection Data Integration Task-relevant Data Presented yesterday Today

© P. Giorgini, F. Dalpiaz 3 Outline Data Mining techniques Frequent patterns, association rules Support and confidence Classification and prediction Decision trees Bayesian classifiers Support Vector Machines Lazy learning Cluster Analysis Visualization of the results Summary

© P. Giorgini, F. Dalpiaz 4 Data Mining techniques

© P. Giorgini, F. Dalpiaz 5 Frequent pattern analysis What is it? Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set Frequent pattern analysis: searching for frequent patterns Motivation: Finding inherent regularities in data Which products are bought together? Yesterday’s wine and spaghetti example What are the subsequent purchases after buying a PC? Can we automatically classify web documents? Applications Basket data analysis Cross-marketing Catalog design Sale campaign analysis

© P. Giorgini, F. Dalpiaz 6 Basic Concepts: Frequent Patterns and Association Rules (1) Transaction-idItems bought 1Wine, Bread, Spaghetti 2Wine, Cocoa, Spaghetti 3Wine, Spaghetti, Cheese 4Bread, Cheese, Sugar 5Bread, Cocoa, Spaghetti, Cheese, Sugar Itemsets (= transactions in this example) Goal: find all rules of type X  Y between items in an itemset with minimum: Support s - probability that an itemset contains X  Y Confidence c – conditional probability that an itemset containing X contains also Y

© P. Giorgini, F. Dalpiaz 7 Basic Concepts: Frequent Patterns and Association Rules (2) Transaction-idItems bought 1Wine, Bread, Spaghetti 2Wine, Cocoa, Spaghetti 3Wine, Spaghetti, Cheese 4Bread, Cheese, Sugar 5Bread, Cocoa, Spaghetti, Cheese, Sugar Suppose: support s = 50% confidence c=50% Support is used to define frequent patterns (sets of products in more than s% itemsets) {Wine} in itemsets 1, 2, 3 (support = 60%) {Bread} in itemsets 1, 4, 5 (support = 60%) {Spaghetti} in itemsets 1, 2, 3, 5 (support = 80%) {Cheese} in itemsets 3, 4, 5 (support = 60%) {Wine, Spaghetti} in itemsets 1, 2, 3 (support = 60%)

© P. Giorgini, F. Dalpiaz 8 Basic Concepts: Frequent Patterns and Association Rules (3) Transaction-idItems bought 1Wine, Bread, Spaghetti 2Wine, Cocoa, Spaghetti 3Wine, Spaghetti, Cheese 4Bread, Cheese, Sugar 5Bread, Cocoa, Spaghetti, Cheese, Sugar Suppose: support s = 50% confidence c=50% Confidence defines association rules: X  Y rules in frequent patterns whose confidence is bigger than c Suggestion: {Wine, Spaghetti} is the only frequent pattern to be considered. Why? Association rules: Wine  Spaghetti (support=60%, confidence=100%) Spaghetti  Wine (support=60%, confidence=75%)

© P. Giorgini, F. Dalpiaz 9 Advanced concepts in Association Rules discovery Algorithms must face scalability problems Apriori: If there is any itemset which is infrequent, its superset should not be generated/tested! Advanced problems Boolean vs. quantitative associations age(x, “30..39”) and income(x, “42..48K”)  buys(x, “car”) [s=1%, c=75%] Single level vs. multiple-level analysis What brands of wine are associated with what brands of spaghetti? Are support and confidence clear?

© P. Giorgini, F. Dalpiaz 10 Another example for association rules Transaction-idItems bought 1Margherita, Beer, Coke 2Margherita, Beer 3Quattro stagioni, Coke 4Margherita, Coke Frequent itemsets: {Margherita} = 75% {Beer} = 50% {Coke} = 75% {Margherita, Beer} = 50% {Margherita, Coke} = 50% Support s = 40% Confidence c = 70% Association rules: Beer  Margherita [c=50%,s=100%]

© P. Giorgini, F. Dalpiaz 11 Classification vs. Prediction Classification Characterizes (describes) a set of items belonging to a training set; these items are already classified according to a label attribute The characterization is a model The model can be applied to classify new data (predict the class they should belong to) Prediction models continuous-valued functions, i.e., predicts unknown or missing values Applications Credit approval, target marketing, fraud detection

© P. Giorgini, F. Dalpiaz 12 Classification: the process  Model construction The class label attribute defines the class each item should belong to The set of items used for model construction is called training set The model is represented as classification rules, decision trees, or mathematical formulae  Model usage Estimate accuracy of the model On the training set On a generalization of the training set If the accuracy is acceptable, use the model to classify data tuples whose class labels are not known

© P. Giorgini, F. Dalpiaz 13 Classification: the process Model construction Training Data Classification Algorithms IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’ Classifier (Model)

© P. Giorgini, F. Dalpiaz 14 Classification: the process Model usage Classifier Testing Data Unseen Data (Jeff, Professor, 4) Tenured? IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’

© P. Giorgini, F. Dalpiaz 15 Supervised vs. Unsupervised Learning Supervised learning (classification) Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations New data is classified based on the training set Unsupervised learning (clustering) The class labels of training data is unknown Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data

© P. Giorgini, F. Dalpiaz 16 Evaluating generated models Accuracy classifier accuracy: predicting class label predictor accuracy: guessing value of predicted attributes Speed time to construct the model (training time) time to use the model (classification/prediction time) Robustness handling noise and missing values Scalability efficiency in disk-resident databases Interpretability understanding and insight provided by the model

© P. Giorgini, F. Dalpiaz 17 Classification techniques Decision Trees (1) Income > 20K€ Investment type choice Age > 60 Married? Low risk noyes no Mid risk yes noyes High riskMid risk

© P. Giorgini, F. Dalpiaz 18 Classification techniques Decision Trees (2) How are the attributes in decision trees selected? Two well-known indexes are used Information gain selects the most informative attribute in distinguishing the items between the classes It biases towards attributes with a large set of values Gain ratio faces the information gain limitations

© P. Giorgini, F. Dalpiaz 19 Classification techniques Bayesian classifiers (2) Bayesian classification A statistical classification technique Predicts class membership probabilities Founded on the Bayes theorem What if X = “Red and rounded” and H = “Apple”? Performance The simplest implementation (Naïve Bayes) can be compared to decision trees and neural networks Incremental Each training example can increase/decrease the probability that an hypothesis in correct

© P. Giorgini, F. Dalpiaz 20 5 minutes break!

© P. Giorgini, F. Dalpiaz 21 Classification techniques Support Vector Machines One of the most advanced classification techniques Left figure: a small margin between the classes is found Right figure: the largest margin is found Support vector machines (SVMs) are able to identify the right figure margin

© P. Giorgini, F. Dalpiaz 22 Classification techniques SVMs + Kernel Functions Is data always linearly separable? NO!!! Solution: SVMs + Kernel Functions How to split this? SVM SVM + Kernel Functions

© P. Giorgini, F. Dalpiaz 23 Classification techniques Lazy learning Lazy learning Simply stores training data (or only minor processing) and waits until it is given a test tuple Less time in training but more time in predicting Uses a richer hypothesis space (many local linear functions), and hence the accuracy is higher Instance-based learning Subcategory of lazy learning Store training examples and delay the processing (“lazy evaluation”) until a new instance must be classified An example: k-nearest neighbor approach

© P. Giorgini, F. Dalpiaz 24 Classification techniques k-nearest neighbor All instances correspond to points in the n-Dimensional space – x is the instance to be classified The nearest neighbor are defined in terms of Euclidean distance, dist(X 1, X 2 ) For discrete-valued, k-NN returns the most common value among the k training examples nearest to x It depends on k!!! k=3  Red K=5  Blue Which class should the green circle belong to?

© P. Giorgini, F. Dalpiaz 25 Prediction techniques An overview Prediction is different from classification Classification refers to predict categorical class label Prediction models continuous-valued functions Major method for prediction: regression model the relationship between one or more independent or predictor variables and a dependent or response variable Regression analysis Linear and multiple regression Non-linear regression Other regression methods: generalized linear model, Poisson regression, log-linear models, regression trees No details here

© P. Giorgini, F. Dalpiaz 26 What is cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Finding similarities between data according to the characteristics found in the data and grouping similar data objects into clusters It belongs to unsupervised learning Typical applications As a stand-alone tool to get insight into data distribution As a preprocessing step for other algorithms (day 1 slides)

© P. Giorgini, F. Dalpiaz 27 Examples of cluster analysis Marketing: Help marketers discover distinct groups in their customer bases Land use: Identification of areas of similar land use in an earth observation database Insurance: Identifying groups of motor insurance policy holders with a high average claim cost City-planning: Identifying groups of houses according to their house type, value, and geographical location

© P. Giorgini, F. Dalpiaz 28 Good clustering A good clustering method will produce high quality clusters with high intra-class similarity low inter-class similarity Dissimilarity/Similarity metric: Similarity is expressed in terms of a distance function, typically metric: d(i, j) The definitions of distance functions are usually very different for interval-scaled, boolean, categorical, ordinal ratio, and vector variables. It is hard to define “similar enough” or “good enough”

© P. Giorgini, F. Dalpiaz 29 A small example How to cluster this data? This process is not easy in practice. Why?

© P. Giorgini, F. Dalpiaz 30 Visualization of the results Presentation of the results or knowledge obtained from data mining in visual forms Examples Scatter plots Association rules Decision trees Clusters

© P. Giorgini, F. Dalpiaz 31 Scatter plots (SAS Enterprise miner)

© P. Giorgini, F. Dalpiaz 32 Association rules (SGI/Mineset)

© P. Giorgini, F. Dalpiaz 33 Decision trees (SGI/Mineset)

© P. Giorgini, F. Dalpiaz 34 Clusters (IBM Intelligent Miner)

© P. Giorgini, F. Dalpiaz 35 Summary Why Data Mining? Data Mining and KDD Data preprocessing Classification Clustering Some scenarios