Classification and Prediction (cont.) Pertemuan 10 Matakuliah: M0614 / Data Mining & OLAP Tahun : Feb - 2010.

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Regresi Linear Sederhana Pertemuan 01 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Pengujian Parameter Regresi Ganda Pertemuan 22 Matakuliah: L0104/Statistika Psikologi Tahun: 2008.
Data Mining Classification: Alternative Techniques
Regresi dan Korelasi Linear Pertemuan 19
Intelligent Environments1 Computer Science and Engineering University of Texas at Arlington.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Chapter 6. Classification and Prediction
CMPUT 466/551 Principal Source: CMU
Machine Learning Neural Networks
x – independent variable (input)
1 Pertemuan 26 Object Relational Database Management System (Lanjutan) Matakuliah: M0174/OBJECT ORIENTED DATABASE Tahun: 2005 Versi: 1/0.
1 Pertemuan 13 BACK PROPAGATION Matakuliah: H0434/Jaringan Syaraf Tiruan Tahun: 2005 Versi: 1.
Statistical Methods Chichang Jou Tamkang University.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Prediction with Regression Analysis (HK: Chapter 7.8) Qiang Yang HKUST.
Korelasi dan Regresi Linear Sederhana Pertemuan 25
Gini Index (IBM IntelligentMiner)
Data Warehousing and Data Mining
Classification and Prediction: Basic Concepts Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Chapter 5 Data mining : A Closer Look.
Classification and Prediction: Regression Analysis
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Data Mining Techniques
1 FORECASTING Regression Analysis Aslı Sencer Graduate Program in Business Information Systems.
COMP3503 Intro to Inductive Modeling
10/5/2015Data Mining: Concepts and Techniques1 Chapter 6. Classification and Prediction What is classification? What is prediction? Issues regarding classification.
Chapter 9 Neural Network.
Basic Data Mining Technique
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Feature Selection: Why?
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 1 Prediction.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
CpSc 810: Machine Learning Evaluation of Classifier.
Classification Techniques: Bayesian Classification
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression Regression Trees.
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
Prediction By N.Gopinath AP/CSE December 22, 2015Data Mining: Concepts and Techniques1.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Curve Fitting Pertemuan 10 Matakuliah: S0262-Analisis Numerik Tahun: 2010.
Outline K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Lazy Learners K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Classification & Prediction — Continue—. Overfitting in decision trees Small training set, noise, missing values Error rate decreases as training set.
PREDICTION Elsayed Hemayed Data Mining Course. Outline  Introduction  Regression Analysis  Linear Regression  Multiple Linear Regression  Predictor.
Curve Fitting Introduction Least-Squares Regression Linear Regression Polynomial Regression Multiple Linear Regression Today’s class Numerical Methods.
STATISTICS 12.0 Correlation and Linear Regression “Correlation and Linear Regression -”Causal Forecasting Method.
3-1Forecasting Weighted Moving Average Formula w t = weight given to time period “t” occurrence (weights must add to one) The formula for the moving average.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Data Transformation: Normalization
Chapter 7. Classification and Prediction
Regresi dan Korelasi Pertemuan 10
Data Mining: Concepts and Techniques
School of Computer Science & Engineering
Data Mining Lecture 11.
Genetic Algorithms (GA)
Classification and Prediction
CSCI N317 Computation for Scientific Applications Unit Weka
Contact: Machine Learning – (Linear) Regression Wilson Mckerrow (Fenyo lab postdoc) Contact:
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Presentation transcript:

Classification and Prediction (cont.) Pertemuan 10 Matakuliah: M0614 / Data Mining & OLAP Tahun : Feb

Bina Nusantara Pada akhir pertemuan ini, diharapkan mahasiswa akan mampu : Mahasiswa dapat menggunakan teknik analisis classification by decision tree induction, Bayesian classification, classification by back propagation, dan lazy learners pada data mining. (C3) Learning Outcomes 3

Bina Nusantara Acknowledgments These slides have been adapted from Han, J., Kamber, M., & Pei, Y. Data Mining: Concepts and Technique.

Bina Nusantara Other classification methods: Linear and non-linear regression Accuracy and error methods Summary Outline Materi 5

October 11, 2015Data Mining: Concepts and Techniques6 What Is Prediction? (Numerical) prediction is similar to classification –construct a model –use model to predict continuous or ordered value for a given input Prediction is different from classification –Classification refers to predict categorical class label –Prediction models continuous-valued functions Major method for prediction: regression –model the relationship between one or more independent or predictor variables and a dependent or response variable Regression analysis –Linear and multiple regression –Non-linear regression –Other regression methods: generalized linear model, Poisson regression, log-linear models, regression trees

Predictive modeling: Predict data values or construct generalized linear models based on the database data. One can only predict value ranges or category distributions Method outline: – Minimal generalization – Attribute relevance analysis – Generalized linear model construction – Prediction Determine the major factors which influence the prediction –Data relevance analysis: uncertainty measurement, entropy analysis, expert judgement, etc. Multi-level prediction: drill-down and roll-up analysis Predictive Modeling in Databases

October 11, 2015Data Mining: Concepts and Techniques8 Linear Regression Linear regression: involves a response variable y and a single predictor variable x y = w 0 + w 1 x where w 0 (y-intercept) and w 1 (slope) are regression coefficients Method of least squares: estimates the best-fitting straight line Multiple linear regression: involves more than one predictor variable –Training data is of the form (X 1, y 1 ), (X 2, y 2 ),…, (X |D|, y |D| ) –Ex. For 2-D data, we may have: y = w 0 + w 1 x 1 + w 2 x 2 –Solvable by extension of least square method or using SAS, S-Plus –Many nonlinear functions can be transformed into the above

October 11, 2015Data Mining: Concepts and Techniques9 Some nonlinear models can be modeled by a polynomial function A polynomial regression model can be transformed into linear regression model. For example, y = w 0 + w 1 x + w 2 x 2 + w 3 x 3 convertible to linear with new variables: x 2 = x 2, x 3 = x 3 y = w 0 + w 1 x + w 2 x 2 + w 3 x 3 Other functions, such as power function, can also be transformed to linear model Some models are intractable nonlinear (e.g., sum of exponential terms) –possible to obtain least square estimates through extensive calculation on more complex formulae Nonlinear Regression

October 11, 2015Data Mining: Concepts and Techniques10 Generalized linear model: –Foundation on which linear regression can be applied to modeling categorical response variables –Variance of y is a function of the mean value of y, not a constant –Logistic regression: models the prob. of some event occurring as a linear function of a set of predictor variables –Poisson regression: models the data that exhibit a Poisson distribution Log-linear models: (for categorical data) –Approximate discrete multidimensional prob. distributions –Also useful for data compression and smoothing Regression trees and model trees –Trees to predict continuous values rather than class labels Other Regression-Based Models

October 11, 2015Data Mining: Concepts and Techniques11 Regression Trees and Model Trees Regression tree: proposed in CART system –CART: Classification And Regression Trees –Each leaf stores a continuous-valued prediction –It is the average value of the predicted attribute for the training tuples that reach the leaf Model tree: –Each leaf holds a regression model—a multivariate linear equation for the predicted attribute –A more general case than regression tree Regression and model trees tend to be more accurate than linear regression when the data are not represented well by a simple linear model

October 11, 2015Data Mining: Concepts and Techniques12 Predictive modeling: Predict data values or construct generalized linear models based on the database data One can only predict value ranges or category distributions Method outline: – Minimal generalization – Attribute relevance analysis – Generalized linear model construction – Prediction Determine the major factors which influence the prediction –Data relevance analysis: uncertainty measurement, entropy analysis, expert judgement, etc. Multi-level prediction: drill-down and roll-up analysis Predictive Modeling in Multidimensional Databases

October 11, 2015Data Mining: Concepts and Techniques13 Prediction: Numerical Data

October 11, 2015Data Mining: Concepts and Techniques14 Prediction: Categorical Data

October 11, 2015Data Mining: Concepts and Techniques15 Classifier Accuracy Measures Accuracy of a classifier M, acc(M): percentage of test set tuples that are correctly classified by the model M –Error rate (misclassification rate) of M = 1 – acc(M) –Given m classes, CM i,j, an entry in a confusion matrix, indicates # of tuples in class i that are labeled by the classifier as class j Alternative accuracy measures (e.g., for cancer diagnosis) sensitivity = t-pos/pos /* true positive recognition rate */ specificity = t-neg/neg /* true negative recognition rate */ precision = t-pos/(t-pos + f-pos) accuracy = sensitivity * pos/(pos + neg) + specificity * neg/(pos + neg) –This model can also be used for cost-benefit analysis Real class\Predicted class buy_computer = yesbuy_computer = nototalrecognition(%) buy_computer = yes buy_computer = no total Real class\Predicted class C1C1 ~C 1 C1C1 True positiveFalse negative ~C 1 False positiveTrue negative

October 11, 2015Data Mining: Concepts and Techniques16 Predictor Error Measures Measure predictor accuracy: measure how far off the predicted value is from the actual known value Loss function: measures the error between. y i and the predicted value y i ’ –Absolute error: | y i – y i ’| –Squared error: (y i – y i ’) 2 Test error (generalization error): the average loss over the test set –Mean absolute error: Mean squared error: –Relative absolute error: Relative squared error: The mean squared-error exaggerates the presence of outliers Popularly use (square) root mean-square error, similarly, root relative squared error

October 11, 2015Data Mining: Concepts and Techniques17 Summary Classification and prediction are two forms of data analysis that can be used to extract models describing important data classes or to predict future data trends. Effective and scalable methods have been developed for decision trees induction, Naive Bayesian classification, Bayesian belief network, rule-based classifier, Backpropagation, Support Vector Machine (SVM), pattern-based classification, nearest neighbor classifiers, and case-based reasoning, and other classification methods such as genetic algorithms, rough set and fuzzy set approaches. Linear, nonlinear, and generalized linear models of regression can be used for prediction. Many nonlinear problems can be converted to linear problems by performing transformations on the predictor variables. Regression trees and model trees are also used for prediction.

Bina Nusantara Dilanjutkan ke pert. 11 Cluster Analysis