PREDICTION Elsayed Hemayed Data Mining Course. Outline  Introduction  Regression Analysis  Linear Regression  Multiple Linear Regression  Predictor.

Slides:



Advertisements
Similar presentations
Kin 304 Regression Linear Regression Least Sum of Squares
Advertisements

Statistical Techniques I EXST7005 Simple Linear Regression.
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
1 Simple Linear Regression and Correlation The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES Assessing the model –T-tests –R-square.
Chapter 14 The Simple Linear Regression Model. I. Introduction We want to develop a model that hopes to successfully explain the relationship between.
Simple Linear Regression 1. Correlation indicates the magnitude and direction of the linear relationship between two variables. Linear Regression: variable.
Regression Regression: Mathematical method for determining the best equation that reproduces a data set Linear Regression: Regression method applied with.
LINEAR REGRESSION: What it Is and How it Works Overview What is Bivariate Linear Regression? The Regression Equation How It’s Based on r.
LINEAR REGRESSION: What it Is and How it Works. Overview What is Bivariate Linear Regression? The Regression Equation How It’s Based on r.
Chapter 12 Simple Regression
Statistics 350 Lecture 16. Today Last Day: Introduction to Multiple Linear Regression Model Today: More Chapter 6.
ESTIMATING THE REGRESSION COEFFICIENTS FOR SIMPLE LINEAR REGRESSION.
Stat 217 – Day 26 Regression, cont.. Last Time – Two quantitative variables Graphical summary  Scatterplot: direction, form (linear?), strength Numerical.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Lecture 17 Interaction Plots Simple Linear Regression (Chapter ) Homework 4 due Friday. JMP instructions for question are actually for.
Lecture 5: Simple Linear Regression
1 Chapter 17: Introduction to Regression. 2 Introduction to Linear Regression The Pearson correlation measures the degree to which a set of data points.
Simple Linear Regression Analysis
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
Chapter 6 (cont.) Regression Estimation. Simple Linear Regression: review of least squares procedure 2.
Classification and Prediction: Regression Analysis
Simple Linear Regression
Chapter 11 Simple Regression
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
1 FORECASTING Regression Analysis Aslı Sencer Graduate Program in Business Information Systems.
Linear Trend Lines = b 0 + b 1 X t Where is the dependent variable being forecasted X t is the independent variable being used to explain Y. In Linear.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Classification and Prediction (cont.) Pertemuan 10 Matakuliah: M0614 / Data Mining & OLAP Tahun : Feb
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 12-1 Correlation and Regression.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Statistical Methods Statistical Methods Descriptive Inferential
Multiple Regression I KNNL – Chapter 6. Models with Multiple Predictors Most Practical Problems have more than one potential predictor variable Goal is.
Simple Linear Regression. Deterministic Relationship If the value of y (dependent) is completely determined by the value of x (Independent variable) (Like.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 2 – Slide 1 of 20 Chapter 4 Section 2 Least-Squares Regression.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
STATISTICS 12.0 Correlation and Linear Regression “Correlation and Linear Regression -”Causal Forecasting Method.
Chapter 13 Multiple Regression
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
Correlation & Regression Analysis
Economics 173 Business Statistics Lecture 10 Fall, 2001 Professor J. Petry
Chapter 8: Simple Linear Regression Yang Zhenlin.
© 2001 Prentice-Hall, Inc.Chap 13-1 BA 201 Lecture 18 Introduction to Simple Linear Regression (Data)Data.
1 Simple Linear Regression and Correlation Least Squares Method The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES.
4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look.
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
Chapter 12 Simple Regression Statistika.  Analisis regresi adalah analisis hubungan linear antar 2 variabel random yang mempunyai hub linear,  Variabel.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
The simple linear regression model and parameter estimation
Chapter 14 Introduction to Multiple Regression
Chapter 7. Classification and Prediction
Kin 304 Regression Linear Regression Least Sum of Squares
Rule-Based Classification
Multiple Regression.
Chapter 11 Simple Regression
BPK 304W Regression Linear Regression Least Sum of Squares
Simple Linear Regression - Introduction
CHAPTER 29: Multiple Regression*
LESSON 21: REGRESSION ANALYSIS
6-1 Introduction To Empirical Models
Regression Models - Introduction
The Multiple Regression Model
Introduction to Regression
Regression Models - Introduction
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

PREDICTION Elsayed Hemayed Data Mining Course

Outline  Introduction  Regression Analysis  Linear Regression  Multiple Linear Regression  Predictor Error Measure  Evaluating the Accuracy a Predictor 2 Prediction

Introduction  “What if we would like to predict a continuous value, rather than a categorical label (like classification)?”  Numeric prediction is the task of predicting continuous (or ordered) values for given input.  The salary of college graduates with 10 years of work experience,  The potential sales of a new product given its price.  The most widely used approach for numeric prediction is regression, a statistical methodology 3 Prediction

Regression Analysis Prediction 4  Regression analysis can be used to model the relationship between one or more independent or predictor variables and a dependent or response variable (which is continuous-valued).  The predictor variables are the attributes of interest describing the tuple (the values of the predictor variables are known)  The response variable is what we want to predict  Given a tuple described by predictor variables, we want to predict the associated value of the response variable

Regression Analysis – cont. Prediction 5  We’ll discuss straight-line regression analysis (which involves a single predictor variable) and multiple linear regression analysis (which involves two or more predictor variables)  Several software packages exist to solve regression problems. Examples include SAS ( SPSS ( and S-Plus (

Straight Line Regression Linear Regression 6 Prediction

Straight line regression Prediction 7  Straight-line regression analysis involves a response variable, y, and a single predictor variable, x.  It is the simplest form of regression, and models y as a linear function of x.  That is,y = w0+w1x;  where the variance of y is assumed to be constant,  w0 and w1 are regression coefficients  w0 the Y-intercept  w1 the slope of the line.

Straight line regression – cont. Prediction 8  These coefficients can be solved for by the method of least squares, which estimates the best-fitting straight line as the one that minimizes the error between the actual data and the estimate of the line.  Let D be a training set consisting of values of predictor variable, x, for some population and their associated values for response variable, y.  The training set contains |D| data points of the form(x1, y1), (x2, y2), : : :, (x |D|, y |D| )

Straight line regression – cont. Prediction 9  where is the mean value of x1, x2, : : :, x|D|, and is the mean value of y1, y2, : : :, y|D| Prediction

Example – Salary Data Prediction 10  Using Least Square Method  y = x.  Thus the salary of a college graduate with, say, 10 years of experience is $58,600.

Multiple Linear Regression Prediction 11  It allows response variable y to be modeled as a linear function of, say, n predictor variables or attributes, A1, A2, : : :, An, describing a tuple, X. (That is, X = (x1, x2, : : :, xn).)  An example of a multiple linear regression model based on two predictor attributes or variables, A1 and A2, is y = w0+w1x1+w2x2,  where x1 and x2 are the values of attributes A1 and A2, respectively, in X.

Multiple Linear Regression – Least Squares Prediction 12  The least squares method can be extended to solve for w0, w1, and w2.  The equations, however, become long and are tedious to solve by hand.  Multiple regression problems are instead commonly solved with the use of statistical software packages, such as SAS, SPSS, and S-Plus

Predictor Error Measures Prediction 13  Let DT be a test set of the form (X1, y1), (X2,y2), : : :, (Xd, yd), where the Xi are the n-dimensional test tuples with associated known values, yi, for a response variable, y, and d is the number of tuples in DT.  The mean squared error exaggerates the presence of outliers, while the mean absolute error does not.

Predictor Error Measures – cont. Prediction 14 The mean value of the yi’s of the training data,

Evaluating the Accuracy a Predictor – The Holdout method Prediction 15  The given data are randomly partitioned into two independent sets, a training set and a test set.  Typically, two-thirds of the data are allocated to the training set, and the remaining one-third is allocated to the test set.  The training set is used to derive the model, whose accuracy is estimated with the test set.  The estimate is pessimistic because only a portion of the initial data is used to derive the model.

Estimating accuracy with the holdout method Prediction 16

Random Subsampling Prediction 17  The holdout method is repeated k times.  The overall accuracy estimate is taken as the average of the accuracies obtained from each iteration.  (For prediction, we can take the average of the predictor error rates.)

Homework due day 3 Prediction 18  Prepare a database with several thousands of records  Define a data mining application to run on your data  Download and install free data mining tool  Use the tool to mine your data  Prepare a demo to present your findings to the class.

Summary  Introduction  Regression Analysis  Linear Regression  Multiple Linear Regression  Predictor Error Measure  Evaluating the Accuracy a Predictor 19 Prediction