11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 1 Prediction.

Slides:



Advertisements
Similar presentations
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Advertisements

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Preventing Overfitting Problem: We don’t want to these algorithms to fit to ``noise’’ The generated tree may overfit the training data –Too many branches,
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)
Lazy vs. Eager Learning Lazy vs. eager learning
1er. Escuela Red ProTIC - Tandil, de Abril, Instance-Based Learning 4.1 Introduction Instance-Based Learning: Local approximation to the.
CJT 765: Structural Equation Modeling Class 3: Data Screening: Fixing Distributional Problems, Missing Data, Measurement.
Instance Based Learning
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Comparing the Various Types of Multiple Regression
Bayesian classifiers.
Instance based learning K-Nearest Neighbor Locally weighted regression Radial basis functions.
Statistics for Managers Using Microsoft® Excel 5th Edition
Bivariate Regression CJ 526 Statistical Analysis in Criminal Justice.
Instance Based Learning
Statistical Methods Chichang Jou Tamkang University.
Instance-Based Learning
Classification and Regression. Classification and regression  What is classification? What is regression?  Issues regarding classification and regression.
19-1 Chapter Nineteen MULTIVARIATE ANALYSIS: An Overview.
Lecture 6: Multiple Regression
These slides are based on Tom Mitchell’s book “Machine Learning” Lazy learning vs. eager learning Processing is delayed until a new instance must be classified.
Aprendizagem baseada em instâncias (K vizinhos mais próximos)
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Topic 3: Regression.
INSTANCE-BASE LEARNING
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Classification and Prediction: Regression Analysis
Variance and covariance Sums of squares General linear models.
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
INFERENTIAL STATISTICS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON
Chapter 13: Inference in Regression
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Basic Data Mining Technique
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
SW388R6 Data Analysis and Computers I Slide 1 Multiple Regression Key Points about Multiple Regression Sample Homework Problem Solving the Problem with.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
Multiple regression.
CpSc 881: Machine Learning Instance Based Learning.
CpSc 810: Machine Learning Instance Based Learning.
Outline K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Lazy Learners K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
There is a hypothesis about dependent and independent variables The relation is supposed to be linear We have a hypothesis about the distribution of errors.
 Seeks to determine group membership from predictor variables ◦ Given group membership, how many people can we correctly classify?
Research Methodology Lecture No :26 (Hypothesis Testing – Relationship)
CS Machine Learning Instance Based Learning (Adapted from various sources)
Kansas State University Department of Computing and Information Sciences CIS 890: Special Topics in Intelligent Systems Wednesday, November 15, 2000 Cecil.
CS 8751 ML & KDDInstance Based Learning1 k-Nearest Neighbor Locally weighted regression Radial basis functions Case-based reasoning Lazy and eager learning.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
BINARY LOGISTIC REGRESSION
Chapter 7. Classification and Prediction
CH 5: Multivariate Methods
Regression Analysis Simple Linear Regression
Multiple Regression.
INFERENTIAL STATISTICS: REGRESSION ANALYSIS AND STANDARDIZATION
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Machine Learning: UNIT-4 CHAPTER-1
Presentation transcript:

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 1 Prediction

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 2 Learning Objectives Analyze datasets involving predictive tasks with linear regression –Calculate predicted variables. Analyze datasets involving predictive tasks with nearest neighbor. Evaluate the prediction performance. Interpret the analysis results.

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 3 What Is Prediction? Prediction is similar to classification –First, construct a model –Second, use model to predict unknown value Major method for prediction is regression –Linear and multiple regression –Non-linear regression Prediction is different from classification –Classification refers to predicting categorical class label –Prediction models continuous-valued functions

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 4 Linear regression : Y =  +  X –Two parameters,  and  specify the line and are to be estimated by using the data at hand. –using the least squares criterion to the known values of Y 1, Y 2, …, X 1, X 2, …. Multiple regression : Y = b0 + b1 X1 + b2 X2. –Many nonlinear functions can be transformed into the above. Log-linear models : –The multi-way table of joint probabilities is approximated by a product of lower-order tables. –Probability: p(a, b, c, d) =  ab  ac  ad  bcd Regression Analysis and Log- Linear Models in Prediction

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 5 Method of Least Squares –With N data points in the form (x 1, y 1 ), (x 2, y 2 ), …, (x N, y N ), estimate the coefficients in y = w 0 + w 1 x Least Squares Method

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 6 Prediction: Numerical Data

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 7 Prediction: Categorical Data

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 8 Multivariate Data Multiple measurements (sensors) d inputs/features/attributes: d-variate N instances/observations/examples

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 9 Multivariate Parameters

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 10 Parameter Estimation

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 11 Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those instances: not a good idea if the sample is small Use ‘missing’ as an attribute: may give information Imputation: Fill in the missing value –Mean imputation: Use the most likely value (e.g., mean) –Imputation by regression: Predict based on other attributes

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 12 Multivariate Normal Distribution

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 13 Multivariate Regression Multivariate linear model Multivariate polynomial model: Define new higher-order variables z 1 =x 1, z 2 =x 2, z 3 =x 1 2, z 4 =x 2 2, z 5 =x 1 x 2 and use the linear model in this new z space (basis functions, kernel trick, SVM)

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 14 When to Choose Multivariate Regression One dependent variable Independent variables continuous Independent variables continuous & nominal Independent variables nominal ContinuousMultiple regression NominalDiscriminant analysis Logistic regression

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 15 Dataset

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 16 Data Mining Questions Can we predict men’s life expectancy – lifeexpm - in the world based on the following predictors: –People living in cities – urban –People who read – literacy –Infant mortality – babymort –Gross domestic product – gdp_cap –Aids cases – aids –Daily calorie intake – calories Same question, omitting babymort. Can we predict women’s life expectancy – lifeexpmf – based on lifeexpmm and the previous predictors.

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 17 Assumptions Assumptions in multiple linear regression: –There exists a linear relationship between the independent variables / predictors and the dependent variable. –The error / residual is normally distributed  parametric prediction. –The error is not correlated with the predictor. –There is no multicollinearity between the independent variables  no pair or subset is correlated. Matrix of correlations between pairs of predictors.

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 18 Different Methods Simultaneous regression –No prior ideas about the variables, small set of variables. Hierarchical regression –The data analyst has prior ideas about the predicting power of the different variables. He/she can create an order between the variables. Questions to answer: how prediction by certain variables improves on prediction by others. Stepwise regression –Enter the variables sequentially, capitalizes on chance, large set of variables – not recommended.

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 19 Simultaneous Method Question: can we predict lifeexpm based on the following predictors: urban, literacy, babymort, gdp_cap, aids, calories? Enter all the variables simultaneously. Study the relative contribution of each variable to the prediction.

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 20 Check the Assumptions In SPSS, several assumptions can be checked during analysis by requesting –Correlation matrix  pairwise collinearity –Coefficients table  Multicollinearity  consider combining these variables –Study scatterplots of the data and look for linear relationships between each predictor and the dependent variable …

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 21 Collinearity Can be checked before regression analysis too. Analyze  correlate  bivariate select the independent variables urban, literacy, babymort, gdp_cap, aids, calories select Options  missing values  exclude cases listwise. Click Continue  OK Pearson correlation coefficient: –r > 0.5 or r < 0.5 and significant at p < 0.05 –Eliminate correlations greater than 0.9 or smaller than -0.9, if significant.

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 22 Collinearity

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 23 Simultaneous Method Conduct the regression analysis with all these variables. Analyze  Regression  Linear. Select the dependent variable and the independent variables. Select Method  Enter (simultaneous). Statistics  all selected except covariance matrix. Continue  OK.

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 24 Simultaneous Method

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 25 Simultaneous Method

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 26 Simultaneous Method

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 27 Simultaneous Method Multiple correlation coefficient (R) is.955. Adjusted R square of.905 indicates that 90.5% of the variance in average male life expectancy can be predicted from the predictors. Maybe some predictors are not helping.

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 28 Simultaneous Method ANOVA (ANalysis Of VAriance) indicates with F= that the predictors significantly predict the dependent variable – greater than 1.0 at least. Tests the fit of the model to the data.

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 29 Simultaneous Method Coefficients indicates the standardized beta coefficients – this is the most important information about which variables contribute the most. t value indicates with the Sig whether the contribution of this variable is significant – needs to be < Changing the variables may have an effect on these numbers. Each VIF should be less than 10, and average should not be a lot greater than 1. If tolerance is < 1-R 2 =.095, then there is a risk of multicollinearity. This is not the case here for any variable.

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 30 Simultaneous Method Each variable should have most of its variance proportion in one dimension only – Ex: 86%, 80%,.... Otherwise, could indicate collinearity. Indicates how much each variable contributes to any collinearity in the model.

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 31 Regression Result Question: can we predict lifeexpm based on the following predictors: urban, literacy, babymort, gdp_cap, aids, calories? Answer Multiple regression was conducted to determine the best linear combination of urban, literacy, babymort, gdp_cap, aids, and calories for predicting lifeexpm. The descriptive statistics and correlations can be found in table …. and indicate a strong correlation between babymort and literacy.

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 32 Regression Result This combination of variables significantly predicted lifeexpm, F(6, 73) = , p < Only literacy and babymort significantly contributed to the prediction. The beta weights, presented in table … suggest that babymort contributes the most to predicting lifeexpm, and that literacy also contributes to this prediction. The adjusted R squared value was.905, which indicates that 90.5% of the variance in lifeexpm was explained by the model. This is a very large effect.

33 Instance-Based Methods Instance-based learning: –Store training examples and delay the processing (“lazy evaluation”) until a new instance must be classified Typical approaches –k-nearest neighbor approach Instances represented as points in a Euclidean space. –Locally weighted regression Constructs local approximation –Case-based reasoning Uses symbolic representations and knowledge-based inference

34 The k-Nearest Neighbor Algorithm All instances correspond to points in the n-D space. The nearest neighbors are defined in terms of Euclidean distance. The target function could be discrete- or real- valued. For discrete-valued, the k-NN returns the most common value among the k training examples nearest to x q. Vonoroi diagram: the decision surface induced by 1-NN for a typical set of training examples.. _ + _ xqxq + _ _ + _ _

35 Discussion on the k-NN Algorithm The k-NN algorithm for continuous-valued target functions –Calculate the mean values of the k nearest neighbors Distance-weighted nearest neighbor algorithm –Weight the contribution of each of the k neighbors according to their distance to the query point x q giving greater weight to closer neighbors –Similarly, for real-valued target functions Robust to noisy data by averaging k-nearest neighbors Curse of dimensionality: distance between neighbors could be dominated by irrelevant attributes. –To overcome it, axes stretch or elimination of the least relevant attributes.

36 The k-Nearest Neighbor Algorithm This algorithm can be used for classification tasks –Example: word pronunciation Or for prediction tasks.