Data mining and statistical learning, lecture 2 Outline  An example of data mining  SAS Enterprise miner.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Exploring the Shape of the Dose-Response Function.
Topic 9: Remedies.
Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.
Generalized Additive Models Keith D. Holler September 19, 2005 Keith D. Holler September 19, 2005.
Probability & Statistical Inference Lecture 9
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Regression Analysis Simple Regression. y = mx + b y = a + bx.
Ch11 Curve Fitting Dr. Deshi Ye
Correlation and regression
Qualitative Variables and
Model assessment and cross-validation - overview
Data mining and statistical learning - lecture 6
The loss function, the normal equation,
Datamining and statistical learning - lecture 9 Generalized linear models (GAMs)  Some examples of linear models  Proc GAM in SAS  Model selection in.
Chapter 13 Multiple Regression
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Multiple regression analysis
Data mining and statistical learning - lab2-4 Lab 2, assignment 1: OLS regression of electricity consumption on temperature at 53 sites.
BA 555 Practical Business Analysis
Chapter 12 Multiple Regression
Regression Diagnostics Using Residual Plots in SAS to Determine the Appropriateness of the Model.
Data mining and statistical learning, lecture 5 Outline  Summary of regressions on correlated inputs  Ridge regression  PCR (principal components regression)
Data mining and statistical learning, lecture 3 Outline  Ordinary least squares regression  Ridge regression.
OLS versus MLE Example YX Here is the data:
An Introduction to Logistic Regression
EPI809/Spring Testing Individual Coefficients.
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.
SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005.
Least-Squares Regression
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,
Ch4 Describing Relationships Between Variables. Pressure.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Linear Regression Analysis 5E Montgomery, Peck & Vining 1 Chapter 8 Indicator Variables.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
Single-Factor Studies KNNL – Chapter 16. Single-Factor Models Independent Variable can be qualitative or quantitative If Quantitative, we typically assume.
Assignments CS fall Assignment 1 due Generate the in silico data set of 2sin(1.5x)+ N (0,1) with 100 random values of x between.
Maths Study Centre CB Open 11am – 5pm Semester Weekdays
1 Experimental Statistics - week 12 Chapter 11: Linear Regression and Correlation Chapter 12: Multiple Regression.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
CMS SAS Users Group Conference Learn more about THE POWER TO KNOW ® October 17, 2011 Medicare Payment Standardization Modeling using SAS Enterprise Miner.
Forecasting. Model with indicator variables The choice of a forecasting technique depends on the components identified in the time series. The techniques.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Linear Regression.
Data mining and statistical learning, lecture 1b
Psychology 202a Advanced Psychological Statistics
Advanced Analytics Using Enterprise Miner
Statistical Methods For Engineers
CHAPTER 29: Multiple Regression*
Linear Regression.
6-1 Introduction To Empirical Models
Linear regression Fitting a straight line to observations.
Nonlinear Fitting.
Adequacy of Linear Regression Models
Presentation transcript:

Data mining and statistical learning, lecture 2 Outline  An example of data mining  SAS Enterprise miner

Data mining and statistical learning, lecture 2 Daily electricity consumption in Sweden

Data mining and statistical learning, lecture 2 ln daily electricity consumption in Sweden

Data mining and statistical learning, lecture 2 Available data  Daily levels of the total electricity consumption in Sweden  Daily levels of temperature, wind speed, and precipitation at a large number of weather stations in Sweden  Population in all municipalities in Sweden  Calendar data (Julian day, weekdays, holidays)

Data mining and statistical learning, lecture 2 Selecting, exploring, and modifying data Too much weather data!  We assigned a weather station to each municipality, and computed population-weighted mean values for the temperature, wind speed and precipitation in the whole of Sweden  Then we examined the relationship between the electricity consumption and the population-weighted weather data

Data mining and statistical learning, lecture 2 ln daily electricity consumption vs population- weighted mean temperature in Sweden

Data mining and statistical learning, lecture 2 Cubic spline with one knot (at x=1) Between knots, the spline function is identical to a third order polynomial At knots the function and its first two derivatives are continuous

Data mining and statistical learning, lecture 2 Some examples of additive models A nonlinear, additive model A mixed linear and nonlinear, additive model

Data mining and statistical learning, lecture 2 Modelling ln daily electricity consumption as a spline function of the population-weighted mean temperature in Sweden proc gam data=mining.electricity; model lnConsumption = spline(Mean_temp, df=20); ID Time(day); output out=smhiouttemp pred resid; run;

Data mining and statistical learning, lecture 2 Modelling ln daily electricity consumption as a spline function of the population-weighted mean temperature in Sweden: residual analysis

Data mining and statistical learning, lecture 2 Modelling ln daily electricity consumption in Sweden - residual analysis Spline of temperature Spline of Julian day Weekday dummies

Data mining and statistical learning, lecture 2 Modelling ln daily electricity consumption in Sweden - residual analysis Spline of temperature Spline of Julian day Weekday dummies Splines of contemporaneous and time-lagged weather data Splines of Julian day and time Weekday and holiday dummies

Data mining and statistical learning, lecture 2 Deviance analysis of the investigated models of ln daily electricity consumption in Sweden The residual deviance of a fitted model is minus twice its log-likelihood If the error terms are normally distributed, the deviance is equal to the sum of squared residuals

Data mining and statistical learning, lecture 2 Modelling ln daily electricity consumption in Sweden: time series plot of residuals

Data mining and statistical learning, lecture 2 Model selection in data-rich environments Divide the given data sets into two parts Use the training set to fit all potential models Use the test set to validate the tested models TrainingTest

Data mining and statistical learning, lecture 2 Model selection and unbiased estimation of the predictive power of the selected model Divide the given data sets into three parts Use the training set to fit all potential models Use the validation set to select a model Use the test set to compute an unbiased estimate of the predictive power of the selected model TrainingValidationTest

Data mining and statistical learning, lecture 2 SAS Enterprise Miner A toolbox for the five elements of data mining offering:  Convenient handling of large and complex datasets  Convenient comparison and assessment of many models  Widely used procedures for prediction, classification and association analysis

Data mining and statistical learning, lecture 2 SAS Enterprise Miner Run the miner  Import data  Create a project  Create a dataflow diagram  Edit the nodes of the diagram  Run a diagram  Assess the results Write and run SAS code