Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data mining and statistical learning, lecture 2 Outline  An example of data mining  SAS Enterprise miner.

Similar presentations


Presentation on theme: "Data mining and statistical learning, lecture 2 Outline  An example of data mining  SAS Enterprise miner."— Presentation transcript:

1 Data mining and statistical learning, lecture 2 Outline  An example of data mining  SAS Enterprise miner

2 Data mining and statistical learning, lecture 2 Daily electricity consumption in Sweden

3 Data mining and statistical learning, lecture 2 ln daily electricity consumption in Sweden

4 Data mining and statistical learning, lecture 2 Available data  Daily levels of the total electricity consumption in Sweden 2002-2006  Daily levels of temperature, wind speed, and precipitation at a large number of weather stations in Sweden  Population in all municipalities in Sweden  Calendar data (Julian day, weekdays, holidays)

5 Data mining and statistical learning, lecture 2 Selecting, exploring, and modifying data Too much weather data!  We assigned a weather station to each municipality, and computed population-weighted mean values for the temperature, wind speed and precipitation in the whole of Sweden  Then we examined the relationship between the electricity consumption and the population-weighted weather data

6 Data mining and statistical learning, lecture 2 ln daily electricity consumption vs population- weighted mean temperature in Sweden

7 Data mining and statistical learning, lecture 2 Cubic spline with one knot (at x=1) Between knots, the spline function is identical to a third order polynomial At knots the function and its first two derivatives are continuous

8 Data mining and statistical learning, lecture 2 Some examples of additive models A nonlinear, additive model A mixed linear and nonlinear, additive model

9 Data mining and statistical learning, lecture 2 Modelling ln daily electricity consumption as a spline function of the population-weighted mean temperature in Sweden proc gam data=mining.electricity; model lnConsumption = spline(Mean_temp, df=20); ID Time(day); output out=smhiouttemp pred resid; run;

10 Data mining and statistical learning, lecture 2 Modelling ln daily electricity consumption as a spline function of the population-weighted mean temperature in Sweden: residual analysis

11 Data mining and statistical learning, lecture 2 Modelling ln daily electricity consumption in Sweden - residual analysis Spline of temperature Spline of Julian day Weekday dummies

12 Data mining and statistical learning, lecture 2 Modelling ln daily electricity consumption in Sweden - residual analysis Spline of temperature Spline of Julian day Weekday dummies Splines of contemporaneous and time-lagged weather data Splines of Julian day and time Weekday and holiday dummies

13 Data mining and statistical learning, lecture 2 Deviance analysis of the investigated models of ln daily electricity consumption in Sweden The residual deviance of a fitted model is minus twice its log-likelihood If the error terms are normally distributed, the deviance is equal to the sum of squared residuals

14 Data mining and statistical learning, lecture 2 Modelling ln daily electricity consumption in Sweden: time series plot of residuals

15 Data mining and statistical learning, lecture 2 Model selection in data-rich environments Divide the given data sets into two parts Use the training set to fit all potential models Use the test set to validate the tested models TrainingTest

16 Data mining and statistical learning, lecture 2 Model selection and unbiased estimation of the predictive power of the selected model Divide the given data sets into three parts Use the training set to fit all potential models Use the validation set to select a model Use the test set to compute an unbiased estimate of the predictive power of the selected model TrainingValidationTest

17 Data mining and statistical learning, lecture 2 SAS Enterprise Miner A toolbox for the five elements of data mining offering:  Convenient handling of large and complex datasets  Convenient comparison and assessment of many models  Widely used procedures for prediction, classification and association analysis

18 Data mining and statistical learning, lecture 2 SAS Enterprise Miner Run the miner  Import data  Create a project  Create a dataflow diagram  Edit the nodes of the diagram  Run a diagram  Assess the results Write and run SAS code


Download ppt "Data mining and statistical learning, lecture 2 Outline  An example of data mining  SAS Enterprise miner."

Similar presentations


Ads by Google