1 Ka-fu Wong University of Hong Kong A Brief Review of Probability, Statistics, and Regression for Forecasting.

Slides:



Advertisements
Similar presentations
Estimation of Means and Proportions
Advertisements

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 9 Inferences Based on Two Samples.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
The Simple Regression Model
Previous Lecture: Distributions. Introduction to Biostatistics and Bioinformatics Estimation I This Lecture By Judy Zhong Assistant Professor Division.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
Statistical Estimation and Sampling Distributions
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Linear regression models
Simple Regression. Major Questions Given an economic model involving a relationship between two economic variables, how do we go about specifying the.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
The Simple Linear Regression Model: Specification and Estimation
Chapter 10 Simple Regression.
Simple Linear Regression
Point estimation, interval estimation
Chapter 4 Multiple Regression.
The Simple Regression Model
SIMPLE LINEAR REGRESSION
1 Ka-fu Wong University of Hong Kong Pulling Things Together.
Chapter 11 Multiple Regression.
OMS 201 Review. Range The range of a data set is the difference between the largest and smallest data values. It is the simplest measure of dispersion.
Inference about a Mean Part II
The Simple Regression Model
AP Statistics Section 10.2 A CI for Population Mean When is Unknown.
Inferences About Process Quality
Review of Probability and Statistics
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
5-3 Inference on the Means of Two Populations, Variances Unknown
Simple Linear Regression and Correlation
Introduction to Regression Analysis, Chapter 13,
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Chapter 4-5: Analytical Solutions to OLS
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
1 G Lect 3b G Lecture 3b Why are means and variances so useful? Recap of random variables and expectations with examples Further consideration.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Variables and Random Variables àA variable is a quantity (such as height, income, the inflation rate, GDP, etc.) that takes on different values across.
Byron Gangnes Econ 427 lecture 3 slides. Byron Gangnes A scatterplot.
Y X 0 X and Y are not perfectly correlated. However, there is on average a positive relationship between Y and X X1X1 X2X2.
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
ELEC 303 – Random Signals Lecture 18 – Classical Statistical Inference, Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 4, 2010.
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Geology 5670/6670 Inverse Theory 21 Jan 2015 © A.R. Lowry 2015 Read for Fri 23 Jan: Menke Ch 3 (39-68) Last time: Ordinary Least Squares Inversion Ordinary.
The final exam solutions. Part I, #1, Central limit theorem Let X1,X2, …, Xn be a sequence of i.i.d. random variables each having mean μ and variance.
6. Simple Regression and OLS Estimation Chapter 6 will expand on concepts introduced in Chapter 5 to cover the following: 1) Estimating parameters using.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Sampling and estimation Petter Mostad
Review of Statistics.  Estimation of the Population Mean  Hypothesis Testing  Confidence Intervals  Comparing Means from Different Populations  Scatterplots.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and  2 Now, we need procedures to calculate  and  2, themselves.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Bias-Variance Analysis in Regression  True function is y = f(x) +  where  is normally distributed with zero mean and standard deviation .  Given a.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Inference about the slope parameter and correlation
Ch3: Model Building through Regression
Simple Linear Regression - Introduction
CONCEPTS OF ESTIMATION
Regression Models - Introduction
The Regression Model Suppose we wish to estimate the parameters of the following relationship: A common method is to choose parameters to minimise the.
Simple Linear Regression
Regression Models - Introduction
Presentation transcript:

1 Ka-fu Wong University of Hong Kong A Brief Review of Probability, Statistics, and Regression for Forecasting

2 Random variable A random variable is a mapping from the set of all possible outcomes to the real numbers. Today’s Hang Seng Index can go up, down or stay the same as yesterday. Consider the movement of Hang Seng Index in a month of 22 trading days. We can define a random variable Y of number of days in which Hang Seng Index goes up. In this case, Y assumes 22 values, y=1, y=2, …, y=22. Discrete random variables can assume only a countable number of values. A discrete probability distribution describes the probability of occurrence for all the events. For instance, p i is the probability that event i will occur. Continuous random variables can assume a continuum of values. A probability density function, f(y), is a nonnegative continuous function such that the area under f(y) between any points a and b is the probability that Y assumes a value between a and b.

3 Moments Mean (measures central tendency): Variance (measures dispersion around mean): Standard deviation: Skewness (measures the amount of asymmetry in a distribution): Kurtosis (measures the thickness of the tails in a distribution):

4 Multivariate Random Variables Joint distribution: Covariance (measures dependence between two variables): Correlation: Conditional distribution: Conditional mean Conditional variance

5 Statistics Sample mean: Sample variance:or Sample standard deviation: or

6 Statistics Sample skewness: Sample kurtosis: Jarque-Bera test statistics: Under null of independent normally distributed observations, JB is distributed in large samples as a chi-square distribution with two degrees of freedom.

7 Example What is our expectation of y given x=0?

8 Forecast Suppose we want to forecast the value of a variable y, given the value of a variable x. Denote that forecast y f │ x.

9 Conditional expectation as a forecast Think of y and x as random variables jointly drawn from some underlying population. It seems reasonable to consider constructing the forecast of y based on x as the expected value of y conditional on x, i.e., y f │ x = E(y │ x ), the average population value of y given that value of x. E(y │ x ) is also called the population regression of y (on x).

10 Conditional expectation as a forecast The expected value of y conditional on x y f │ x = E(y │ x ), It turns out that in many reasonable forecasting settings, this forecast has optimal properties (e.g., minimizing expected loss), and (approximating) this forecast guides our choice of forecast method.

11 Unbiasedness of Conditional expectation as a forecast The forecast error will be y - E(y │ x ) Expected forecast error = E[y - E(y │ x )] = E(y)-E[E(y │ x )] = E(y)-E(y) = 0 Thus the conditional expectation is an unbiased forecast. Note that another name for E(y │ x ) is the population regression of y (on x).

12 Some operational assumptions about E(y | x) In order to proceed in this direction, we need to make some additional assumptions about the underlying population and, in particular, the form of E(y │ x ). The simplest assumption to make is to assume that the conditional expectation is a linear function of x, i.e., assume E(y │ x ) = β 0 + β 1 x If β 0 and β 1 are known, then the forecast problem is completed by setting y f │ x = β 0 + β 1 x

13 When parameters are unknown Even if the conditional expectation is linear in x, the parameters β 0 and β 1 will be unknown. The next best thing for us to do would be to estimate the values of β 0 and β 1 and use the estimated β’s in place of their actual values to form the forecasts. This substitution will not provide as accurate a forecast, since we’re introducing a new source of forecast error due to “estimation error” or “sampling error.” However, under certain conditions the resulting forecast will still be unbiased and retain certain optimality properties.

14 When parameters are unknown Suppose we have access to a sample of T pairs of (x,y) drawn from the population from which the relevant value of y will be drawn: (x 1,y 1 ),(x 2,y 2 ),…,(x T,y T ). In this case, a natural estimator of β 0 and β 1 is the ordinary least squares (OLS) estimator, which is obtained by minimizing the sum of squared residuals  y t –  β 0 – β 1 x t   with respect to β 0 and β 1. The solution are the OLS estimates and. Then, for a given value of x, we can forecast y according to

15 Fitting a regression line Estimating β 0 and β 1

16 When parameters are unknown This estimation procedure, also called the sample regression of y on x, will provide us with a “good” estimate of the conditional expectation of y given x (i.e., the population regression of y on x) and, therefore, a “good” forecast of y given x, provided that certain additional assumptions apply to the relationship between y and x. Let ε denote the difference between y and E(y │ x ). That is, ε = y - E(y │ x ) i.e.,y = E(y │ x ) + ε and y = β 0 + β 1 x + ε, if E(y │ x ) = β 0 + β 1 x.

17 When parameters are unknown The assumptions that we need pertain to these ε’s (the “other factors” that determine y) and their relationship to the x’s. For instance, so long as E(ε t │ x 1,…,x T ) = 0 for t = 1,…,T, the OLS estimator of β 0 and β 1 based on the data (x 1,y 1 ),…,(x T,y T ) will be unbiased and, as a result, the forecast constructed by replacing these “population parameters” with the OLS estimates will be unbiased. A standard set of assumptions that provide us with a lot of value – Given x 1,…,x T, ε 1,…,ε T are i.i.d. N(0,σ 2 ) random variables.

18 When parameters are unknown These ideas and procedures extend naturally to the setting where we want to forecast the value of y based on the values of k other variables, say, x 1,…,x k. We begin by considering the conditional expectation or population regression of y on x 1,…,x k to make our forecast. That is, y f │ x 1,…,x k = E(y │ x 1,…,x k ) To operationalize this forecast, we first assume that the conditional expectation is linear, i.e., E(y │ x 1,…,x k ) = β 0 + β 1 x 1 + … + β k x k

19 When parameters are unknown The unknown β’s are generally replaced the estimate from a sample OLS regression. Suppose we have the data set (y 1,x 11,…,x k1 ), (y 2,x 12,…,x k2 ), …, (y T,x 1T,…,x kT ) The OLS estimate of the unknown parameters are obtained by minimizing the sum-of-squared residuals,  (y t – β 0 – β 1 x 1t - … - β k x kt ) 2, t = 1,…,T. As in the case of the simple regression model, this procedure to estimate the population regression function will have good properties provided that the regression errors ε t = y t – E(y t │ x 1t,…,x kt ), t = 1,…,T have appropriate properties.

20 Example Multiple Linear regression

21 Residual plots

22 Density Forecasts and Interval Forecasts The procedures we described above produce point forecasts of y. They can also be used to produce density and interval forecasts of y, provided that the x’s and the regression errors, i.e., the ε’s, meet certain conditions.

23 End