Econ 482.

Econ 482

Random variables: A random variable is any variable whose value cannot be predicted exactly Discrete random variables: Random variable which has a specific (countable set) of possible values Examples? 2. Continuous random variables: Random variable which can take any value of a continuous range of values

Properties of discrete random variables

PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE
red This sequence provides an example of a discrete random variable. Suppose that you have a red die which, when thrown, takes the numbers from 1 to 6 with equal probability. © Christopher Dougherty 1999–2006

red green 1 2 3 4 5 6 Suppose that you also have a green die that can take the numbers from 1 to 6 with equal probability. © Christopher Dougherty 1999–2006

red green 1 2 3 4 5 6 We will define a random variable X as the sum of the numbers when the dice are thrown. © Christopher Dougherty 1999–2006

red green 1 2 3 4 5 For example, if the red die is 4 and the green one is 6, X is equal to 10. © Christopher Dougherty 1999–2006

red green 1 2 3 4 5 7 6 Similarly, if the red die is 2 and the green one is 5, X is equal to 7. © Christopher Dougherty 1999–2006

red green The table shows all the possible outcomes. © Christopher Dougherty 1999–2006

red green X 2 3 4 5 6 7 8 9 10 11 12 If you look at the table, you can see that X can be any of the numbers from 2 to 12. © Christopher Dougherty 1999–2006

red green X f 2 3 4 5 6 7 8 9 10 11 12 We will now define f, the frequencies associated with the possible values of X. © Christopher Dougherty 1999–2006

red green X f 2 3 4 5 4 6 7 8 9 10 11 12 For example, there are four outcomes which make X equal to 5. © Christopher Dougherty 1999–2006

red green X f 2 1 3 2 4 3 5 4 6 5 7 6 8 5 9 4 10 3 11 2 12 1 Similarly you can work out the frequencies for all the other values of X. © Christopher Dougherty 1999–2006

red green X f p 2 1 3 2 4 3 5 4 6 5 7 6 8 5 9 4 10 3 11 2 12 1 Finally we will derive the probability of obtaining each value of X. © Christopher Dougherty 1999–2006

red green X f p 2 1 3 2 4 3 5 4 6 5 7 6 8 5 9 4 10 3 11 2 12 1 If there is 1/6 probability of obtaining each number on the red die, and the same on the green die, each outcome in the table will occur with 1/36 probability. © Christopher Dougherty 1999–2006

red green X f p 2 1 1/36 3 2 2/36 4 3 3/36 5 4 4/36 6 5 5/36 7 6 6/36 8 5 5/36 9 4 4/36 10 3 3/36 11 2 2/36 12 1 1/36 Hence to obtain the probabilities associated with the different values of X, we divide the frequencies by 36. © Christopher Dougherty 1999–2006

2 36 __ 3 36 __ 4 36 __ 5 36 __ 6 36 __ 5 36 __ 4 36 __ 3 36 __ 2 36 __ 1 36 1 36 2 3 4 5 6 7 8 9 10 11 12 X The distribution is shown graphically. in this example it is symmetrical, highest for X equal to 7 and declining on either side. © Christopher Dougherty 1999–2006

EXPECTED VALUE OF A RANDOM VARIABLE
Definition of E(X), the expected value of X: The expected value of a random variable, also known as its population mean, is the weighted average of its possible values, the weights being the probabilities attached to the values. © Christopher Dougherty 1999–2006

Definition of E(X), the expected value of X: Note that the sum of the probabilities must be unity, so there is no need to divide by the sum of the weights. © Christopher Dougherty 1999–2006

xi x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 This sequence shows how the expected value is calculated, first in abstract and then with the random variable defined in the first sequence. We begin by listing the possible values of X. © Christopher Dougherty 1999–2006

xi pi x1 p1 x2 p2 x3 p3 x4 p4 x5 p5 x6 p6 x7 p7 x8 p8 x9 p9 x10 p10 x11 p11 Next we list the probabilities attached to the different possible values of X. © Christopher Dougherty 1999–2006

xi pi xi pi x1 p1 x1 p1 x2 p2 x3 p3 x4 p4 x5 p5 x6 p6 x7 p7 x8 p8 x9 p9 x10 p10 x11 p11 Then we define a column in which the values are weighted by the corresponding probabilities. © Christopher Dougherty 1999–2006

xi pi xi pi x1 p1 x1 p1 x2 p2 x2 p2 x3 p3 x4 p4 x5 p5 x6 p6 x7 p7 x8 p8 x9 p9 x10 p10 x11 p11 We do this for each value separately. © Christopher Dougherty 1999–2006

xi pi xi pi x1 p1 x1 p1 x2 p2 x2 p2 x3 p3 x3 p3 x4 p4 x4 p4 x5 p5 x5 p5 x6 p6 x6 p6 x7 p7 x7 p7 x8 p8 x8 p8 x9 p9 x9 p9 x10 p10 x10 p10 x11 p11 x11 p11 Here we are assuming that n, the number of possible values, is equal to 11, but it could be any number. © Christopher Dougherty 1999–2006

xi pi xi pi x1 p1 x1 p1 x2 p2 x2 p2 x3 p3 x3 p3 x4 p4 x4 p4 x5 p5 x5 p5 x6 p6 x6 p6 x7 p7 x7 p7 x8 p8 x8 p8 x9 p9 x9 p9 x10 p10 x10 p10 x11 p11 x11 p11 S xi pi = E(X) The expected value is the sum of the entries in the third column. © Christopher Dougherty 1999–2006

xi pi xi pi xi pi x1 p1 x1 p1 2 1/36 x2 p2 x2 p2 3 2/36 x3 p3 x3 p3 4 3/36 x4 p4 x4 p4 5 4/36 x5 p5 x5 p5 6 5/36 x6 p6 x6 p6 7 6/36 x7 p7 x7 p7 8 5/36 x8 p8 x8 p8 9 4/36 x9 p9 x9 p9 10 3/36 x10 p10 x10 p /36 x11 p11 x11 p /36 S xi pi = E(X) The random variable X defined in the previous sequence could be any of the integers from 2 to 12 with probabilities as shown. © Christopher Dougherty 1999–2006

xi pi xi pi xi pi xi pi x1 p1 x1 p1 2 1/36 2/36 x2 p2 x2 p2 3 2/36 x3 p3 x3 p3 4 3/36 x4 p4 x4 p4 5 4/36 x5 p5 x5 p5 6 5/36 x6 p6 x6 p6 7 6/36 x7 p7 x7 p7 8 5/36 x8 p8 x8 p8 9 4/36 x9 p9 x9 p9 10 3/36 x10 p10 x10 p /36 x11 p11 x11 p /36 S xi pi = E(X) X could be equal to 2 with probability 1/36, so the first entry in the calculation of the expected value is 2/36. © Christopher Dougherty 1999–2006

xi pi xi pi xi pi xi pi x1 p1 x1 p1 2 1/36 2/36 x2 p2 x2 p2 3 2/36 6/36 x3 p3 x3 p3 4 3/36 x4 p4 x4 p4 5 4/36 x5 p5 x5 p5 6 5/36 x6 p6 x6 p6 7 6/36 x7 p7 x7 p7 8 5/36 x8 p8 x8 p8 9 4/36 x9 p9 x9 p9 10 3/36 x10 p10 x10 p /36 x11 p11 x11 p /36 S xi pi = E(X) The probability of x being equal to 3 was 2/36, so the second entry is 6/36. © Christopher Dougherty 1999–2006

xi pi xi pi xi pi xi pi x1 p1 x1 p1 2 1/36 2/36 x2 p2 x2 p2 3 2/36 6/36 x3 p3 x3 p3 4 3/36 12/36 x4 p4 x4 p4 5 4/36 20/36 x5 p5 x5 p5 6 5/36 30/36 x6 p6 x6 p6 7 6/36 42/36 x7 p7 x7 p7 8 5/36 40/36 x8 p8 x8 p8 9 4/36 36/36 x9 p9 x9 p9 10 3/36 30/36 x10 p10 x10 p /36 22/36 x11 p11 x11 p /36 12/36 S xi pi = E(X) Similarly for the other 9 possible values. © Christopher Dougherty 1999–2006

xi pi xi pi xi pi xi pi x1 p1 x1 p1 2 1/36 2/36 x2 p2 x2 p2 3 2/36 6/36 x3 p3 x3 p3 4 3/36 12/36 x4 p4 x4 p4 5 4/36 20/36 x5 p5 x5 p5 6 5/36 30/36 x6 p6 x6 p6 7 6/36 42/36 x7 p7 x7 p7 8 5/36 40/36 x8 p8 x8 p8 9 4/36 36/36 x9 p9 x9 p9 10 3/36 30/36 x10 p10 x10 p /36 22/36 x11 p11 x11 p /36 12/36 S xi pi = E(X) /36 To obtain the expected value, we sum the entries in this column. © Christopher Dougherty 1999–2006

xi pi xi pi xi pi xi pi x1 p1 x1 p1 2 1/36 2/36 x2 p2 x2 p2 3 2/36 6/36 x3 p3 x3 p3 4 3/36 12/36 x4 p4 x4 p4 5 4/36 20/36 x5 p5 x5 p5 6 5/36 30/36 x6 p6 x6 p6 7 6/36 42/36 x7 p7 x7 p7 8 5/36 40/36 x8 p8 x8 p8 9 4/36 36/36 x9 p9 x9 p9 10 3/36 30/36 x10 p10 x10 p /36 22/36 x11 p11 x11 p /36 12/36 S xi pi = E(X) /36 = 7 The expected value turns out to be 7. Actually, this was obvious anyway. We saw in the previous sequence that the distribution is symmetrical about 7. © Christopher Dougherty 1999–2006

Alternative notation for E(X): E(X) = mX Very often the expected value of a random variable is represented by m, the Greek m. If there is more than one random variable, their expected values are differentiated by adding subscripts to m. © Christopher Dougherty 1999–2006

EXPECTED VALUE OF A FUNCTION OF A RANDOM VARIABLE
Definition of E[g(X)], the expected value of a function of X: To find the expected value of a function of a random variable, you calculate all the possible values of the function, weight them by the corresponding probabilities, and sum the results. © Christopher Dougherty 1999–2006

Definition of E[g(X)], the expected value of a function of X: Example: For example, the expected value of X2 is found by calculating all its possible values, multiplying them by the corresponding probabilities, and summing. © Christopher Dougherty 1999–2006

xi pi x1 p1 x2 p2 x3 p3 … … xn pn First you list the possible values of X and the corresponding probabilities. © Christopher Dougherty 1999–2006

xi pi g(xi) x1 p1 g(x1) x2 p2 g(x2) x3 p3 g(x3) … … …... xn pn g(xn) Next you calculate the function of X for each possible value of X. © Christopher Dougherty 1999–2006

xi pi g(xi) g(xi ) pi x1 p1 g(x1) g(x1) p1 x2 p2 g(x2) x3 p3 g(x3) … … …... xn pn g(xn) Then, one at a time, you weight the value of the function by its corresponding probability. © Christopher Dougherty 1999–2006

xi pi g(xi) g(xi ) pi x1 p1 g(x1) g(x1) p1 x2 p2 g(x2) g(x2) p2 x3 p3 g(x3) g(x3) p3 … … …... ……... xn pn g(xn) g(xn) pn You do this individually for each possible value of X. © Christopher Dougherty 1999–2006

xi pi g(xi) g(xi ) pi x1 p1 g(x1) g(x1) p1 x2 p2 g(x2) g(x2) p2 x3 p3 g(x3) g(x3) p3 … … …... ……... xn pn g(xn) g(xn) pn S g(xi) pi The sum of the weighted values is the expected value of the function of X. © Christopher Dougherty 1999–2006

xi pi g(xi) g(xi ) pi xi pi x1 p1 g(x1) g(x1) p1 2 1/36 x2 p2 g(x2) g(x2) p2 3 2/36 x3 p3 g(x3) g(x3) p3 4 3/36 … … …... …… /36 … … …... …… /36 … … …... …… /36 … … …... …… /36 … … …... …… /36 … … …... …… /36 … … …... …… /36 xn pn g(xn) g(xn) pn 12 1/36 S g(xi) pi The process will be illustrated for X2, where X is the random variable defined in the first sequence. The 11 possible values of X and the corresponding probabilities are listed. © Christopher Dougherty 1999–2006

xi pi g(xi) g(xi ) pi xi pi xi2 x1 p1 g(x1) g(x1) p1 2 1/36 4 x2 p2 g(x2) g(x2) p2 3 2/36 9 x3 p3 g(x3) g(x3) p3 4 3/36 16 … … …... …… /36 25 … … …... …… /36 36 … … …... …… /36 49 … … …... …… /36 64 … … …... …… /36 81 … … …... …… /36 100 … … …... …… /36 121 xn pn g(xn) g(xn) pn 12 1/36 144 S g(xi) pi First you calculate the possible values of X2. © Christopher Dougherty 1999–2006

xi pi g(xi) g(xi ) pi xi pi xi2 xi2 pi x1 p1 g(x1) g(x1) p1 2 1/ x2 p2 g(x2) g(x2) p2 3 2/36 9 x3 p3 g(x3) g(x3) p3 4 3/36 16 … … …... …… /36 25 … … …... …… /36 36 … … …... …… /36 49 … … …... …… /36 64 … … …... …… /36 81 … … …... …… /36 100 … … …... …… /36 121 xn pn g(xn) g(xn) pn 12 1/36 144 S g(xi) pi The first value is 4, which arises when X is equal to 2. The probability of X being equal to 2 is 1/36, so the weighted function is 4/36, which we shall write in decimal form as 0.11. © Christopher Dougherty 1999–2006

xi pi g(xi) g(xi ) pi xi pi xi2 xi2 pi x1 p1 g(x1) g(x1) p1 2 1/ x2 p2 g(x2) g(x2) p2 3 2/ x3 p3 g(x3) g(x3) p3 4 3/ … … …... …… / … … …... …… / … … …... …… / … … …... …… / … … …... …… / … … …... …… / … … …... …… / xn pn g(xn) g(xn) pn 12 1/ S g(xi) pi Similarly for all the other possible values of X. © Christopher Dougherty 1999–2006

xi pi g(xi) g(xi ) pi xi pi xi2 xi2 pi x1 p1 g(x1) g(x1) p1 2 1/ x2 p2 g(x2) g(x2) p2 3 2/ x3 p3 g(x3) g(x3) p3 4 3/ … … …... …… / … … …... …… / … … …... …… / … … …... …… / … … …... …… / … … …... …… / … … …... …… / xn pn g(xn) g(xn) pn 12 1/ S g(xi) pi The expected value of X2 is the sum of its weighted values in the final column. It is equal to It is the average value of the figures in the previous column, taking the differing probabilities into account. © Christopher Dougherty 1999–2006

© Christopher Dougherty 1999–2006
xi pi g(xi) g(xi ) pi xi pi xi2 xi2 pi x1 p1 g(x1) g(x1) p1 2 1/ x2 p2 g(x2) g(x2) p2 3 2/ x3 p3 g(x3) g(x3) p3 4 3/ … … …... …… / … … …... …… / … … …... …… / … … …... …… / … … …... …… / … … …... …… / … … …... …… / xn pn g(xn) g(xn) pn 12 1/ S g(xi) pi Note that E(X2) is not the same thing as E(X), squared. In the previous sequence we saw that E(X) for this example was 7. Its square is 49. © Christopher Dougherty 1999–2006

POPULATION VARIANCE OF A DISCRETE RANDOM VARIABLE
Population variance of X: The previous sequence defined the expected value of a function of a random variable X. There is only one function that is of much interest to us, at least initially: the squared deviation from the population mean. The expected value of the squared deviation is known as the population variance of X. It is a measure of the dispersion of the distribution of X about its population mean. © Christopher Dougherty 1999–2006

xi pi xi – m (xi – m) (xi – m)2 pi 2 1/36 – 3 2/36 – 4 3/36 – 5 4/36 – 6 5/36 – 7 6/ 8 5/ 9 4/ 10 3/ 11 2/ 12 1/ 5.83 We will calculate the population variance of the random variable X defined in the first sequence. We start as usual by listing the possible values of X and the corresponding probabilities. © Christopher Dougherty 1999–2006

xi pi xi – m (xi – m) (xi – m)2 pi 2 1/36 – 3 2/36 – 4 3/36 – 5 4/36 – 6 5/36 – 7 6/ 8 5/ 9 4/ 10 3/ 11 2/ 12 1/ 5.83 Next we need a column giving the deviations of the possible values of X about its population mean. In the second sequence we saw that the population mean of X was 7. © Christopher Dougherty 1999–2006

xi pi xi – m (xi – m) (xi – m)2 pi 2 1/36 – 3 2/36 – 4 3/36 – 5 4/36 – 6 5/36 – 7 6/ 8 5/ 9 4/ 10 3/ 11 2/ 12 1/ 5.83 When X is equal to 2, the deviation is –5. © Christopher Dougherty 1999–2006

xi pi xi – m (xi – m) (xi – m)2 pi 2 1/36 – 3 2/36 – 4 3/36 – 5 4/36 – 6 5/36 – 7 6/ 8 5/ 9 4/ 10 3/ 11 2/ 12 1/ 5.83 © Christopher Dougherty 1999–2006

xi pi xi – m (xi – m) (xi – m)2 pi 2 1/36 – 3 2/36 – 4 3/36 – 5 4/36 – 6 5/36 – 7 6/ 8 5/ 9 4/ 10 3/ 11 2/ 12 1/ 5.83 Next we need a column giving the squared deviations. When X is equal to 2, the squared deviation is 25. © Christopher Dougherty 1999–2006

xi pi xi – m (xi – m) (xi – m)2 pi 2 1/36 – 3 2/36 – 4 3/36 – 5 4/36 – 6 5/36 – 7 6/ 8 5/ 9 4/ 10 3/ 11 2/ 12 1/ 5.83 Similarly for the other values of X. © Christopher Dougherty 1999–2006

xi pi xi – m (xi – m) (xi – m)2 pi 2 1/36 – 3 2/36 – 4 3/36 – 5 4/36 – 6 5/36 – 7 6/ 8 5/ 9 4/ 10 3/ 11 2/ 12 1/ 5.83 Now we start weighting the squared deviations by the corresponding probabilities. What do you think the weighted average will be? Have a guess. © Christopher Dougherty 1999–2006

xi pi xi – m (xi – m) (xi – m)2 pi 2 1/36 – 3 2/36 – 4 3/36 – 5 4/36 – 6 5/36 – 7 6/ 8 5/ 9 4/ 10 3/ 11 2/ 12 1/ 5.83 A reason for making an initial guess is that it may help you to identify an arithmetical error, if you make one. If the initial guess and the outcome are very different, that is a warning. © Christopher Dougherty 1999–2006

xi pi xi – m (xi – m) (xi – m)2 pi 2 1/36 – 3 2/36 – 4 3/36 – 5 4/36 – 6 5/36 – 7 6/ 8 5/ 9 4/ 10 3/ 11 2/ 12 1/ 5.83 We calculate all the weighted squared deviations. © Christopher Dougherty 1999–2006

xi pi xi – m (xi – m) (xi – m)2 pi 2 1/36 – 3 2/36 – 4 3/36 – 5 4/36 – 6 5/36 – 7 6/ 8 5/ 9 4/ 10 3/ 11 2/ 12 1/ 5.83 The sum is the population variance of X. © Christopher Dougherty 1999–2006

Population variance of X In equations, the population variance of X is usually written sX2, s being the Greek s. © Christopher Dougherty 1999–2006

Standard deviation of X The standard deviation of X is the square root of its population variance. Usually written sx, it is an alternative measure of dispersion. It has the same units as X. © Christopher Dougherty 1999–2006

Properties of continuous random variables

CONTINUOUS RANDOM VARIABLES
probability 2 36 __ 3 36 __ 4 36 __ 5 36 __ 6 36 __ 5 36 __ 4 36 __ 3 36 __ 2 36 __ 1 36 1 36 2 3 4 5 6 7 8 9 10 11 12 X A discrete random variable is one that can take only a finite set of values. The sum of the numbers when two dice are thrown is an example. Each value has associated with it a finite probability, which you can think of as a ‘packet’ of probability. The packets sum to unity because the variable must take one of the values. © Christopher Dougherty 1999–2006

height X 55 60 65 70 75 However, most random variables encountered in econometrics are continuous. They can take any one of an infinite set of values defined over a range (or possibly, ranges). As a simple example, take the temperature in a room. We will assume that it can be anywhere from 55 to 75 degrees Fahrenheit with equal probability within the range. © Christopher Dougherty 1999–2006

height X 55 60 65 70 75 In the case of a continuous random variable, the probability of it being equal to a given finite value (for example, temperature equal to ) is always infinitesimal. For this reason, you can only talk about the probability of a continuous random variable lying between two given values. The probability is represented graphically as an area. © Christopher Dougherty 1999–2006

height X 55 56 60 65 70 75 For example, you could measure the probability of the temperature being between 55 and 56, both measured exactly. Given that the temperature lies anywhere between 55 and 75 with equal probability, the probability of it lying between 55 and 56 must be 0.05. © Christopher Dougherty 1999–2006 7

height 0.05 X 55 56 57 60 65 70 75 Similarly, the probability of the temperature lying between 56 and 57 is 0.05. © Christopher Dougherty 1999–2006

height 0.05 X 55 57 58 60 65 70 75 And similarly for all the other one-degree intervals within the range. © Christopher Dougherty 1999–2006

height 0.05 55 57 58 60 65 70 75 X The probability per unit interval is 0.05 and accordingly the area of the rectangle representing the probability of the temperature lying in any given unit interval is 0.05. The probability per unit interval is called the probability density and it is equal to the height of the unit-interval rectangle. © Christopher Dougherty 1999–2006

f(X) = 0.05 for X 75 f(X) = 0 for X < 55 and X > 75 height 0.05 X 55 57 58 60 65 70 75 Mathematically, the probability density is written as a function of the variable, for example f(X). In this example, f(X) is 0.05 for 55 < X < 75 and it is zero elsewhere. © Christopher Dougherty 1999–2006

f(X) = 0.05 for X 75 f(X) = 0 for X < 55 and X > 75 probability density f(X) 0.05 X 55 57 58 60 65 70 75 The vertical axis is given the label probability density, rather than height. f(X) is known as the probability density function and is shown graphically in the diagram as the thick black line. © Christopher Dougherty 1999–2006

f(X) = 0.05 for X 75 f(X) = 0 for X < 55 and X > 75 probability density f(X) 0.05 X 55 60 65 70 75 Suppose that you wish to calculate the probability of the temperature lying between 65 and 70 degrees. To do this, you should calculate the area under the probability density function between 65 and 70. Typically you have to use the integral calculus to work out the area under a curve, but in this very simple example all you have to do is calculate the area of a rectangle. © Christopher Dougherty 1999–2006

f(X) = 0.05 for X 75 f(X) = 0 for X < 55 and X > 75 probability density f(X) 5 0.05 0.05 0.25 X 55 60 65 70 75 The height of the rectangle is 0.05 and its width is 5, so its area is 0.25. © Christopher Dougherty 1999–2006

Properties of discrete random variables: Expected value: Variance:
What is population, outcome, expectation, population mean?

Properties of continuous random variables:
Expected value: where f(X) is the probability density function Variance:

EXPECTED VALUE RULES 1. E(X + Y) = E(X) + E(Y) Example of E(red) = 3.5, hence, E(X=red+green) = 7 (much faster!) This sequence states the rules for manipulating expected values. First, the additive rule. The expected value of the sum of two random variables is the sum of their expected values. © Christopher Dougherty 1999–2006

EXPECTED VALUE RULES 1. E(X + Y) = E(X) + E(Y) Example generalization: E(W + X + Y + Z) = E(W) + E(X) + E(Y) + E(Z) This generalizes to any number of variables. An example is shown. © Christopher Dougherty 1999–2006

EXPECTED VALUE RULES 1. E(X + Y) = E(X) + E(Y) E(bX) = bE(X) The second rule is the multiplicative rule. The expected value of (a variable multiplied by a constant) is equal to the constant multiplied by the expected value of the variable. © Christopher Dougherty 1999–2006

EXPECTED VALUE RULES 1. E(X + Y) = E(X) + E(Y) E(bX) = bE(X) Example: E(3X) = 3E(X) For example, the expected value of 3X is three times the expected value of X. © Christopher Dougherty 1999–2006

EXPECTED VALUE RULES 1. E(X + Y) = E(X) + E(Y) E(bX) = bE(X) E(b) = b Finally, the expected value of a constant is just the constant. Of course this is obvious. © Christopher Dougherty 1999–2006

EXPECTED VALUE RULES 1. E(X + Y) = E(X) + E(Y) E(bX) = bE(X) E(b) = b Y = b1 + b2X E(Y) = E(b1 + b2X) Useful, bc. of Y = xb + e As an exercise, we will use the rules to simplify the expected value of an expression. Suppose that we are interested in the expected value of a variable Y, where Y = b1 + b2X. © Christopher Dougherty 1999–2006

EXPECTED VALUE RULES 1. E(X + Y) = E(X) + E(Y) E(bX) = bE(X) E(b) = b Y = b1 + b2X E(Y) = E(b1 + b2X) = E(b1) + E(b2X) We use the first rule to break up the expected value into its two components. © Christopher Dougherty 1999–2006

EXPECTED VALUE RULES 1. E(X + Y) = E(X) + E(Y) E(bX) = bE(X) E(b) = b Y = b1 + b2X E(Y) = E(b1 + b2X) = E(b1) + E(b2X) = b1 + b2E(X) Then we use the second rule to replace E(b2X) by b2E(X) and the third rule to simplify E(b1) to just b1. This is as far as we can go in this example. © Christopher Dougherty 1999–2006

ALTERNATIVE EXPRESSION FOR POPULATION VARIANCE
= E(X2) – m2 = E[(X – m)2] = E(X2 – 2mX + m2) = E(X2) + E(–2mX) + E(m2) = E(X2) – 2mE(X) + m2 = E(X2) – 2m2 + m2 = E(X2) – m2 Good training Exercise ! This sequence derives an alternative expression for the population variance of a random variable. It provides an opportunity for practising the use of the expected value rules. © Christopher Dougherty 1999–2006

= E(X2) – m2 = E[(X – m)2] = E(X2 – 2mX + m2) = E(X2) + E(–2mX) + E(m2) = E(X2) – 2mE(X) + m2 = E(X2) – 2m2 + m2 = E(X2) – m2 We start with the definition of the population variance of X. © Christopher Dougherty 1999–2006

= E(X2) – m2 = E[(X – m)2] = E(X2 – 2mX + m2) = E(X2) + E(–2mX) + E(m2) = E(X2) – 2mE(X) + m2 = E(X2) – 2m2 + m2 = E(X2) – m2 We expand the quadratic. © Christopher Dougherty 1999–2006

= E(X2) – m2 = E[(X – m)2] = E(X2 – 2mX + m2) = E(X2) + E(–2mX) + E(m2) = E(X2) – 2mE(X) + m2 = E(X2) – 2m2 + m2 = E(X2) – m2 Now the first expected value rule is used to decompose the expression into three separate expected values. © Christopher Dougherty 1999–2006

= E(X2) – m2 = E[(X – m)2] = E(X2 – 2mX + m2) = E(X2) + E(–2mX) + E(m2) = E(X2) – 2mE(X) + m2 = E(X2) – 2m2 + m2 = E(X2) – m2 The second expected value rule is used to simplify the middle term and the third rule is used to simplify the last one. © Christopher Dougherty 1999–2006

= E(X2) – m2 = E[(X – m)2] = E(X2 – 2mX + m2) = E(X2) + E(–2mX) + E(m2) = E(X2) – 2mE(X) + m2 = E(X2) – 2m2 + m2 = E(X2) – m2 The middle term is rewritten, using the fact that E(X) and mX are just different ways of writing the population mean of X. © Christopher Dougherty 1999–2006

= E(X2) – m2 = E[(X – m)2] = E(X2 – 2mX + m2) = E(X2) + E(–2mX) + E(m2) = E(X2) – 2mE(X) + m2 = E(X2) – 2m2 + m2 = E(X2) – m2 Hence we get the result. © Christopher Dougherty 1999–2006

THE FIXED AND RANDOM COMPONENTS OF A RANDOM VARIABLE
Population mean of X: E(X) =mX In observation i, the random component is given by ui = xi – mX Hence xi can be decomposed into fixed and random components: xi = mX + ui Note that the expected value of ui is zero: E(ui) = E(xi – mX) = E(xi) + E(–mX) =mX – mX = 0 In this short sequence we shall decompose a random variable X into its fixed and random components. Let the population mean of X be mX. © Christopher Dougherty 1999–2006

Population mean of X: E(X) =mX In observation i, the random component is given by ui = xi – mX Hence xi can be decomposed into fixed and random components: xi = mX + ui Note that the expected value of ui is zero: E(ui) = E(xi – mX) = E(xi) + E(–mX) =mX – mX = 0 IMPORTANT Property of error term! The actual value of X in any observation will in general be different from mX. We will call the difference ui, so ui = xi - mX. © Christopher Dougherty 1999–2006

Population mean of X: E(X) =mX In observation i, the random component is given by ui = xi – mX Hence xi can be decomposed into fixed and random components: xi = mX + ui Note that the expected value of ui is zero: E(ui) = E(xi – mX) = E(xi) + E(–mX) =mX – mX = 0 Re-arranging this equation, we can write xi as the sum of its fixed component, mX, which is the same for all observations, and its random component, ui. © Christopher Dougherty 1999–2006

Population mean of X: E(X) =mX In observation i, the random component is given by ui = xi – mX Hence xi can be decomposed into fixed and random components: xi = mX + ui Note that the expected value of ui is zero: E(ui) = E(xi – mX) = E(xi) + E(–mX) =mX – mX = 0 The expected value of the random component is zero. It does not systematically tend to increase or decrease X. It just makes it deviate from its population mean. © Christopher Dougherty 1999–2006

INDEPENDENCE OF TWO RANDOM VARIABLES
Two random variables X and Y are said to be independent if and only if E[f(X)g(Y)] = E[f(X)] E[g(Y)] for any functions f(X) and g(Y). If and only if means BOTH directions! (important in next slide exercise) Two variables X and Y are independent if and only if, given any functions f(X) and g(Y), the expected value of the product f(X)g(Y) is equal to the expected value of f(X) multiplied by the expected value of g(Y). © Christopher Dougherty 1999–2006

INDEPENDENCE OF TWO RANDOM VARIABLES
Two random variables X and Y are said to be independent if and only if E[f(X)g(Y)] = E[f(X)] E[g(Y)] for any functions f(X) and g(Y). Special case: if X and Y are independent, E(XY) = E(X) E(Y) Draw figure with example of two rv with expectation 0. if independent, then E(XY) =0, otherwise not. As a special case, the expected value of XY is equal to the expected value of X multiplied by the expected value of Y if and only if X and Y are independent. © Christopher Dougherty 1999–2006

COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION
The covariance of two random variables X and Y, often written sXY, is defined to be the expected value of the product of their deviations from their population means. © Christopher Dougherty 1999–2006

If two variables are independent, their covariance is zero. To show this, start by rewriting the covariance as the product of the expected values of its factors. We are allowed to do this because (and only because) X and Y are independent (see the earlier sequence on independence. Click back, and lets student do the exercise. © Christopher Dougherty 1999–2006

The expected values of both factors are zero because E(X) = mX and E(Y) = mY. E(mX) = mX and E(mY) = mY because mX and mY are constants. Thus the covariance is zero. © Christopher Dougherty 1999–2006

Covariance rules 1. If Y = V + W, cov(X, Y) = cov(X, V) + cov(X,W). 2. If Y = bZ, where b is a constant cov(X, Y) = bcov(X, Z) 3. If Y = b, where b is a constant, cov(X, Y) = 0 There are some rules that follow in a perfectly straightforward way from the definition of covariance, and since they are going to be used frequently in later chapters it is worthwhile establishing them immediately. First, the addition rule. © Christopher Dougherty 1999–2006

Covariance rules 1. If Y = V + W, cov(X, Y) = cov(X, V) + cov(X,W). 2. If Y = bZ, where b is a constant cov(X, Y) = bcov(X, Z) 3. If Y = b, where b is a constant, cov(X, Y) = 0 Next, the multiplication rule, for cases where a variable is multiplied by a constant. © Christopher Dougherty 1999–2006

Covariance rules 1. If Y = V + W, cov(X, Y) = cov(X, V) + cov(X,W). 2. If Y = bZ, where b is a constant cov(X, Y) = bcov(X, Z) 3. If Y = b, where b is a constant, cov(X, Y) = 0 Let student do the proof of rule 3. Finally, a primitive rule that is often useful. © Christopher Dougherty 1999–2006 7

Covariance rules 1. If Y = V + W, cov(X, Y) = cov(X, V) + cov(X,W). Proof: Since Y = V + W, mY = mV + mW The proofs of the rules are straightforward. In each case the proof starts with the definition of cov(X, Y). © Christopher Dougherty 1999–2006

Covariance rules 1. If Y = V + W, cov(X, Y) = cov(X, V) + cov(X,W). Proof: Since Y = V + W, mY = mV + mW We now substitute for Y and re-arrange. © Christopher Dougherty 1999–2006

Covariance rules 1. If Y = V + W, cov(X, Y) = cov(X, V) + cov(X,W). Proof: Since Y = V + W, mY = mV + mW This gives us the result. © Christopher Dougherty 1999–2006

Covariance rules 2. If Y = bZ, cov(X, Y) = bcov(X, Z). Proof: Since Y = bZ, mY = bmZ Next, the multiplication rule, for cases where a variable is multiplied by a constant. The Y terms have been replaced by the corresponding bZ terms. © Christopher Dougherty 1999–2006

Covariance rules 2. If Y = bZ, cov(X, Y) = bcov(X, Z). Proof: Since Y = bZ, mY = bmZ b is a common factor and can be taken out of the expression, giving us the result that we want. © Christopher Dougherty 1999–2006

Covariance rules 3. If Y = b, cov(X, Y) = 0. Proof: Since Y = b, mY = b The proof of the third rule is trivial. © Christopher Dougherty 1999–2006

1. If Y = V + W, var(Y) = var(V) + var(W) + 2cov(V, W). 2. If Y = bZ, where b is a constant, var(Y) = b2var(Z). 3. If Y = b, where b is a constant, var(Y) = 0. 4. If Y = V + b, where b is a constant, var(Y) = var(V). Important! Corresponding to the covariance rules, there are parallel rules for variances. First the addition rule. © Christopher Dougherty 1999–2006

1. If Y = V + W, var(Y) = var(V) + var(W) + 2cov(V, W). 2. If Y = bZ, where b is a constant, var(Y) = b2var(Z). 3. If Y = b, where b is a constant, var(Y) = 0. 4. If Y = V + b, where b is a constant, var(Y) = var(V). Next, the multiplication rule, for cases where a variable is multiplied by a constant. © Christopher Dougherty 1999–2006

1. If Y = V + W, var(Y) = var(V) + var(W) + 2cov(V, W). 2. If Y = bZ, where b is a constant, var(Y) = b2var(Z). 3. If Y = b, where b is a constant, var(Y) = 0. 4. If Y = V + b, where b is a constant, var(Y) = var(V). A third rule to cover the special case where Y is a constant. © Christopher Dougherty 1999–2006

1. If Y = V + W, var(Y) = var(V) + var(W) + 2cov(V, W). 2. If Y = bZ, where b is a constant, var(Y) = b2var(Z). 3. If Y = b, where b is a constant, var(Y) = 0. 4. If Y = V + b, where b is a constant, var(Y) = var(V). Finally, it is useful to state a fourth rule. It depends on the first three, but it is so often of practical value that it is worth keeping it in mind separately. © Christopher Dougherty 1999–2006

1. If Y = V + W, var(Y) = var(V) + var(W) + 2cov(V, W). Proof: var(Y) = cov(Y, Y) = cov([V + W], Y) = cov(V, Y) + cov(W, Y) = cov(V, [V + W]) + cov(W, [V + W]) = cov(V, V) + cov(V,W) + cov(W, V) + cov(W, W) = var(V) + 2cov(V, W) + var(W) The proofs of these rules can be derived from the results for covariances, noting that the variance of Y is equivalent to the covariance of Y with itself. © Christopher Dougherty 1999–2006

1. If Y = V + W, var(Y) = var(V) + var(W) + 2cov(V, W). Proof: var(Y) = cov(Y, Y) = cov([V + W], Y) = cov(V, Y) + cov(W, Y) = cov(V, [V + W]) + cov(W, [V + W]) = cov(V, V) + cov(V,W) + cov(W, V) + cov(W, W) = var(V) + 2cov(V, W) + var(W) We start by replacing one of the Y arguments by V + W. © Christopher Dougherty 1999–2006

1. If Y = V + W, var(Y) = var(V) + var(W) + 2cov(V, W). Proof: var(Y) = cov(Y, Y) = cov([V + W], Y) = cov(V, Y) + cov(W, Y) = cov(V, [V + W]) + cov(W, [V + W]) = cov(V, V) + cov(V,W) + cov(W, V) + cov(W, W) = var(V) + 2cov(V, W) + var(W) We then use covariance rule 1. © Christopher Dougherty 1999–2006

1. If Y = V + W, var(Y) = var(V) + var(W) + 2cov(V, W). Proof: var(Y) = cov(Y, Y) = cov([V + W], Y) = cov(V, Y) + cov(W, Y) = cov(V, [V + W]) + cov(W, [V + W]) = cov(V, V) + cov(V,W) + cov(W, V) + cov(W, W) = var(V) + 2cov(V, W) + var(W) We now substitute for the other Y argument in both terms and use covariance rule 1 a second time. © Christopher Dougherty 1999–2006

1. If Y = V + W, var(Y) = var(V) + var(W) + 2cov(V, W). Proof: var(Y) = cov(Y, Y) = cov([V + W], Y) = cov(V, Y) + cov(W, Y) = cov(V, [V + W]) + cov(W, [V + W]) = cov(V, V) + cov(V,W) + cov(W, V) + cov(W, W) = var(V) + 2cov(V, W) + var(W) This gives us the result. Note that the order of the arguments does not affect a covariance expression and hence cov(W, V) is the same as cov(V, W). © Christopher Dougherty 1999–2006

2. If Y = bZ, where b is a constant, var(Y) = b2var(Z). Proof: var(Y) = cov(Y, Y) = cov(bZ, bZ) = b2cov(Z, Z) = b2var(Z). The proof of the variance rule 2 is even more straightforward. We start by writing var(Y) as cov(Y, Y). We then substitute for both of the iYi arguments and take the b terms outside as common factors. © Christopher Dougherty 1999–2006

3. If Y = b, where b is a constant, var(Y) = 0. Proof: var(Y) = cov(b, b) = 0. The third rule is trivial. We make use of covariance rule 3. Obviously if a variable is constant, it has zero variance. © Christopher Dougherty 1999–2006

4. If Y = V + b, where b is a constant, var(Y) = var(V). Proof: var(Y) = var(V) + 2cov(V, b) + var(b) = var(V) The fourth variance rule starts by using the first. The second term on the right side is zero by covariance rule 3. The third is also zero by variance rule 3. © Christopher Dougherty 1999–2006

4. If Y = V + b, where b is a constant, var(Y) = var(V). Proof: var(Y) = var(V) + 2cov(V, b) + var(b) = var(V) V mV V + b mV + b The intuitive reason for this result is easy to understand. If you add a constant to a variable, you shift its entire distribution by that constant. The expected value of the squared deviation from the mean is unaffected. © Christopher Dougherty 1999–2006

cov(X, Y) is unsatisfactory as a measure of association between two variables X and Y because it depends on the units of measurement of X and Y. A better measure of association is the population correlation coefficient because it is dimensionless. The numerator possesses the units of measurement of both X and Y. The variances of X and Y in the denominator possess the squared units of measurement of those variables. © Christopher Dougherty 1999–2006

However, once the square root has been taken into account, the units of measurement are the same as those of the numerator, and the expression as a whole is unit free. If X and Y are independent, rXY will be equal to zero because sXY will be zero. If there is a positive association between them, sXY, and hence rXY, will be positive. If there is an exact positive linear relationship, rXY will assume its maximum value of 1. Similarly, if there is a negative relationship, rXY will be negative, with minimum value of –1. If X and Y are independent, rXY will be equal to zero because sXY will be zero. If there is a positive association between them, sXY, and hence rXY, will be positive. If there is an exact positive linear relationship, rXY will assume its maximum value of 1. Similarly, if there is a negative relationship, rXY will be negative, with minimum value of –1. © Christopher Dougherty 1999–2006

SAMPLING AND ESTIMATORS
Suppose we have a random variable X and we wish to estimate its unknown population mean mX. Planning (beforehand concepts) Our first step is to take a sample of n observations {X1, …, Xn}. Before we take the sample, while we are still at the planning stage, the Xi are random quantities. We know that they will be generated randomly from the distribution for X, but we do not know their values in advance. So now we are thinking about random variables on two levels: the random variable X, and its random sample components. © Christopher Dougherty 1999–2006

Suppose we have a random variable X and we wish to estimate its unknown population mean mX. Realization (afterwards concepts) Once we have taken the sample we will have a set of numbers {x1, …, xn}. This is called by statisticians a realization. The lower case is to emphasize that these are numbers, not variables. © Christopher Dougherty 1999–2006

Suppose we have a random variable X and we wish to estimate its unknown population mean mX. Planning (beforehand concepts) Back to the plan. Having generated a sample of n observations {X1, …, Xn}, we plan to use them with a mathematical formula to estimate the unknown population mean mX. This formula is known as an estimator. In this context, the standard (but not only) estimator is the sample mean An estimator is a random variable because it depends on the random quantities {X1, …, Xn}. © Christopher Dougherty 1999–2006

Suppose we have a random variable X and we wish to estimate its unknown population mean mX. Realization (afterwards concepts) The actual number that we obtain, given the realization {x1, …, xn}, is known as our estimate. © Christopher Dougherty 1999–2006

probability density function of X probability density function of X mX X mX X We will see why these distinctions are useful and important in a comparison of the distributions of X and X. We will start by showing that X has the same mean as X. © Christopher Dougherty 1999–2006

Proof that the Estimator of the mean X_bar is unbiased! We start by replacing X by its definition and then using expected value rule 2 to take 1/n out of the expression as a common factor. © Christopher Dougherty 1999–2006

Next we use expected value rule 1 to replace the expectation of a sum with a sum of expectations. © Christopher Dougherty 1999–2006

Now we come to the bit that requires thought. Start with X1. When we are still at the planning stage, X1 is a random variable and we do not know what its value will be. © Christopher Dougherty 1999–2006

All we know is that it will be generated randomly from the distribution of X. The expected value of X1, as a beforehand concept, will therefore be mX. The same is true for all the other sample components, thinking about them beforehand. Hence we write this line. © Christopher Dougherty 1999–2006

probability density function of X probability density function of X mX X mX X We will next demonstrate that the variance of the distribution of X is smaller than that of X, as depicted in the diagram. © Christopher Dougherty 1999–2006

We start by replacing X by its definition and then using variance rule 2 to take 1/n out of the expression as a common factor. © Christopher Dougherty 1999–2006

Next we use variance rule 1 to replace the variance of a sum with a sum of variances. In principle there are many covariance terms as well, but they are zero if we assume that the sample values are generated independently. © Christopher Dougherty 1999–2006

Now we come to the bit that requires thought. Start with X1. When we are still at the planning stage, we do not know what the value of X1 will be. © Christopher Dougherty 1999–2006

All we know is that it will be generated randomly from the distribution of X. The variance of X1, as a beforehand concept, will therefore be sX. The same is true for all the other sample components, thinking about them beforehand. Hence we write this line. 2 © Christopher Dougherty 1999–2006

Thus we have demonstrated that the variance of the sample mean is equal to the variance of X divided by n, a result with which you will be familiar from your statistics course. © Christopher Dougherty 1999–2006

Econ 482.

Similar presentations

Presentation on theme: "Econ 482."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Econ 482.

Similar presentations

Presentation on theme: "Econ 482."— Presentation transcript:

Similar presentations

About project

Feedback