Presentation is loading. Please wait.

Presentation is loading. Please wait.

TA: Natalia Shestakova October, 2007 Labor Economics Exercise session # 1 Artificial Data Generation.

Similar presentations


Presentation on theme: "TA: Natalia Shestakova October, 2007 Labor Economics Exercise session # 1 Artificial Data Generation."ā€” Presentation transcript:

1 TA: Natalia Shestakova October, 2007 Labor Economics Exercise session # 1 Artificial Data Generation

2 Overview Generating random variables Graphing Throwing seeds Generating random dummy variables from sample Drawing from multivariate distributions Loops and distribution of estimated coefficients

3 Random-number functions: uniform() returns uniformly distributed pseudorandom numbers on the interval [0,1). uniform() takes no arguments, but the parentheses must be typed. invnormal(uniform()) returns normally distributed random numbers with mean 0 and standard deviation 1. Reminder: Discrete uniform distribution: all values of a finite set of possible values are equally probable, continuous: all intervals of the same length are equally probable Normal distribution: family of continuous probability distributions. Each member of the family may be defined by two parameters, location and scale: the mean ("average") and standard deviation ("variability"), respectively Generating random variables-1

4 Examples: 500 draws from the uniform distribution on [0,1] set obs 500 gen x1 = uniform() 500 draws from the standard normal distribution, mean 0, variance 1 gen x2 = invnorm(uniform()) 500 draws from the distribution N(1,2) gen x3 = 1 + 4*invnorm(uniform()) 500 draws from the uniform distribution between 3 and 12 gen x4 = 3 + 9*uniform() 500 observations of the variable that is a linear combination of other variables gen z = 4 - 3*x4 + 8*x2 Generating random variables-2

5 Graphing

6 Throwing seeds => Allows you to generate a particular sample anytime again: set obs 500 set seed 2 gen z1 = invnorm(uniform()) set seed 2 gen z2 = invnorm(uniform()) set seed 19840607 gen z3 = invnorm(uniform()) dotplot z1 z2 z3

7 Task: generate a variable that characterizes whether an individual smokes (smoke=1) or does not (smoke=0) smoke. (a) for period 1, assume that (s)he smokes with probability 30%, (b) for each of the following 30 periods, there is a 65% chance that a smoker keeps smoking and a 5% chance that a non-smoker starts smoking Solution: (a)Note, that a uniformly distributed at [0,1) variable is less than 0.3 with 30% chance. Then: gen smoke = uniform()<.3 (b)first, for every individual, give her/him an ID and create observations for 30 years (they will be the same); then, step by step, update probabilities to smoke in every year for every ID: by pid: replace smoke=uniform() 1 Generating random dummy variables from sample

8 Task: generate a number of variables that are correlated with each other (have multivariate distribution) Solution: (a) drawnorm: draws a sample from a multivariate normal distribution with desired means and covariance matrix drawnorm x y, n(1000) means(m) corr(C) (b) corr2data: creates an artificial dataset with a specified correlation structure (is not a sample from an underlying population with the summary statistics specified) corr2data x y, n(1000) means(m) corr(C) Note: matrices m and C can be specified using mat Drawing from multivariate distributions

9 Why to use loops? -> low probability that one randomly drawn sample coincides with the real one -> drawing more samples for estimating a coefficient of interest and taking the average of these coefficients makes the estimate closer to the real one How to use loops? gen b1=0 /* all observations of b1 are assigned 0 value local i=1 /* i is a counter variable in the following loop set more off /* useful command so we do not have to hit enter every time the regression runs while `i'<=500 { /* command to start a loop of 500 repeatitions drop _all /* drop all specified observations so we can randomly generate them again /*generate random variables /*regression scalar d =_b[x1]/* store the output of regression into a variable replace b1 = scalar(d) if _n==`iā€˜ /* put the estimated coefficient in the ith regression into ith observation of variable b1 local i=`i'+1 /* adds 1 to the counter } /*end of the loop Loops and distribution of estimated coefficients

10 Any questions???


Download ppt "TA: Natalia Shestakova October, 2007 Labor Economics Exercise session # 1 Artificial Data Generation."

Similar presentations


Ads by Google