TA: Natalia Shestakova October, 2007 Labor Economics Exercise session # 1 Artificial Data Generation.

Slides:



Advertisements
Similar presentations
Managerial Economics in a Global Economy
Advertisements

Qualitative predictor variables
MOMENT GENERATING FUNCTION AND STATISTICAL DISTRIBUTIONS
Copula Regression By Rahul A. Parsa Drake University &
Previous Lecture: Distributions. Introduction to Biostatistics and Bioinformatics Estimation I This Lecture By Judy Zhong Assistant Professor Division.
Exercise session # 1 Random data generation Jan Matuska November, 2006 Labor Economics.
Normal Distributions: Finding Probabilities 1 Section 5.2.
Normal Distributions: Finding Probabilities
Uncertainty in fall time surrogate Prediction variance vs. data sensitivity – Non-uniform noise – Example Uncertainty in fall time data Bootstrapping.
Econ 338/envr 305 clicker questions
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Sampling: Final and Initial Sample Size Determination
Covariance and Correlation: Estimator/Sample Statistic: Population Parameter: Covariance and correlation measure linear association between two variables,
Simulation Operations -- Prof. Juran.
Simple Regression. Major Questions Given an economic model involving a relationship between two economic variables, how do we go about specifying the.
Sampling Distributions (§ )
Correlation and regression
Objectives (BPS chapter 24)
Observers and Kalman Filters
Chapter 10: Sampling and Sampling Distributions
1 Def: Let and be random variables of the discrete type with the joint p.m.f. on the space S. (1) is called the mean of (2) is called the variance of (3)
GRA 6020 Multivariate Statistics The regression model OLS Regression Ulf H. Olsson Professor of Statistics.
Probability Distributions
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Standard error of estimate & Confidence interval.
Review of Probability.
Random Variables A random variable A variable (usually x ) that has a single numerical value (determined by chance) for each outcome of an experiment A.
Week71 Discrete Random Variables A random variable (r.v.) assigns a numerical value to the outcomes in the sample space of a random phenomenon. A discrete.
Sampling The sampling errors are: for sample mean
Today’s lesson Confidence intervals for the expected value of a random variable. Determining the sample size needed to have a specified probability of.
Montecarlo Simulation LAB NOV ECON Montecarlo Simulations Monte Carlo simulation is a method of analysis based on artificially recreating.
Statistical Experiment A statistical experiment or observation is any process by which an measurements are obtained.
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
L Berkley Davis Copyright 2009 MER301: Engineering Reliability Lecture 9 1 MER301:Engineering Reliability LECTURE 9: Chapter 4: Decision Making for a Single.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 5-1 Business Statistics: A Decision-Making Approach 8 th Edition Chapter 5 Discrete.
Psych 230 Psychological Measurement and Statistics
Confidence Intervals (Dr. Monticino). Assignment Sheet  Read Chapter 21  Assignment # 14 (Due Monday May 2 nd )  Chapter 21 Exercise Set A: 1,2,3,7.
Statistics. A two-dimensional random variable with a uniform distribution.
Flat clustering approaches
AP Statistics, Section 7.2, Part 1 2  The Michigan Daily Game you pick a 3 digit number and win $500 if your number matches the number drawn. AP Statistics,
SESSION 37 & 38 Last Update 5 th May 2011 Continuous Probability Distributions.
Section 7.2 P1 Means and Variances of Random Variables AP Statistics.
Random Variables A random variable is a rule that assigns exactly one value to each point in a sample space for an experiment. A random variable can be.
Chapter 9: One- and Two-Sample Estimation Problems: 9.1 Introduction: · Suppose we have a population with some unknown parameter(s). Example: Normal( ,
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson0-1 Supplement 2: Comparing the two estimators of population variance by simulations.
Chap 5-1 Chapter 5 Discrete Random Variables and Probability Distributions Statistics for Business and Economics 6 th Edition.
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
Statistical Inference
Continuous Probability Distributions
Multiple Imputation using SOLAS for Missing Data Analysis
CH 5: Multivariate Methods
Chapter 7: Sampling Distributions
Quantitative Data Analysis P6 M4
AP Statistics: Chapter 7
Introduction to Econometrics
Simple Linear Regression
Sampling Distribution of a Sample Proportion
CHAPTER 15 SUMMARY Chapter Specifics
Summary of Tests Confidence Limits
Lecture 7 Sampling and Sampling Distributions
数据的矩阵描述.
Section Means and Variances of Random Variables
Section Means and Variances of Random Variables
Chapter 9: One- and Two-Sample Estimation Problems:
Multivariate Methods Berlin Chen
Continuous Random Variables 2
Multivariate Methods Berlin Chen, 2005 References:
Sampling Distribution of a Sample Proportion
Sampling Distributions (§ )
Random Variables A random variable is a rule that assigns exactly one value to each point in a sample space for an experiment. A random variable can be.
Presentation transcript:

TA: Natalia Shestakova October, 2007 Labor Economics Exercise session # 1 Artificial Data Generation

Overview Generating random variables Graphing Throwing seeds Generating random dummy variables from sample Drawing from multivariate distributions Loops and distribution of estimated coefficients

Random-number functions: uniform() returns uniformly distributed pseudorandom numbers on the interval [0,1). uniform() takes no arguments, but the parentheses must be typed. invnormal(uniform()) returns normally distributed random numbers with mean 0 and standard deviation 1. Reminder: Discrete uniform distribution: all values of a finite set of possible values are equally probable, continuous: all intervals of the same length are equally probable Normal distribution: family of continuous probability distributions. Each member of the family may be defined by two parameters, location and scale: the mean ("average") and standard deviation ("variability"), respectively Generating random variables-1

Examples: 500 draws from the uniform distribution on [0,1] set obs 500 gen x1 = uniform() 500 draws from the standard normal distribution, mean 0, variance 1 gen x2 = invnorm(uniform()) 500 draws from the distribution N(1,2) gen x3 = 1 + 4*invnorm(uniform()) 500 draws from the uniform distribution between 3 and 12 gen x4 = 3 + 9*uniform() 500 observations of the variable that is a linear combination of other variables gen z = 4 - 3*x4 + 8*x2 Generating random variables-2

Graphing

Throwing seeds => Allows you to generate a particular sample anytime again: set obs 500 set seed 2 gen z1 = invnorm(uniform()) set seed 2 gen z2 = invnorm(uniform()) set seed gen z3 = invnorm(uniform()) dotplot z1 z2 z3

Task: generate a variable that characterizes whether an individual smokes (smoke=1) or does not (smoke=0) smoke. (a) for period 1, assume that (s)he smokes with probability 30%, (b) for each of the following 30 periods, there is a 65% chance that a smoker keeps smoking and a 5% chance that a non-smoker starts smoking Solution: (a)Note, that a uniformly distributed at [0,1) variable is less than 0.3 with 30% chance. Then: gen smoke = uniform()<.3 (b)first, for every individual, give her/him an ID and create observations for 30 years (they will be the same); then, step by step, update probabilities to smoke in every year for every ID: by pid: replace smoke=uniform() 1 Generating random dummy variables from sample

Task: generate a number of variables that are correlated with each other (have multivariate distribution) Solution: (a) drawnorm: draws a sample from a multivariate normal distribution with desired means and covariance matrix drawnorm x y, n(1000) means(m) corr(C) (b) corr2data: creates an artificial dataset with a specified correlation structure (is not a sample from an underlying population with the summary statistics specified) corr2data x y, n(1000) means(m) corr(C) Note: matrices m and C can be specified using mat Drawing from multivariate distributions

Why to use loops? -> low probability that one randomly drawn sample coincides with the real one -> drawing more samples for estimating a coefficient of interest and taking the average of these coefficients makes the estimate closer to the real one How to use loops? gen b1=0 /* all observations of b1 are assigned 0 value local i=1 /* i is a counter variable in the following loop set more off /* useful command so we do not have to hit enter every time the regression runs while `i'<=500 { /* command to start a loop of 500 repeatitions drop _all /* drop all specified observations so we can randomly generate them again /*generate random variables /*regression scalar d =_b[x1]/* store the output of regression into a variable replace b1 = scalar(d) if _n==`i‘ /* put the estimated coefficient in the ith regression into ith observation of variable b1 local i=`i'+1 /* adds 1 to the counter } /*end of the loop Loops and distribution of estimated coefficients

Any questions???