Planning a Simulation Study

Slides:



Advertisements
Similar presentations
The Simple Regression Model
Advertisements

1 Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical content.
1 Analysis of Variance This technique is designed to test the null hypothesis that three or more group means are equal.
MAS1302: General Feedback on Assignment 1 Part A (non-computer questions) Generally well done by all those who made a serious effort.
Jan Shapes of distributions… “Statistics” for one quantitative variable… Mean and median Percentiles Standard deviations Transforming data… Rescale:
Evaluating Hypotheses
Useful Statistical Distributions for Econometrics Econometrics is usually concerned with the estimation of equations of the form: The normal distribution.
PSYC512: Research Methods PSYC512: Research Methods Lecture 8 Brian P. Dyre University of Idaho.
Today: Central Tendency & Dispersion
1 Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical content.
EC339: Lecture 6 Chapter 5: Interpreting OLS Regression.
STATISTICS: BASICS Aswath Damodaran 1. 2 The role of statistics Aswath Damodaran 2  When you are given lots of data, and especially when that data is.
STA Lecture 111 STA 291 Lecture 11 Describing Quantitative Data – Measures of Central Location Examples of mean and median –Review of Chapter 5.
(a.k.a: The statistical bare minimum I should take along from STAT 101)
1 G Lect 10a G Lecture 10a Revisited Example: Okazaki’s inferences from a survey Inferences on correlation Correlation: Power and effect.
Variability The goal for variability is to obtain a measure of how spread out the scores are in a distribution. A measure of variability usually accompanies.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Instrumental Variables: Problems Methods of Economic Investigation Lecture 16.
Psyc 235: Introduction to Statistics Lecture Format New Content/Conceptual Info Questions & Work through problems.
Central Tendency and Variability Chapter 4. Variability In reality – all of statistics can be summed into one statement: – Variability matters. – (and.
Analysis of Residuals Data = Fit + Residual. Residual means left over Vertical distance of Y i from the regression hyper-plane An error of “prediction”
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Introduction to Statistics Santosh Kumar Director (iCISA)
1 Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical content.
Summarizing Risk Analysis Results To quantify the risk of an output variable, 3 properties must be estimated: A measure of central tendency (e.g. µ ) A.
Sampling and estimation Petter Mostad
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 11: Models Marshall University Genomics Core Facility.
Chapter 4: Variability. Variability The goal for variability is to obtain a measure of how spread out the scores are in a distribution. A measure of variability.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
Non-parametric Approaches The Bootstrap. Non-parametric? Non-parametric or distribution-free tests have more lax and/or different assumptions Properties:
II. Descriptive Statistics (Zar, Chapters 1 - 4).
Central Tendency and Variability Chapter 4. Variability In reality – all of statistics can be summed into one statement: – Variability matters. – (and.
Sampling and Sampling Distributions. Sampling Distribution Basics Sample statistics (the mean and standard deviation are examples) vary from sample to.
Exploratory Data Analysis
Statistical Forecasting
The symmetry statistic
Power and p-values Benjamin Neale March 10th, 2016
Numerical descriptions of distributions
Analysis of variance ANOVA.
Some General Concepts of Point Estimation
Stat 31, Section 1, Last Time Sampling Distributions
Descriptive Statistics (Part 2)
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Review Data: {2, 5, 6, 8, 5, 6, 4, 3, 2, 1, 4, 9} What is F(5)? 2 4 6
Central Tendency and Variability
Univariate Descriptive Statistics
Summary descriptive statistics: means and standard deviations:
Essential Statistics (a.k.a: The statistical bare minimum I should take along from STAT 101)
Simple Linear Regression - Introduction
t distribution Suppose Z ~ N(0,1) independent of X ~ χ2(n). Then,
Diagnostics and Transformation for SLR
Statistical Methods For Engineers
Sampling Distribution Models
Week 6 Statistics for comparisons
Discrete Event Simulation - 4
Summary descriptive statistics: means and standard deviations:
1. Homework #2 (not on posted slides) 2. Inferential Statistics 3
Inferential Statistics
Parametric Methods Berlin Chen, 2005 References:
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Sampling Distributions (§ )
Diagnostics and Transformation for SLR
Inference about the Slope and Intercept
Applied Statistics and Probability for Engineers
Statistical inference for the slope and intercept in SLR
Presentation transcript:

Planning a Simulation Study Naomi Altman Department of Statistics

A small example Comparing 2 regression methods: 1) Take the range of X; divide in half, and use the line through in each half where ~ denotes median. 2) Use least squares regression.

A small example slope least squares my method 1.009 1.018

Where do we go from here?

Where do we go from here? What line should we use and does it matter? Some lines used by the class: 2+5x 3+7x (HW 6) 3x 10x slope=rnorm(1000,10,3) What sample size should we use and does it matter? Some choices by the class: 25 (HW 6) 30 1000

Where do we go from here? What error SD should we use and does it matter? Some choices by the class: 2, 3 What x-values should be used: 1:25 rnorm(25,0,10) regenerated for each iteration

Working from What you know Properties of LSE: If E(e)=0 then E(b1) = b1 If Var(e)=s2 then Var(b1) = s2 /Sxx If e~N(0, s2) then b1 has min MSE SSR=b12Sxx Signal to Noise Ratio ~ b12Sxx/s2 Properties of Slope of Medians: If XL and XU are the median X values, the values of the true line at those values is b0 + b1Xk, k=L,U. What is the median of the Y's? What is the variance of the median of the Y's? Could we show that this estimator is unbiased?

What is interesting Errors that are not normal: heavy tails skewed Errors that are not independent Different sample sizes?

Picking Simulation Conditions We also know that regression is defined as the mean of Y for each value of X. If Y does not have a mean (e.g. Cauchy) this might be interesting. We do not know anything about the median slope estimator. Statistical theory could be used to derive some properties. But there is one obvious property - since we split at the midpoint, it will not be resistant to leverage. X-distributions which are skewed or have a high leverage point might be interesting. It would be easy to show that the intercept does not matter.

What is interesting How do we compare: N(0,1) with t(1) no moments t(2) mean 0, variance ∞ t(d) mean 0, variance 1/(d-2) What about Bin(n,p) mean=np, var=np(1-p)

What is interesting How do we compare: N(0,1) with t(1) no moments t(2) mean 0, variance ∞ t(d) mean 0, variance 1/(d-2) What about Bin(n,p) mean=np, var=np(1-p) We probably want to separate issues regarding distribution from issues about variance - all distributions should be scaled to have the same variance. Errors need to have the same meaning - e.g. additive errors should have mean 0.

What is interesting How do we compare: N(0,1) with t(1) no moments t(2) mean 0, variance ∞ t(d) mean 0, variance 1/(d-2) What about Bin(n,p) mean=np, var=np(1-p) If the distributions we are comparing do not have moments, we need to be careful about using the variance and MSE of the estimator. Also, it would be better to match center and "variability" based on quantiles which exist for all distributions.

What should vary? Typically in a regression study, we vary the errors. But many regression methods that perform well for non-normal data are very sensitive to the choice of X.

Summary Some things to think about: What should the data "look like"? You might use "realistic" data by doing a descriptive analysis of a real data set, and them simulating data to "look like" the real data. You might want to explore some property like skewness or long-tails.

Skewness, etc There are lots of skewed distributions - which one should you pick? does it matter?

Sample Size How big does "n" have to be to get "asymptotic" results? How many "n" values do you need? How many samples do you need to simulate? How does computing time depend on n?

Computing Time How slow is "too slow"? Can you improve your computer code? Should you switch to a faster computer system and if so, how much of your time will that take?

Computing Should you write your functions "from scratch" or use some else's code that may not be 100% what you want? If you write your own code, how can you be sure it is right?

What do you "save"? If you save everything, you may slow down the simulation too much. If you save only what you need now, you may need to rerun the simulation. e.g. Many people computed both the slope and intercept, even though we needed only the slope.

How do you vary conditions? E.g. If you are looking at different variances, you could generate e.g. N(0,1) and N(0,3) and N(0,20) errors independently or (and usually better) generate N(0,1) and then multiply by 3 and by 20 to obtain the others. This reduces the variance in your comparisons.

Summarizing Results What plots and tables will you use to summarize the results? How will you assess if something "interesting" has happened? I generally try to plan these early in the study, so I do not have to redo the simulations.