An Introduction to R: Monte Carlo Simulation MWERA 2012 Emily A. Price, MS Marsha Lewis, MPA Dr. Gordon P. Brooks.

Slides:



Advertisements
Similar presentations
Jack Davis Andrew Henrey FROM N00B TO PRO. PURPOSE Create a simulator from scratch that: Generates data from a variety of distributions Makes a response.
Advertisements

SE503 Advanced Project Management Dr. Ahmed Sameh, Ph.D. Professor, CS & IS Project Uncertainty Management.
Materials for Lecture 11 Chapters 3 and 6 Chapter 16 Section 4.0 and 5.0 Lecture 11 Pseudo Random LHC.xls Lecture 11 Validation Tests.xls Next 4 slides.
Simulation Operations -- Prof. Juran.
Session 7a. Decision Models -- Prof. Juran2 Overview Monte Carlo Simulation –Basic concepts and history Excel Tricks –RAND(), IF, Boolean Crystal Ball.
A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT.
1 Statistical Inference H Plan: –Discuss statistical methods in simulations –Define concepts and terminology –Traditional approaches: u Hypothesis testing.
Random-Variate Generation. Need for Random-Variates We, usually, model uncertainty and unpredictability with statistical distributions Thereby, in order.
Simulation Modeling and Analysis
Overview of The Operations Research Modeling Approach.
1 Practicals, Methodology & Statistics II Laura McAvinue School of Psychology Trinity College Dublin.
Introduction to Simulation. What is simulation? A simulation is the imitation of the operation of a real-world system over time. It involves the generation.
Linear and generalised linear models
SIMULATION. Simulation Definition of Simulation Simulation Methodology Proposing a New Experiment Considerations When Using Computer Models Types of Simulations.
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Probability Review. Probability Probability = mathematic interpretation of uncertainty –Uncertainty plays a major role in engineering decision making.
1 D r a f t Life Cycle Assessment A product-oriented method for sustainability analysis UNEP LCA Training Kit Module k – Uncertainty in LCA.
Simulation.
Sampling error Error that occurs in data due to the errors inherent in sampling from a population –Population: the group of interest (e.g., all students.
Testing Hypotheses.
Xitao Fan, Ph.D. Chair Professor & Dean Faculty of Education University of Macau Designing Monte Carlo Simulation Studies.
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
Component Reliability Analysis
Lecture 7: Simulations.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Statistics for Engineer Week II and Week III: Random Variables and Probability Distribution.
Moment Generating Functions
Monte Carlo Simulation CWR 6536 Stochastic Subsurface Hydrology.
Montecarlo Simulation LAB NOV ECON Montecarlo Simulations Monte Carlo simulation is a method of analysis based on artificially recreating.
Chapter 14 Monte Carlo Simulation Introduction Find several parameters Parameter follow the specific probability distribution Generate parameter.
CS433 Modeling and Simulation Lecture 16 Output Analysis Large-Sample Estimation Theory Dr. Anis Koubâa 30 May 2009 Al-Imam Mohammad Ibn Saud University.
1 Statistical Distribution Fitting Dr. Jason Merrick.
Examples of Computing Uses for Statisticians Data management : data entry, data extraction, data cleaning, data storage, data manipulation, data distribution.
1 SMU EMIS 7364 NTU TO-570-N Inferences About Process Quality Updated: 2/3/04 Statistical Quality Control Dr. Jerrell T. Stracener, SAE Fellow.
Simulation is the process of studying the behavior of a real system by using a model that replicates the behavior of the system under different scenarios.
Introduction to Programming in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. 1.
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
Distributions, Iteration, Simulation Why R will rock your world (if it hasn’t already)
Simulations and programming in R. Why to simulate and program in R at all? ADVANTAGES –All R facilities can be used in the simulations Random number generators.
Simulation is the process of studying the behavior of a real system by using a model that replicates the system under different scenarios. A simulation.
McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All rights reserved.
Learning Simio Chapter 10 Analyzing Input Data
Course Outline Presentation Reference Course Outline for MTS-202 (Statistical Inference) Fall-2009 Dated: 27 th August 2009 Course Supervisor(s): Mr. Ahmed.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
How Are Computers Programmed? CPS120: Introduction to Computer Science Lecture 5.
Tutorial I: Missing Value Analysis
Introduction Paul J. Hurtado Mathematical Biosciences Institute (MBI), The Ohio State University 19 May 2014 (Monday a.m.)
Linear Regression Linear Regression. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Purpose Understand Linear Regression. Use R functions.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © 2005 Dr. John Lipp.
Gil McVean, Department of Statistics Thursday February 12 th 2009 Monte Carlo simulation.
Software. Introduction n A computer can’t do anything without a program of instructions. n A program is a set of instructions a computer carries out.
Bootstrapping James G. Anderson, Ph.D. Purdue University.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
Intro CS – Probability and Random Numbers Lesson Plan 6a.
1 Life Cycle Assessment A product-oriented method for sustainability analysis UNEP LCA Training Kit Module k – Uncertainty in LCA.
Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,
Estimating standard error using bootstrap
Modeling and Simulation CS 313
OPERATING SYSTEMS CS 3502 Fall 2017
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Modeling and Simulation CS 313
Al-Imam Mohammad Ibn Saud University Large-Sample Estimation Theory
Monte Carlo Simulation Managing uncertainty in complex environments.
Professor S K Dubey,VSM Amity School of Business
Simulation: Sensitivity, Bootstrap, and Power
Introduction to Matlab
1/2555 สมศักดิ์ ศิวดำรงพงศ์
Introductory Statistics
Presentation transcript:

An Introduction to R: Monte Carlo Simulation MWERA 2012 Emily A. Price, MS Marsha Lewis, MPA Dr. Gordon P. Brooks

Objectives and/or Goals Three main parts –Data generation in R –Basic Monte Carlo programming (e.g. loops) –Running simulations (e.g., investigating Type I errors)

Why Use Monte Carlo Methods? According to Mooney (1997) Monte Carlo simulations are useful to –Make inferences when weak statistical theory exists for an estimator –Test null hypotheses under a variety of plausible conditions –Assess the quality of an inference method –Assess the robustness of parametric inference to assumption violations –Compare estimator’s properties

What are Monte Carlo Methods? Experiments composed of random numbers to evaluate mathematical expressions (Gentle, 2003) Empirically determine the sampling distribution of a test statistic Computer-based methods for approximating values and properties of random variables (Braun & Murdoch, 2007)

Logic of Monte Carlo Mooney (1997) presents five steps 1.Specify the pseudo-population in symbolic terms in such a way that it can be used to generate samples. That is, writing code to generate data in a specific manner. 2.Sample from the pseudo-population in ways that reflect the topic of interest 3.Calculate θ in a pseudo-sample and store it in a vector 4.Repeat steps 2 and 3 t times where t is the number of trials 5.Construct a relative frequency distribution of resulting values which is a Monte Carlo estimate of the sampling distribution of under the conditions specified by the pseudo-population and the sampling procedures

Practical Issues/ Considerations What software to use? How much time to run the simulation? Reproducibility of results Adequacy of random number generator

Why use R? It’s FREE It is a flexible language that can be controlled by the user It uses a vector based approach Depending on the package, there are built in commands which the user can access and minimize the amount of programming required for MC simulation –Make sure to load the require packages at the beginning of the session R community has a plethora of information: help websites, listservs, textbooks, blogs –Manuals for R available at

Part 1: Data Generation RNG and setting seed –Purpose of the seed is to recovery results Initialize all parameters of interest Loops Print results Access output

Generating a Single Random Variable R has four parts: CDF, PDF, Quantile function and simulation procedure –dnorm, pnorm, qnorm, rnorm respectively rnorm(x,mean=0,sd=1) runif(20,min=2,max=5) Distributions: normal, uniform, poisson, beta, gamma, chisquare, weibull, exponential

Try it, you’ll like it! rnorm(x,mean=0,sd=1) Generate a normal distribution of 50 values with a mean of 50 and sd of 10 x <- sample(1:2,20,TRUE,prob=c(1/2,1/2)) Generate data that mimics rolling a die

Generating Correlated Data X~Normal (20, 5), Y~Normal (40, 10), corr(X,Y) =0.6 –4 inputs Sample size, mean, variance-covariance matrix, and method –3 methods of data generation Eigenvalue (default), Singular Value, and Cholesky

Try it, you’ll like it! rmvnorm(n, mean, sigma, method) Generate data for 3 variables such that X --Normal (20, 5), Y-- Normal (40, 10), Z -- Normal (60,15) and Corr(X,Y) =0.6, Corr(X,Z) = 0.7, Corr(Y,Z)=0.8

Part 2: Basic MC Programming Four steps (Braun & Murdoch, 2007) 1.Understand the problem 2.Work out a general idea how to solve it Flow charts 3.Translate your general idea into a detailed implementation Turn the flowchart into code 4.Check: Does it work?

Programming Commands* Loops –for, if, ifelse, while Statements –repeat, break, next * We can’t cover all programming aspects but wanted to mention other commands

Functions They are “self-contained units with a well-defined purpose” (Braun & Murdoch, 2007, p. 59) –Take an input, do some calculations, and produce an output In R, functions are objects and can be manipulated like other more common objects such as vectors, matrices, and lists. –R provides source code for its own functions R allows you to write your own functions

Part 3: Running Simulations Trimmed mean sampling distribution Replicating a published Monte Carlo study in R. –Zimmerman, D. W. (2004). A note on preliminary tests of equality of variances. British Journal of Mathematical and Statistical Psychology 57, 173–181.

Questions Thank you for your time

References Braun, W. J., & Murdoch, D. J. (2007). A first course in statistical programming with R. New York: Cambridge University. Gentle, J. E. (2003). Random number generation and Monte Carlo methods (2nd ed.). New York: Springer-Verlag. Mooney, C. Z. (1997). Monte Carlo simulation (Sage University Paper series on Quantitative Applications in the Social Sciences, series no ). Thousand Oaks, CA: Sage. Zimmerman, D. W. (2004). A note on preliminary tests of equality of variances. British Journal of Mathematical and Statistical Psychology 57, 173–181.

Our code