Distributions, Iteration, Simulation Why R will rock your world (if it hasn’t already)

Slides:



Advertisements
Similar presentations
Normal Distribution The shaded area is the probability of z > 1.
Advertisements

Fitting Bivariate Models October 21, 2014 Elizabeth Prom-Wormley & Hermine Maes
Jack Davis Andrew Henrey FROM N00B TO PRO. PURPOSE Create a simulator from scratch that: Generates data from a variety of distributions Makes a response.
Exercise session # 1 Random data generation Jan Matuska November, 2006 Labor Economics.
How do we generate the statistics of a function of a random variable? – Why is the method called “Monte Carlo?” How do we use the uniform random number.
An Introduction to R: Monte Carlo Simulation MWERA 2012 Emily A. Price, MS Marsha Lewis, MPA Dr. Gordon P. Brooks.
Sampling Distributions (§ )
Structural Equation Modeling
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Simulation Modeling and Analysis
Discrete Event Simulation How to generate RV according to a specified distribution? geometric Poisson etc. Example of a DEVS: repair problem.
Linear and generalised linear models
Correlations and Copulas Chapter 10 Risk Management and Financial Institutions 2e, Chapter 10, Copyright © John C. Hull
Raw data analysis S. Purcell & M. C. Neale Twin Workshop, IBG Colorado, March 2002.
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Maximum likelihood (ML)
Normal distribution.
Discriminant Analysis Testing latent variables as predictors of groups.
4-1 Continuous Random Variables 4-2 Probability Distributions and Probability Density Functions Figure 4-1 Density function of a loading on a long,
Structural Equation Modeling Continued: Lecture 2 Psy 524 Ainsworth.
Probability Distributions 2014/04/07 Maiko Narahara
Moment Generating Functions 1/33. Contents Review of Continuous Distribution Functions 2/33.
Fundamental Graphics in R Prof. Ke-Sheng Cheng Dept. of Bioenvironmental Systems Eng. National Taiwan University.
Short Resume of Statistical Terms Fall 2013 By Yaohang Li, Ph.D.
Model Inference and Averaging
Statistics for Engineer Week II and Week III: Random Variables and Probability Distribution.
Moment Generating Functions
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
1 Statistical Distribution Fitting Dr. Jason Merrick.
Generalized Linear Models All the regression models treated so far have common structure. This structure can be split up into two parts: The random part:
Lecture 19 Nov10, 2010 Discrete event simulation (Ross) discrete and continuous distributions computationally generating random variable following various.
Univariate modeling Sarah Medland. Starting at the beginning… Data preparation – The algebra style used in Mx expects 1 line per case/family – (Almost)
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Introduction to Programming in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone
4-1 Continuous Random Variables 4-2 Probability Distributions and Probability Density Functions Figure 4-1 Density function of a loading on a long,
Simulations and programming in R. Why to simulate and program in R at all? ADVANTAGES –All R facilities can be used in the simulations Random number generators.
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
IE 300, Fall 2012 Richard Sowers IESE. 8/30/2012 Goals: Rules of Probability Counting Equally likely Some examples.
Computing for Research I Spring 2013
SYSTEMS Identification Ali Karimpour Assistant Professor Ferdowsi University of Mashhad Reference: “System Identification Theory For The User” Lennart.
Statistics……revisited
Mx modeling of methylation data: twin correlations [means, SD, correlation] ACE / ADE latent factor model regression [sex and age] genetic association.
Introduction Paul J. Hurtado Mathematical Biosciences Institute (MBI), The Ohio State University 19 May 2014 (Monday a.m.)
Linear Regression Linear Regression. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Purpose Understand Linear Regression. Use R functions.
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
Jump to first page Bayesian Approach FOR MIXED MODEL Bioep740 Final Paper Presentation By Qiang Ling.
Hierarchical models. Hierarchical with respect to Response being modeled – Outliers – Zeros Parameters in the model – Trends (Us) – Interactions (Bs)
Stats Lab #3.
Statistical Inference
4-1 Continuous Random Variables 4-2 Probability Distributions and Probability Density Functions Figure 4-1 Density function of a loading on a long,
Chapter 14 Fitting Probability Distributions
Power and p-values Benjamin Neale March 10th, 2016
Statistical Modelling
Let’s continue to do a Bayesian analysis
Probability for Machine Learning
Model Inference and Averaging
Classification of unlabeled data:
Analyzing Redistribution Matrix with Wavelet
Fundamental Graphics in R
数据的矩阵描述.
‘If’ statements, relational operators, and logical operators
R course 6th lecture.
Normal Distribution The Bell Curve.
BOULDER WORKSHOP STATISTICS REVIEWED: LIKELIHOOD MODELS
Power Calculation Practical
Power Calculation Practical
1/2555 สมศักดิ์ ศิวดำรงพงศ์
Presentation transcript:

Distributions, Iteration, Simulation Why R will rock your world (if it hasn’t already)

Simulation Sampling Calculation Iteration Data storage Summation

General structure of common functions on distributions There are many distributions in R (e.g. norm is Gaussian) For every statistical distribution in R, there are 4 related functions d: density p: probability distribution q: quantile r: random number generation e.g. rnorm(10) draws ten values randomly from a standard normal distribution Like most R functions, one can alter the defaults by specifying additional parameters. What parameters can be altered varies by distrubution

> dnorm(0) <-default mean=0, sd=1 [1] input output > pnorm(2) <-default mean=0, sd=1 [1] input output

> qnorm(.40) [1] input output > rnorm(100, mean=0,sd=1) [1] [8] [15] [22] [29] [36] [43] [50] [57] [64] [71] [78] [85] [92] [99]

Simulating random variables in R rnorm(): uni normal mvrnorm(): multi normal rbinom(): binomial runif(): uniform rpois(): poisson rchisq():  2 rnbinom(): neg. binomial rlogis(): logistic rbeta(): beta rgamma(): gamma rgeom(): geometric rlnorm(): log normal rweibull(): Weibull rt(): t rf(): F …

loops for() and while() are the most commonly used others (e.g. repeat) are less useful for(i in [vector]){ do something } while([some condition is true]){ do something else }

for loop { for (i in 10:1){ + cat("Matt is here for ", i, " more days \n") + } cat(“Matt is gone”) } Matt is here for 10 more days Matt is here for 9 more days Matt is here for 8 more days Matt is here for 7 more days Matt is here for 6 more days Matt is here for 5 more days Matt is here for 4 more days Matt is here for 3 more days Matt is here for 2 more days Matt is here for 1 more days Matt is gone

The Elementary Conditional if if([condition is true]){ do this } if([condition is true]){ do this }else{ do this instead } ifelse([condition is true],do this,do this instead)

Conditional operators == : is equal to (NOT THE SAME as ‘=‘) ! : not != : not equal inequalities (>, =,<=) & : and by element (if a vector) | : or by element && : and total object || : y total object xor(x,y): x or y true by not both

Other useful functions: sample(): sample elements from a vector, either with or without replacement and weights subset(): extract subset of a data frame based on logical criteria

Time to get your hands dirty Open up the R file sim.R Different scripts separated by lines of “#” Two approaches to the CLT Exercise #1 Simple ACE Simulator Simple Factor model/IRT simulator Exercise #2 “solutions” to exercises at the bottom

Exercise #1: Create a simple genetic drift simulator for a biallelic locus The frequency of an allele at t+1 is dependent on its frequency at t. The binomial distribution might come in handy There are many ways to model this phenomenon, more or less close to reality (source: wikipedia) N=100 N=1000

Simulating Path Diagrams Simulating data based on a path model is usually fairly easy Latent variables are sources of variance, and usually standard normal path coefficients represent strength of effect from causal variable to effect variable. Drawing simulations using path diagrams (or something similar) can help formalize the structure of your simulation

ACE Model 2 measured variables 6 sources of variance 3 levels of correlation 1,.5, or 0 Standard normal latent variables Effect of latent variable on phenotype = factor score * path coefficient Phenotype #1 Phenotype #2

mvrnorm() part of MASS library Allows generation of n-dimensional matrices of multivariate normal distributions Also useful for simulating data for unrelated random normal variables  efficient code and less work

mvrnorm() == simplicity mvrnorm(n,mu,Sigma,…) n = number of samples mu = vector of means for some number of variables Sigma = covariance matrix for these variables Example: mu<- rep(0,3) Sigma<-matrix(c(1,.5,.25,.5,1,.25,.25,.25,1),3,3) ex<-mvrnorm(n=500,mu=mu,Sigma=Sigma) pairs(ex)

output

Back to R script

Factor Models Similar principles In a simple model, each observed variable has 2 sources of variance (factor + “error”) Psychometric models often require binary/ordinal data

Back to R script

Exercise #2: Generate a simulator for measuring the accuracy of Falconer estimation in predicting variance components MZ DZ

Hints: Simulation: The EEAsim6.R script contains a twin simulator that will speed things up. source(“EEAsim6.R”) will load it. function is twinsim() “zyg” variable encodes zygosity; MZ=0, DZ=1 Variables required here are numsubs, a2, c2,and e2 Calculation: cor(x,y): calculate the Pearson correlation between x and y Falconer Estimates a 2 = 2*(cor MZ -cor DZ ) c 2 = (2*cor DZ )-cor MZ ) e 2 =1-a 2 -c 2 Iteration: a “for” loop will work just fine data storage: perhaps a nsim x nestimates matrix? visualization: prior graph used plot and boxplot, do whatever you want