Programming and Simulations Frank Witmer 6 January 2011.

Slides:



Advertisements
Similar presentations
Evaluating the Effects of Business Register Updates on Monthly Survey Estimates Daniel Lewis.
Advertisements

Spatial Autocorrelation and Spatial Regression
Chapter 13 – Boot Strap Method. Boot Strapping It is a computer simulation to generate random numbers from a sample. In Excel, it can simulate 5000 different.
Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.
Confidence Intervals Underlying model: Unknown parameter We know how to calculate point estimates E.g. regression analysis But different data would change.
6. More on the For-Loop Using the Count Variable Developing For-Loop Solutions.
Analysis of Simulation Input.. Simulation Machine n Simulation can be considered as an Engine with input and output as follows: Simulation Engine Input.
8-1 Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall Chapter 8 Confidence Interval Estimation Statistics for Managers using Microsoft.
Copyright ©2011 Pearson Education 8-1 Chapter 8 Confidence Interval Estimation Statistics for Managers using Microsoft Excel 6 th Global Edition.
Statistics 800: Quantitative Business Analysis for Decision Making Measures of Locations and Variability.
Chapter 9 Hypothesis Testing II. Chapter Outline  Introduction  Hypothesis Testing with Sample Means (Large Samples)  Hypothesis Testing with Sample.
POSTER TEMPLATE BY: Cluster-Based Modeling: Exploring the Linear Regression Model Space Student: XiaYi(Sandy) Shen Advisor:
1 Some terminology Population - the set of all cases about which we have some interest. Sample - the cases we have selected from the population (randomly)
M. Verleysen UCL 1 Feature Selection with Mutual Information and Resampling M. Verleysen Université catholique de Louvain (Belgium) Machine Learning Group.
Using Hispanic Market Information Sources in SimplyMap Charles Swartz Vice President, Technology Geographic Research, Inc.
Bootstrap and Cross-Validation Bootstrap and Cross-Validation.
Introduction to Python Basics of the Language. Install Python Find the most recent distribution for your computer at:
Branching Instructions1 Conditional Statements There are two forms of the IF instruction %if(x,y,z) –if condition x is true, then y, else z –Example: SET.
1 CSC 221: Introduction to Programming Fall 2012 Functions & Modules  standard modules: math, random  Python documentation, help  user-defined functions,
Inferential Statistics 2 Maarten Buis January 11, 2006.
Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
Using Resampling Techniques to Measure the Effectiveness of Providers in Workers’ Compensation Insurance David Speights Senior Research Statistician HNC.
C++ Programming Language Lecture 2 Problem Analysis and Solution Representation By Ghada Al-Mashaqbeh The Hashemite University Computer Engineering Department.
Sampling And Resampling Risk Analysis for Water Resources Planning and Management Institute for Water Resources May 2007.
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
Decision Tree (Rule Induction)
ENGR 101: Robotics Lecture 4 – Making Decisions Outline  The Stall Sensor  Making Decisions  Random Number Generation References 
CS140: Intro to CS An Overview of Programming in C by Erin Chambers.
Computational statistics, lecture3 Resampling and the bootstrap  Generating random processes  The bootstrap  Some examples of bootstrap techniques.
Lynn Lethbridge SHRUG November, What is Bootstrapping? A method to estimate a statistic’s sampling distribution Bootstrap samples are drawn repeatedly.
1 A Balanced Introduction to Computer Science, 2/E David Reed, Creighton University ©2008 Pearson Prentice Hall ISBN Chapter 13 Conditional.
Chap 8-1 Chapter 8 Confidence Interval Estimation Statistics for Managers Using Microsoft Excel 7 th Edition, Global Edition Copyright ©2014 Pearson Education.
Debugging Logic Errors CPS120 Introduction to Computer Science.
Radicals With Like Terms Recall that you can add two expressions if they are like terms. This process is based on the distributive property, which was.
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
The Hashemite University Computer Engineering Department
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Chapter 4 –Dimension Reduction Data Mining for Business Analytics Shmueli, Patel & Bruce.
IST 210: PHP LOGIC IST 210: Organization of Data IST210 1.
Alterations u assignment is the operation provided by programming languages for altering the value stored by a variable. u The assignment operator (=)
Quantifying Uncertainty
Data Mining: Neural Network Applications by Louise Francis CAS Convention, Nov 13, 2001 Francis Analytics and Actuarial Data Mining, Inc.
Bootstrapping James G. Anderson, Ph.D. Purdue University.
Notes on Bootstrapping Jeff Witmer 10 February 2016.
Statistics for Business and Economics 8 th Edition Chapter 7 Estimation: Single Population Copyright © 2013 Pearson Education, Inc. Publishing as Prentice.
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Bias and Variability Lecture 27 Section 8.3 Wed, Nov 3, 2004.
Introduction. We want to see if there is any relationship between the results on exams and the amount of hours used for studies. Person ABCDEFGHIJ Hours/
FAKE GAME updates Pavel Kordík
VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ANALYSIS OF MODELS Ivo Kondapaneni, Pavel Kordík, Pavel Slavík Department of Computer Science and Engineering,
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
The GAME Algorithm Applied to Complex Fractionated Atrial Electrograms Data Set Pavel Kordík, Václav Křemen and Lenka Lhotská Department of Computer Science.
Programming in R Intro, data and programming structures
Bootstrap in refinement
Statistical Analysis of the Randomized Block Design
QM222 Class 8 Section A1 Using categorical data in regression
ECON734: Spatial Econometrics – Lab 2
Simulation: Sensitivity, Bootstrap, and Power
Quantifying uncertainty using the bootstrap
RANDOM FORESTS
Writing the executive summary section of your report
QQ Plot Quantile to Quantile Plot Quantile: QQ Plot:
ECON734: Spatial Econometrics – Lab 2
Bootstrapping Jackknifing
Introduction to Sampling Distributions
Chapter 15 Multiple Regression.
Bootstrap resampling methods: something for nothing?
Bootstrapping and Bootstrapping Regression Models
Chapter 4 –Dimension Reduction
Presentation transcript:

Programming and Simulations Frank Witmer 6 January 2011

Outline General programming tips Programming loops Simulation – Distributions – Sampling – Bootstrapping

General Programming Tips Use meaningful variable names Include more comments than you think necessary Debugging your code – Since R is interpreted, non-function variables are available for inspection if execution terminates – Built-in debugging support: debug(), browser(), trace() – But generally adding print statements in functions is sufficient Syntax highlighting! –

Loops Because R is an interpreted language, all variables in the system are evaluated and stored at every step So avoid loops for computationally intense analysis

For & While loop syntax for (variable in sequence) { expression } while (condition) { expression }

if/else control statements if ( condition1 ) { expression1 } else if ( condition2 ) { expression2 } else { expression3 }

Ways to avoid loops (sometimes) tapply: apply a function (FUN) to a variable based on a grouping variable lapply: apply a function (FUN) to each variable in a given list – sapply: same as lapply but output is more user- friendly

Data simulation Can simulate data using standard distribution functions, e.g. core names norm, pois Use ‘r’ prefix to generate random values of the distribution – rnorm(numVals, mean, sd) – rpois(numVals, mean) Use set.seed() if you want your simulated data to be reproducible

Standard distribution functions

Sampling Sample from a dataset using: sample(dataset, numItems, replace?) Can use to simulate survey results or bootstrap statistical estimates

Bootstrap overview Method to measure accuracy of estimates from a sample empirically For a sample of size n, draw many random samples, also of size n, with replacement Two ways to bootstrap regression estimates – residual resampling: add resampled regression residuals to the original dep. var. & re-estimate – data resampling: sample complete cases of original data and estimate coefficients

Recall: Boston Metadata CRIM per capita crime rate by town ZN proportion of residential land zoned for lots over 25,000 ft 2 INDUS proportion of non-retail business acres per town CHAS Charles River dummy variable (=1 if tract bounds river; 0 otherwise) NOX Nitrogen oxide concentration (parts per 10 million) RM average number of rooms per dwelling AGE proportion of owner-occupied units built prior to 1940 DIS weighted distances to five Boston employment centres RAD index of accessibility to radial highways TAX full-value property-tax rate per $10,000 PTRATIO pupil-teacher ratio by town B 1000(Bk ) 2 where Bk is the proportion of blacks by town LSTAT % lower status of the population MEDV Median value of owner-occupied homes in $1000's