SAS Beyond TFLs: An Application as a Statistician’s Tool William Coar Date: 10/15/2009 Denver SAS User’s Group.

Slides:



Advertisements
Similar presentations
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Advertisements

Estimation in Sampling
Analysis of variance (ANOVA)-the General Linear Model (GLM)
STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
Sampling Distributions (§ )
Econ 140 Lecture 41 More on Univariate Populations Lecture 4.
Multiple regression analysis
1 Analysis of Variance This technique is designed to test the null hypothesis that three or more group means are equal.
Descriptive Statistics In SAS Exploring Your Data.
Sampling Distributions
Evaluating Hypotheses
BHS Methods in Behavioral Sciences I
Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.
Lecture 9: One Way ANOVA Between Subjects
CHAPTER 6 Statistical Analysis of Experimental Data
13-1 Designing Engineering Experiments Every experiment involves a sequence of activities: Conjecture – the original hypothesis that motivates the.
Continuous Random Variables and Probability Distributions
QMS 6351 Statistics and Research Methods Probability and Probability distributions Chapter 4, page 161 Chapter 5 (5.1) Chapter 6 (6.2) Prof. Vera Adamchik.
Chapter 11: Inference for Distributions
Inferences About Process Quality
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
PSY 307 – Statistics for the Behavioral Sciences Chapter 8 – The Normal Curve, Sample vs Population, and Probability.
Richard M. Jacobs, OSA, Ph.D.
Slide 1 Testing Multivariate Assumptions The multivariate statistical techniques which we will cover in this class require one or more the following assumptions.
Confidence Interval A confidence interval (or interval estimate) is a range (or an interval) of values used to estimate the true value of a population.
Density Curves and Normal Distributions
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
AM Recitation 2/10/11.
Education 793 Class Notes T-tests 29 October 2003.
PROC GREPLAY With Templates December 5, 2008 Barry Hong Process Analysis and Simulation U. S. Steel Canada © 2008 United States Steel Corporation.
Jan 17,  Hypothesis, Null hypothesis Research question Null is the hypothesis of “no relationship”  Normal Distribution Bell curve Standard normal.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
PARAMETRIC STATISTICAL INFERENCE
Tips & Tricks MASUG02/18/2005. Multiple Graphs on One Page.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
TS02 SAS GTL - Injecting New Life into Graphs
Random Variables Numerical Quantities whose values are determine by the outcome of a random experiment.
3-2 Random Variables In an experiment, a measurement is usually denoted by a variable such as X. In a random experiment, a variable whose measured.
ANOVA (Analysis of Variance) by Aziza Munir
Exam Exam starts two weeks from today. Amusing Statistics Use what you know about normal distributions to evaluate this finding: The study, published.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
1 Chapter 6 Estimates and Sample Sizes 6-1 Estimating a Population Mean: Large Samples / σ Known 6-2 Estimating a Population Mean: Small Samples / σ Unknown.
Random Sampling Approximations of E(X), p.m.f, and p.d.f.
ON PATHS LESS TRODDEN… Excursions in SAS/GRAPH ® and PROC TABULATE Presented by Aaron Rabushka © Aaron Rabushka 2000.
Lecture 3 Topic - Descriptive Procedures Programs 3-4 LSB 4:1-4.4; 4:9:4:11; 8:1-8:5; 5:1-5.2.
Effective SAS greplay’ing and how to avoid stretching By David Mottershead Senior Programmer, Quanticate.
Week111 The t distribution Suppose that a SRS of size n is drawn from a N(μ, σ) population. Then the one sample t statistic has a t distribution with n.
Going from data to analysis Dr. Nancy Mayo. Getting it right Research is about getting the right answer, not just an answer An answer is easy The right.
Chapter 10 The t Test for Two Independent Samples
© 2010 Pearson Prentice Hall. All rights reserved 7-1.
Other Types of t-tests Recapitulation Recapitulation 1. Still dealing with random samples. 2. However, they are partitioned into two subsamples. 3. Interest.
Introduction to ANOVA Research Designs for ANOVAs Type I Error and Multiple Hypothesis Tests The Logic of ANOVA ANOVA vocabulary, notation, and formulas.
SAS ® is a very powerful tool when producing Graphics. A single graphical data step can easily create a Kaplan Meier Plot, but there is no single graphical.
Chapter 1 Introduction to Statistics. Section 1.1 Fundamental Statistical Concepts.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Lecture 8: Measurement Errors 1. Objectives List some sources of measurement errors. Classify measurement errors into systematic and random errors. Study.
Chapter 13 Understanding research results: statistical inference.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Two-Sample Hypothesis Testing
This Week Review of estimation and hypothesis testing
CHAPTER 29: Multiple Regression*
6-1 Introduction To Empirical Models
Discrete Event Simulation - 4
12 Inferential Analysis.
AP Statistics Chapter 16 Notes.
Sampling Distributions (§ )
Advanced Algebra Unit 1 Vocabulary
Presentation transcript:

SAS Beyond TFLs: An Application as a Statistician’s Tool William Coar Date: 10/15/2009 Denver SAS User’s Group

2 Outline  Motivation  Normal Distribution  Ratio of normally distributed random variables  Simulation  Visualization  Brief Application

3 Motivation  SAS is the standard for analysis and reporting in the pharmaceutical industry –generation of Tables, Figures, and Listings (TFLs)  Data generally reside in SAS format  Statistician is a link between data and the report –Use of TFLs –Adhoc analyses –Etc.  SAS as a tool beyond reporting and analysis

4 Motivation  Drug to improve exercise capacity in subjects with pulmonary arterial hypertension  The Treatment effect is change from baseline (cfb) in distance walked in 6 minutes  Food and Drug Administration approves the drug but mandates additional information –Is the treatment effect greater when the blood concentration of drug is higher?  Suppose less than ½ of the effect is retained when concentrations are low –Consider changing the dosing schedule?

5 Motivation  Consider the ratio of two treatment effects –Effect at peak concentration –Effect at trough concentration  Allow for a placebo correction  Not uncommon to assume the treatment effect (change from baseline) is a distribution that is approximately normal  What happens when we have a ratio of two normally distributed random variables?

6 Motivation  Consider the following random variables:  x,y,z can be individual observations or sample means, x and y may be correlated  Possible correlation –Peak and trough for same treated subject –Successive measurements in subjects on placebo

7 Motivation  Wish to estimate with  By assumption, we expect 0 ≤ R ≤1  and  are both normal random variables  r is a random variable from some distribution –Possibility of zero in the denominator

8 Normal Distribution  Normal distribution: –Bell-shaped continuous probability distribution –(theoretical) frequency distribution is symmetric and bell shaped Frequency distribution: a set of intervals, usually adjacent and of equal width, each associated with a frequency indicating the number of measurements in that interval. –Also called the Gaussian distribution –Many desirable statistical properties

9 Normal Distribution

10 Normal Distribution  Denoted N(    –Mean (  ) –Variance (   )  Continuous range from (-∞, +∞) –Small neighborhood around 0 always possible though it may not be probable –Likelihood around zero depends on  and 

11 Ratio of Normal Random Variables  Problems arise when it is possible for the denominator to be zero  Example –Let x~N(  x =5,  x 2 =25), y~N(  y =10,  y 2 =25) where x and y are independent – 95% chance we observe x between (-5,15) and y between (0,20) –Expect to see a ratio around (  x /  y )=5/10=1/2 –Suppose we observe a 10 in the numerator and a in the denominator –r=10/.001=10,000

12 Ratio of Normal Random Variables  Unconditional distribution is a Cauchy-like distribution –Undefined mean and variance  Consider a condition –Statistical test to see if denominator is unlikely to be zero –Assures the denominator is not near zero  Not unreasonable to require a (significant) treatment effect at peak blood concentrations  Goal is to determine how the ratio behaves under this condition Simulate!

13 Simulation: Generate data  If z~N(0,1), then x=  x +z*  x ~ N(  x,  x  )  Bivariate Normal –Two normally distributed random variables that may/may not be correlated –Using Cholesky Decomposition, if z 1 ~N(0,1) and z 2 ~N(0,1) then and x~N(  x,  x 2 ) and y~N(  y,  y 2 ),  xy =  xy /(  x  y )

14 Simulation: SAS Code data x; mu=0; sd=1; do j=1 to 10000; z=normal(&seed); x=mu + z*sd; output; end; run; data x; mux=0; muy=2; sdx=1; sdy=2; sxy=0; do j=1 to 10000; z1=normal(12356); z2=normal(6789); x=mux + z1*sdx; y=muy + sxy*z1/sdx + sqrt(sdy**2-(sxy**2)/sdx**2 )*z2; output; end; run;

15 goptions reset=all lfactor=2; axis1 value=none label=none minor=none; axis2 value=(h=1.5) label=(h=2); proc univariate data=x gout=intdata.dist1 noprint; var x; histogram / vaxis=axis2 midpoints=-5 to 5 by 1 haxis=axis1; histogram / vaxis=axis2 midpoints=-5 to 5 by.5 haxis=axis1; histogram / vaxis=axis2 midpoints=-5 to 5 by.25 haxis=axis1; histogram / normal(color=red mu=est sigma=est w=6) vaxis=axis2 midpoints=-5 to 5 by.1 haxis=axis1; run; Simulation: SAS Code filename got "graphs\normal_freqdist.jpeg"; goptions reset=all device=jpeg gsfname=got gsfmode=replace rotate=portrait xmax=2in ymax=2in fontres=presentation xpixels=950 ypixels=950; proc greplay igout=intdata.dist1 gout=work.gseg tc=sashelp.templt template=l2r2 nofs; treplay 1=univar 2=univar2 3=univar1 4=univar3; run; quit;

16 Simulation: SAS Code ods graphics on / reset imagefmt=JPEG imagename="compnrml" noborder height=4in width=4in; proc sgplot data=y; density x / lineattrs=(thickness=4) legendlabel='Mu=0, Std=1'; density y / lineattrs=(thickness=4) legendlabel='Mu=2, Std=2'; keylegend / location=inside position=topright across=1; run; quit; ods graphics off;

17 Simulation  We can simulate data fairly easily  Smooth functions show the general behavior and will allow for multiple distributions per plot  There are various ways to obtain histograms –Proc Univariate –Proc GPlot on the histogram output dataset –Proc SGPlot and SGPanel  Desire techniques that allow for multiple distributions per plot –Visualize behavior of the ratio for various R, sample sizes, correlation structures, restrictions on the denominator

18 Simulation  Three different ratios (peak effect of 30m) –0, ½, 1  Equal standard deviations for cfb –65m  Three different sample sizes (2:1 randomization drug:placebo) –60:30 –112:56 –184:92  Two (within-subject) correlation structures –High (≈.9) –Low (≈.1)  Two different levels of alpha for testing the denominator –0.05, 0.01

19 Simulation  Methods for organization should be well thought out –That being said, tasks often evolve…  Many conditions to vary (3x3x2x2) –Separate programs and/or folders –Macros –Permanent data –Graphs in permanent catalogs to be replayed later –Naming conventions

20 Simulation  Process –Define the set of conditions (ratio, sample size, etc.) –Derive distributions of the numerator and denominator –Simulate normal random variables Outcomes of a clinical trial –Hypothesis test on the denominator (Type 1 Error) –Exclude the observation if denominator not significantly different from zero –Plot Multiple curves per plot Multiple plots per page

21 Simulation  Generate x, y, and z  Check for significance in denominator  Determine r x=mux + z1*sdx; y=muy + sxy*z1/sdx + sqrt(sdy**2-(sxy**2)/sdx**2 )*z2; z=muz + z3*sdz; sp=sqrt(sdy**2 + sdz**2); zstat=probit(1-&alpha/2); zcrit=(y-z)/sp; sig=abs(zcrit)>zstat; r=(x-z)/(y-z);

22 Visualization: One curve  GPlot using histogram output dataset  SMxx interpolation in the symbol statement –Fits a smooth line to jagged (noisy) data –xx=degree of smoothing 0-99 Higher=smoother symbol1 value=none i=SM55 color=red l=3 width=5 ;

23 Visualization  Multiple curves –Set histogram datasets together and create a by variable –Add another symbol statement –Update plot statement and add legend proc gplot data=forplot ; plot _obspct_*_minpt_=bygrp / href=(0.5) chref=("red") whref=(2) haxis=axis1 vaxis=axis2 legend=legend1; run;

24 Visualization  Add the remaining 4 lines  Store it in a permanent catalog to be replayed later. –gout=intdata.dist_6030L  Update titles and labels as needed  For Proc Replay –Suppress some labels –Increase font sizes and line thickness

25 Visualization  Proc GReplay –TDEF statement to create a 2x3 panel of plots –x,y coordinates defined by percentages –TREPLAY to insert the graphs created in the simulation exercise x x x x x x x x

26 Visualization proc greplay nofs; tc=work.templt; tdef Box2x3 des='2 by 3 Boxs' 1 / llx=0 lly=50 ulx=0 uly=100 urx=33.3 ury=100 lrx=33.3 lry=50 2 / llx=33.3 lly=50 ulx=33.3 uly=100 urx=66.6 ury=100 lrx=66.6 lry=50 3 / llx=66.6 lly=50 ulx=66.6 uly=100 urx=100 ury=100 lrx=100 lry=50 4 / llx=0 lly=0 ulx=0 uly=50 urx=33.3 ury=50 lrx=33.3 lry=0 5 / llx=33.3 lly=0 ulx=33.3 uly=50 urx=66.6 ury=50 lrx=66.6 lry=0 6 / llx=66.6 lly=0 ulx=66.6 uly=50 urx=100 ury=50 lrx=100 lry=0 ; run;

27 Visualization  Prepare for replaying graphs  Copy the graphs to the work graphics catalog  Change the names proc catalog c=work.gseg; copy in=intdata.dist_6030L out=work.gseg; change gplot=g6030L / entrytype=grseg; copy in=intdata.dist_6030H out=work.gseg; change gplot=g6030H / entrytype=grseg; copy in=intdata.dist_11256L out=work.gseg; change gplot=g11256L / entrytype=grseg; copy in=intdata.dist_11256H out=work.gseg; change gplot=g11256H / entrytype=grseg; copy in=intdata.dist_18492L out=work.gseg; change gplot=g18492L / entrytype=grseg; copy in=intdata.dist_18492H out=work.gseg; change gplot=g18492H / entrytype=grseg; quit;

28 Visualization xxxxxxx

29 Visualization  “Almost” complete –Adjust graphics options to optimize display –Graphic options can be powerful to enhance display Time versus information trade-off  Other ways to do this? –Keep intermediate simulated data or histogram datasets –Proc SGplot or SGPanel –How much processing is needed?

30 Application to Design  Scenario 1: Data from one 16 week clinical trial –At week 16, ½ subjects have 6mwd measured at peak blood concentrations, the other measured 6mwd at trough concentrations –Observations that measure peak and trough are independent –Similar to low correlation  Scenario 2: Schedule an extra visit at week 12 in one clinical trial –At week 12, some test at peak, some test at trough –Switch and repeat at week 16 –Similar to high correlation scenario

31 Application to Design xxxxxxx

32 Application to Design  Higher variation in low correlation scenario  If the true ratio was near 1: –For Scenario 1: there is a non-negligible probability of observing something less than ½ –For Scenario 2: much less likely to see something near ½  Costs associated with extra visit?  Consequences of decisions made about the true ratio? –Asked to consider a different dosing schedule?

33 Conclusions  Simulations in SAS allowed us to understand the conditional distributions –Beneficial in planning  Many ways to use graphics to visualize in this simulation –Available procedures Organizational structure to simulate data and graphs –Obtain a lot of information in each panel –Even more when panels are combined  Graphic options can be powerful to enhance display –Time versus information trade-off

Questions?