Presentation is loading. Please wait.

Presentation is loading. Please wait.

SAS Beyond TFLs: An Application as a Statistician’s Tool William Coar Date: 10/15/2009 Denver SAS User’s Group.

Similar presentations


Presentation on theme: "SAS Beyond TFLs: An Application as a Statistician’s Tool William Coar Date: 10/15/2009 Denver SAS User’s Group."— Presentation transcript:

1 SAS Beyond TFLs: An Application as a Statistician’s Tool William Coar bill.coar@gmail.com Date: 10/15/2009 Denver SAS User’s Group

2 2 Outline  Motivation  Normal Distribution  Ratio of normally distributed random variables  Simulation  Visualization  Brief Application

3 3 Motivation  SAS is the standard for analysis and reporting in the pharmaceutical industry –generation of Tables, Figures, and Listings (TFLs)  Data generally reside in SAS format  Statistician is a link between data and the report –Use of TFLs –Adhoc analyses –Etc.  SAS as a tool beyond reporting and analysis

4 4 Motivation  Drug to improve exercise capacity in subjects with pulmonary arterial hypertension  The Treatment effect is change from baseline (cfb) in distance walked in 6 minutes  Food and Drug Administration approves the drug but mandates additional information –Is the treatment effect greater when the blood concentration of drug is higher?  Suppose less than ½ of the effect is retained when concentrations are low –Consider changing the dosing schedule?

5 5 Motivation  Consider the ratio of two treatment effects –Effect at peak concentration –Effect at trough concentration  Allow for a placebo correction  Not uncommon to assume the treatment effect (change from baseline) is a distribution that is approximately normal  What happens when we have a ratio of two normally distributed random variables?

6 6 Motivation  Consider the following random variables:  x,y,z can be individual observations or sample means, x and y may be correlated  Possible correlation –Peak and trough for same treated subject –Successive measurements in subjects on placebo

7 7 Motivation  Wish to estimate with  By assumption, we expect 0 ≤ R ≤1  and  are both normal random variables  r is a random variable from some distribution –Possibility of zero in the denominator

8 8 Normal Distribution  Normal distribution: –Bell-shaped continuous probability distribution –(theoretical) frequency distribution is symmetric and bell shaped Frequency distribution: a set of intervals, usually adjacent and of equal width, each associated with a frequency indicating the number of measurements in that interval. –Also called the Gaussian distribution –Many desirable statistical properties

9 9 Normal Distribution

10 10 Normal Distribution  Denoted N(    –Mean (  ) –Variance (   )  Continuous range from (-∞, +∞) –Small neighborhood around 0 always possible though it may not be probable –Likelihood around zero depends on  and 

11 11 Ratio of Normal Random Variables  Problems arise when it is possible for the denominator to be zero  Example –Let x~N(  x =5,  x 2 =25), y~N(  y =10,  y 2 =25) where x and y are independent – 95% chance we observe x between (-5,15) and y between (0,20) –Expect to see a ratio around (  x /  y )=5/10=1/2 –Suppose we observe a 10 in the numerator and a 0.001 in the denominator –r=10/.001=10,000

12 12 Ratio of Normal Random Variables  Unconditional distribution is a Cauchy-like distribution –Undefined mean and variance  Consider a condition –Statistical test to see if denominator is unlikely to be zero –Assures the denominator is not near zero  Not unreasonable to require a (significant) treatment effect at peak blood concentrations  Goal is to determine how the ratio behaves under this condition Simulate!

13 13 Simulation: Generate data  If z~N(0,1), then x=  x +z*  x ~ N(  x,  x  )  Bivariate Normal –Two normally distributed random variables that may/may not be correlated –Using Cholesky Decomposition, if z 1 ~N(0,1) and z 2 ~N(0,1) then and x~N(  x,  x 2 ) and y~N(  y,  y 2 ),  xy =  xy /(  x  y )

14 14 Simulation: SAS Code data x; mu=0; sd=1; do j=1 to 10000; z=normal(&seed); x=mu + z*sd; output; end; run; data x; mux=0; muy=2; sdx=1; sdy=2; sxy=0; do j=1 to 10000; z1=normal(12356); z2=normal(6789); x=mux + z1*sdx; y=muy + sxy*z1/sdx + sqrt(sdy**2-(sxy**2)/sdx**2 )*z2; output; end; run;

15 15 goptions reset=all lfactor=2; axis1 value=none label=none minor=none; axis2 value=(h=1.5) label=(h=2); proc univariate data=x gout=intdata.dist1 noprint; var x; histogram / vaxis=axis2 midpoints=-5 to 5 by 1 haxis=axis1; histogram / vaxis=axis2 midpoints=-5 to 5 by.5 haxis=axis1; histogram / vaxis=axis2 midpoints=-5 to 5 by.25 haxis=axis1; histogram / normal(color=red mu=est sigma=est w=6) vaxis=axis2 midpoints=-5 to 5 by.1 haxis=axis1; run; Simulation: SAS Code filename got "graphs\normal_freqdist.jpeg"; goptions reset=all device=jpeg gsfname=got gsfmode=replace rotate=portrait xmax=2in ymax=2in fontres=presentation xpixels=950 ypixels=950; proc greplay igout=intdata.dist1 gout=work.gseg tc=sashelp.templt template=l2r2 nofs; treplay 1=univar 2=univar2 3=univar1 4=univar3; run; quit;

16 16 Simulation: SAS Code ods graphics on / reset imagefmt=JPEG imagename="compnrml" noborder height=4in width=4in; proc sgplot data=y; density x / lineattrs=(thickness=4) legendlabel='Mu=0, Std=1'; density y / lineattrs=(thickness=4) legendlabel='Mu=2, Std=2'; keylegend / location=inside position=topright across=1; run; quit; ods graphics off;

17 17 Simulation  We can simulate data fairly easily  Smooth functions show the general behavior and will allow for multiple distributions per plot  There are various ways to obtain histograms –Proc Univariate –Proc GPlot on the histogram output dataset –Proc SGPlot and SGPanel  Desire techniques that allow for multiple distributions per plot –Visualize behavior of the ratio for various R, sample sizes, correlation structures, restrictions on the denominator

18 18 Simulation  Three different ratios (peak effect of 30m) –0, ½, 1  Equal standard deviations for cfb –65m  Three different sample sizes (2:1 randomization drug:placebo) –60:30 –112:56 –184:92  Two (within-subject) correlation structures –High (≈.9) –Low (≈.1)  Two different levels of alpha for testing the denominator –0.05, 0.01

19 19 Simulation  Methods for organization should be well thought out –That being said, tasks often evolve…  Many conditions to vary (3x3x2x2) –Separate programs and/or folders –Macros –Permanent data –Graphs in permanent catalogs to be replayed later –Naming conventions

20 20 Simulation  Process –Define the set of conditions (ratio, sample size, etc.) –Derive distributions of the numerator and denominator –Simulate normal random variables Outcomes of a clinical trial –Hypothesis test on the denominator (Type 1 Error) –Exclude the observation if denominator not significantly different from zero –Plot Multiple curves per plot Multiple plots per page

21 21 Simulation  Generate x, y, and z  Check for significance in denominator  Determine r x=mux + z1*sdx; y=muy + sxy*z1/sdx + sqrt(sdy**2-(sxy**2)/sdx**2 )*z2; z=muz + z3*sdz; sp=sqrt(sdy**2 + sdz**2); zstat=probit(1-&alpha/2); zcrit=(y-z)/sp; sig=abs(zcrit)>zstat; r=(x-z)/(y-z);

22 22 Visualization: One curve  GPlot using histogram output dataset  SMxx interpolation in the symbol statement –Fits a smooth line to jagged (noisy) data –xx=degree of smoothing 0-99 Higher=smoother symbol1 value=none i=SM55 color=red l=3 width=5 ;

23 23 Visualization  Multiple curves –Set histogram datasets together and create a by variable –Add another symbol statement –Update plot statement and add legend proc gplot data=forplot ; plot _obspct_*_minpt_=bygrp / href=(0.5) chref=("red") whref=(2) haxis=axis1 vaxis=axis2 legend=legend1; run;

24 24 Visualization  Add the remaining 4 lines  Store it in a permanent catalog to be replayed later. –gout=intdata.dist_6030L  Update titles and labels as needed  For Proc Replay –Suppress some labels –Increase font sizes and line thickness

25 25 Visualization  Proc GReplay –TDEF statement to create a 2x3 panel of plots –x,y coordinates defined by percentages –TREPLAY to insert the graphs created in the simulation exercise 123 456 x x x x x x x x

26 26 Visualization proc greplay nofs; tc=work.templt; tdef Box2x3 des='2 by 3 Boxs' 1 / llx=0 lly=50 ulx=0 uly=100 urx=33.3 ury=100 lrx=33.3 lry=50 2 / llx=33.3 lly=50 ulx=33.3 uly=100 urx=66.6 ury=100 lrx=66.6 lry=50 3 / llx=66.6 lly=50 ulx=66.6 uly=100 urx=100 ury=100 lrx=100 lry=50 4 / llx=0 lly=0 ulx=0 uly=50 urx=33.3 ury=50 lrx=33.3 lry=0 5 / llx=33.3 lly=0 ulx=33.3 uly=50 urx=66.6 ury=50 lrx=66.6 lry=0 6 / llx=66.6 lly=0 ulx=66.6 uly=50 urx=100 ury=50 lrx=100 lry=0 ; run;

27 27 Visualization  Prepare for replaying graphs  Copy the graphs to the work graphics catalog  Change the names proc catalog c=work.gseg; copy in=intdata.dist_6030L out=work.gseg; change gplot=g6030L / entrytype=grseg; copy in=intdata.dist_6030H out=work.gseg; change gplot=g6030H / entrytype=grseg; copy in=intdata.dist_11256L out=work.gseg; change gplot=g11256L / entrytype=grseg; copy in=intdata.dist_11256H out=work.gseg; change gplot=g11256H / entrytype=grseg; copy in=intdata.dist_18492L out=work.gseg; change gplot=g18492L / entrytype=grseg; copy in=intdata.dist_18492H out=work.gseg; change gplot=g18492H / entrytype=grseg; quit;

28 28 Visualization xxxxxxx

29 29 Visualization  “Almost” complete –Adjust graphics options to optimize display –Graphic options can be powerful to enhance display Time versus information trade-off  Other ways to do this? –Keep intermediate simulated data or histogram datasets –Proc SGplot or SGPanel –How much processing is needed?

30 30 Application to Design  Scenario 1: Data from one 16 week clinical trial –At week 16, ½ subjects have 6mwd measured at peak blood concentrations, the other measured 6mwd at trough concentrations –Observations that measure peak and trough are independent –Similar to low correlation  Scenario 2: Schedule an extra visit at week 12 in one clinical trial –At week 12, some test at peak, some test at trough –Switch and repeat at week 16 –Similar to high correlation scenario

31 31 Application to Design xxxxxxx

32 32 Application to Design  Higher variation in low correlation scenario  If the true ratio was near 1: –For Scenario 1: there is a non-negligible probability of observing something less than ½ –For Scenario 2: much less likely to see something near ½  Costs associated with extra visit?  Consequences of decisions made about the true ratio? –Asked to consider a different dosing schedule?

33 33 Conclusions  Simulations in SAS allowed us to understand the conditional distributions –Beneficial in planning  Many ways to use graphics to visualize in this simulation –Available procedures Organizational structure to simulate data and graphs –Obtain a lot of information in each panel –Even more when panels are combined  Graphic options can be powerful to enhance display –Time versus information trade-off

34 Questions?


Download ppt "SAS Beyond TFLs: An Application as a Statistician’s Tool William Coar Date: 10/15/2009 Denver SAS User’s Group."

Similar presentations


Ads by Google