Download presentation
Presentation is loading. Please wait.
Published byHester Cobb Modified over 8 years ago
1
SAS Beyond TFLs: An Application as a Statistician’s Tool William Coar bill.coar@gmail.com Date: 10/15/2009 Denver SAS User’s Group
2
2 Outline Motivation Normal Distribution Ratio of normally distributed random variables Simulation Visualization Brief Application
3
3 Motivation SAS is the standard for analysis and reporting in the pharmaceutical industry –generation of Tables, Figures, and Listings (TFLs) Data generally reside in SAS format Statistician is a link between data and the report –Use of TFLs –Adhoc analyses –Etc. SAS as a tool beyond reporting and analysis
4
4 Motivation Drug to improve exercise capacity in subjects with pulmonary arterial hypertension The Treatment effect is change from baseline (cfb) in distance walked in 6 minutes Food and Drug Administration approves the drug but mandates additional information –Is the treatment effect greater when the blood concentration of drug is higher? Suppose less than ½ of the effect is retained when concentrations are low –Consider changing the dosing schedule?
5
5 Motivation Consider the ratio of two treatment effects –Effect at peak concentration –Effect at trough concentration Allow for a placebo correction Not uncommon to assume the treatment effect (change from baseline) is a distribution that is approximately normal What happens when we have a ratio of two normally distributed random variables?
6
6 Motivation Consider the following random variables: x,y,z can be individual observations or sample means, x and y may be correlated Possible correlation –Peak and trough for same treated subject –Successive measurements in subjects on placebo
7
7 Motivation Wish to estimate with By assumption, we expect 0 ≤ R ≤1 and are both normal random variables r is a random variable from some distribution –Possibility of zero in the denominator
8
8 Normal Distribution Normal distribution: –Bell-shaped continuous probability distribution –(theoretical) frequency distribution is symmetric and bell shaped Frequency distribution: a set of intervals, usually adjacent and of equal width, each associated with a frequency indicating the number of measurements in that interval. –Also called the Gaussian distribution –Many desirable statistical properties
9
9 Normal Distribution
10
10 Normal Distribution Denoted N( –Mean ( ) –Variance ( ) Continuous range from (-∞, +∞) –Small neighborhood around 0 always possible though it may not be probable –Likelihood around zero depends on and
11
11 Ratio of Normal Random Variables Problems arise when it is possible for the denominator to be zero Example –Let x~N( x =5, x 2 =25), y~N( y =10, y 2 =25) where x and y are independent – 95% chance we observe x between (-5,15) and y between (0,20) –Expect to see a ratio around ( x / y )=5/10=1/2 –Suppose we observe a 10 in the numerator and a 0.001 in the denominator –r=10/.001=10,000
12
12 Ratio of Normal Random Variables Unconditional distribution is a Cauchy-like distribution –Undefined mean and variance Consider a condition –Statistical test to see if denominator is unlikely to be zero –Assures the denominator is not near zero Not unreasonable to require a (significant) treatment effect at peak blood concentrations Goal is to determine how the ratio behaves under this condition Simulate!
13
13 Simulation: Generate data If z~N(0,1), then x= x +z* x ~ N( x, x ) Bivariate Normal –Two normally distributed random variables that may/may not be correlated –Using Cholesky Decomposition, if z 1 ~N(0,1) and z 2 ~N(0,1) then and x~N( x, x 2 ) and y~N( y, y 2 ), xy = xy /( x y )
14
14 Simulation: SAS Code data x; mu=0; sd=1; do j=1 to 10000; z=normal(&seed); x=mu + z*sd; output; end; run; data x; mux=0; muy=2; sdx=1; sdy=2; sxy=0; do j=1 to 10000; z1=normal(12356); z2=normal(6789); x=mux + z1*sdx; y=muy + sxy*z1/sdx + sqrt(sdy**2-(sxy**2)/sdx**2 )*z2; output; end; run;
15
15 goptions reset=all lfactor=2; axis1 value=none label=none minor=none; axis2 value=(h=1.5) label=(h=2); proc univariate data=x gout=intdata.dist1 noprint; var x; histogram / vaxis=axis2 midpoints=-5 to 5 by 1 haxis=axis1; histogram / vaxis=axis2 midpoints=-5 to 5 by.5 haxis=axis1; histogram / vaxis=axis2 midpoints=-5 to 5 by.25 haxis=axis1; histogram / normal(color=red mu=est sigma=est w=6) vaxis=axis2 midpoints=-5 to 5 by.1 haxis=axis1; run; Simulation: SAS Code filename got "graphs\normal_freqdist.jpeg"; goptions reset=all device=jpeg gsfname=got gsfmode=replace rotate=portrait xmax=2in ymax=2in fontres=presentation xpixels=950 ypixels=950; proc greplay igout=intdata.dist1 gout=work.gseg tc=sashelp.templt template=l2r2 nofs; treplay 1=univar 2=univar2 3=univar1 4=univar3; run; quit;
16
16 Simulation: SAS Code ods graphics on / reset imagefmt=JPEG imagename="compnrml" noborder height=4in width=4in; proc sgplot data=y; density x / lineattrs=(thickness=4) legendlabel='Mu=0, Std=1'; density y / lineattrs=(thickness=4) legendlabel='Mu=2, Std=2'; keylegend / location=inside position=topright across=1; run; quit; ods graphics off;
17
17 Simulation We can simulate data fairly easily Smooth functions show the general behavior and will allow for multiple distributions per plot There are various ways to obtain histograms –Proc Univariate –Proc GPlot on the histogram output dataset –Proc SGPlot and SGPanel Desire techniques that allow for multiple distributions per plot –Visualize behavior of the ratio for various R, sample sizes, correlation structures, restrictions on the denominator
18
18 Simulation Three different ratios (peak effect of 30m) –0, ½, 1 Equal standard deviations for cfb –65m Three different sample sizes (2:1 randomization drug:placebo) –60:30 –112:56 –184:92 Two (within-subject) correlation structures –High (≈.9) –Low (≈.1) Two different levels of alpha for testing the denominator –0.05, 0.01
19
19 Simulation Methods for organization should be well thought out –That being said, tasks often evolve… Many conditions to vary (3x3x2x2) –Separate programs and/or folders –Macros –Permanent data –Graphs in permanent catalogs to be replayed later –Naming conventions
20
20 Simulation Process –Define the set of conditions (ratio, sample size, etc.) –Derive distributions of the numerator and denominator –Simulate normal random variables Outcomes of a clinical trial –Hypothesis test on the denominator (Type 1 Error) –Exclude the observation if denominator not significantly different from zero –Plot Multiple curves per plot Multiple plots per page
21
21 Simulation Generate x, y, and z Check for significance in denominator Determine r x=mux + z1*sdx; y=muy + sxy*z1/sdx + sqrt(sdy**2-(sxy**2)/sdx**2 )*z2; z=muz + z3*sdz; sp=sqrt(sdy**2 + sdz**2); zstat=probit(1-&alpha/2); zcrit=(y-z)/sp; sig=abs(zcrit)>zstat; r=(x-z)/(y-z);
22
22 Visualization: One curve GPlot using histogram output dataset SMxx interpolation in the symbol statement –Fits a smooth line to jagged (noisy) data –xx=degree of smoothing 0-99 Higher=smoother symbol1 value=none i=SM55 color=red l=3 width=5 ;
23
23 Visualization Multiple curves –Set histogram datasets together and create a by variable –Add another symbol statement –Update plot statement and add legend proc gplot data=forplot ; plot _obspct_*_minpt_=bygrp / href=(0.5) chref=("red") whref=(2) haxis=axis1 vaxis=axis2 legend=legend1; run;
24
24 Visualization Add the remaining 4 lines Store it in a permanent catalog to be replayed later. –gout=intdata.dist_6030L Update titles and labels as needed For Proc Replay –Suppress some labels –Increase font sizes and line thickness
25
25 Visualization Proc GReplay –TDEF statement to create a 2x3 panel of plots –x,y coordinates defined by percentages –TREPLAY to insert the graphs created in the simulation exercise 123 456 x x x x x x x x
26
26 Visualization proc greplay nofs; tc=work.templt; tdef Box2x3 des='2 by 3 Boxs' 1 / llx=0 lly=50 ulx=0 uly=100 urx=33.3 ury=100 lrx=33.3 lry=50 2 / llx=33.3 lly=50 ulx=33.3 uly=100 urx=66.6 ury=100 lrx=66.6 lry=50 3 / llx=66.6 lly=50 ulx=66.6 uly=100 urx=100 ury=100 lrx=100 lry=50 4 / llx=0 lly=0 ulx=0 uly=50 urx=33.3 ury=50 lrx=33.3 lry=0 5 / llx=33.3 lly=0 ulx=33.3 uly=50 urx=66.6 ury=50 lrx=66.6 lry=0 6 / llx=66.6 lly=0 ulx=66.6 uly=50 urx=100 ury=50 lrx=100 lry=0 ; run;
27
27 Visualization Prepare for replaying graphs Copy the graphs to the work graphics catalog Change the names proc catalog c=work.gseg; copy in=intdata.dist_6030L out=work.gseg; change gplot=g6030L / entrytype=grseg; copy in=intdata.dist_6030H out=work.gseg; change gplot=g6030H / entrytype=grseg; copy in=intdata.dist_11256L out=work.gseg; change gplot=g11256L / entrytype=grseg; copy in=intdata.dist_11256H out=work.gseg; change gplot=g11256H / entrytype=grseg; copy in=intdata.dist_18492L out=work.gseg; change gplot=g18492L / entrytype=grseg; copy in=intdata.dist_18492H out=work.gseg; change gplot=g18492H / entrytype=grseg; quit;
28
28 Visualization xxxxxxx
29
29 Visualization “Almost” complete –Adjust graphics options to optimize display –Graphic options can be powerful to enhance display Time versus information trade-off Other ways to do this? –Keep intermediate simulated data or histogram datasets –Proc SGplot or SGPanel –How much processing is needed?
30
30 Application to Design Scenario 1: Data from one 16 week clinical trial –At week 16, ½ subjects have 6mwd measured at peak blood concentrations, the other measured 6mwd at trough concentrations –Observations that measure peak and trough are independent –Similar to low correlation Scenario 2: Schedule an extra visit at week 12 in one clinical trial –At week 12, some test at peak, some test at trough –Switch and repeat at week 16 –Similar to high correlation scenario
31
31 Application to Design xxxxxxx
32
32 Application to Design Higher variation in low correlation scenario If the true ratio was near 1: –For Scenario 1: there is a non-negligible probability of observing something less than ½ –For Scenario 2: much less likely to see something near ½ Costs associated with extra visit? Consequences of decisions made about the true ratio? –Asked to consider a different dosing schedule?
33
33 Conclusions Simulations in SAS allowed us to understand the conditional distributions –Beneficial in planning Many ways to use graphics to visualize in this simulation –Available procedures Organizational structure to simulate data and graphs –Obtain a lot of information in each panel –Even more when panels are combined Graphic options can be powerful to enhance display –Time versus information trade-off
34
Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.