Presentation is loading. Please wait.

Presentation is loading. Please wait.

Case studies in biostatistics Bonnie LaFleur Department of Biostatistics

Similar presentations


Presentation on theme: "Case studies in biostatistics Bonnie LaFleur Department of Biostatistics"— Presentation transcript:

1 Case studies in biostatistics Bonnie LaFleur Department of Biostatistics bonnie.lafleur@vanderbilt.edu

2 Outline Miscellaneous review of graphics and data collection/display. Paper 1: Enhanced tumor formation in cyclin D1 x transforming growth factor beta1 double transgenic mice with characterization by magnetic resonance imaging. Cancer Res. 2004 Feb 15;64(4):1315-22. Paper 2: Neuroblastomas of infancy exhibit a characteristic ganglioside pattern. Cancer 2001 Feb. 15; 91(4): 785-793. Paper 3: MeCP2 mutations in children with and without the phenotype of Rett syndrome. Neurology 2001; 56: 1486-1495.

3 Bar graphs Useful for counts or proportions, not for means Need to make sure that the standard error, if shown, is the correct standard error for proportions, and whether or not standard error or standard deviation is what you want to show.

4 Example Percent of type 1 in each group is 25, 27.4, 73 What is the standard error? By definition the standard error is a way to express how close to the real value we are getting using a random sample instead of the whole population.

5 Standard error Can see that this is dependent on N What does this mean for our example?

6 Back to our example For our example (20.4) we calculate the standard error for two different sample sizes So, our estimate of the true percentage has lower sampling fluctuation with higher sample sizes

7 So what does this mean The main use of standard errors, from a statistical sense, is to calculate 95% confidence intervals for our estimate: p ± 1.96(se) For n=4: (-19%, 60%) For n=20: (3%, 38%)

8 Why did I show this? Bar charts should be used for proportions (or percentages) or counts … not means Correct standard error bars need to be shown, if at all (show standard deviation instead), MUCH more important to include sample sizes with bar charts than either standard error or standard deviation, since once p and the sample size are given the standard deviation and/or standard error are easily calculated.

9 Like this

10 Box plots- for continuous data Dot here is the median (can also include the mean as a bar) Ends of the “ box ” are the 1 st and 3 rd quartiles “ hinges ” are the interquartile range, 1.5 x quartiles (never exceed the data)

11 What can sometimes happen

12 Alternative type of plot

13 Plots to display multiple events over time

14 Dot Plots

15 Data: Things to avoid when creating a dataset to be used in statistical packages Character variables must be in the same case and consistent Don ’ t mix characters with data that should be numeric Date formats should be consistent No summary computations in middle of spreadsheet Differentiate between missing values and “ zero ’ s ”, “ below detection ”, etc.

16 Date of BloodCD4CD4 %HIV VLHAART start dateVZV ser CMV ser VZV RCF CMV RCF 1/14/199862728<10000pos1.3ND 10/16/1998not CHIP Patientpos<1ND 7/15/1997950282067003/03/98pos.pos3.8<1 10/14/1997942278656903/03/98pos.posND1.5 3/3/1998196639<40003/03/98pos 6.21 12/14/1998199742<2003/03/98pos <1 8/17/19999203420-5003/03/98pos 5.18<1 2/7/2000146235<2003/04/98pos >81.85 1/23/199700133585pos<1ND 6/16/199775931.9<4009/15/98pos >8ND 1/20/1998829364529/15/98pos 5.5ND 2/3/1998829364529/15/98pos >8ND 9/15/19985893244719/15/98pos 5.4<1 5/16/2000430252885no therapypos <1 11/25/1997173653640012/21/98posneg>8ND 4/28/19981061424050012/21/98pos.neg.3.7ND 11/24/19988973226,31012/21/98posneg.<1ND 4/14/19998413972,61712/21/98posneg.ND 1/10/200084247<2012/21/99posneg6.2ND 3/26/1997431371120no HAARTpos>8<1 10-6-98 (lab date)21315703 (9-16-98)

17 Hep BDTaP birth PT6 M PT7 M PTbirth PT6 M PT7 M PT 118827194857 29161429810 3163268391123 43511104203031 51434305112764 681715642925 7529527789 86336889710 95181595118 1049141910 32 11634611172932 12444 13379 1421727 Mean20.7272719.6363629.454558.517.5714324.35714 vs. DTaP0.0697930.6653870.5527180.142732

18 Paper #1 Basic question was whether cyclin D1/TGF-1 double transgenic mice are different from cyclin D1 single transgenic mice on a variety of outcomes: Tumor incidence Tumor multiplicity Tumor burden Cellular and molecular changes

19 For tests regarding histologic/cellular changes The data were categorical, plus there were some zero (and very small) cell counts so we had to use nonparametric tests.

20 1='Wild Type' 2='Alb-TGFB' 3='LFABP-Cyclin D1' 4='Double Transgenic'

21

22 Statistics for Table of type by cytomegaly Fisher's Exact Test ______________________________________ Table Probability (P) 0.0024 Pr <= P 0.4390

23 Examine tumor volume First, we graphically examined the data Note that the times are not equal for the two groups (or for any two samples) We grouped into time intervals for the analysis 0-20 days 21-40 days 41-60 days > 60 days

24

25 Examine tumor volume First, we graphically examined the data Note that the times are not equal for the two groups (or for any two samples) We grouped into time intervals for the analysis 0-20 days 21-40 days 41-60 days > 60 days

26

27 Examine tumor volume First, we graphically examined the data Note that the times are not equal for the two groups (or for any two samples) We grouped into time intervals for the analysis 0-20 days 21-40 days 41-60 days > 60 days

28

29 Statistics We then used an analysis that accounts for repeated measures on a single mouse, and looked at the difference over time Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F daygp 3 12 0.42 0.7422 group 1 7 1.36 0.2812 group*daygp 3 39 0.96 0.4233

30 Findings (based on the specific analyses I show here) There is no difference in genotype and cytomegaly (though if we use a sum of all the tumor histopathology variable scores we do see a difference between the double transgenic group and all the other genotypes). There was no difference in tumor volume between the double transgenic and the Cyclin D1 genotype.

31 Paper 2, ganglioside pattern In typical embryonic development ganglioside expression shifts from the fetal b pathway to the adult a pathway. Neuroblastomas in infants is different (biologically and clinically) than those found in older children The main question is whether the ganglioside pathway is different between these two types of neuroblastomas.

32 Data 68 confirmed neuroblastoma samples that were either diagnosed by urinary HVA and VMA at either 3 weeks or 6 months of age (n=25), or presented clinically during study period (n=43). Information was collected on age at sample, time until disease progression, stage, and some other clinical information that was not discussed in this paper.

33 First, lets look at the plot of the data that looks at the % of b pathway gangliosides

34 Why nonparametric? We probably could have used a t- test (comparing two normal means) or analysis of variance (comparing more than two normal means) But, there was some indication that the distributions of these % b gangliosides was non-normal, so we decided to use the Wilcoxon-rank sum test.

35 Event free survival Survival analysis is used to compare “ time-to-event ” between groups. In this case we are looking at time until some clinical adverse event. We need to use specialized statistic tests because we have “ censoring ” in the data. Censoring is when you have incomplete data due to loss-to-follow-up or no event up until the end of study.

36

37 Results

38 Findings The distribution in % b pathway ganglioside production is different in children ≥ 1 year of age that present clinically compared with group that is screened (3 weeks or 6 months of age) This fit their paradigm that neuroblastomas in older children are different than younger children

39 Findings (continued) There is a difference in the event free survival distributions between those patients with ≥ 60% b pathway gangliosides and those with < 60 % b pathway gangliosides. The group with ≥ 60% b pathway gangliosides had longer event free survival.

40 Paper 3: MeCP2 mutations in Rett syndrome This study wanted to examine the association between MeCP2 gene mutations and Rett syndrome (a neurodevelopmental disorder) More specifically, whether a particular pattern of mutation, X-inactivation, along with clinical features differ among mutation types

41 Type of mutation by clinical severity Here we are looking at 5 mutations MBD nonsense Nonsense between MBD and TRD TRD nonsense TRD missense C-terminal deletions And scores of 5 clinical parameters (head growth, seizures, scoliosis and motor skills/ability to walk) The scores were all measured on an ordinal scale

42

43

44 How we analyzed these data Since the severity scores were ordinal, we viewed them as continuous (and normally distributed) We used ANOVA and looked at differences between the mean scores for each of the mutation groups

45

46

47

48 Analysis of covariance Is a combination of analysis of variance and regression The main aim is to see if the regression lines in two or more groups are different In this study, we wanted to see if two of the mutations differed in their regression of clinical severity and X-inactivation (% of one allele active); can be stated as the covariance of mutations on the regression of clinical severity on X-inactivation.

49 Main questions for analysis of covariance Is the straight line relationship between clinical score and severity the same for the two mutations (missense in MBD and nonsense between MBD and TRD versus TRD missense and nonsense and C- terminal deletions)? Do the clinical severity scores for the two mutations differ after adjusting for X- inactivation pattern?

50 P-value for slope = 0.006 P-value for intercepts < 0.0001

51 What we found There was an a statistically significant difference in many of the mutations with respect to head circumference data as well as when a summary of all clinical features There was a statistically significant difference in clinical severity score between the two mutation groups, as well as a difference in slopes between severity score and x-inactivation between the two mutation groups Both of these findings confirmed, and described, MeCP2 mutations causative in Rhett syndrome

52 Thank you for your time Suggested readings Creating More Effective Graphs by Naomi B. Robbins Statistical Analysis and Data Display by Heiberger and Holland Introduction to Biostatistics by Bernard Rosner


Download ppt "Case studies in biostatistics Bonnie LaFleur Department of Biostatistics"

Similar presentations


Ads by Google