Case studies in biostatistics Bonnie LaFleur Department of Biostatistics

Slides:



Advertisements
Similar presentations
I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science.
Advertisements

ADVANCED STATISTICS FOR MEDICAL STUDIES Mwarumba Mwavita, Ph.D. School of Educational Studies Research Evaluation Measurement and Statistics (REMS) Oklahoma.
Sta220 - Statistics Mr. Smith Room 310 Class #14.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Sampling: Final and Initial Sample Size Determination
Analysis of variance (ANOVA)-the General Linear Model (GLM)
Departments of Medicine and Biostatistics
CHAPTER 24: Inference for Regression
Objectives (BPS chapter 24)
Statistical Tests Karen H. Hagglund, M.S.
From last time….. Basic Biostats Topics Summary Statistics –mean, median, mode –standard deviation, standard error Confidence Intervals Hypothesis Tests.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Statistics: Data Analysis and Presentation Fr Clinic II.
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter 19 Data Analysis Overview
Lecture Slides Elementary Statistics Twelfth Edition
Inferences About Process Quality
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 3 Describing Data Using Numerical Measures.
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Chapter 7 Forecasting with Simple Regression
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
How to Analyze Data? Aravinda Guntupalli. SPSS windows process Data window Variable view window Output window Chart editor window.
Quantitative Skills: Data Analysis and Graphing.
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
Simple Linear Regression
1/26/09 1 Community Health Assessment in Small Populations: Tools for Working With “Small Numbers” Region 2 Quarterly Meeting January 26, 2009.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Review and Preview This chapter combines the methods of descriptive statistics presented in.
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
Chapter 15 Data Analysis: Testing for Significant Differences.
Quantitative Skills 1: Graphing
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
Central Tendency and Variability Chapter 4. Variability In reality – all of statistics can be summed into one statement: – Variability matters. – (and.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Descriptive statistics Petter Mostad Goal: Reduce data amount, keep ”information” Two uses: Data exploration: What you do for yourself when.
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.
Mr. Magdi Morsi Statistician Department of Research and Studies, MOH
PCB 3043L - General Ecology Data Analysis.
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Chapter 10 Inference for Regression
Data Analysis, Presentation, and Statistics
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
Approaches to quantitative data analysis Lara Traeger, PhD Methods in Supportive Oncology Research.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Statistical analysis.
April 18 Intro to survival analysis Le 11.1 – 11.2
Statistical analysis.
PCB 3043L - General Ecology Data Analysis.
Basic Statistics Overview
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Advanced Algebra Unit 1 Vocabulary
Sampling Distributions
Introductory Statistics
Presentation transcript:

Case studies in biostatistics Bonnie LaFleur Department of Biostatistics

Outline Miscellaneous review of graphics and data collection/display. Paper 1: Enhanced tumor formation in cyclin D1 x transforming growth factor beta1 double transgenic mice with characterization by magnetic resonance imaging. Cancer Res Feb 15;64(4): Paper 2: Neuroblastomas of infancy exhibit a characteristic ganglioside pattern. Cancer 2001 Feb. 15; 91(4): Paper 3: MeCP2 mutations in children with and without the phenotype of Rett syndrome. Neurology 2001; 56:

Bar graphs Useful for counts or proportions, not for means Need to make sure that the standard error, if shown, is the correct standard error for proportions, and whether or not standard error or standard deviation is what you want to show.

Example Percent of type 1 in each group is 25, 27.4, 73 What is the standard error? By definition the standard error is a way to express how close to the real value we are getting using a random sample instead of the whole population.

Standard error Can see that this is dependent on N What does this mean for our example?

Back to our example For our example (20.4) we calculate the standard error for two different sample sizes So, our estimate of the true percentage has lower sampling fluctuation with higher sample sizes

So what does this mean The main use of standard errors, from a statistical sense, is to calculate 95% confidence intervals for our estimate: p ± 1.96(se) For n=4: (-19%, 60%) For n=20: (3%, 38%)

Why did I show this? Bar charts should be used for proportions (or percentages) or counts … not means Correct standard error bars need to be shown, if at all (show standard deviation instead), MUCH more important to include sample sizes with bar charts than either standard error or standard deviation, since once p and the sample size are given the standard deviation and/or standard error are easily calculated.

Like this

Box plots- for continuous data Dot here is the median (can also include the mean as a bar) Ends of the “ box ” are the 1 st and 3 rd quartiles “ hinges ” are the interquartile range, 1.5 x quartiles (never exceed the data)

What can sometimes happen

Alternative type of plot

Plots to display multiple events over time

Dot Plots

Data: Things to avoid when creating a dataset to be used in statistical packages Character variables must be in the same case and consistent Don ’ t mix characters with data that should be numeric Date formats should be consistent No summary computations in middle of spreadsheet Differentiate between missing values and “ zero ’ s ”, “ below detection ”, etc.

Date of BloodCD4CD4 %HIV VLHAART start dateVZV ser CMV ser VZV RCF CMV RCF 1/14/ <10000pos1.3ND 10/16/1998not CHIP Patientpos<1ND 7/15/ /03/98pos.pos3.8<1 10/14/ /03/98pos.posND1.5 3/3/ <40003/03/98pos /14/ <2003/03/98pos <1 8/17/ /03/98pos 5.18<1 2/7/ <2003/04/98pos > /23/ pos<1ND 6/16/ <4009/15/98pos >8ND 1/20/ /15/98pos 5.5ND 2/3/ /15/98pos >8ND 9/15/ /15/98pos 5.4<1 5/16/ no therapypos <1 11/25/ /21/98posneg>8ND 4/28/ /21/98pos.neg.3.7ND 11/24/ ,31012/21/98posneg.<1ND 4/14/ ,61712/21/98posneg.ND 1/10/ <2012/21/99posneg6.2ND 3/26/ no HAARTpos>8< (lab date) ( )

Hep BDTaP birth PT6 M PT7 M PTbirth PT6 M PT7 M PT Mean vs. DTaP

Paper #1 Basic question was whether cyclin D1/TGF-1 double transgenic mice are different from cyclin D1 single transgenic mice on a variety of outcomes: Tumor incidence Tumor multiplicity Tumor burden Cellular and molecular changes

For tests regarding histologic/cellular changes The data were categorical, plus there were some zero (and very small) cell counts so we had to use nonparametric tests.

1='Wild Type' 2='Alb-TGFB' 3='LFABP-Cyclin D1' 4='Double Transgenic'

Statistics for Table of type by cytomegaly Fisher's Exact Test ______________________________________ Table Probability (P) Pr <= P

Examine tumor volume First, we graphically examined the data Note that the times are not equal for the two groups (or for any two samples) We grouped into time intervals for the analysis 0-20 days days days > 60 days

Examine tumor volume First, we graphically examined the data Note that the times are not equal for the two groups (or for any two samples) We grouped into time intervals for the analysis 0-20 days days days > 60 days

Examine tumor volume First, we graphically examined the data Note that the times are not equal for the two groups (or for any two samples) We grouped into time intervals for the analysis 0-20 days days days > 60 days

Statistics We then used an analysis that accounts for repeated measures on a single mouse, and looked at the difference over time Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F daygp group group*daygp

Findings (based on the specific analyses I show here) There is no difference in genotype and cytomegaly (though if we use a sum of all the tumor histopathology variable scores we do see a difference between the double transgenic group and all the other genotypes). There was no difference in tumor volume between the double transgenic and the Cyclin D1 genotype.

Paper 2, ganglioside pattern In typical embryonic development ganglioside expression shifts from the fetal b pathway to the adult a pathway. Neuroblastomas in infants is different (biologically and clinically) than those found in older children The main question is whether the ganglioside pathway is different between these two types of neuroblastomas.

Data 68 confirmed neuroblastoma samples that were either diagnosed by urinary HVA and VMA at either 3 weeks or 6 months of age (n=25), or presented clinically during study period (n=43). Information was collected on age at sample, time until disease progression, stage, and some other clinical information that was not discussed in this paper.

First, lets look at the plot of the data that looks at the % of b pathway gangliosides

Why nonparametric? We probably could have used a t- test (comparing two normal means) or analysis of variance (comparing more than two normal means) But, there was some indication that the distributions of these % b gangliosides was non-normal, so we decided to use the Wilcoxon-rank sum test.

Event free survival Survival analysis is used to compare “ time-to-event ” between groups. In this case we are looking at time until some clinical adverse event. We need to use specialized statistic tests because we have “ censoring ” in the data. Censoring is when you have incomplete data due to loss-to-follow-up or no event up until the end of study.

Results

Findings The distribution in % b pathway ganglioside production is different in children ≥ 1 year of age that present clinically compared with group that is screened (3 weeks or 6 months of age) This fit their paradigm that neuroblastomas in older children are different than younger children

Findings (continued) There is a difference in the event free survival distributions between those patients with ≥ 60% b pathway gangliosides and those with < 60 % b pathway gangliosides. The group with ≥ 60% b pathway gangliosides had longer event free survival.

Paper 3: MeCP2 mutations in Rett syndrome This study wanted to examine the association between MeCP2 gene mutations and Rett syndrome (a neurodevelopmental disorder) More specifically, whether a particular pattern of mutation, X-inactivation, along with clinical features differ among mutation types

Type of mutation by clinical severity Here we are looking at 5 mutations MBD nonsense Nonsense between MBD and TRD TRD nonsense TRD missense C-terminal deletions And scores of 5 clinical parameters (head growth, seizures, scoliosis and motor skills/ability to walk) The scores were all measured on an ordinal scale

How we analyzed these data Since the severity scores were ordinal, we viewed them as continuous (and normally distributed) We used ANOVA and looked at differences between the mean scores for each of the mutation groups

Analysis of covariance Is a combination of analysis of variance and regression The main aim is to see if the regression lines in two or more groups are different In this study, we wanted to see if two of the mutations differed in their regression of clinical severity and X-inactivation (% of one allele active); can be stated as the covariance of mutations on the regression of clinical severity on X-inactivation.

Main questions for analysis of covariance Is the straight line relationship between clinical score and severity the same for the two mutations (missense in MBD and nonsense between MBD and TRD versus TRD missense and nonsense and C- terminal deletions)? Do the clinical severity scores for the two mutations differ after adjusting for X- inactivation pattern?

P-value for slope = P-value for intercepts <

What we found There was an a statistically significant difference in many of the mutations with respect to head circumference data as well as when a summary of all clinical features There was a statistically significant difference in clinical severity score between the two mutation groups, as well as a difference in slopes between severity score and x-inactivation between the two mutation groups Both of these findings confirmed, and described, MeCP2 mutations causative in Rhett syndrome

Thank you for your time Suggested readings Creating More Effective Graphs by Naomi B. Robbins Statistical Analysis and Data Display by Heiberger and Holland Introduction to Biostatistics by Bernard Rosner