Download presentation
Presentation is loading. Please wait.
Published byViolet Daniel Modified over 9 years ago
1
Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson http://research.LABioMed.org/Biostat Session 1: Quantitative and Inferential Issues I
2
Why Statistics ? For Today’s Graduate, Just One Words: Statistics, NY Times, Aug 5, 2009 " I keep saying that the sexy job in the next 10 years will be statisticians," said Hal Varian, chief economist at Google. "I am not much given to regret, so I puzzled over this one a while. Should have taken much more statistics in college, I think. :)" —Max Levchin, Paypal Co-founder, Slide founder
3
Who am I? Dr. Youngju Pak Originally come from South Korea. PhD-Biostatistics, MS-Stat., BA-Stat. Assistant Professor of Biostatistics at MU until 2012 Joined LA BioMED in March 2013 Practicing Biostatistics since 2000
4
Who are you? Name Career Aspirations
5
Class webpage & Session Schedule Class Webpage: Select "Courses" at http://research.LABioMed.org/biostat (use Explore. Chrome is not quite working with this website somehow) All class material are posted and will be updated on the class webpage There will be some pop-up Quizzes There will be some HW assignments. The TOP THREE will be announced and rewarded at the last session.
6
Session 1 Objectives General quantitative needs in biological research Overview of statistical issues using a published paper How to run Statistical software, MYSTAT
7
General Quantitative Needs Descriptive: Appropriate summarization to meet scientific questions: e.g., changes, or % changes, or reaching threshold? mean, or minimum, or range of response? average time to death, or chances of dying by a fixed time?
8
General Quantitative Needs, Cont’d Inferential: Could results be spurious, a fluke, due to “natural” variations or chance? Inferential statistics: 95% confidence intervals, p-values, etc. Sensitivity/Power: How many subjects are needed?
9
Professional Statistics Software Package Output Enter code; syntax. Stored data; access- ible.
10
Microsoft Excel for Statistics Primarily for descriptive statistics. Limited output. No analyses for %s.
11
Free Statistics Software: Mystat www.systat.com
12
Free Study Size Software www.stat.uiowa.edu/~rlenth/Power
13
Session 1 Objectives General quantitative needs in biological research Overview of statistical issues using a published paper How to run Statistical software, MYSTAT
14
Statistical Issues Subject selection Randomization Efficiency from study design Summarizing study results
15
Paper with Common Statistical Issues Case Study:
16
McCann, et al., Lancet 2007 Nov 3;370(9598):1560-7 Food additives and hyperactive behaviour in 3-year-old and 8/9-year- old children in the community: a randomised, double-blinded, placebo- controlled trial. Objective: test whether intake of artificial food color and additive (AFCA) affects childhood behavior Target population: 3-4, 8-9 years old children Study design: randomized, double-blinded, controlled, crossover trial Sample size: 153 (3 years), 144(8-9 years) in Southampton UK Sampling: Stratified sampling based on SES Baseline measure: 24h recall by the parent of the child’s pretrial diet Group: three groups (mix A, mix B, placebo) Outcomes: ADHD rating scale IV by teachers, WWP hyperactivity score by parents, classroom observation code, Conners continuous performance test II (CPTII) GHA score
17
Statistical Issues Subject selection Randomization Efficiency from study design Summarizing study results
18
Representative or Random Samples How were the children to be studied selected (second column on the first page)? The authors purposely selected "representative" social classes. Is this better than a "randomly" chosen sample that ignores social class? Often hear: Non-random = Non-scientific.
19
Case Study: Participant Selection No mention of random samples.
20
Case Study: Participant Selection It may be that only a few schools are needed to get sufficient individuals. If, among all possible schools, there are few that are lower SES, none of these schools may be chosen. So, a random sample of schools is chosen from the lower SES schools, and another random sample from the higher SES schools.
21
Selection by Over-Sampling It is not necessary that the % lower SES in the study is the same as in the population. There may still be too few subjects in a rare subgroup to get reliable data. Can “over-sample” a rare subgroup, and then weight overall results by proportions of subgroups in the population. The CDC NHANES(http://www.cdc.gov/nchs/nhanes.htm ) studies do this.
22
Statistical Issues Subject selection Randomization Efficiency from study design Summarizing study results
23
Basic Study Designs 1. Prospective (longitudinal) :Risk Factor (2014) Disease status (2020) 2. Retrospective(Case-Control) : Disease status (2014) Risk Factor (2000) 3. Cross sectional : Disease status (2014) Risk Factor (2014) 4. Experimental or Randomized- Control : Risk Factor (2014) Disease status (2020) with assignment of Risk Factor
24
Random Samples vs. Randomization We have been discussing the selection of subjects to study, often a random sample. An observational study would, well, just observe them. An interventional study assigns each subject to one or more treatments in order to compare treatments. Randomization refers to making these assignments in a random way.
25
Why Randomize? So that groups will be similar except for the intervention. So that, when enrolling, we will not unconsciously choose an “appropriate” treatment for a particular subject. Minimizes the chances of introducing bias when attempting to systematically remove it, as in plant yield example.
26
Case Study: Crossover Design Each child is studied on 3 occasions under different diets. Is this better than three separate groups of children? Why, intuitively? How could you scientifically prove your intuition?
27
Statistical Issues Subject selection Randomization Efficiency from study design Summarizing study results
28
Blocked vs. Unblocked Studies AKA matched vs. unmatched. AKA paired vs. unpaired. Block = Pair = Set receiving all treatments. Set could be an individual at multiple times (pre and post), or left and right arms for sunscreen comparison; twins or family; centers in multi- center study, etc. Block ↔ Homogeneous. Blocking is efficient because treatment differences are usually more consistent among subjects than each separate treatment is.
29
Potential Efficiency Due to Pairing........ ….......... ….......................... A BA B Δ=B-A … …. Δ 33 3 Unpaired A and B Separate Groups Paired A and B in a Paired Set
30
Statistical Issues Subject selection Randomization Efficiency from study design Summarizing study results
31
Outcome Measures Generally, how were the outcome measures defined (third page)? They are more complicated here than for most studies. What are the units (e.g., kg, mmol, $, years)? Outcome measures are specific and pre- defined. Aims and goals may be more general.
32
Summarization of Data with Descriptive Statistics
33
What is the difference between Table 1 and Table 2 in terms of methods used to summarize the data?
34
Variable CategoricalNumerical Ordinal Categories are mutually exclusive and ordered Examples: Disease stage, Education level, 5 point likert scale Counts Integer values Examples: Days sick per year, Number of pregnancies, Number of hospital visits Measured (continuous) Takes any value in a range of values Examples: weight in kg, height in feet, age (in years) QualitativeQuantitative Nominal Categories are mutually exclusive and unordered Examples: Gender, Blood group, Eye colour, Marital status Types of Data
35
It is critical to identify the type of data since the choice of an appropriate statistical test as well as how to summarize the data depend on the type of the data.
36
36 Describing categorical & quantitative data Categorical Data –Binary, Nominal, or Ordinal data Disease status ( yes, no) Education level The assignment of the treatment Cancer stage Marital Status –Frequency tables (one, two, or multi way tables) are usually used Quantitative Data –Counts or Continuous Data Weight Blood pressure Age Length of hospital stay in days The total number of ER visits per year –Means or Medians are used for the measure of the central tendency. –Standard deviations or percentiles are used for the measure of variability. –When data is skewed, Medians & percentiles are better summary statistics
37
How to display Data A picture is worth a thousand words ! To getting a ‘feel’ for the data. Categorical data –Frequency tables, Contingency tables (cross tables), Bar charts, Pie-charts Quantitative data –Dot plots, Histograms, Box-Whisker plots*, Scatter plots
38
Frequency Tables
39
Contingency Tables (Crosstabulations)
40
Bar Charts
41
Pie Charts
42
Histograms To catch the patterns of the data Divide up the data points into several mutually exclusive intervals –Categorize the data points.
43
Scatter plots Usually used to illustrate a relationship b/w two variables.
44
Box-Whisker Plots
45
What have we learn today?
46
Assignments HW #1 is posted on the course website Pre-Step for HW #1 –Install MYSTAT in your labtop or a computer in your school computer lab with permission from your school (Ask Ms. Aberle for help) –Download Survey.sav (SPSS data file) from the course website (under Session 1) Submit the hard copy of the completed HW in next session. Read the article focusing on contents in Table 3 &4 and Figure 4.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.