Biostatistics in Practice Session 2: Quantitative and Inferential Issues II Youngju Pak Biostatistician 1.

Slides:



Advertisements
Similar presentations
AP Statistics Course Review.
Advertisements

1 - 1 © 1997 Prentice-Hall, Inc. Statistical Methods.
Introduction to Statistics
Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson Session 1: Quantitative and Inferential.
QUANTITATIVE DATA ANALYSIS
Copyright (c) Bani Mallick1 Lecture 2 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #2 Population and sample parameters More on populations.
Topics: Inferential Statistics
Introduction to Educational Statistics
Statistics for CS 312. Descriptive vs. inferential statistics Descriptive – used to describe an existing population Inferential – used to draw conclusions.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Describing distributions with numbers
Biost 511 DL Discussion Section Announcements Quiz 1 (CEU students only) Will be available on Canvas.uw.edu Friday 12 pm – Sunday 11:59 pm One hour to.
● Midterm exam next Monday in class ● Bring your own blue books ● Closed book. One page cheat sheet and calculators allowed. ● Exam emphasizes understanding.
Fall 2013 Lecture 5: Chapter 5 Statistical Analysis of Data …yes the “S” word.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Summary statistics Using a single value to summarize some characteristic of a dataset. For example, the arithmetic mean (or average) is a summary statistic.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Biostatistics in Practice Peter D. Christenson Biostatistician LABioMed.org /Biostat Session 2: Summarization of Quantitative Information.
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
Topic 5 Statistical inference: point and interval estimate
Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson Session 1: Quantitative and Inferential.
Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
Variability The goal for variability is to obtain a measure of how spread out the scores are in a distribution. A measure of variability usually accompanies.
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.
M07-Numerical Summaries 1 1  Department of ISM, University of Alabama, Lesson Objectives  Learn when each measure of a “typical value” is appropriate.
1 PUAF 610 TA Session 2. 2 Today Class Review- summary statistics STATA Introduction Reminder: HW this week.
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.1 Descriptive Statistics, The Normal Distribution, and Standardization.
1 rules of engagement no computer or no power → no lesson no SPSS → no lesson no homework done → no lesson GE 5 Tutorial 5.
Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.
Review of Chapters 1- 6 We review some important themes from the first 6 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
EDPSY Chp. 2: Measurement and Statistical Notation.
 Statistics The Baaaasics. “For most biologists, statistics is just a useful tool, like a microscope, and knowing the detailed mathematical basis of.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Biostatistics in Practice Session 2: Quantitative and Inferential Issues II Youngju Pak Biostatistician 1.
통계적 추론 (Statistical Inference) 삼성생명과학연구소 통계지원팀 김선우 1.
Research Ethics:. Ethics in psychological research: History of Ethics and Research – WWII, Nuremberg, UN, Human and Animal rights Today - Tri-Council.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 2: Summarization of Quantitative Information.
Biostatistics in Practice Peter D. Christenson Biostatistician LABioMed.org /Biostat Session 2: Summarization of Quantitative Information.
Medical Statistics as a science
Chapter 4: Variability. Variability Provides a quantitative measure of the degree to which scores in a distribution are spread out or clustered together.
Chapter 6: Analyzing and Interpreting Quantitative Data
Biostatistics in Practice Session 2: Summarization of Quantitative Information Peter D. Christenson Biostatistician
Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/19.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Descriptive Statistics Tabular and Graphical Displays –Frequency Distribution - List of intervals of values for a variable, and the number of occurrences.
Introduction to statistics I Sophia King Rm. P24 HWB
Introduction to Medical Statistics. Why Do Statistics? Extrapolate from data collected to make general conclusions about larger population from which.
Chapter 4: Variability. Variability The goal for variability is to obtain a measure of how spread out the scores are in a distribution. A measure of variability.
Descriptive Statistics(Summary and Variability measures)
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
BPS - 5th Ed. Chapter 231 Inference for Regression.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
AP PSYCHOLOGY: UNIT I Introductory Psychology: Statistical Analysis The use of mathematics to organize, summarize and interpret numerical data.
Outline Sampling Measurement Descriptive Statistics:
Experimental Research
Overview of probability and statistics
MATH-138 Elementary Statistics
Article & Final Reviews
APPROACHES TO QUANTITATIVE DATA ANALYSIS
Analyzing and Interpreting Quantitative Data
Description of Data (Summary and Variability measures)
Basic Statistical Terms
15.1 The Role of Statistics in the Research Process
DESIGN OF EXPERIMENT (DOE)
Advanced Algebra Unit 1 Vocabulary
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Introductory Statistics
Presentation transcript:

Biostatistics in Practice Session 2: Quantitative and Inferential Issues II Youngju Pak Biostatistician 1

What we have learned in Session 1?  Basic Study Design  Parallel vs., Cross-over Designs?  Categorical vs., Quantitative Data? Why important?  Summarizing the data with graphs: Contingency Tables, Box Plots, Histogram, etc.  How to run MYSTAT 2

Today’s topics  Article : McCann, et al., Lancet 2007 Nov 3;370(9598):  Descritive Statistics vs. Inferential Statistics  Normal Distributions  Confidence Intervals & P-values  Correlations 3

McCann, et al., Lancet 2007 Nov 3;370(9598):  Food additives and hyperactive behaviour in 3-year- old and 8/9-year-old children in the community: a randomised, double-blinded, placebo-controlled trial.  Target population: 3-4, 8-9 years old children  Study design: randomized, double-blinded, controlled, crossover trial  Sample size: 153 (3 years), 144(8-9 years) in Southampton UK  Objective: test whether intake of artificial food color and additive (AFCA) affects childhood behavior

McCann, et al., Lancet 2007 Nov 3;370(9598):  Sampling: Stratified sampling based on SES in Southampton, UK  Baseline measure: 24h recall by the parent of the child’s pretrial diet  Group: Three groups, for 3 years old –mix A : 20 mg of food colorings + 45 mg sodium benzoate, which is a widely used food preservative –mix B : 30mg of food coloring + 45 mg sodium benzoate(current average daily consumption) –Placebo –For 8/9 years old: multiply these by 1.25  Cross-over Design  A participants receive one of 6 possible random sequences. In a separate study with N=20, no significant difference in looks and taste of drinks among three groups was found even though people ask about which diet type they got when they received placebo (65%) > mix B (52%) > mix A (40%) 5 T0 (baseline)Week 1Week 2Week 3Week 4Week 5Week 6 Randomize Typical DietWashout

McCann, et al., Lancet 2007 Nov 3;370(9598):  Outcomes: Global Hyper Activity(GHA) Score  Attention-Deficit Hyperactivity Disorder(ADHD) rating scale IV by teachers, scaled 1 – 5, higher number means more hyperactive  Weiss-Werry-Peters(WWP) hyperactivity scale by parents,  Classroom observation code,  Conners continuous performance test II (CPTII)  GHA to be aggregated from these four scores 6

Non-Completing or Non-Adhering Subjects Non-response bias? Societal effect vs. Scientific effect ? Efficacy vs. Effectiveness ?

Describing the sample 8

Describing the findings w/ descriptive statistics 9 What was your research question ? Did you get answer for that that research questions from this table? Why or Why not? GHA= (post –pre)/standard deviation (SD) for pre-scores

Describing the findings w/ inferential statistics 10

Describing the findings w/ Graphs using confidence intervals

Population Sample Sample estimate of population parameter Population parameter Sampling mechanism: random sample or convenience sample Confidence Interval for population parameter 12 The Life Cycle of a Research Study With Statistical Applications

So why use a sample?  Often the population is too large to obtain data  Saves time and money  All members of the population may be difficult to contact Parameter vs. Statistic  A parameter is a numerical description of a population characteristics e.g., μ (called as”mu:”): population mean, σ 2 (called as “sigma square”): population variance  A statistic is a numerical description of a sample characteristics e.g., m : sample mean, S 2 : sample variance

Branches of Statistics Descriptive statistics involves the organization, summarization, and presentation of the sample. e.g., sample means, sample standard deviations, histograms, box plots, etc. Inferential statistics involves using a sample to draw conclusions about a population. e.g., confidence intervals, p-values, etc.

3 questions that statisticians attempt to answer How should I collect my data ? - Study design, sample size, statistical power. How should I analyze and summarize the data that I’ve collected ? - displaying the data, descriptive statistics, statistical tests How accurate are my data summaries ? -Inferences: confidence intervals, p-values

Mean vs. Median (measure the central tendency) Mean –What most people think of as “average” –Easy to calculate –Easily distorted –Be cautious with SKEWED data –Calculate: sum of data / number of data points Median –Relatively easy to obtain –Not affected by extreme values so it is considered a “ROBUST” statistic –Calculate: Sort data If odd number points, the middle is the median Otherwise, the median is the average of the middle two numbers 16

Standard Deviation (SD) &Inter-Quartile Range(IRQ) (measuring the variability of the data ) Inter-Quartile Range (IQR)= 75th percentile (Q3) - 25th percentile(Q1), where 25% of the data <Q1, 75% of the data < Q3 SD is usually used for the normally distributed data (bellshape, symmetric around the mean) IQR is usually used when the data distribution is skewed. Range = Max -Min 17

Checking for the normality Symmetric. One peak. Roughly bell-shaped. No outliers. Many statistical tests assume outcome variable follow the normal distribution 18

Other properties of the normal distribution For bell-shaped distributions of data (“normally” distributed): ~ 68% of values are within mean ±1 SD ~ 95% of values are within mean ±2 SD “(Normal) Reference Range” ~ 99.7% of values are within mean ±3 SD 19

Histograms: Not OK for Typical Analyses Skewed Need to transform intensity to another scale, e.g. Log(intensity) Multi-Peak Need to summarize with percentiles, not mean. 20

Summary Statistics: Two quantitative Variables (Correlation) Always look at scatter plot. Correlation, r, ranges from -1 (perfect inverse relation) to +1 (perfect direct), Zero=no relation. Specific to the ranges of the two variables. Typically, cannot extrapolate to populations with other ranges. Measures association, not causation.. 21

Correlation Depends on Range of Data  Graph B contains only the points from graph A that are in the ellipse.  Correlation is reduced in graph B. Thus: correlation between two quantities may be quite different in different study populations.  Do not extrapolate BA 22

Confidence Interval (CI) How well your sample mean(m) reflects the true( or population) mean  How confident?  95%? A confidence interval (CI) is one of inferential statistics that estimate the true unknown parameter using interval scales. 23

Confidence Interval for Population Mean 95% Reference range or “Normal Range”, is sample mean ± 2(SD) _____________________________________ 95% Confidence interval (CI) for the (true, but unknown) mean for the entire population is sample mean ± 2(SD/√N) SD/√N is called “Std Error of the Mean” (SEM) 24

Confidence Interval: Case Study Confidence Interval: ± 1.99(1.04/√73) = ± 0.24 → to 0.10 Table 2 Normal Range: ± 1.99(1.04) = ± 2.07 → to Adjusted CI close to 25

P-values ! Used the evidence of contradiction to your null hypothesis (H 0 ) –e.g., H 0 : no difference in mean GHA scores among three different diet. Based on the statistical test –Eg., T test statistics = Signal / Noise – if Signal >> Noise  statistically significant Usually p < 0.05 called as “statistically significant” in favor of H a 26

Units and Independence Experiments may be designed such that each measurement does not give additional independent information. Many basic statistical methods require that measurements are “independent” for the analysis to be valid. In mathematics, two events are independent if and only if the occurrence of one event makes it neither more nor less probable that the other occurs. 27

Experimental Units in Case Study What is the experimental unit in this study? 1. School 2. Child 3. Parent 4. GHA score (results from three diets) Are all GHA scores(eg. 153 x 3 groups=459 GHA scores for 3-4 years old children) independent? The analysis MUST incorporate this possible correlation (clustering) if there exists.  eg., Mixed Model allowing for clustering due to schools. 28

Announcements Keys for HW1 and HW 2 will be posted on class website by Wednesday. Next session will be held in Oct 15 at RB-1 29