Biostatistics in Practice Peter D. Christenson Biostatistician Session 2: Summarization of Quantitative Information.

Slides:



Advertisements
Similar presentations
AP Statistics Course Review.
Advertisements

Lesson Describing Distributions with Numbers parts from Mr. Molesky’s Statmonkey website.
Biostatistics in Practice Session 2: Quantitative and Inferential Issues II Youngju Pak Biostatistician 1.
Beginning the Visualization of Data
Jan Shapes of distributions… “Statistics” for one quantitative variable… Mean and median Percentiles Standard deviations Transforming data… Rescale:
Monday, 4/29/02, Slide #1 MA 102 Statistical Controversies Monday, 4/29/02 Today: CLOSING CEREMONIES!  Discuss HW #3  Review for final exam  Evaluations.
Sampling Distributions
PSY 1950 Confidence and Power December, Requisite Quote “The picturing of data allows us to be sensitive not only to the multiple hypotheses that.
Introduction to Educational Statistics
Basic Statistical Concepts Part II Psych 231: Research Methods in Psychology.
Estimation Goal: Use sample data to make predictions regarding unknown population parameters Point Estimate - Single value that is best guess of true parameter.
Chapter 1 Descriptive Analysis. Statistics – Making sense out of data. Gives verifiable evidence to support the answer to a question. 4 Major Parts 1.Collecting.
 Multiple choice questions…grab handout!. Data Analysis: Displaying Quantitative Data.
Quantitative Skills: Data Analysis and Graphing.
Normal Curves and Sampling Distributions
Biostatistics in Practice Peter D. Christenson Biostatistician LABioMed.org /Biostat Session 5: Methods for Assessing Associations.
Welcome to Math 6 Statistics: Use Graphs to Show Data Histograms.
1 DATA DESCRIPTION. 2 Units l Unit: entity we are studying, subject if human being l Each unit/subject has certain parameters, e.g., a student (subject)
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Dan Piett STAT West Virginia University
Biostatistics in Practice Peter D. Christenson Biostatistician LABioMed.org /Biostat Session 2: Summarization of Quantitative Information.
Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
Sampling Distribution ● Tells what values a sample statistic (such as sample proportion) takes and how often it takes those values in repeated sampling.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 5: Methods for Assessing Associations.
NOTES The Normal Distribution. In earlier courses, you have explored data in the following ways: By plotting data (histogram, stemplot, bar graph, etc.)
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Turning Data Into Information Chapter 2.
Statistical Analysis Topic – Math skills requirements.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 2 Modeling Distributions of Data 2.2 Density.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
Sampling and Confidence Interval Kenneth Kwan Ho Chui, PhD, MPH Department of Public Health and Community Medicine
Review of Chapters 1- 6 We review some important themes from the first 6 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 4 Describing Numerical Data.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
 Statistics The Baaaasics. “For most biologists, statistics is just a useful tool, like a microscope, and knowing the detailed mathematical basis of.
Biostatistics in Practice Session 2: Quantitative and Inferential Issues II Youngju Pak Biostatistician 1.
Categorical vs. Quantitative…
Unit 4 Statistical Analysis Data Representations.
Biostatistics in Practice Peter D. Christenson Biostatistician LABioMed.org /Biostat Session 2: Summarization of Quantitative Information.
To be given to you next time: Short Project, What do students drive? AP Problems.
Statistics Lecture 3. Last class: types of quantitative variable, histograms, measures of center, percentiles and measures of spread…well, we shall.
Estimating a Population Mean
AP Statistics Semester One Review Part 1 Chapters 1-3 Semester One Review Part 1 Chapters 1-3.
Summarizing Risk Analysis Results To quantify the risk of an output variable, 3 properties must be estimated: A measure of central tendency (e.g. µ ) A.
Biostatistics in Practice Session 2: Summarization of Quantitative Information Peter D. Christenson Biostatistician
Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/19.
1 Sampling Distribution of Arithmetic Mean Dr. T. T. Kachwala.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Descriptive Statistics(Summary and Variability measures)
BPS - 5th Ed. Chapter 231 Inference for Regression.
AP Statistics Review Day 1 Chapters 1-4. AP Exam Exploring Data accounts for 20%-30% of the material covered on the AP Exam. “Exploratory analysis of.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Midterm Review IN CLASS. Chapter 1: The Art and Science of Data 1.Recognize individuals and variables in a statistical study. 2.Distinguish between categorical.
AP Review Exploring Data. Describing a Distribution Discuss center, shape, and spread in context. Center: Mean or Median Shape: Roughly Symmetrical, Right.
Chapter 9 Roadmap Where are we going?.
Active Learning Lecture Slides
EXPLORATORY DATA ANALYSIS and DESCRIPTIVE STATISTICS
Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.
Description of Data (Summary and Variability measures)
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
An Introduction to Statistics
Bootstrap Confidence Intervals using Percentiles
BA 275 Quantitative Business Methods
Displaying Distributions with Graphs
Displaying and Summarizing Quantitative Data
CHAPTER 12 More About Regression
GENERALIZATION OF RESULTS OF A SAMPLE OVER POPULATION
Advanced Algebra Unit 1 Vocabulary
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Introductory Statistics
Presentation transcript:

Biostatistics in Practice Peter D. Christenson Biostatistician Session 2: Summarization of Quantitative Information

Readings for Session 2 from StatisticalPractice.com Units of Analysis Look at the data Summary statistics Location and spread Correlation Normal distribution Confidence intervals

Units of Analysis Go over this entire reading. The author states that students are “more similar” to each other than are other students, or some students are “independent”. What does this mean? “Independent” really refers to the measurement that is made, not the “units”. If knowledge of the value for a student does not change the likelihood of another student’s value, then the students are independent for this measurement. Would students from the same class likely be independent on height? How about on knowing what a case-control study is?

Look at the Data: I Statistical methods depend on the “form” of a set of data, which can be assessed with some common useful graphics: Graph NameY-axisX-axis HistogramCountCategory ScatterplotContinuous Continuous Dot PlotContinuous Category Box PlotPercentiles Category Line PlotMean or value Category

Look at the Data: II What do we look for? Histograms: Ideal: Symmetric, bell-shaped. Skewness? Multiple peaks? Many values at, say, 0, and bell-shaped otherwise? Outliers? Scatterplots: Ideal: Narrow ellipse. Outliers? Funnel-shaped? Gap with no values for one or both variables.

Summary Statistics: I Location: Mean for symmetric data. Median for skewed data. Geometric mean for some skewed data (see next slide). Spread (standard deviation=SD): Standard, convention, non-intuitive values. SD of what? E.g., SD of individuals, or of group means. Fundamental, critical measure for most statistical methods. See graphs in reading for how mean and SD change if units of measurement change, e.g., nmoles to mg.

Summary Statistics: II Rule of Thumb: For bell-shaped distributions of data (“normally” distributed): ~ 68% of values are within mean ±1 SD ~ 95% of values are within mean ±2 SD ~ 99.7% of values are within mean ±3 SD Geometric means (see next slide): Used for some skewed data. 1.Take logs of individual values. 2.Find, say, mean ±2 SD → mean (low, up) of the logged values. 3.Find antilogs of mean, low, up. Call them GM, low2, up2 (back on original scale). 4.GM is the “geometric mean”. The interval (low2,up2) is skewed about GM (corresponds to graph).

Geometric Means These are histograms rotated 90º, and box plots. Note how the log transformation gives a symmetric distribution.

Summary Statistics: III (Correlation) Always look at scatterplot. See graphs in readings for values ranging from -1 (perfectly inverse relation) to +1 (perfectly direct). Zero=no relation. Measures linear association. Very sensitive to outliers. Specific to the ranges of the two variables. Typically, cannot extrapolate to populations with other ranges. Subgroups may not have the same correlation; in fact, they could have the opposite association (ecological fallacy). Special correlations are used for non-symmetric data. Measures association, not causation.

Confidence Intervals: I See beginning of reading for the goal of confidence intervals. CIs are not about individuals, but rather about populations, i.e., groups of individuals. A mean from a sample estimates the mean of the entire population. 95% CI for the mean is a range of values we're 95% sure contains the unknown mean. Reading example: N=40 non-smokers. Vitamin C mean±2SD is 90±2*35 = 20 to 160 = “normal range”. Our estimate of the unknown mean for all non-smokers is 90, but how confident are we about that estimate? Need a ±range for it that we are 95% confident contains the unknown mean.

Confidence Intervals: II Can calculate a CI for any unknown parameter. Typical 95% CI for a mean is roughly: mean ± 2SD/√N. Larger SD → wider CI. Larger N → narrower CI. More confidence → wider CI. For reading example, about 90 ± 2*35/√40 = 78 to 102. I am being sloppy with terminology. The underlined mean above is the always-to-be-unknown mean for the population (everyone). The other mean, before ±, is the mean that is calculated from the sample of N, and estimates the unknown mean. Note explicit use of N; correct unit of analysis is critical. What if we measured vitamin C on 10 days for each subject?

Confidence vs. Prediction Intervals Typical 95% CI for a mean is roughly: mean ± 2SD/√N. Recall that this CI is the range of values we're 95% sure contains the unknown mean for “everyone”. What about (normal) ranges for individuals? This is often called a prediction interval (PI). 95% of individuals fall in a 95% PI. 95% chance that an individual falls in a 95% PI. Typical 95% PI for an individual is roughly: mean ± 2SD. With large N (? often N>30 is used), do not need bell-shaped data distribution for the CI, but that shape IS needed for the PI, regardless of N. Otherwise, we use percentiles for normal ranges.