Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.1 Descriptive Statistics, The Normal Distribution, and Standardization.

Slides:



Advertisements
Similar presentations
Statistics Review.
Advertisements

Statistics.
Descriptive (Univariate) Statistics Percentages (frequencies) Ratios and Rates Measures of Central Tendency Measures of Variability Descriptive statistics.
QUANTITATIVE DATA ANALYSIS
Descriptive Statistics, The Normal Distribution,
Calculating & Reporting Healthcare Statistics
Descriptive Statistics
Introduction to Educational Statistics
Chapter 2 Describing distributions with numbers. Chapter Outline 1. Measuring center: the mean 2. Measuring center: the median 3. Comparing the mean and.
Descriptive Statistics Healey Chapters 3 and 4 (1e) or Ch. 3 (2/3e)
1. Homework #2 2. Inferential Statistics 3. Review for Exam.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Describing Data: Numerical
Statistics for Linguistics Students Michaelmas 2004 Week 1 Bettina Braun.
Describing distributions with numbers
STATISTIC & INFORMATION THEORY (CSNB134) MODULE 2 NUMERICAL DATA REPRESENTATION.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Objectives 1.2 Describing distributions with numbers
Statistics and Research methods Wiskunde voor HMI Betsy van Dijk.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Summary statistics Using a single value to summarize some characteristic of a dataset. For example, the arithmetic mean (or average) is a summary statistic.
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
Variability The goal for variability is to obtain a measure of how spread out the scores are in a distribution. A measure of variability usually accompanies.
Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values.
Thinking About Psychology: The Science of Mind and Behavior 2e Charles T. Blair-Broeker Randal M. Ernst.
Describing Behavior Chapter 4. Data Analysis Two basic types  Descriptive Summarizes and describes the nature and properties of the data  Inferential.
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
Describing distributions with numbers
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
Statistics PSY302 Quiz One Spring A _____ places an individual into one of several groups or categories. (p. 4) a. normal curve b. spread c.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Statistics 11 The mean The arithmetic average: The “balance point” of the distribution: X=2 -3 X=6+1 X= An error or deviation is the distance from.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
INVESTIGATION 1.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Chapter 3 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Chapter 3: Measures of Central Tendency and Variability Imagine that a researcher.
Measures of Central Tendency: The Mean, Median, and Mode
Basic Statistical Terms: Statistics: refers to the sample A means by which a set of data may be described and interpreted in a meaningful way. A method.
Introduction to Statistics Santosh Kumar Director (iCISA)
Unit 2 (F): Statistics in Psychological Research: Measures of Central Tendency Mr. Debes A.P. Psychology.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Summarizing Risk Analysis Results To quantify the risk of an output variable, 3 properties must be estimated: A measure of central tendency (e.g. µ ) A.
Introduction to statistics I Sophia King Rm. P24 HWB
Describing Samples Based on Chapter 3 of Gotelli & Ellison (2004) and Chapter 4 of D. Heath (1995). An Introduction to Experimental Design and Statistics.
Numerical descriptions of distributions
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.
Descriptive Statistics for one variable. Statistics has two major chapters: Descriptive Statistics Inferential statistics.
CHAPTER 2: Basic Summary Statistics
Descriptive Statistics(Summary and Variability measures)
Psychology’s Statistics Appendix. Statistics Are a means to make data more meaningful Provide a method of organizing information so that it can be understood.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Data analysis and basic statistics KSU Fellowship in Clinical Pathology Clinical Biochemistry Unit
Lecture 8 Data Analysis: Univariate Analysis and Data Description Research Methods and Statistics 1.
Outline Sampling Measurement Descriptive Statistics:
Confidence Intervals and Sample Size
Doc.RNDr.Iveta Bedáňová, Ph.D.
Description of Data (Summary and Variability measures)
Summary descriptive statistics: means and standard deviations:
Central Tendency.
Descriptive and inferential statistics. Confidence interval
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Numerical Descriptive Measures
Summary descriptive statistics: means and standard deviations:
Data analysis and basic statistics
Statistics PSY302 Review Quiz One Spring 2017
CHAPTER 2: Basic Summary Statistics
Numerical Descriptive Measures
Presentation transcript:

Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.1 Descriptive Statistics, The Normal Distribution, and Standardization

Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.2 The Big Picture Populations and Samples Sample / Statistics x, s, s 2 Population Parameters , ,  2

Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.3 Describe the Sample  Descriptive Statistics  Measures of Central Tendency  Measures of Variability  Other Descriptive Measures  Beware that poor samples may provide a distorted view of the population  This can happen due to biased sampling or due to chance  In general, larger samples a more representative of the population.

Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.4 Measures of Central Tendency  Measures the “center” of the data  Examples  Mean  Median  Mode  The choice of which to use, depends…  It is okay to report more than one.  They are simply descriptive (not inferential)

Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.5 Mean  The “average”.  If the data are made up of n observations x 1, x 2,…, x n, the mean is given by the sum of the observations divided by the number of observations. For example, if the data are x 1 =1, x 2 =2, x 3 =3, then the mean is (1+2+3)/3=2.  We can calculate the sample mean from the data. It is often denoted as

Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.6 Mean  The population mean is often denoted by . This is usually unknown (although we try to make inferences about this).  The sample mean is an unbiased estimator of the population mean.

Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.7 Median  The “middle observation” according to its rank in the data.  The median is:  The observation with rank (n+1)/2 if n is odd  The average of observations with rank n/2 and (n+2)/2 if n is even

Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.8 Median  Example: Income level with Bill Gates in the room.  The median is more robust than the mean to extreme observations.  If data are skewed to the right, then the mean > median (in general)  If data are skewed to the left, then mean < median (in general)  If data are symmetric, then mean  median

Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.9 Mode  The value that occurs the most often  Good for ordinal or nominal data in which there are a limited number of categories  Not very useful for continuous data

Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.10 Measures of Variability  Measure the “spread” in the data  Example: Age distribution in the Extension School vs. FAS college  Some important measures  Variance  Standard Deviation  Range  Interquartile Range

Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.11 Variance  The sample variance (s 2 ) may be calculated from the data. It is the average of the square deviations of the observations from the mean.  The population variance is often denoted by  2. This is usually unknown.

Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.12 Variance  The deviations are squared because we are only interested in the size of the deviation rather than the direction (larger or smaller than the mean).  Note  Why?

Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.13 Variance  The reason that we divide by n-1 instead of n has to do with the number of “information units” in the variance. After estimating the sample mean, there are only n-1 observations that are a priori unknown (degrees of freedom).  This also makes s 2 an unbiased estimator of  2.

Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.14 Standard Deviation  Square root of the variance  s = sqrt(s 2 ) = sample SD  Calculate from the data (see formula for s 2)  = sqrt(  2 ) = population SD  Usually unknown  Expressed in the same units as the mean (instead of squared units like the variance)  But, s is not an unbiased estimator of .

Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.15 Range  Maximum-Minimum  Very sensitive to extreme observations (outliers)

Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.16 Interquartile Range  Q3-Q1  More robust than the range to extreme observations

Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.17 Other Descriptive Measures  Minimum and Maximum  Very sensitive to extreme observations  Sample size (N)  Percentiles  Examples:  Median = 50 th percentile  Q1, Q3

Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.18 Small Samples  For very small samples (e.g., <5 observations), summary statistics are not very meaningful (and frankly can be misleading). I recommend simply list the data.

Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.19 Example – AACTG A5087  Insert Examples_descriptive_stats

Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.20 Random Variables  Variable  A characteristic that can be measured, categorized, quantified, or qualified.  Random variable  A variable whose value is determined by a random phenomena (I.e., not determined by study design)  Continuous random variable  Can take on any value within a specified interval or continuum

Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.21 Probability Distributions  Every random variable has a corresponding probability distribution  A probability distribution describes the behavior of the random variable  It identifies possible values of the random variable and provides information about the probability that these values (or ranges of values) will occur.  A particularly important probability distribution is the normal distribution

Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.22 Normal Distribution  “Bell-shaped curve”  Symmetric about its mean (  )  The closer that an observation is to the mean, the more frequently it occurs.  Notation: X~N( ,  )

Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.23 Normal Distribution  Insert normal_distribution

Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.24 Normal Distribution  The normal distribution, N( ,  ) can be described by the following “density function”:  The area under this curve (function) is one. Probabilities may be calculated as the area under the curve (above the x- axis). Integration (calculus) can help quantify these areas (probabilities).

Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.25 Standard Normal Distribution  A special normal distribution: N(0,1)  Values from this distribution represent the number of SDs away from the mean (0).  Known properties of this distribution  Can make probabilistic statements using Table A.3  Insert standard_normal_table  See the handout: reading the standard normal table

Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.26 Standardization  For any variable X (with mean  and SD =  ), then Z=(X-  )/  has mean 0 and SD=1.  This is called standardization.  This creates a variable Z, such that values of this variable represent the number of SDs away from the mean (0).

Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.27 Standardization  Common Mistake:  X has mean  and SD = , then Z=(X-  )/  ~N(0,1). This is NOT true.  It is true that Z has mean 0 and SD=1 (standardization). However, Z is only normal if X was also normal.

Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.28 Standardization  However, if X~N( ,  ), then Z=(X-  )/  ~N(0,1). We can then make probabilistic statements about X.  Thus we can make probabilistic statements about any variable with any normal distribution.

Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.29 Example - IQ  IQ~N(100,15)  Z=(IQ-100)/15~N(0,1)  What’s the probability that a person chosen at random has an IQ>135?  Z=( )/15=2.33  P(Z>2.33)  0.01 (from Table A.3)

Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.30 Example – IQ (cont.)  What’s the probability that a person chosen at random has an IQ<90?  Z=(90-100)/15=-0.67  P(Z 0.67)   Remember symmetry  Probabilities that a person chosen at random has an IQ between two values may also be obtained.