Statistics 11 The mean The arithmetic average: The “balance point” of the distribution: X=2 -3 X=6+1 X=10+5 0 An error or deviation is the distance from.

Slides:



Advertisements
Similar presentations
Population vs. Sample Population: A large group of people to which we are interested in generalizing. parameter Sample: A smaller group drawn from a population.
Advertisements

Quantitative Methods in HPELS 440:210
Numerically Summarizing Data
PRED 354 TEACH. PROBILITY & STATIS. FOR PRIMARY MATH
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Descriptive Statistics Statistical Notation Measures of Central Tendency Measures of Variability Estimating Population Values.
Descriptive Statistics
Intro to Descriptive Statistics
Variability Ibrahim Altubasi, PT, PhD The University of Jordan.
COURSE: JUST 3900 INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Instructor: Dr. John J. Kerbs, Associate Professor Joint Ph.D. in Social Work and Sociology.
Chapter 3: Central Tendency
Central Tendency and Variability Chapter 4. Central Tendency >Mean: arithmetic average Add up all scores, divide by number of scores >Median: middle score.
1 Measures of Central Tendency Greg C Elvers, Ph.D.
Measures of Central Tendency
July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data.
Descriptive Statistics Healey Chapters 3 and 4 (1e) or Ch. 3 (2/3e)
Today: Central Tendency & Dispersion
Measures of Central Tendency
Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately describes the center of the.
The Data Analysis Plan. The Overall Data Analysis Plan Purpose: To tell a story. To construct a coherent narrative that explains findings, argues against.
Describing Data: Numerical
BIOSTATISTICS II. RECAP ROLE OF BIOSATTISTICS IN PUBLIC HEALTH SOURCES AND FUNCTIONS OF VITAL STATISTICS RATES/ RATIOS/PROPORTIONS TYPES OF DATA CATEGORICAL.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Intra-Individual Variability Intra-individual variability is greater among older adults (Morse 1993) –May be an indicator of the functioning of the central.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Measures of Central Tendency or Measures of Location or Measures of Averages.
Variability The goal for variability is to obtain a measure of how spread out the scores are in a distribution. A measure of variability usually accompanies.
PPA 501 – Analytical Methods in Administration Lecture 5a - Counting and Charting Responses.
Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values.
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
Variability.  Reflects the degree to which scores differ from one another  Usually in reference to the mean value  A measure of the central tendency.
Central Tendency and Variability Chapter 4. Variability In reality – all of statistics can be summed into one statement: – Variability matters. – (and.
Descriptive Statistics1 LSSG Green Belt Training Descriptive Statistics.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
1 Univariate Descriptive Statistics Heibatollah Baghi, and Mastee Badii George Mason University.
INVESTIGATION 1.
A way to organize data so that it has meaning!.  Descriptive - Allow us to make observations about the sample. Cannot make conclusions.  Inferential.
Chapter 3 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Chapter 3: Measures of Central Tendency and Variability Imagine that a researcher.
Measures of Central Tendency: The Mean, Median, and Mode
Chapter 2 Means to an End: Computing and Understanding Averages Part II  igma Freud & Descriptive Statistics.
Descriptive Statistics The goal of descriptive statistics is to summarize a collection of data in a clear and understandable way.
CHAPTER 3  Descriptive Statistics Measures of Central Tendency 1.
Introduction to Statistics Santosh Kumar Director (iCISA)
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Summary Statistics: Measures of Location and Dispersion.
Chapter 5: Measures of Dispersion. Dispersion or variation in statistics is the degree to which the responses or values obtained from the respondents.
LIS 570 Summarising and presenting data - Univariate analysis.
Chapter 4: Variability. Variability The goal for variability is to obtain a measure of how spread out the scores are in a distribution. A measure of variability.
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.
Chapter 3: Central Tendency 1. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
CHAPTER 2: Basic Summary Statistics
Measures of Central Tendency (MCT) 1. Describe how MCT describe data 2. Explain mean, median & mode 3. Explain sample means 4. Explain “deviations around.
Descriptive Statistics(Summary and Variability measures)
A way to organize data so that it has meaning!.  Descriptive - Allow us to make observations about the sample. Cannot make conclusions.  Inferential.
Central Tendency and Variability Chapter 4. Variability In reality – all of statistics can be summed into one statement: – Variability matters. – (and.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Lecture 8 Data Analysis: Univariate Analysis and Data Description Research Methods and Statistics 1.
Descriptive Statistics
Descriptive Statistics: Overview
Numerical Measures: Centrality and Variability
Description of Data (Summary and Variability measures)
Numerical Descriptive Measures
Chapter 3: Central Tendency
Descriptive Statistics Healey Chapters 3 and 4 (1e) or Ch. 3 (2/3e)
Descriptive Statistics
CHAPTER 2: Basic Summary Statistics
Central Tendency & Variability
Presentation transcript:

Statistics 11 The mean The arithmetic average: The “balance point” of the distribution: X=2 -3 X=6+1 X= An error or deviation is the distance from a score to the mean: X-μ

Statistics 12 The sum of the errors or deviations around the mean is always 0. Advantage: More informative than median & mode Takes all the observation/scores into account. Takes the distance & direction of deviations/errors into account.

Statistics 13 Advantage: More uses than median & mode Necessary for calculating many inferential statistics. Limitation: Not always possible to calculate a mean (scale) Mean can only be calculated for interval/ratio level data Need a different measure for nominal or ordinal level data

Statistics 14 Limitation: Not always appropriate to use the mean to describe the middle of a distribution (distribution) Mean is sensitive to extreme values or “outliers” Mean does not always reflect where the scores “pile up” Need a different measure for asymmetrical distributions Use the mean for unimodal, symmetrical distributions of interval/ratio level data.

Statistics 15 The median Divides the distribution exactly in half; 50th percentile Odd # of scores & no “pileup” or ties at the middle: median = the middle score

Statistics 16 Even # of scores & no “pileup” or ties at the middle: median = the average of the 2 middle scores

Statistics 17 Advantage: Insensitive to extreme values Can be used when extreme values distort the mean 3, 5, 5, 6, 7, 8, 9median=6mean= 6.1 3, 5, 5, 6, 7, 8, 50median=6mean=12 Is the most central, representative value in skewed distributions Advantage: Can be calculated when the mean cannot Can be used with ranks (as well as interval/ratio data) Can be used with open-ended distributions Example: # of siblings (5+ siblings?)

Statistics 18 Limitation: Not as informative as the mean Takes only the observations/scores around the 50th %ile into account. Provides no information about distances between observations. Limitation: Fewer uses than the mean Median is purely descriptive.

Statistics 19 Limitation: Not always possible to calculate a median (scale) Median can only be calculated for ordinal & interval/ratio data Use the median when you cannot calculate a mean or when the distributions of interval/ratio data are skewed by extreme values.

Statistics 110 The mode The most frequently occurring score(s) Advantage: Simple to find mode = “coke”

Statistics 111 Advantage: Can be used with any scale of measurement Median can only be calculated for ordinal & interval/ratio data Mean can only be calculated for interval/ratio data Advantage: Can be used to indicate >1 most frequent value Use to indicate bimodality, multimodality

Statistics 112 M = 5 median = 5 Neither reflects where the scores actually “pile up.” Modes = 2 & 8.

Statistics 113 Use to indicate major & minor modes major mode = 6am minor mode = 6pm

Statistics 114 Limitation: Not as informative as the mean or median Takes only the most frequently observed X values into account. Provides no information about distances between observations or the # of observations above/below the mode. Limitation: Fewer uses than the mean Mode is purely descriptive. Need to calculate a mean to use with inferential statistics. Use the mode when you cannot compute a mean or median, or with the mean/median to describe a bimodal/multimodal distribution.

Statistics 115 Describing distributions: Measures of variability or dispersion To describe/summarize a distribution of scores efficiently, you need: A measure of central tendency + a measure of variability. Which measure of central tendency is most appropriate? Why?

Statistics 116 Central tendency & variability measures are “partners.” Mode  range Median  interquartile range, semi-interquartile range Mean  SS, variance (σ²or s² ), standard deviation (σ or s) These measures describe distributions & indicate how well individual scores or samples of scores represent the population.

Statistics 117 Variability measures used with the mode & median Range Based on the distance between the highest & lowest observations on the X scale. Only takes the 2 most extreme observations into account.

Statistics 118 For interval/ratio data, range = data highest score or X max =11 URL X max =11.5 lowest score or X min =2 LRL X min =1.5 highest score – lowest score +1 = = 10 OR URL X max – LRL X min = = 10

Statistics 119 Range can also be used for ordered categories: Range = from “agree” to “disagree strongly,” with modal response = “disagree.” The range is typically used with the mode, when the mean & median are inappropriate or impossible to calculate (but may be reported along with a median or a mean).

Statistics 120 Interquartile range (IQR) & semi-interquartile range (SIQR) Based on distances between scores corresponding to percentiles on the X scale. Only take the middle 50% of the distribution into account. Use only with interval/ratio data. Interquartile range = distance between 1st & 3rd quartiles IQR = Q 3 - Q 1 1st quartile [Q 1 ] is the score at the 25th percentile 2nd quartile [Q 2 ] is the score at the 50th percentile—the median 3rd quartile [Q 3 ] is the score at the 75th percentile

Statistics 121 IQR provides information about how much distance on the X scale covers or contains the middle 50% of the distribution.

Statistics 122 N= Q1= Q3= IQR= SIQR = Semi-Interquartile range = half the interquartile range SIQR = For a symmetrical distribution, SIQR tells you the distance from the median up to Q3 or down to Q1—the distance covering the 25% of the distribution to each side of the median. The interquartile range & semi-interquartile range are typically used with the median (but may be reported along with a mean).

Statistics 123 Variability measures used with the mean SS, variance & standard deviation are based on distances between each of the scores & the mean on the X scale. All scores are taken into account, as with the mean. Use only with interval/ratio data. Most useful for symmetrical distributions, when the mean is the best measure of central tendency.

Statistics 124 Notation:Population parameter Sample statistic Sum of squared deviations SS Variance σ²σ²s² Standard deviation (s.d.) σs For now, we will be working with the population values.

Statistics 125 Suppose you want to summarize how far the scores in a distribution typically deviate from the mean. Averages are a convenient way to summarize information, BUT: REMEMBER: An error or deviation is the distance from a score to the mean: X- μ REMEMBER: The sum of the deviations around the mean is always 0.

Statistics 126 You can’t sum the deviations & divide by the number of scores to get a useful average amount of deviation: 0/N will always = 0. What can you do to summarize the deviations? SS = the sum of the squared deviations or errors around the mean (definitional formula—conceptual) Squaring the deviations first allows you to sum them.

Statistics 127 Computing, squaring, & summing all N deviations is tedious, so there is a “shortcut.” Plug these values into the following formula: (computational formula—use this one) For the distribution of N=4,  X= 8  X ² = 38

Statistics 128 Computing, squaring, & summing all N deviations is tedious, so there is a “shortcut.” Plug these values into the following formula: (computational formula—use this one) For the distribution of N=4,  X= 8  X ² = 38 SS=

Statistics 129 So, what does __ tell you about the variability of the distribution of scores? By itself, not much… SS summarizes the amount of deviation & is useful for further analyses. In general, we CAN say that:

Statistics 130 As variability increases (more differences between scores, larger deviations) SS gets larger. Extreme scores farther from μ contribute proportionately more to SS because they produce larger deviations. As N increases (more squared deviations to sum) SS gets larger. Because SS increases with N, SS is NOT a good descriptive statistic You can’t compare SS between groups of different sizes.

Statistics 131 How can you use SS to create a measure that will allow you to compare different-sized groups?

Statistics 132 How can you use SS to create a measure that will allow you to compare different-sized groups? Variance = the average squared deviation or “mean squared deviation” Variance is not affected by N, because it is an average— the mean squared deviation. Variance summarizes the amount of deviation, allows for comparisons between different-sized groups, & is useful for further analyses. Since variance is a mean of squared deviations, it is not on the same scale as our original variable

Statistics 133 Because variance does NOT allow you to describe typical variation among scores in terms of the original scale, it is still NOT a good descriptive statistic. The relationship between variance & distances or units on the I/E scale is difficult to visualize or understand. How can you use  ² to create a descriptive measure of variability on the same scale as the original scores? SS=the sum of the squared deviations=  (X-  ) ²  ² =the average squared deviation = SS/N What we’d really like to have is a measure of the typical or average deviation from the mean that is NOT based on squared quantities.

Statistics 134 Standard deviation = the typical or expected deviation The typical, average, or “expected” distance that scores deviate from the mean. population s.d. Taking the square root of the variance “returns” the measure of variability to the original units of measurement. This allows you to represent standard deviation as a distance on the X axis. This also allows you to make statements about how extreme or unusual an observation is.

Statistics 135 Note: As with the mean, standard deviation is most useful for describing symmetrical distributions. Standard deviation is the best descriptive measure of variability around a mean; SS & variance are important concepts for understanding & for use in further analyses.

Statistics 136 Sample variance & standard deviation We often want to make statements about population parameters. How extroverted are male U.S. citizens, on average? Parameter of interest = μ. BUT, much of the time we only have access to sample statistics. How extroverted are males from the PSY 1 subject pool, on average? Our best estimate of μ = M calculated using sample data.

Statistics 137 Notation:Population parameter Sample statistic MeanμM # of observationsNn Sum of squared deviations SS Variance σ²σ² s²s² Standard deviation (s.d.) σs We use statistics s & s ² as estimates of σ & σ² when the population parameters are unknown.

Statistics 138 Formulae PopulationSampleDifference? M instead of , “N”  “n” AND n-1 instead of N “N”  “n” AND n-1 instead of N

Statistics 139 Sample & population SS are the same Calculations do NOT change from population to sample. Population:Sample: “N” has just been relabeled as “n.” If you use the definitional formula, use the correct mean. Population: Sample: This will matter later on…

Statistics 140 Comparing sample & population variances Calculations DO change from population to sample. Population: Sample: “N” has been relabeled as “n.” AND Use n-1 instead of N in the denominator. Sample formula will always yield a larger value.

Statistics 141 Why (n-1) instead of N? Because (n-1) instead of N corrects for bias in calculating s & s ². Remember: Sample statistics are only useful to the extent that they provide unbiased estimates of population parameters. What is an unbiased statistic?

Statistics 142 One that on average = the population parameter. M is an unbiased estimate of  : The average of many sample means = the population mean. (Each box is a sample mean.) What is a biased statistic? One that systematically over or underestimates the parameter.

Statistics 143 SS/N tends to underestimate population variance when using sample data. Why doesn’t the SS/N formula work with sample data? Samples usually contain less variability than the populations they come from. Samples tend to contain observations from the center of the population distribution. These samples do not reflect the extremes of the population, so we underestimate the true variability.

Statistics 144 Dividing SS by a smaller number corrects for the tendency to underestimate true population variability. The n-1 correction makes s ² & s unbiased estimators of σ ² & σ. n-1 is also referred to as “degrees of freedom.” Sample variances have n-1 degrees of freedom—they are calculated from n-1 independent scores. The last score is determined by the other scores & by M.

Statistics 145 For the following set of data, compute the value for SS. –Scores: 5, 2, 2, 7, 9 ANS: SS = Calculate the variance and the standard deviation for the following date (Population & Sample) –Scores: 2, 3, 2, 4, 7, 5, 3, 6, 4

Statistics 146 Data: xf xf X²X²f( X ²) SS = σ ² = σ = s ² = s =