VARIABILITY Distributions Measuring dispersion

Slides:



Advertisements
Similar presentations
DEPICTING DISTRIBUTIONS. How many at each value/score Value or score of variable.
Advertisements

SUMMARIZING DATA: Measures of variation Measure of Dispersion (variation) is the measure of extent of deviation of individual value from the central value.
Calculating & Reporting Healthcare Statistics
Introduction to Educational Statistics
CHAPTER 6 Statistical Analysis of Experimental Data
Measures of Dispersion
Today: Central Tendency & Dispersion
BPT 2423 – STATISTICAL PROCESS CONTROL.  Frequency Distribution  Normal Distribution / Probability  Areas Under The Normal Curve  Application of Normal.
Confidence Intervals. Estimating the difference due to error that we can expect between sample statistics and the population parameter.
Objective To understand measures of central tendency and use them to analyze data.
© Copyright McGraw-Hill CHAPTER 6 The Normal Distribution.
Chapter 3 Statistical Concepts.
B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma.
JDS Special Program: Pre-training1 Basic Statistics 01 Describing Data.
Data Handbook Chapter 4 & 5. Data A series of readings that represents a natural population parameter A series of readings that represents a natural population.
Psychology’s Statistics Statistical Methods. Statistics  The overall purpose of statistics is to make to organize and make data more meaningful.  Ex.
Measures of Variability Objective: Students should know what a variance and standard deviation are and for what type of data they typically used.
Variability. Statistics means never having to say you're certain. Statistics - Chapter 42.
Nature of Science Science Nature of Science Scientific methods Formulation of a hypothesis Formulation of a hypothesis Survey literature/Archives.
Warsaw Summer School 2014, OSU Study Abroad Program Variability Standardized Distribution.
DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number.
VARIABILITY. Case no.AgeHeightM/F 12368M 22264F 32369F 42571M 52764F 62272M 72465F 82366M 92366F F M F M F F F.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Central Tendency & Dispersion
Review - Confidence Interval Most variables used in social science research (e.g., age, officer cynicism) are normally distributed, meaning that their.
Statistical Analysis Quantitative research is first and foremost a logical rather than a mathematical (i.e., statistical) operation Statistics represent.
DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number.
Descriptive Statistics for one Variable. Variables and measurements A variable is a characteristic of an individual or object in which the researcher.
Outline of Today’s Discussion 1.Displaying the Order in a Group of Numbers: 2.The Mean, Variance, Standard Deviation, & Z-Scores 3.SPSS: Data Entry, Definition,
VARIABILITY. Case no.AgeHeightM/F 12368M 22264F 32369F 42571M 52764F 62272M 72465F 82366M 92366F F M F M F F F.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
DEFINITIONS Population Sample Unit of analysis Case Sampling frame.
Cell Diameters and Normal Distribution. Frequency Distributions a frequency distribution is an arrangement of the values that one or more variables take.
DEFINITIONS Population Sample Unit of analysis Case Sampling frame.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 10 Descriptive Statistics Numbers –One tool for collecting data about communication.
Advanced Quantitative Techniques
Analysis of Quantitative Data
MATH-138 Elementary Statistics
Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.
Research Methods in Psychology PSY 311
Problem: Assume that among diabetics the fasting blood level of glucose is approximately normally distributed with a mean of 105mg per 100ml and an SD.
How Psychologists Ask and Answer Questions Statistics Unit 2 – pg
PCB 3043L - General Ecology Data Analysis.
Statistics.
APPROACHES TO QUANTITATIVE DATA ANALYSIS
AP Statistics Empirical Rule.
Central Tendency and Variability
Univariate Descriptive Statistics
Univariate Descriptive Statistics
Statistics is the science of conducting studies to collect, organize, summarize, analyze, present, interpret and draw conclusions from data. Table.
MEASURES OF CENTRAL TENDENCY
Module 8 Statistical Reasoning in Everyday Life
An Introduction to Statistics
Basic Statistical Terms
Week 5 Descriptive Statistics
Lesson 1: Summarizing and Interpreting Data
Dispersion How values arrange themselves around the mean
Random sample of patrol officers, each scored 1-5 on a cynicism scale
DEFINITIONS Population Sample Unit of analysis Case Sampling frame.
DEFINITIONS Population Sample Unit of analysis Case Sampling frame.
VARIABILITY Distributions Measuring dispersion
Warsaw Summer School 2017, OSU Study Abroad Program
Summary (Week 1) Categorical vs. Quantitative Variables
1. Homework #2 (not on posted slides) 2. Inferential Statistics 3
Summary (Week 1) Categorical vs. Quantitative Variables
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Chapter Fifteen Frequency Distribution, Cross-Tabulation, and
Measures of Central Tendency; Dispersion
Central Tendency & Variability
The Mean Variance Standard Deviation and Z-Scores
Presentation transcript:

VARIABILITY Distributions Measuring dispersion Variance and standard deviation

Review: Distribution Case no. Age Height M/F 1 23 68 M 2 22 64 F 3 69 4 25 71 5 27 6 72 7 24 65 8 66 9 10 11 21 12 62 13 14 15 16 56 17 18 70 19 20 26 60 52 31 61 28 29 30 67 Summary statistics mean = 24 mean = 67 %M 39 %F 61 An arrangement of cases according to their score or value on one or more variables Categorical variable Continuous variable

Dispersion How do cases “disperse” (arrange themselves) around the mean? officers

Three statistics that measure dispersion Measure how cases “disperse” (arrange themselves) around the mean Average deviation  (x - ) ----------- n Average distance between the mean and the values (scores) for each case Uses absolute distances (no + or -) Affected by extreme scores We’ll never use it in class Variance (s2): A sample’s cumulative dispersion  (x - )2 ----------- n  we always use n-1 (our sample sizes are always small) Standard deviation (s): A standardized form of variance, comparable between samples  (x - )2 ----------- n  we always use n-1 (our sample sizes are always small) Square root of the variance Expresses dispersion in units of equal size for that particular distribution Less affected by extreme scores Mean 2.3 officers

This is not an acceptable graph – it’s only to illustrate dispersion Variability exercise Sample 1 (n=10) Officer Score Mean Diff. Sq. 1 3 2.9 .1 .01 2 3 2.9 .1 .01 3 3 2.9 .1 .01 4 3 2.9 .1 .01 5 3 2.9 .1 .01 6 3 2.9 .1 .01 7 3 2.9 .1 .01 8 1 2.9 -1.9 3.61 9 2 2.9 -.9 .81 10 5 2.9 2.1 4.41 ____________________________________________________ Sum 8.90 Variance (sum of squares / n-1) s2 .99 Standard deviation (sq. root of variance) s .99 Random sample of patrol officers, each scored 1-5 on a cynicism scale This is not an acceptable graph – it’s only to illustrate dispersion

Sample 2 (n=10) Officer Score Mean Diff. Sq. 1 2 ___ ___ ___ 2 1 ___ ___ ___ 3 1 ___ ___ ___ 4 2 ___ ___ ___ 5 3 ___ ___ ___ 6 3 ___ ___ ___ 7 3 ___ ___ ___ 8 3 ___ ___ ___ 9 4 ___ ___ ___ 10 2 ___ ___ ___ Sum ____ Variance s2 ____ Standard deviation s ____ Another random sample of patrol officers, each scored 1-5 on a cynicism scale Compute ...

VARIABILITY Shape of distributions Flat, peaked, normal

“Flat” distributions Mean A poor 3.65 descriptor Dispersion (aka, “variability”): How scores or values arrange themselves around the mean When scores are more dispersed (i.e., “variability” is greater) a distribution’s shape gets flatter Greater distance between most scores and the mean Many scores are at a considerable distance from the mean The mean loses value as a “summary statistic” Arrests Mean A poor 3.65 descriptor

“Peaked” and “normal” distributions Dispersion (aka, “variability”): How scores or values arrange themselves around the mean Peaked: If most scores cluster about a certain value the shape of the distribution is called “peaked” Normal: If the clustering of scores is around the mean the distribution is called “normal” In social science research it turns out that scores or values for many variables are normally or near-normally distributed This allows use of the mean to describe the underlying datasets That’s why means are called a “summary statistic” - they can “summarize” the values of samples or populations Arrests Mean Not a good 2.3  descriptor Peaked distribution (but not “normal”) Arrests Mean A good 3.0  descriptor Peaked and “normal” distribution

Characteristics of normal distributions Unimodal and symmetrical: shapes on both sides of the mean are identical 68.26 percent of the area “under” the curve – meaning 68.26 percent of the cases – falls within one “standard deviation” (+/- 1 ) from the mean The fact that a distribution is “normal” or “near-normal” does NOT imply that the mean is of any particular value. All it implies is that scores distribute themselves around the mean “normally”. Means depend on the data. In this distribution the mean could be any value. By definition, the standard deviation score that corresponds with the mean of a normal distribution - whatever the mean might be - is zero. ( = 0) Mean (whatever it is) Standard deviation (always 0 at the mean)

How well do means represent (summarize) a sample? If variable “no. of tickets” was “normally” distributed most cases would fall inside a bell-shaped curve. Here they don’t. Number of tickets Frequency B D F H K A C E G I J L M 2.13 4.46 6.79 -1 SD mean +1 SD 13 officers scored on numbers of tickets written in one week In a normal distribution about 66% of cases would fall within 1 SD of the mean. 13 X .66 = 9 cases But here only 7 cases (Officers D-J) do, while nearly as many (6) don’t. Scores are very dispersed, making the distribution mostly flat. So here the mean is NOT a good shortcut for describing how officers performed. Officer A: 1 ticket Officers B & C: 2 tickets each Officers D & E: 3 tickets each Officers F & G: 4 tickets each Officers H & I: 5 tickets each Officer J: 6 tickets Officers K & L: 7 tickets each Officer M: 9 tickets Mean = 4.46 SD = 2.33

13 officers scored on numbers of tickets written in one week Here, 9 of 13 cases (officers C-K) do fall within 1 SD of the mean. The distribution is near-normal because most officers wrote close to the same number of tickets. The cases “cluster” around the mean. So, for this sample the mean is a decent summary statistic - a good shortcut for describing officer performance D G E H J A B C F I K L M 2.59 4.69 6.79 -1 SD mean +1 SD Number of tickets Frequency Here most cases do fall inside the bell-shaped curve. Variable “no. of tickets” seems near-normally distributed Officer A: 1 ticket Officer B: 2 tickets Officer C: 3 tickets Officers D, E, F: 4 tickets each Officers G, H, I: 5 tickets each Officers J & K: 6 tickets each Officer L: 7 tickets Officer M: 9 tickets Mean = 4.69 SD = 2.1

Going beyond description… When variables are normally or near-normally distributed, the mean, variance and standard deviation can help describe datasets But they are also useful in explaining why things change; that is, in testing hypotheses You want to test the hypothesis that college-educated cops are more effective: college  greater effectiveness Independent variable: college (Y/N) Dependent variable: effectiveness (scale 1-5) You go to the XYZ police dept., draw two samples of patrol officers - one of college grads, the other of non-college grads - and test each officer for effectiveness. On a scale of 1 (ineffective) to 5 (highly effective) this is how they scored: 10 college grads (mean 3.7) 10 non-college (mean 2.8) The difference between means is in the hypothesized direction. But does that “prove” that college grads are more effective? To determine whether the difference in means is “statistically significant,” meaning large enough to prove the value of education, we need to know each sample’s variance. Don’t worry - we’ll cover this later! Are college-educated cops more effective? College grads Non-college grads

Exam information You must bring a regular, non-scientific calculator with no functions beyond a square root key. You will be asked to apply concepts including research question, hypothesis and variables to the “college education and police job performance" article. You will be given data and asked to create graph(s) depicting the distribution of a single variable. You will compute basic statistics, including mean, median, mode and standard deviation. All computations must be shown on the answer sheet. You will be given the formula for variance (s2). You must use and display the procedure described in the slides and practiced in class for manually calculating variance (s2) and its square root, known as standard deviation (s). This is a relatively brief exam. You will have one hour to complete it. We will then take a break and move on to the next topic.