Descriptive Statistics

Slides:



Advertisements
Similar presentations
Measures of Central Tendency
Advertisements

SPSS Review CENTRAL TENDENCY & DISPERSION
Descriptive Statistics Measures of Central Tendency Variability Standard Scores.
Basic Statistical Concepts
Measures of Dispersion or Measures of Variability
Descriptive Statistics
Statistics Intro Univariate Analysis Central Tendency Dispersion.
SOC 3155 SPSS CODING/GRAPHS & CHARTS CENTRAL TENDENCY & DISPERSION.
Analysis of Research Data
Intro to Descriptive Statistics
Measures of Dispersion
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data.
Today: Central Tendency & Dispersion
Measures of Central Tendency
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Describing distributions with numbers
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
BIOSTATISTICS II. RECAP ROLE OF BIOSATTISTICS IN PUBLIC HEALTH SOURCES AND FUNCTIONS OF VITAL STATISTICS RATES/ RATIOS/PROPORTIONS TYPES OF DATA CATEGORICAL.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
EPE/EDP 557 Key Concepts / Terms –Empirical vs. Normative Questions Empirical Questions Normative Questions –Statistics Descriptive Statistics Inferential.
Measures of Central Tendency or Measures of Location or Measures of Averages.
Overview Summarizing Data – Central Tendency - revisited Summarizing Data – Central Tendency - revisited –Mean, Median, Mode Deviation scores Deviation.
Statistics Recording the results from our studies.
Statistical Tools in Evaluation Part I. Statistical Tools in Evaluation What are statistics? –Organization and analysis of numerical data –Methods used.
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
PPA 501 – Analytical Methods in Administration Lecture 5a - Counting and Charting Responses.
Describing Behavior Chapter 4. Data Analysis Two basic types  Descriptive Summarizes and describes the nature and properties of the data  Inferential.
Introduction to Descriptive Statistics Objectives: 1.Explain the general role of statistics in assessment & evaluation 2.Explain three methods for describing.
Descriptive Statistics: Numerical Methods
Descriptive Statistics
KNR 445 Statistics t-tests Slide 1 Variability Measures of dispersion or spread 1.
Skewness & Kurtosis: Reference
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
1 Univariate Descriptive Statistics Heibatollah Baghi, and Mastee Badii George Mason University.
Statistics 11 The mean The arithmetic average: The “balance point” of the distribution: X=2 -3 X=6+1 X= An error or deviation is the distance from.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
Copyright © 2014 by Nelson Education Limited. 3-1 Chapter 3 Measures of Central Tendency and Dispersion.
1 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely used)
INVESTIGATION 1.
Measures of Central Tendency: The Mean, Median, and Mode
Chapter 2 Means to an End: Computing and Understanding Averages Part II  igma Freud & Descriptive Statistics.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/19.
Statistical Analysis of Data. What is a Statistic???? Population Sample Parameter: value that describes a population Statistic: a value that describes.
LIS 570 Summarising and presenting data - Univariate analysis.
Introduction to statistics I Sophia King Rm. P24 HWB
Outline of Today’s Discussion 1.Displaying the Order in a Group of Numbers: 2.The Mean, Variance, Standard Deviation, & Z-Scores 3.SPSS: Data Entry, Definition,
Chapter 2 Describing and Presenting a Distribution of Scores.
Descriptive Statistics(Summary and Variability measures)
Statistics Josée L. Jarry, Ph.D., C.Psych. Introduction to Psychology Department of Psychology University of Toronto June 9, 2003.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 2 Describing and Presenting a Distribution of Scores.
Chapter 4: Measures of Central Tendency. Measures of central tendency are important descriptive measures that summarize a distribution of different categories.
Making Sense of Statistics: A Conceptual Overview Sixth Edition PowerPoints by Pamela Pitman Brown, PhD, CPG Fred Pyrczak Pyrczak Publishing.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
AP PSYCHOLOGY: UNIT I Introductory Psychology: Statistical Analysis The use of mathematics to organize, summarize and interpret numerical data.
Lecture 8 Data Analysis: Univariate Analysis and Data Description Research Methods and Statistics 1.
Chapter 11 Summarizing & Reporting Descriptive Data.
Different Types of Data
SPSS CODING/GRAPHS & CHARTS CENTRAL TENDENCY & DISPERSION
Statistics.
Numerical Measures: Centrality and Variability
Description of Data (Summary and Variability measures)
STATS DAY First a few review questions.
Descriptive Statistics
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Presentation transcript:

Descriptive Statistics

the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24/ 2010 Stories vs. Statistics By JOHN ALLEN PAULOS

Overview What are descriptive statistics? A bit of terminology/notation Measures of Central Tendency Mean, Mode, Median Measures of Variability Ranges, Standard Deviations The Normal Curve

Terminology/Notation A data distribution = A set of data/scores (the whole thing) 1, 2, 4, 7 X = A raw, single score (i.e., 2 from above) ∑ = Summation (added up) ∑X = 14 (each individual score added up) n = sample size (distribution size, or number of scores) n = 4 (from above)

Descriptive Statistics Descriptive statistics are the side of statistics we most often use in our everyday lives Realize that most observations/data are too “large” for a human to take in and comprehend – we must “reduce” them How can we summarize what we see? Example – Grades/Registrar

Descriptive Statistics Descriptive statistics = describing the data n = 50, a test score of 83% Where does it fit in the class?? Making sense out of chaos

Descriptive Statistics Transform a set of numbers or observations into indices that describe or characterize the data “Summary statistics” A large group of statistics that are used in all research manuscripts Even the most complex statistical tests and studies start with descriptive statistics

Descriptive Statistics Measurement Scales Relationship Scatterplot Correlation Regression Nominal Ordinal Interval Ratio Descriptive Statistics Graphic Portrayals Variability Central Tendency Range Standard deviation Standardized scores Frequencies Histograms Bar graphs Normal distribution Mean Median Mode

Descriptive Statistics Descriptive statistics usually accomplish two major goals: 1) Describe the central location of the data 2) Describe how the data are dispersed about that point In other words, they provide: 1) Measures of Central Tendency 2) Measures of Variability

Measure of Central Tendency What SINGLE summary value best describes the CENTRAL location of an entire distribution? Mode: which value occurs most often Median: the value above and below which 50% of the cases fall (the middle; 50th percentile) Mean: mathematical balance point; arithmetic/mathematical average

Mode Most frequent occurrence What if data were? 17, 19, 20, 20, 22, 23, 25, 28 17, 19, 20, 20, 22, 23, 23, 28 Problem: set of numbers can be bimodal, or trimodal, depending on the scores Not a stable measure Ex. 17, 19, 20, 22, 23, 28, 28

Median Rank numbers, pick middle one What if data were…? 17, 19, 20, 23, 23, 28 Solution: add up two middle scores, divide by 2 (=21.5) Best measure in asymmetrical distribution (i.e. skewed), not sensitive to extreme scores Ex. 17, 19, 20, 23, 23, 428

Mean = X Add up the numbers and divide by the sample size (the number of numbers!) Try this one… 2,3,5,6,9 2+3+5+6+9 = 25 / 5 = 5 (Usually) best measure of the three –uses the most information (all values from distribution contribute)

Characteristics of the Mean Balance point Point around which deviations sum to zero Deviation = X – X For instance, if scores are 2,3,5,6,9 Mean is 5 Sum of deviations: (-3)+(-2)+0+1+4=0 ∑ (X – X) = 0

Characteristics of the Mean Affected by extreme scores Example 1 Scores 7, 11, 11, 14, 17 Mean = 12, Mode and Median = 11 Example 2 Scores 7, 11, 11, 14, 170 Mean = 42.6, Mode & Median = 11

Characteristics of the Mean Balance point Affected by extreme scores Appropriate for use with interval or ratio scales of measurement More stable than Median or Mode when multiple samples drawn from the same population Basis for inferential stats

Guidelines to Choose Measure of Central Tendency Mean is preferred because it is the basis of inferential statistics Median may be better for skewed data Distribution of wealth in the US – ex. annual household income in Washington state for 2000: mean=$76,818; median=$42,024 Mode to describe average of nominal data (eye color, hair color, etc…)

Normal Distribution Frequency, How often a score occurs Scores

MLB batting averages over 3-year span (min. 100 AB) Mean = 0.267 n = 1291

Normal Distribution Scores Mode “Normal” distribution indicates the data are perfectly symmetrical Median Mean Scores

Positively skewed distribution Mode Median Mean Scores

NFL Salaries 2011

Negatively skewed distribution Mode Median Mean Scores

Relationship among the MCT & shape of distribution

Alaska’s average elevation of 1900 feet is less than that of Kansas. Nothing in that average suggests the 16 highest mountains in the United States are in Alaska. Averages mislead, don’t they? Grab Bag, Pantagraph, 08/03/2000

Measures of dispersion or spread Variability Measures of dispersion or spread The only thing constant is variation.

the notions of variability Unusual Peculiar Strange Original Extreme Special Unlike Deviant Dissimilar different NY Times, 10/24/ 2010 Stories vs. Statistics By JOHN ALLEN PAULOS

Variability defined Measures of Central Tendency provide a summary level of the data Recognizes that scores vary across individual cases ie, the mean or median may not be an actual score in your distribution Variability quantifies the spread of performance How scores vary around mean/mode/median

To describe a distribution 1) Measure of Central Tendency Mean, Mode, Median 2) Measure of Variability Multiple measures Range, Interquartile range, Semi-Interquartile Range Standard Deviation

Range Range = Difference between low/high score # of hours spent watching TV/week 2, 5, 7, 7, 8, 8, 10, 12, 12, 15, 17, 20 Range = (Max - Min) Score 20 - 2 = 18 Very susceptible to outliers Doesn’t indicate anything about variability around the mean/central point

Semi-Interquartile range What is a quartile?? Divide sample into 4 parts of equal size Q1 , Q2 , Q3 = Quartile Points Interquartile Range = Q3 - Q1 Difference between highest and lowest quartile SIQR = IQR / 2 Related to the Median…prevents outliers from overly skewing measure For ordinal data or skewed interval/ratio

Quartiles based on miles walked/week BMD and walking Quartiles based on miles walked/week Krall et al, 1994, Walking is related to bone density and rates of bone loss. AJSM, 96:20-26

Notes: Skewed Distribution? 95th Percentile? 50th Percentile vs Median?

Standard Deviation Most commonly accepted measure of spread Variation itself is nature's only irreducible essence. Stephen Jay Gould Standard Deviation Most commonly accepted measure of spread Compute the deviations of all numbers from the mean Square and THEN sum each of the deviations Divide by the number of deviations Finally, take the square root

Standard Deviation Distribution = 1, 3, 5, 7 X = 16 /4 = 4 1) Compute Deviations = -3, -1, 1, 3 2) Square Deviations = 9, 1, 1, 9 3) Sum Deviations = 20 4) Divide by n= 20/4 = 5 5) Take square root = √5 = 2.2

Key points about SD SD small  data clustered round mean SD largedata scattered from the mean Affected by extreme scores (just like mean)…oftentimes called “outliers” Consistent (more stable) across samples from the same population Just like the mean - so it works well with inferential stats (where repeated samples are taken)

SD Example Three NFL quarterbacks with similar QB ratings in 2006: Matt Hasselbeck (SEA) = 76.0 Rex Grossman (CHI) = 73.9 Brett Favre (GB) = 72.7 Note: QB rating involves a complex formula accounting for passing attempts, completions, yards, touchdowns, and interceptions…100+ is considered outstanding & 70-80 is average All appear to have had very similar, somewhat mediocre seasons as QB’s

SD Example Let’s look at the SD of their game-by-game QB ratings: Matt Hasselbeck (SEA) = 29.97 Rex Grossman (CHI) = 47.60 Brett Favre (GB) = 27.81 Grossman had, by far, the most variability (i.e. inconsistency) in his game-by-game performances…is this good or bad?

Clinical Use of SD

SD and the normal curve The following concepts are critical to your understanding of how descriptive statistics works Remember – a “normal” curve is perfectly symmetrical. This is not typical, but usually data are almost normal…

SD and the normal curve X = 70 SD = 10 34.1% 34.1% About 68% of scores fall within 1 SD of mean X = 70 SD = 10 34.1% 34.1% 60 70 80

The standard deviation and the normal curve About 68% of scores fall between 60 and 70 X = 70 SD = 10 34% 34% 60 70 80

The standard deviation and the normal curve About 95% of scores fall within 2 SD of mean X = 70 SD = 10 34.1% 34.1% 13.6% 13.6% 50 60 70 80 90

The standard deviation and the normal curve About 95% of scores fall between 50 and 90 X = 70 SD = 10 34.1% 34.1% 13.6% 13.6% 50 60 70 80 90

The standard deviation and the normal curve About 99.7% of scores fall within 3 S.D. of the mean X = 70 SD = 10 34.1% 34.1% 13.6% 13.6% 2.3% 2.3% 40 50 60 70 80 90 100

The standard deviation and the normal curve About 99.7% of scores fall between 40 and 100 X = 70 SD = 10 34.1% 34.1% 13.6% 13.6% 2.3% 2.3% 40 50 60 70 80 90 100

What about = 70, SD = 5? What approximate percentage of scores fall between 65 & 75? …1SD below + 1SD above = 68% What range includes about 99.7% of all scores? …3SD below to 3SD above = 55 to 85

Interpreting The Normal Table Area under Normal Curve Specific SD values (z) include certain percentages of the scores Values of Special Interest 1.96 SD = 47.5% of scores (47.5 + 47.5 = 95%) 2.58 SD = 49.5% of scores (49.5 + 49.5 = 99%) ie, 95% of scores fall within 1.96 standard deviations of the mean (1.96 above and 1.96 below)

IQ X = 100 SD = 15 68% have an IQ between 85-115 55 70 85 100 115 130 34.1% 34.1% 13.6% 13.6% 2.3% 2.3% 55 70 85 100 115 130 145

MLB players’ batting averages over a 3-year span (min. 100 at bats) ~95% of players have an average between 0.196 and 0.337

Next Week… We will utilize our understanding of descriptive statistics concepts, including central tendency, variability, and the normal curve, to examine standardized scores Homework = Cronk 3.1 – 3.4 Bring calculator to class In-class activity 2…