Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 3: Central Tendency And Dispersion.

Slides:



Advertisements
Similar presentations
Population vs. Sample Population: A large group of people to which we are interested in generalizing. parameter Sample: A smaller group drawn from a population.
Advertisements

University of Durham D Dr Robert Coe University of Durham School of Education Tel: (+44 / 0) Fax: (+44 / 0)
Measures of Dispersion
Numerically Summarizing Data
IB Math Studies – Topic 6 Statistics.
Descriptive Statistics
Slides by JOHN LOUCKS St. Edward’s University.
Measures of Variability. Why are measures of variability important? Why not just stick with the mean?  Ratings of attractiveness (out of 10) – Mean =
Measures of Dispersion
Chapter In Chapter 3… … we used stemplots to look at shape, central location, and spread of a distribution. In this chapter we use numerical summaries.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 4: The Normal Distribution and Z-Scores.
Unit 4 – Probability and Statistics
1 Chapter 4: Variability. 2 Variability The goal for variability is to obtain a measure of how spread out the scores are in a distribution. A measure.
Chapter 5 – 1 Chapter 5: Measures of Variability The Importance of Measuring Variability The Range IQR (Inter-Quartile Range) Variance Standard Deviation.
Central Tendency and Variability
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Chapter 2 Describing Data with Numerical Measurements
Statistics for Linguistics Students Michaelmas 2004 Week 1 Bettina Braun.
Describing distributions with numbers
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
LECTURE 12 Tuesday, 6 October STA291 Fall Five-Number Summary (Review) 2 Maximum, Upper Quartile, Median, Lower Quartile, Minimum Statistical Software.
Dr. Serhat Eren DESCRIPTIVE STATISTICS FOR GROUPED DATA If there were 30 observations of weekly sales then you had all 30 numbers available to you.
Numerical Descriptive Techniques
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
1 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely used)
Methods for Describing Sets of Data
Variability The goal for variability is to obtain a measure of how spread out the scores are in a distribution. A measure of variability usually accompanies.
Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values.
Tuesday August 27, 2013 Distributions: Measures of Central Tendency & Variability.
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
Psyc 235: Introduction to Statistics Lecture Format New Content/Conceptual Info Questions & Work through problems.
1 Review Mean—arithmetic average, sum of all scores divided by the number of scores Median—balance point of the data, exact middle of the distribution,
Describing distributions with numbers
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
1 CHAPTER 3 NUMERICAL DESCRIPTIVE MEASURES. 2 MEASURES OF CENTRAL TENDENCY FOR UNGROUPED DATA  In Chapter 2, we used tables and graphs to summarize a.
1 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely used)
DATA ANALYSIS n Measures of Central Tendency F MEAN F MODE F MEDIAN.
INVESTIGATION 1.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Chapter 5 Measures of Variability. 2 Measures of Variability Major Points The general problem The general problem Range and related statistics Range and.
Chapter 3 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Chapter 3: Measures of Central Tendency and Variability Imagine that a researcher.
Measures of Variability. Why are measures of variability important? Why not just stick with the mean?  Ratings of attractiveness (out of 10) – Mean =
Practice Page 65 –2.1 Positive Skew Note Slides online.
LECTURE CENTRAL TENDENCIES & DISPERSION POSTGRADUATE METHODOLOGY COURSE.
Central Tendency & Dispersion
Summary Statistics and Mean Absolute Deviation MM1D3a. Compare summary statistics (mean, median, quartiles, and interquartile range) from one sample data.
Summary Statistics: Measures of Location and Dispersion.
Chapter 5: Measures of Dispersion. Dispersion or variation in statistics is the degree to which the responses or values obtained from the respondents.
Univariate Statistics PSYC*6060 Peter Hausdorf University of Guelph.
Psychology 202a Advanced Psychological Statistics September 8, 2015.
Statistics topics from both Math 1 and Math 2, both featured on the GHSGT.
LIS 570 Summarising and presenting data - Univariate analysis.
Introduction to statistics I Sophia King Rm. P24 HWB
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.
Statistics and Data Analysis
Descriptive Statistics Unit 6. Variable Any characteristic (data) recorded for the subjects of a study ex. blood pressure, nesting orientation, phytoplankton.
Honors Statistics Chapter 3 Measures of Variation.
Chapter 14 Statistics and Data Analysis. Data Analysis Chart Types Frequency Distribution.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Practice Page Practice Page Positive Skew.
Descriptive Statistics (Part 2)
Objective: Given a data set, compute measures of center and spread.
Central Tendency and Variability
NUMERICAL DESCRIPTIVE MEASURES
Descriptive Statistics
Measures of Central Tendency
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Presentation transcript:

Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 3: Central Tendency And Dispersion

Measures of Central Tendency Numerical values that refer to the center of a distribution Numerical values that refer to the center of a distribution Used to provide a “best descriptor” of the score for a sample Used to provide a “best descriptor” of the score for a sample Usefulness or quality of the measure depends on shape of distribution Usefulness or quality of the measure depends on shape of distribution Mode, Median, and Mean Mode, Median, and Mean

The Mode Defined as the most common or frequent score Defined as the most common or frequent score The value with the highest point on a frequency distribution of a variable The value with the highest point on a frequency distribution of a variable 3,4,1,5,7,1,2,3,1,1,6,1, 7,2 3,4,1,5,7,1,2,3,1,1,6,1, 7,2 The mode = 1 The mode = 1

The Mode If two adjacent points occur with equal and greatest frequency, the mode can be considered the average of these two. If two adjacent points occur with equal and greatest frequency, the mode can be considered the average of these two. Mode = 3.5 Mode = 3.5

The Mode If the two points are not adjacent and equal, the distribution is bimodal. If the two points are not adjacent and equal, the distribution is bimodal. Of course, binning might result in a single mode by eliminating error/noise. Of course, binning might result in a single mode by eliminating error/noise. Bimodal usually means substantially separated Bimodal usually means substantially separated

The Median Score that corresponds to the point at or below which 50% of scores fall Score that corresponds to the point at or below which 50% of scores fall The “middle” number in a ranking of the data The “middle” number in a ranking of the data Median Location Median Location Mdn location = (N+1)/2 Mdn location = (N+1)/2 If we have 11 numbers, the mdn location is: If we have 11 numbers, the mdn location is: (11+1)/2 = 6 (11+1)/2 = 6 1,1,2,3,3,3,4,4,5,5,6 1,1,2,3,3,3,4,4,5,5,6 Mdn = 3 Mdn = 3

The Median What about: 1,1,2,3,3,3,4,4,5,5,6,6 What about: 1,1,2,3,3,3,4,4,5,5,6,6 Mdn location = (12+1) / 2 = 6.5 Mdn location = (12+1) / 2 = 6.5 Mdn = 3.5 Mdn = 3.5 When the median location falls between points, the median is defined as the average of those two points. When the median location falls between points, the median is defined as the average of those two points.

Median: Histogram vs. Stem and Leaf Stem-and-Leaf Plot Frequency Stem & Leaf Stem width: 1.00 Each leaf: 1 case(s)

The Mean The average value The average value The sum of the scores divided by the number of scores The sum of the scores divided by the number of scores 2,4,5,9,11 2,4,5,9,11 ( )=31; 31/5=6.2 ( )=31; 31/5=6.2

Relations Among Measures of Central Tendency When the distributions are symmetric, the three measures will generally correspond. When the distributions are symmetric, the three measures will generally correspond. When the distributions are asymmetric, they will often diverge. When the distributions are asymmetric, they will often diverge.

The Mode: Advantages & Disadvantages Mode is the most commonly occurring score. Mode is the most commonly occurring score. Always appears in the data; mean and median may not. Always appears in the data; mean and median may not. Most likely score to occur. Most likely score to occur. Useful for nominal data; mean and median are not. Useful for nominal data; mean and median are not. When might the mode be useful? When might the mode be useful?

Loaded Dice The mode is your best bet. Median is not the highest probability. Mean does not even occur in sample

Disadvantages of The Mode Mode can vary depending on how data are grouped/binned Mode can vary depending on how data are grouped/binned May not be representative of entire distribution May not be representative of entire distribution Loaded Dice Example Loaded Dice Example Rare events (e.g., most frequent is zero) Rare events (e.g., most frequent is zero) Tells us nothing about cause of nonzero events Tells us nothing about cause of nonzero events

Advantages & Disadvantages of the Mean and Median Let me tell you a story.... Better known as ALWAYS look at your data distributions

Men, Women, Evolution, & Sex Is there a gender difference in the number of desired partners? Is there a gender difference in the number of desired partners? Evolutionary psychologists say “yes” due to an asymmetry in minimum parental investment needs. Evolutionary psychologists say “yes” due to an asymmetry in minimum parental investment needs. Data appeared to support this Data appeared to support this

Men, Women, Evolution, & Sex Mean # partners in next 30 years: Mean # partners in next 30 years: Men = 7.69; Women = 2.78 Men = 7.69; Women = 2.78 You can’t blame men; it’s in there nature! You can’t blame men; it’s in there nature! Yes? No? Any ideas? Yes? No? Any ideas?

Means versus Medians These folks never considered the form of their data (or did they?) These folks never considered the form of their data (or did they?) Without winsorization, men’s mean = 64 Without winsorization, men’s mean = 64

Means: Men = 7.69; Women = 2.78 Medians and Modes = 1

Advantages & Disadvantages of the Mean and Median Mean is subject to bias by extreme values Mean is subject to bias by extreme values May provide a value for central tendency that does not exist in data set May provide a value for central tendency that does not exist in data set Major benefit is historical use and ability to be manipulated algrebraically Major benefit is historical use and ability to be manipulated algrebraically Most mathematical equations depend on it Most mathematical equations depend on it When assumptions are met, it is quite valid When assumptions are met, it is quite valid Median Median Not influenced by extreme values (e.g., salaries, home values). Not influenced by extreme values (e.g., salaries, home values). Not as amenable to algebraic manipulation and use. Not as amenable to algebraic manipulation and use.

Measures of Variability/Dispersion The degree to which individual data points are distributed around the mean The degree to which individual data points are distributed around the mean Provide a measure of how representative the mean is of the scores Provide a measure of how representative the mean is of the scores More Representative

Several Measures Range Range Distance from lowest to highest values Distance from lowest to highest values 1,2,3,4,4,5,6,7; Range = 7-1 = 6 1,2,3,4,4,5,6,7; Range = 7-1 = 6 Suffers from sensitivity to extremes Suffers from sensitivity to extremes 1,2,3,4,4,5,6,7,80; Range = 80-1 = 79 1,2,3,4,4,5,6,7,80; Range = 80-1 = 79 Interquartile Range Interquartile Range Range of the middle 50% of scores Range of the middle 50% of scores Less dependent on extreme values Less dependent on extreme values Trimmed samples and statistics Trimmed samples and statistics

Average Deviation Conceptually Clear Conceptually Clear How far individual scores deviate from the mean on average How far individual scores deviate from the mean on average Problem is that average deviation from the mean is, be definition, zero Problem is that average deviation from the mean is, be definition, zero 1,2,3,3,4,5 1,2,3,3,4,5 Deviations: -2,-1,0,0,1,2 Deviations: -2,-1,0,0,1,2 Average Deviation = 0 Average Deviation = 0

The Variance Solves the problem that deviations sum to zero Solves the problem that deviations sum to zero Variance is defined as the average of the sum squared deviations about the mean Variance is defined as the average of the sum squared deviations about the mean Squares of negative numbers are positive Squares of negative numbers are positive Divide by N-1, not N Divide by N-1, not N Sample Variance is used to estimate Population Variance Sample Variance is used to estimate Population Variance

The Variance Data: 1,2,3,3,4,4,4,5,6 Volunteer?

Standard Deviation Square root of the variance Square root of the variance Average deviation from the mean Average deviation from the mean Gets rid of the squared metric Gets rid of the squared metric

Computational Formulae Algebraic manipulations are less clear conceptually but easy to use Algebraic manipulations are less clear conceptually but easy to use

Mean and Variance as Estimators These descriptive statistics are used to estimate parameters These descriptive statistics are used to estimate parameters

Bias in Sample Variance If we calculated the average squared deviation of the sample (as opposed to dividing by N-1), the variance would be a biased estimate of the population variance. If we calculated the average squared deviation of the sample (as opposed to dividing by N-1), the variance would be a biased estimate of the population variance. Bias: A property of a statistic whose long- range average is not equal to the parameter it estimates. Bias: A property of a statistic whose long- range average is not equal to the parameter it estimates.

Bias in Sample Variance Why does using N produce bias? Why does using N produce bias? Expected value is the long range avg. of a statistic over repeated samples. Expected value is the long range avg. of a statistic over repeated samples.

Applet Example

Multiply by constant: N/N-1

Box-and-Whisker Plots Graphical representations of dispersion Graphical representations of dispersion Quite useful to quickly visualize nature of variability and extreme scores Quite useful to quickly visualize nature of variability and extreme scores

Box-and-Whisker Plots First find the median location and mdn First find the median location and mdn Find the quartile locations Find the quartile locations Medians of the upper and lower half of distribution Medians of the upper and lower half of distribution Quartile location = (mdn location + 1) / 2 Quartile location = (mdn location + 1) / 2 These are termed the “hinges” These are termed the “hinges” Note: drop fractional values of mdn location Note: drop fractional values of mdn location Hinges bracket interquartile range (IQR) Hinges bracket interquartile range (IQR) Hinges serve as top and bottom of box Hinges serve as top and bottom of box

Box-and-Whisker Plots Find the H-spread Find the H-spread Range between two quartiles Range between two quartiles Simply the IQR Simply the IQR Area inside box in plot Area inside box in plot Draw the whiskers Draw the whiskers Lines from hinges to farthest points not more than 1.5 X H-spread Lines from hinges to farthest points not more than 1.5 X H-spread Outliers Outliers Points beyond whiskers Points beyond whiskers Denoted with asterisks Denoted with asterisks

Box-and-Whisker Plots Stem-and-Leaf Plot Frequency Stem & Leaf Extremes (>=15) Stem width: Each leaf: 1 case(s)

Example