Where are we? Measure of central tendency FETP India.

Slides:



Advertisements
Similar presentations
Chapter Three McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved
Advertisements

© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 4. Measuring Averages.
Measures of Central Tendency. Central Tendency “Values that describe the middle, or central, characteristics of a set of data” Terms used to describe.
Calculating & Reporting Healthcare Statistics
Chap 3-1 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 3 Describing Data: Numerical.
PPA 415 – Research Methods in Public Administration
Intro to Descriptive Statistics
CRIM 483 Descriptive Statistics.  Produces values that best represent an entire group of scores  Measures of central tendency—three types of information.
Measures of Central Tendency 3.1. ● Analyzing populations versus analyzing samples ● For populations  We know all of the data  Descriptive measures.
Chapter 3: Central Tendency
Today: Central Tendency & Dispersion
Chapter 4 Measures of Central Tendency
Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately describes the center of the.
Describing Data: Numerical
Chapter 3 Descriptive Measures
Describing distributions with numbers
Measurement Tools for Science Observation Hypothesis generation Hypothesis testing.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Measurements of Central Tendency. Statistics vs Parameters Statistic: A characteristic or measure obtained by using the data values from a sample. Parameter:
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
1 Data and central tendency Integrated Disease Surveillance Programme (IDSP) district surveillance officers (DSO) course.
Chapter 3 Statistics for Describing, Exploring, and Comparing Data
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Created by Tom Wegleitner, Centreville, Virginia Section 3-1 Review and.
1 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely used)
Measures of Central Tendency or Measures of Location or Measures of Averages.
Why statisticians were created Measure of dispersion FETP India.
Central Tendency Introduction to Statistics Chapter 3 Sep 1, 2009 Class #3.
Created by Tom Wegleitner, Centreville, Virginia Section 2-4 Measures of Center.
Psyc 235: Introduction to Statistics Lecture Format New Content/Conceptual Info Questions & Work through problems.
IT Colleges Introduction to Statistical Computer Packages Lecture 3 Eng. Heba Hamad week
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
1 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely used)
INVESTIGATION 1.
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman 1 Measures of Central Tendency Section 2-4 M A R I O F. T R I O L A Copyright ©
Chapter Three McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved
Measures of Central Tendency: The Mean, Median, and Mode
Chapter 2 Means to an End: Computing and Understanding Averages Part II  igma Freud & Descriptive Statistics.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
1 Measures of Center. 2 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely.
Central Tendency & Dispersion
Central Tendency. Variables have distributions A variable is something that changes or has different values (e.g., anger). A distribution is a collection.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Summary Statistics: Measures of Location and Dispersion.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved.
Chapter 3 Descriptive Statistics: Numerical Methods.
Descriptive Statistics(Summary and Variability measures)
Data Description Chapter 3. The Focus of Chapter 3  Chapter 2 showed you how to organize and present data.  Chapter 3 will show you how to summarize.
Summarizing Data with Numerical Values Introduction: to summarize a set of numerical data we used three types of groups can be used to give an idea about.
Measure of central tendency In a representative sample, the values of a series of data have a tendency to cluster around a certain point usually at the.
Measures of Center Sigma- A symbol for sum. ∑ Summation Notation- The use of the sigma symbol to represent a summation. Also called sigma-notation or.
Measures of Central Tendency. What is a measure of central tendency? Measures of Central Tendency Mode Median Mean Shape of the Distribution Considerations.
PRESENTATION OF DATA.
Business and Economics 6th Edition
Topic 3: Measures of central tendency, dispersion and shape
Numerical Measures: Centrality and Variability
Midrange (rarely used)
Descriptive Statistics
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Descriptive Statistics
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Numerical Descriptive Measures
MEASURES OF CENTRAL TENDENCY
LESSON 3: CENTRAL TENDENCY
Numerical Descriptive Measures
Numerical Descriptive Statistics
Chapter Three Numerically Summarizing Data
For an ideal average 1- rigidly defined 2- easy to understand and easy to calculate 3- based upon all the observation 4- suitable for further mathematical.
Numerical Descriptive Measures
Lecture 15 Sections 5.1 – 5.2 Tue, Feb 14, 2006
Presentation transcript:

Where are we? Measure of central tendency FETP India

Competency to be gained from this lecture Calculate a measure of central tendency that is adapted to the sample studied

Key issues Measures of central tendency  Mode  Median  Mean  Geometric mean Appropriate applications

Summary statistics A single value that summarizes the observed value of a variable  Part of the data reduction process Two types:  Measures of location/central tendency/average  Measures of dispersion/variability/spread Describe the shape of the distribution of a set of observations Necessary for precise and efficient comparisons of different sets of data  The location (average) and shape (variability) of different distributions may be different

Different variability, same location

Different location, same variability

Quick definitions of measures of central tendency Mode  The most frequently occuring observation Median  The mid-point of a set of ordered observations Arithmetic mean  The product of the division of the arithmetic sum of observations by the number of observations

The mode Definition  The mode of a distribution is the value that is observed most frequently in a given set of data How to obtain it?  Arrange the data in sequence from low to high  Count the number of times each value occurs  The most frequently occurring value is the mode Mode

The mode N Mode

Examples of mode (1/2): Annual salary (in 100,000 rupees) 4, 3, 3, 2, 3, 8, 4, 3, 7, 2 Arranging the values in order:  2, 2, 3, 3, 3, 3, 4, 4, 7, 8 7, 8  The mode is three times “3” Mode

Examples of mode (2/2): Incubation period for hepatitis affected persons (in days) 29, 31, 24, 29, 30, 25 Arranging the values in order:  24, 25, 29, 29, 30, 31  Mode is 29 Mode

The mode is the only location statistics to be used when some characteristic itself cannot be measured Colour preference of people for their cars Colour preferenceNumber of people Green354 Blue852 Gray310 Red474 Mode

Specific features of the mode There may be no mode  When each value is unique There may be more than one mode  When more than 1 peak occurs  Bimodal distribution The mode can be misinterpreted  Is a distribution skewed or bimodal ? The mode is not amenable to statistical tests The mode is not based upon all observations Mode

The median The median describes literally the middle value of the data It is defined as the value above or below which half (50%) the observations fall Median

Computing the median Arrange the observations in order from smallest to largest (ascending order) or vice- versa Count the number of observations “n”  If “n” is an odd number Median = value of the (n+1) / 2th observation  If “n” is an even number Median = the average of the n / 2th and (n /2)+1th observations Median

Computing the Median, Example Example of median calculation What is the median of the following values:  10, 20, 12, 3, 18, 16, 14, 25, 2  Arrange the numbers in increasing order 2, 3, 10, 12, 14, 16, 18, 20, 25 Median = 14 Suppose there is one more observation (8)  2, 3, 8, 10, 12, 14, 16, 18, 20, 25  Median = Mean of 12 & 14 = 13 Median

Advantages and disadvantages of the median Advantages  The median is unaffected by extreme values Disadvantages  The median does not contain information on the other values of the distribution Only selected by its rank You can change 50% of the values without affecting the median  The median is less amenable to statistical tests Median

The median is not sensitive to extreme values Median Same median

Mean (Arithmetic mean / Average) Most commonly used measure of location Definition  Calculated by adding all observed values and dividing by the total number of observations Notations  Each observation is denoted as x1, x2, … xn  The total number of observations: n  Summation process = Sigma :   The mean: X X =  xi /n Mean

Computation of the mean Duration of stay in days in a hospital  8,25,7,5,8,3,10,12,9 9 observations (n=9) Sum of all observations = 87 Mean duration of stay = 87 / 9 = 9.67 Incubation period in days of a disease  8,45,7,5,8,3,10,12,9 9 observations (n=9) Sum of all observations =107 Mean incubation period = 107 / 9 = Mean

Advantages and disadvantages of the mean Advantages  Has a lot of good theoretical properties  Used as the basis of many statistical tests  Good summary statistic for a symmetrical distribution Disadvantages  Less useful for an asymmetric distribution Can be distorted by outliers, therefore giving a less “typical” value Mean

Mean of several groups combined Mean of all groups = 2000 / 50 = 40 Crude average = 39.7

The geometric mean Background  Some distribution appear symmetric after log transformation (e.g., Neutrophil counts)  A log transformation may help describing the central tendency Definition  The geometric mean is the antilog of the mean of the log values Geometric mean

Calculating a geometric mean Observe the set of observations  5,10,20,25,40 Take the logarithm of these values  0.70, 1.00, 1.30, 1.40 & Calculate the mean of the log values  = 6.00  6.00/ 5 = 1.20 Take the antilog of the mean of the log values  Antilog (1.20) = Geometric mean

Geometric mean of several groups combined Overall GM = antilog of ( / 50) = antilog ( ) = 9.3 Geometric mean

N Mean = Median = 10Mode = 13.5 Choosing

What measure of location to use? Consider the duration (days) of absence from work of 21 labourers owing to sickness  1, 1, 2, 2, 3, 3, 4, 4, 4, 4, 5, 6, 6, 6, 7, 8, 9, 10, 10, 59, 80 Mean = 11 days  Not typical of the series as 19 of the 21 labourers were absent for less than 11 days  Distorted by extreme values Median = 5 days  Better measure Choosing

Choice of measure of central tendency for symmetric distributions Any one of the central/location measures can be used The mean has definite advantages if subsequent computations are needed Choosing

Choice of measure of central tendency for asymmetric distributions For skewed distributions, the mean is not suitable  Positive skewed: Mean gives a higher value  Negatively skewed: Mean gives a lower value If some observations deviate much more than others in the series, then median is the appropriate measure If the log-transformed distribution is symmetric, the geometric mean may be used Choosing

Key messages The mode is the most common value The median is adapted when there are extreme values The mean is adapted for symmetric distribution The geometric mean may be useful when log transformed data are symmetric The type of the distribution determines the measure of central tendency to use