Theme 3. Group description

Slides:

Advertisements

Similar presentations

Population vs. Sample Population: A large group of people to which we are interested in generalizing. parameter Sample: A smaller group drawn from a population.

Advertisements

Descriptive Measures MARE 250 Dr. Jason Turner.

Chapter 3: Descriptive Statistics

Agricultural and Biological Statistics

Measures of Dispersion

Descriptive Statistics

Descriptive Statistics – Central Tendency & Variability Chapter 3 (Part 2) MSIS 111 Prof. Nick Dedeke.

Analysis of Economic Data

Descriptive Statistics

Measures of Dispersion

Data observation and Descriptive Statistics

Describing Data: Numerical

BIOSTATISTICS II. RECAP ROLE OF BIOSATTISTICS IN PUBLIC HEALTH SOURCES AND FUNCTIONS OF VITAL STATISTICS RATES/ RATIOS/PROPORTIONS TYPES OF DATA CATEGORICAL.

Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.

EPE/EDP 557 Key Concepts / Terms –Empirical vs. Normative Questions Empirical Questions Normative Questions –Statistics Descriptive Statistics Inferential.

Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.

© Copyright McGraw-Hill CHAPTER 3 Data Description.

Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.

Skewness & Kurtosis: Reference

An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.

Measures of Dispersion

Categorical vs. Quantitative…

Chapter 3 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Chapter 3: Measures of Central Tendency and Variability Imagine that a researcher.

Descriptive Statistics The goal of descriptive statistics is to summarize a collection of data in a clear and understandable way.

Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.

1 Day 1 Quantitative Methods for Investment Management by Binam Ghimire.

1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.

Descriptive Statistics(Summary and Variability measures)

Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.

Introduction Dispersion 1 Central Tendency alone does not explain the observations fully as it does reveal the degree of spread or variability of individual.

Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.

COMPLETE BUSINESS STATISTICS

Descriptive Statistics ( )

Exploratory Data Analysis

Theme 4. Measures of individual position

Business and Economics 6th Edition

Numerical descriptions of distributions

Descriptive Statistics

Chapter 3 Describing Data Using Numerical Measures

Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.

Numerical Descriptive Measures

Measures of Position & Exploratory Data Analysis

Descriptive Statistics (Part 2)

CENTRAL MOMENTS, SKEWNESS AND KURTOSIS

Descriptive Statistics: Overview

Measures of Central Tendency

Chapter 4 Fundamental statistical characteristics II: Dispersion and form measurements.

Describing, Exploring and Comparing Data

Chapter 3 Created by Bethany Stubbe and Stephan Kogitz.

CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.

NUMERICAL DESCRIPTIVE MEASURES

Descriptive Statistics

Description of Data (Summary and Variability measures)

Science of Psychology AP Psychology

Chapter 3 Describing Data Using Numerical Measures

Numerical Descriptive Measures

STA 291 Spring 2008 Lecture 5 Dustin Lueker.

STA 291 Spring 2008 Lecture 5 Dustin Lueker.

Numerical Descriptive Measures

Numerical Descriptive Measures

Numerical Descriptive Statistics

CENTRAL MOMENTS, SKEWNESS AND KURTOSIS

Honors Statistics Review Chapters 4 - 5

MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.

Measures of Dispersion

Business and Economics 7th Edition

Biostatistics Lecture (2).

Presentation transcript:

Theme 3. Group description 1. Introduction. 2. Central tendency: mode, median, arithmetic mean and other measures. Definitions, calculations, characteristics and criteria of use. 3. Variability: Range, Variance, Standard Deviation (sample and population) and other measures (interquartile range, and coefficient of variation). Definitions, calculations, characteristics and criteria of use. 4. Asymmetry: Definition, calculation and interpretation. 5. Kurtosis: Definition, calculation and interpretation. 6. Graphical representation: box plots and error bars.

Central Tendency Measures of central tenden indicate a value representative of the bulk of the data: Example: the data 4,7,5,6,5,4,5,5,5,6,5,4,4, it is clear that (by eye) the center is around five, which could be taken as an index of central tendency. We will see 3 most common indices central tendency (mode, mean and median) first. Then we will see other indices.

(Arithmetic) Mean Formula: It is simply adding all values, and then that amount is divided by the number of values. If we have the data: 4,6,5,3,7 The mean is (4+6+5+3+7)/5=4 Note: You can use weighted means. Consider that there are 2 data, one (5) weighs 0'6 and the other (6) weighs 0.4. Then, the average will be (5 * 6 * 0.6 + 0.4) / (0.6 + 0.4) = 5'4

Properties of the mean -The Sum of differences (all values) relative to the mean is always 0 -If we add a constant to each of the values, the new arithmetic average result will be the original more the constant. If we multiply each value by a constant, the new arithmetic mean is the original mean multiplied by the constant.

Median The median (MDN or Md) is defined as the “middle” value in a sortered data set. For example, in the sequence (ordinate) 3,4,5,6,7,8,9 the median is 6 In the sequence (ordinate) 2,3,4,6,7,9 the median is 5 (the arithmetic mean between the two central values, observe that n is even, in the above example it was odd)

Properties of the median It does not use all the elements It can be calculated with ordinal data It is less affected by atypical data than the arithmetic mean.

The mode The Mode (Mo) is defined as that value of the variable corresponding to the higher frequency. In the data set: 4,5,6,6,3,6,4,5 Mo = 6 properties: --It's not necessarily unique (there may be several fashions) --It is possible to compute mode with a nominal scale --Its calculation does not involve all elements

Which one shall we use? Mean Mode Median

Resistence and robustness Resitent statistics: Those who are not influenced (or only slightly) by small changes in the data. Obviously, the average is a little resistant to changes in statistical data, since it is influenced by each and every one of them. The median, however, is a highly resistant statistician.

Robust (resistent) measures of central tendency Trimmed means They consists in calculating the arithmetic mean of a central subset of the data, not considered a specific ratio p at each end. (P is usually expressed as a percentage). For example, an average 40% cut in a sequence of 10 data involves not taking into account either the 4 higher or 4 lower values. Note that the 0% trimmed mean is the arithmetic mean.

Robust measures of central tendency Trimmed means Calculate the trimmed mean 5% of the following: 3, 4, 4, 5, 5, 6, 7, 8, 9, 11 The value must be 6.11

Robust measures of central tendency Other measures thatalso trim the data One can use a minimum and a maximum value beyond which the data that exceeds such values are removed. For example, in experiments measuring reaction time, it is common to remove data below some lower cutoff (e.g., 200 ms) and beyond some upper cutoff (e.g., 1500 ms is.) Thus, if all the data are in the range 200-1500 ms no data would be deleted

Robust measures of central tendency Other robust stats SPSS offers some robust measures of central tendency (M-estimators) They apply someweighting applied to the data—they are not common in psychology Calculation: SPSS.

3. Variability In the previous section we studied several measures (mean, median, etc.) central tendency. Clearly, to know how representative is the value of such a measure of central tendency, it is also necessary to have a measure of variability. For example, someone may have an average of 5 with the following data (5, 4, 6, 5, 5) and another one having an average of 5 to data (10, 0, 5, 9, 1). Obviously the first subject shows less variability.

How can we measure the variability? A first strategy would be to use the formula But it is always zero A second strategy is to use absolute values But it is tricky to use absolute values… What is left then? Employ the sum of squared differences .... It is the first step for the variance

Variance Formula As we will see in the second half (inferential statistics), the variance is a biased estimator of the population variance; therefore the use of "quasivariance" which is the same except that the variance is divided by n-1 is preferred; the quasivariance is an unbiased estimator of population variance:

Standard deviation Fórmulae An obvious advantage of the standard deviation of the variance is the standard deviation is given in the same units as the original data (in the variance units are squared). NOTE: SPSS always offers the n-1 option (which is the usual one, as it is un unbiased estimator).

Properties of the variance and stand.dev. Variance and SD are essentially positive values. (Notice that the differences on the mean are squared) 2. Neither the variance or desv.típica are altered when we add a constant. Then we know that

Then we know This applies to the SD as well.

If each data point is multiplied by a constant, the new SD is the original SD multiplied by the absolute value of this constant, and new variance is the original multiplied by the square of that constant

Other measures of variability 1. Amplitude It is the difference between the extreme values Its advantage is the simplicity of calculation; the only problem is that it is too sensitive to extreme values 2. Desviación media (DM) DM is rare to find in practice--it is not easy to work with absolute values.

Other measures of variability 3. Semi-interquartile range (Q) It is based on the first and third quartile—it is a robust statistics It may be used when the mean is the best option for central tendency, and it is relatively common. 4. Variation Coefficient (CV) Ratio scale Indicates the number of times the deviation average contains mean: the higher the CV greater variability. There are no units, so it allows the comparison between different variables.

4. Asymmetry In the above two points we have seen measures of central tendency and variability. While obtaining such measures is key to describe a sample and make inferences about the population of origin, it is also essential to know the form of a distribution to obtain an adequate characterization of the shape of the data distribution.

Asymmetry While it is easy to have an idea of whether the distribution is symmetrical or not after seeing the graphical representation (e.g., a histogram or a box-and-whisker diagram), it is important to quantify the asymmetry of a distribution. Recall that when data distribution is symmetric, the mean, median and mode match. (And the distribution is the same shape on the left and right of center) While many psychological distributions is assumed that tend to be symmetrical and unimodal distribution in many cases we find it is asymmetric (e.g., distributions of reaction times on almost any task is positive asymmetric).

Positive asymmetry Negative asymmetry Difficult test Saleries Response times Moda Media Mediana Negative asymmetry Easy test Media Moda Mediana

Index of asymmetry 1. Asymmetry index based on the Mode It is very simple to calculate. It is based on the relationship between the media and mode in symmetric and asymmetric distributions (see previous slide): If the distribution is symmetrical, As will be 0 If the distribution is positively skewed, As will be greater than 0 If the distribution is negatively skewed, As will be less than 0

Índices de asimetría 2. Index of asymmetry based on moments (SPSS) It is based on the difference of the data on the mean and variance, although this time the coefficients rise are cubed If the distribution is symmetrical As will be 0 If the distribution is positively skewed, As will be greater than 0 If the distribution is negatively skewed, As will be less than 0 Disadvantage: Very influenced by atypical-scores

Kurtosis It refers to the shape of distribution in relation to a standard, which is the normal distribution. This standard is the normal distribution: mesokurtic distribution. If the distribution is more peaked than the normal distribution we have a leptokurtic distribution. If the distribution is flatter than the normal distribution we have a platykurtic distribution.

Kurtosis IMPORTANT: Kurtosis is independent of the variability (in the sense of "variance"). A leptokurtic distribution is higly peaked in the center (more than the normal distribution ), it decays very rapidly at first, but in the end is somewhat higher than the normal distribution. That means a leptokurtic distribution is more likely to yield more extreme values than the normal distribution.

Example (Mesokurtic distribution-normal)

index of kurtosis For a normal distribution (mesokurtic) we know that And this will be the reference for the index that will employ kurtosis If the distribution is normal (mesokurtic), the index is 0 If the distribution is leptokurtic, the index is greater than 0 If the distribution is platykurtic, the index is less than 0

More examples

6. Exploring the central tendency, variability and asymmetry in a graph While it is possible to use different graphics to assess the variability (and central tendency, asymmetry, etc.), it is interesting to use box and whisker diagrams . The box is defined by the first quartile and third quartile, with the median within the box. This will be discussed in detail in practice. Here goes an example from Ratcliff, Perea, Colangelo and Buchanan (2004, Brain & Cognition) (see next slide)

6. Viewing the trend, variability and asymmetry in a graph The median is the thick line inside the boxes (between first and third quartiles). "Atypical" scores are presented individually (see there are two types of outliers). Notice that the controls are clearly different from patients in a "boundary separation" and the "non-decision component", while there is much more overlap in the "quality of information".

6. Viewing the trend, variability and asymmetry in a graph In the case of "drift rate" (patients), the distance between P75 and P50 is much smaller than the one between P50 and P25, thus suggesting that there are positive asymmetry. P25 P50 P75 In the case of "non-decision component" (patients), the distance between P75 and P50 is much smaller than that between P50 and P25, suggesting that there is negative asymmetry.