Central tendency and spread

Slides:



Advertisements
Similar presentations
Descriptive Measures MARE 250 Dr. Jason Turner.
Advertisements

Descriptive Statistics
DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion.
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Intro to Descriptive Statistics
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Introduction to Statistics Chapter 3 Using Statistics to summarize.
Central Tendency and Variability Chapter 4. Central Tendency >Mean: arithmetic average Add up all scores, divide by number of scores >Median: middle score.
July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data.
Quiz 2 Measures of central tendency Measures of variability.
Describing Data: Numerical
(c) 2007 IUPUI SPEA K300 (4392) Outline: Numerical Methods Measures of Central Tendency Representative value Mean Median, mode, midrange Measures of Dispersion.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
M07-Numerical Summaries 1 1  Department of ISM, University of Alabama, Lesson Objectives  Learn when each measure of a “typical value” is appropriate.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Central Tendency and Variability Chapter 4. Variability In reality – all of statistics can be summed into one statement: – Variability matters. – (and.
Lecture 3 Describing Data Using Numerical Measures.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
INVESTIGATION 1.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
Agenda Descriptive Statistics Measures of Spread - Variability.
Chapter 2 Means to an End: Computing and Understanding Averages Part II  igma Freud & Descriptive Statistics.
Basic Statistical Terms: Statistics: refers to the sample A means by which a set of data may be described and interpreted in a meaningful way. A method.
LECTURE CENTRAL TENDENCIES & DISPERSION POSTGRADUATE METHODOLOGY COURSE.
Statistics topics from both Math 1 and Math 2, both featured on the GHSGT.
LIS 570 Summarising and presenting data - Univariate analysis.
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.
Medical Statistics (full English class) Ji-Qian Fang School of Public Health Sun Yat-Sen University.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Descriptive Statistics(Summary and Variability measures)
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Statistics -Descriptive statistics 2013/09/30. Descriptive statistics Numerical measures of location, dispersion, shape, and association are also used.
Lecture 8 Data Analysis: Univariate Analysis and Data Description Research Methods and Statistics 1.
MAT 135 Introductory Statistics and Data Analysis Adjunct Instructor
Chapter 4 – Statistics II
Descriptive statistics
Descriptive Statistics ( )
Exploratory Data Analysis
Chapter 1: Exploring Data
Business and Economics 6th Edition
Descriptive Measures Descriptive Measure – A Unique Measure of a Data Set Central Tendency of Data Mean Median Mode 2) Dispersion or Spread of Data A.
Chapter 3 Describing Data Using Numerical Measures
Data Mining: Concepts and Techniques
Central Tendency and Variability
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Descriptive Statistics
Description of Data (Summary and Variability measures)
Summary descriptive statistics: means and standard deviations:
Chapter 3 Describing Data Using Numerical Measures
Normality or not? Different distributions and their importance
CHAPTER 1 Exploring Data
Stats Club Marnie Brennan
Numerical Descriptive Measures
Stats Club Marnie Brennan
BUS7010 Quant Prep Statistics in Business and Economics
Descriptive and inferential statistics. Confidence interval
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Summary descriptive statistics: means and standard deviations:
Chapter 1: Exploring Data
Numerical Descriptive Measures
Chapter 1: Exploring Data
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Descriptive Statistics
CHAPTER 1 Exploring Data
Lecture 4 Psyc 300A.
Business and Economics 7th Edition
Presentation transcript:

Central tendency and spread Stats Club 7 Marnie Brennan

References Petrie and Sabin - Medical Statistics at a Glance: Chapter 5, 6, 10, 35 Good Petrie and Watson - Statistics for Veterinary and Animal Science: Chapter 2, 4 Good Thrusfield – Veterinary Epidemiology: Chapter 12 Kirkwood and Sterne – Essential Medical Statistics: Chapter 4

Terminology! Along similar lines of previous Stats Clubs, we are talking about ways of describing your numerical data Gives you basic calculations to do to explore your data (get a feel for it) Enables you to compare your data with those collected by other researchers

Central tendency Central tendency = a measure of location or position of data, i.e. the ‘average’ This basically means calculating things like: Mean (arithmetic mean) Median Mode Others E.g. geometric mean (distn. skewed to the right), weighted mean Nice table in Petrie and Sabin (Chapter 5) summarising advantages and disadvantages of all measurements

Central tendency – Mean, Median Mean = Sum of your data/total number of measurements Algebraically defined Affected by skewed data THEREFORE good to use for normally distributed variables Median = The midpoint of your values i.e. what the ‘halfway’ value in your data is If the observations are arranged in increasing order, the median would be the middle value Not algebraically defined Not affected by skewed data THEREFORE good to use for non-normally distributed variables

Distributions Median Mean Mean and median the same

Central tendency - Mode Mode = the value that occurs the most frequently in a data set Generally means more if you have discrete data e.g. The most common litter size of bearded collie dogs is 7 Not often used What is the mode?

Spread Spread = measure of dispersion or variability (variation) of data This basically means calculating things like: Range Percentiles (Quartiles, Interquartile range) Variance Standard deviation Others E.g. coefficient of variation Nice table in Petrie and Sabin (Chapter 6) summarising main points about these measurements

Range and percentiles Range = the range between the minimum and maximum values of your data Gives an indication of spread at a very basic level Distorted by outliers (get a large range) Percentiles = if data is ordered from lowest to highest, these divide the data up into ‘compartments’ E.g. The 5th percentile = the point along the data below which 5% of the data lies; the 20th percentile = the point in the data below which 20% of the data lies Special types of percentiles are called ‘quartiles’ – these divide the data into 4 equal parts (the 25th, 50th and 75th percentiles) From these, you get an ‘interquartile range’ - IQR, which is values between the 25th and 75th percentiles The 50th percentile is the median Not distorted by outliers

Range = 22-28 (6) Q1 (25th percentile) = 24 Q3 (75th percentile) = 26 IQR = 24-26 (2) Range = 0.12-134 (133.9) Q1 (25th percentile) = 6 Q3 (75th percentile) = 36 IQR = 6-36 (30) What conclusions can we draw about what to use when??

Rule of thumb Mean and range = good to use for normally distributed variables Median and interquartile range = good to use for non-normally distributed variables

Variance Variance = the deviations of the data values from the mean e.g. If the data are bunched around the mean, the variance is small; if the data are spread out, the variance is large Calculated by squaring each distance between the observations and the mean - we then take the mean of this (add all values together and divide by the total number of observations minus 1) Reason for squaring – the negative values are ‘cancelled’ out DON’T WORRY ABOUT HOW TO DO THIS! This is what computers are for! Measured in the same units as the observations, but squared e.g. If the units are grams, the variance will be in grams squared

Mean = 26 Variance = 430 Mean = 23 Variance = 11090

Example If we had 6 observations (with mean = 0.17): 15, 18, -14, -17, -3 and 2 What is the variance? = (15 – 0.17)2 + (18-0.17) 2 + (-14 – 0.17) 2 + (-17 – 0.17) 2 + (-3 – 0.17) 2 + (2-0.17) 2/6-1 = 209.37 It is n-1 to reduce bias (we have a sample vs. whole population - again don’t worry too much!)

Standard Deviation (SD) Standard deviation = square root of the variance Similar to variance – also relates to the deviations of the observations from the mean (the ‘average’ deviation) Therefore the units are the same as for the observations – more convenient than variance? If we have a normally distributed dataset, then the mean +/- 2 x standard deviations approximately encompasses the central 95% of observations

Mean = 26 Variance = 430 Standard deviation = 21 Mean = 23 Variance = 11090 Standard deviation = 105

http://www.stark-labs.com/help/blog/files/StackingMethods.php

What about the standard error of the mean (SE or SEM)? Relates to the precision of the sample mean as an estimate of the population mean Standard deviation describes the variation in the data values from the mean Can use SEM to construct confidence intervals This will be covered in greater detail in another session

General rule Standard deviation, variance and SEM are for normally distributed variables For non-normally distributed variables, stick with interquartile range

Examples in papers

= Equal variances? Comparing groups of numerical values It is an assumption of some of the tests used to compare different numerical data groups (e.g. T-tests, ANOVAs) that the variances must be equal (homogeneity of variance) in the groups compared You need to know whether your groups meet these criteria – if they do not: Use other non-parametric tests Transform your data to fit the assumptions Again DON’T WORRY – we’ll cover this in another session

When we start again… The bunfight that is: P-values.................! Type I and Type II errors