Summary Statistics When analysing practical sets of data, it is useful to be able to define a small number of values that summarise the main features present.

Slides:



Advertisements
Similar presentations
Measures of Location and Dispersion
Advertisements

Data Analysis Techniques II: Measures of Central Tendencies, Dispersion and Symmetry Advanced Planning Techniques, Lecture 9 Prof. Dr. S. Shabih-ul-Hassan.
Copyright © 2014 by McGraw-Hill Higher Education. All rights reserved.
Describing Data: Measures of Dispersion
Describing Data: Measures of Central Tendency
Chapter 3 Properties of Random Variables
DESCRIPTIVE STATISTICS
1 1 Slide © 2003 South-Western/Thomson Learning TM Slides Prepared by JOHN S. LOUCKS St. Edwards University.
Chapter 3 - Part A Descriptive Statistics: Numerical Methods
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Chapter 3, Numerical Descriptive Measures
Quantitative Analysis (Statistics Week 8)
2 x0 0 12/13/2014 Know Your Facts!. 2 x1 2 12/13/2014 Know Your Facts!
Basic Statistics Measures of Central Tendency.
Lecture (3) Description of Central Tendency. Hydrological Records.
Chapter Three McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved
Measures of Dispersion
DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion.
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-1 Statistics for Business and Economics 7 th Edition Chapter 2 Describing Data:
Intro to Descriptive Statistics
Basic Business Statistics 10th Edition
Introduction to Educational Statistics
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Introduction to Statistics Chapter 3 Using Statistics to summarize.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 3 Describing Data Using Numerical Measures.
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Describing Data: Numerical
Economics 173 Business Statistics Lecture 2 Fall, 2001 Professor J. Petry
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Part II Sigma Freud & Descriptive Statistics
Descriptive Statistics Anwar Ahmad. Central Tendency- Measure of location Measures descriptive of a typical or representative value in a group of observations.
1 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely used)
1 DATA DESCRIPTION. 2 Units l Unit: entity we are studying, subject if human being l Each unit/subject has certain parameters, e.g., a student (subject)
© Copyright McGraw-Hill CHAPTER 3 Data Description.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Chapter 2 Describing Data.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
INVESTIGATION 1.
 IWBAT summarize data, using measures of central tendency, such as the mean, median, mode, and midrange.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Chapter Three McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved
Numerical Measures of Variability
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
LIS 570 Summarising and presenting data - Univariate analysis.
Applied Quantitative Analysis and Practices LECTURE#07 By Dr. Osman Sadiq Paracha.
By Tatre Jantarakolica1 Fundamental Statistics and Economics for Evaluating Survey Data of Price Indices.
Descriptive Statistics(Summary and Variability measures)
Statistics Josée L. Jarry, Ph.D., C.Psych. Introduction to Psychology Department of Psychology University of Toronto June 9, 2003.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
GROUPED DATA LECTURE 5 OF 6 8.DATA DESCRIPTIVE SUBTOPIC
Measures of location and dispersion.
Exploratory Data Analysis
Statistical Methods Michael J. Watts
Business and Economics 6th Edition
Descriptive Statistics
Statistical Methods Michael J. Watts
Topic 3: Measures of central tendency, dispersion and shape
Measures of Central Tendency
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Descriptive Statistics
Description of Data (Summary and Variability measures)
MEASURES OF CENTRAL TENDENCY
Basic Statistical Terms
Measure of Central Tendency
Numerical Descriptive Measures
Presentation transcript:

Summary Statistics When analysing practical sets of data, it is useful to be able to define a small number of values that summarise the main features present. We will derive (i) representative values, (ii) measures of spread and (iii) measures of skewness and other characteristics. Representative Values These are sometimes called measures of location or measures of central tendency. 1. Random Value Given a set of data S = { x 1, x 2, …, x n }, we select a random number, say k, in the range 1 to n and return the value x k. This method of generating a representative value is straightforward, but it suffers from the fact that extreme values can occur and successive values could vary considerably from one another. 2. Arithmetic Mean This is also known as the average. For the set S above the average is x = {x 1 + x 2 + … + x n }/ n. If x 1 occurs f 1 times, x 2 occurs f 2 times and so on, we get the formula x = { f 1 x 1 + f 2 x 2 + … + f n x n } / { f 1 + f 2 + … + f n }, written x = f x / f, where (sigma) denotes a sum.

Example 1. The data refers to the marks that students in a class obtained in an examination. Find the average mark for the class. The first point to note is that the marks are presented as Mark Mid-Point Number ranges, so we must be careful in our of Range of Students interpretation of the ranges. All the intervals x i f i f i x i must be of equal rank and their must be no gaps in the classification. In our case, we interpret the range to contain marks greater than 0 and less than or equal to Thus, its mid-point is 10. The other intervals are interpreted accordingly Sum The arithmetic mean is x = 3000 / 50 = 60 marks. Note that if weights of size f i are suspended x 1 x 2 x x n from a metre stick at the points x i, then the average is the centre of gravity of the f 1 f n distribution. Consequently, it is very sensitive f 2 to outlying values. Equally the population should be homogenous for the average to be meaningful. For example, if we assume that the typical height of girls in a class is less than that of boys, then the average height of all students is neither representative of the girls or the boys.

3. The Mode This is the value in the distribution that occurs most frequently. By common agreement, it is calculated from the histogram using linear interpolation on the modal class. The various similar triangles in the diagram generate the common ratios. In our case, the mode is / 33 (20) = 67.8 marks. 4. The Median This is the middle point of the distribution. It is used heavily in educational applications. If { x 1, x 2, …, x n } are the marks of students in a class, arranged in non-decreasing order, then the median is the mark of the (n + 1)/2 student. It is often calculated from the ogive or cumulative frequency diagram. In our case, the median is / 25 (20) = 64.4 marks. 50 Frequency Cumulative Frequency

Measures of Dispersion or Scattering Example 2. The following distribution has the same MarksFrequency arithmetic mean as example 1, but the values are more x f fx dispersed. This illustrates the point that an average value on its own may not adequately describe a statistical distributions To devise a formula that traps the degree to which a distribution is concentrated about the average, we consider the deviations of the values from the average. Sums If the distribution is concentrated around the mean, then the deviations will be small, while if the distribution is very scattered, then the deviations will be large. The average of the squares of the deviations is called the variance and this is used as a measure of dispersion. The square root of the variance is called the standard deviation and has the same units of measurement as the original values and is the preferred measure of dispersion in many applications. x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 x

Variance & Standard Deviation    VAR[X] = Average of the Squared Deviations =  f { Squared Deviations } /  f =  f { x i - x } 2 /  f =  f x i 2 /  f - x 2, called the product moment formula.   Standard Deviation =  Variance Example 1Example 2 f x f x f x 2 f x f x f x VAR [X] = / 50 - (60) 2 VAR [X] = / 50 - (60) 2 = 372 marks 2 = 756 marks 2

Other Summary Statistics Skewness An important attribute of a statistical distribution relates to its degree of symmetry. The word “skew” means a tail, so that distributions that have a large tail of outlying values on the right-hand-side are called positively skewed or skewed to the right. The notion of negative skewness is defined similarly. A simple formula for skewness is Skewness = ( Mean - Mode ) / Standard Deviation which in the case of example 1 is: Skewness = ( ) / = Coefficient of Variation This formula was devised to standardise the arithmetic mean so that comparisons can be drawn between different distributions.. However, it has not won universal acceptance. Coefficient of Variation = Mean / standard Deviation. Semi-Interquartile Range Just as the median corresponds to the 0.50 point in a distribution, the quartiles Q 1, Q 2, Q 3 correspond to the 0.25, 0.50 and 0.75 points. An alternative measure of dispersion is Semi-Interquartile Range = ( Q 3 - Q 1 ) / 2. Geometric Mean For data that is growing geometrically, such as economic data with a high inflation effect, an alternative to the the arithmetic mean is preferred. It involves getting the root to the power N =  f of a product of terms Geometric Mean = N  x 1 f1 x 2 f2 … x k fk