DESCRIPTIVE MEASURES.

Slides:



Advertisements
Similar presentations
Chapter Three McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved
Advertisements

Class Session #2 Numerically Summarizing Data
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 4. Measuring Averages.
The mean for quantitative data is obtained by dividing the sum of all values by the number of values in the data set.
NUMERICAL DESCRIPTIVE MEASURES
MEASURES OF DISPERSION
NUMERICAL DESCRIPTIVE MEASURES
Calculating & Reporting Healthcare Statistics
Intro to Descriptive Statistics
Introduction to Educational Statistics
Chapter 3: Central Tendency
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Chapter 4 Measures of Central Tendency
Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately describes the center of the.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 4.1 Chapter Four Numerical Descriptive Techniques.
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Summarizing Scores With Measures of Central Tendency
Department of Quantitative Methods & Information Systems
Describing distributions with numbers
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)
Numerical Descriptive Techniques
Central Tendency Introduction to Statistics Chapter 3 Sep 1, 2009 Class #3.
Describing distributions with numbers
Chapter 4 – 1 Chapter 4: Measures of Central Tendency What is a measure of central tendency? Measures of Central Tendency –Mode –Median –Mean Shape of.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
1 CHAPTER 3 NUMERICAL DESCRIPTIVE MEASURES. 2 MEASURES OF CENTRAL TENDENCY FOR UNGROUPED DATA  In Chapter 2, we used tables and graphs to summarize a.
Copyright © 2014 by Nelson Education Limited. 3-1 Chapter 3 Measures of Central Tendency and Dispersion.
INVESTIGATION 1.
Chapter Three McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved
Measures of Central Tendency: The Mean, Median, and Mode
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
1 Review Sections 2.1, 2.2, 1.3, 1.4, 1.5, 1.6 in text.
CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
LIS 570 Summarising and presenting data - Univariate analysis.
The Third lecture We will examine in this lecture: Mean Weighted Mean Median Mode Fractiles (Quartiles-Deciles-Percentiles) Measures of Central Tendency.
Chapter 3 Descriptive Statistics: Numerical Methods.
Descriptive Statistics(Summary and Variability measures)
Data Description Chapter 3. The Focus of Chapter 3  Chapter 2 showed you how to organize and present data.  Chapter 3 will show you how to summarize.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Chapter 4: Measures of Central Tendency. Measures of central tendency are important descriptive measures that summarize a distribution of different categories.
Chapter 3 Numerical Descriptive Measures. 3.1 Measures of central tendency for ungrouped data A measure of central tendency gives the center of a histogram.
PRESENTATION OF DATA.
Measures of Central Tendency and Location
MEASURE of CENTRAL TENDENCY of UNGROUPED DATA
Chapter 2: Methods for Describing Data Sets
NUMERICAL DESCRIPTIVE MEASURES
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Numerical Measures: Centrality and Variability
NUMERICAL DESCRIPTIVE MEASURES
Descriptive Statistics
NUMERICAL DESCRIPTIVE MEASURES
Characteristics of the Mean
Numerical Descriptive Measures
MEASURES OF CENTRAL TENDENCY
STA 291 Summer 2008 Lecture 4 Dustin Lueker.
LESSON 3: CENTRAL TENDENCY
Numerical Descriptive Measures
Numerical Descriptive Statistics
NUMERICAL DESCRIPTIVE MEASURES (Part A)
NUMERICAL DESCRIPTIVE MEASURES
Numerical Descriptive Measures
NUMERICAL DESCRIPTIVE MEASURES
Presentation transcript:

DESCRIPTIVE MEASURES

DESCRIPTIVE MEASURES Frequency distributions and graphs may be considered a first kind of summarization. However, they are not helpful when we need to describe verbally the main futures of a data set. Summaries are extremely useful in understanding and communicating the most important characteristics of a data set.

DESCRIPTIVE MEASURES Example: these techniques can help us to graph data on family incomes. However, we may want to know the income of a “typical” family, the spread of the distribution of incomes, or the location of a family with particular income Spread Position of a particular family $56,260 Center Income

TYPE OF DESCRIPTIVE MEASURES Such questions can be answered using the summary measures. Included among these are: Measures of Central Tendency Measures of Position Measures of Dispersion

MEASURES OF CENTRAL TENDENCIES

Measures of Central Tendencies The simplest and most extreme kind of summary is to reduce the entire group of observations down to one single value that best represents the data set This single-value summary should be value that is typical of the observations of the group (a population or a sample). Measures of central tendency are measures of the location of the middle or the center of a distribution. The definition of "middle" or "center" is purposely left somewhat vague so that the term "central tendency" can refer to a wide variety of measures.

Measures of Central Tendencies Mode for every kind of qualitative variables (“nominal” and “ordinal”) and quantitative variables Median for qualitative “ordinal” variables and quantitative variables Mean only for quantitative variables Nominal variables: the values do not have any quantitative meaning and there is no ordering relationship between them. E.g.: gender. Ordinal variables: there is an order relationship between values, but the difference between two successive modalities is not quantifiable. E.g: a three-point rating scale measuring customer satisfaction (“Not Satisfied”, “Satisfied”, “Very Satisfied”).

Mode Definition: Mode is a French word that means fashion. In statistics, the mode represents the most common value in a data set. The mode is the value that occurs with the highest frequency in a data set.

Mode: examples Mode Highest frequency Mode Highest frequency Stress on Job Frequency (ni) Very Somewhat None 10 14 6 Mode Highest frequency Vehicles Owned Number of Households (ni) 1 2 3 4 5 18 11 Mode Highest frequency

Mode - Grouped data: example When quantitative variables are grouped in classes the mode is defined as the class interval where most observations lie. This is called the modal-class interval. MODAL-CLASS INTERVAL The class interval that occurs with the highest frequency in a data set. Weekly Earnings (dollars) Number of Employees n 400 -| 600 600 -| 800 800 -| 1000 1000 -| 1200 1200 -| 1400 1400 -| 1600 14 22 49 20 9 6 Modal-class interval Highest frequency

Mode: other features One advantage of the mode is that it can be calculated for both kinds of data, quantitative and qualitative. The mode is rarely used as a measure of central tendency for numeric variables. However, for categorical variables, the mode is more useful because the mean and median do not make sense. A data set may have none or many modes: The data set with only one mode is called unimodal. The data set with two modes is called bimodal. The data set with more than two modes is called multimodal.

Median Definition The median is the value of the middle term in a data set that has been ranked in increasing order. In other words, the median divides a ranked data set into two equal parts. The calculation of the median consists of the following two steps Rank the data set in increasing order; Find the middle term in a data set with n values. The value of this term is the median

Median The position of the middle term in a data set with n values is obtained as follows: Position of the middle term= If n is odd If n is even the average of the two middle values Thus, we can redefine the median as follows Median = value of the th term in a ranked data set if n is odd Median = value of the th term in a ranked data set if n is even

Median: example 1 The following data give the weight lost (in pounds) by a sample of five members of a health club at the end of two months of membership: 10 5 19 8 3 We rank the given data in increasing order as follows: 3 5 8 10 19 We find the position of the middle term (n is odd):

Median: example 1 Therefore, the median is the value of the third term in the ranked data. 3 5 8 10 19 The median weight loss for this sample of five members of this health club is 8 pounds. Median

Median: example 2 The table lists the total revenue for the 12 top-grossing North American concert tours of all time Tour Artist Total Revenue (millions of dollars) Steel Wheels, 1989 Magic Summer, 1990 Voodoo Lounge, 1994 The Division Bell, 1994 Hell Freezes Over, 1994 Bridges to Babylon, 1997 Popmart, 1997 Twenty-Four Seven, 2000 No Strings Attached, 2000 Elevation, 2001 Popodyssey, 2001 Black and Blue, 2001 The Rolling Stones New Kids on the Block Pink Floyd The Eagles U2 Tina Turner ‘N-Sync The Backstreet Boys 98.0 74.1 121.2 103.5 79.4 89.3 79.9 80.2 76.4 109.7 86.8 82.1

Median: example 2 We rank the given data in increasing order as follows: 74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7 121.2 We find the position of the middle term (n is even): 74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7 121.2 The median is given by the mean of the sixth and the seventh values in the ranked data.

Median: example 3 The following data consider a questionnaire item on the time involvement of 11 scientists in the 'perception and identification of research problems': Great, Very low or nil, Very great, Very great, Great, Very great, Medium, Low, Great, Medium, Medium. We rank the given data in increasing order in the 'perception and identification of research problems‘ as follows: Very low or nil, Low, Medium, Medium, Medium, Great, Great, Great, Very great, Very great, Very great. Median

Median: frequency distribution In order to find the median using frequency distributions, you must calculate the cumulative frequency distribution. Then, the first value with a cumulative frequency greater than or equal to the position of the middle value is the median. If the position of the middle value is exactly 0.5 more than the cumulative frequency of the previous value, then the median is the midpoint between the two values.

Median: example 4 First value  15.5 Median Stress on Job Frequency (ni) Cumulative frequency None Somewhat Very 6 14 10 20 30 First value  15.5 Median

Median: example 5 First value  20.5 Median Vehicles Owned Number of Households (ni) Cumulative frequency 1 2 3 4 5 18 11 2 + 18 = 20 2 + 18 + 11 = 31 2 + 18 + 11 + 4 = 35 2 + 18 + 11 + 4 + 3 = 38 2 + 18 + 11 + 4 + 3 + 2=40 First value  20.5 Median 0.5 more than the previous cumulative frequency

Mean Definition Also called the arithmetic mean is the most frequently used measure of central tendency and is obtained by dividing the sum of all values by the number of values in the data set. The mean calculated for sample data is denoted by and the mean calculated for the population is denoted by

Mean: example 1 The following data are the 2002 total payrolls of 5 Major League Baseball (MLB) teams. MLB Team 2002 Total Payroll (millions of dollars) Anaheim Angels Atlanta Braves New York Yankees St. Louis Cardinals Tampa Bay Devil Rays 62 93 126 75 34

Mean: example 2 The following are the ages of all 8 employees of a small company. 53 32 61 27 39 44 49 57

Mean: frequency distribution To find the mean of a frequency distribution multiply each value by its frequency and add them up. Then divide by the total number of elements in your data set:

Mean: example 3 Vehicles Owned ( ) Number of Households ( ) 1 2 3 4 5 ( ) Number of Households ( ) 1 2 3 4 5 18 11 0*2=0 1*18=18 2*11=22 3*4=12 4*3=12 5*2=10 Sum 40 74

Mean: frequency distribution with classes When data are organized in classes, we don’t know the values of individuals observations. In these cases the mean is computed as follows: where mi is the midpoint of each class interval

Mean: example 4 The following table gives the frequency distribution of the number of orders received each day during the past 50 days at the office of a mail-order company. Number of orders (x) Number of days ( ni ) mi mi*ni 10 – 12 13 – 15 16 – 18 19 – 21 4 12 20 14 (10+12)/2=11 17 44 168 340 280 Sum n=50 ∑m*n = 832

Mode, Median Mean: relationships Mode: one advantage of the mode is that it can be calculated for both kinds of data, quantitative and qualitative. The mode is rarely used as a measure of central tendency for numeric variables. Median: the median can be computed for at least qualitative ordinal data. The advantage of using the median is that it is not influenced by outliers, and hence in this case it is preferred over the mean Mean: The mean can be calculated only for quantitative data. The mean is influenced by outliers. OUTLIERS or EXTREME VALUES Values that are very small or very large relative to the majority of the values in a data set.

Median, Mean, outlier: example1 The following Table shows the 2000 populations (in thousands) of the five Pacific states. State Population (thousands) Washington Oregon Alaska Hawaii California 5894 3421 627 1212 33.872 Outlier Mean without California= thousand Mean with California= thousand Median=3421 thousand

Mode, Median Mean: relationships For a symmetric histogram and frequency curve with one peak the values of the mean, median, and mode are identical, and they lie at the center of the distribution. For a histogram and a frequency curve skewed to the right, the value of the mean is the largest, that of the mode is the smallest, and the value of the median lies between these two. If a histogram and a distribution curve are skewed to the left, the value of the mean is the smallest and that of the mode is the largest, with the value of the median lying between these two.

MEASURES OF POSITION

MEASURES OF POSITION Definition A measure of position determines the position of a single value in relation to other values in a sample or a population data set. There are many measures of position; we will see quartiles and percentiles

QUARTILES QUARTILES Quartiles are three summary measures that divide a ranked data set into four equal parts. SECOND QUARTILE The second quartile is the same as the median of a data set. FIRST QUARTILE The first quartile is the value of the middle term among the observations that are less than the median. THIRD QUARTILE The third quartile is the value of the middle term among the observations that are greater than the median.

QUARTILES Each of these portions contains 25% of the observations of a data set arranged in increasing order 25% Q1 Q2 Q3 Approximately 25% of the values in a ranked data set are less than Q1 and about 75% are greater than Q1. The second quartile, Q2, divides a ranked data set into two equal parts (median). Approximately 75% of the values in a ranked data set are less than Q3 and about 25% are greater than Q3.

QUARTILES: example 1 The following table lists the total revenue for the 12 top-grossing North American concert tours of all time. Tour Artist Total Revenue (millions of dollars) Steel Wheels, 1989 Magic Summer, 1990 Voodoo Lounge, 1994 The Division Bell, 1994 Hell Freezes Over, 1994 Bridges to Babylon, 1997 Popmart, 1997 Twenty-Four Seven, 2000 No Strings Attached, 2000 Elevation, 2001 Popodyssey, 2001 Black and Blue, 2001 The Rolling Stones New Kids on the Block Pink Floyd The Eagles U2 Tina Turner ‘N-Sync The Backstreet Boys 98.0 74.1 121.2 103.5 79.4 89.3 79.9 80.2 76.4 109.7 86.8 82.1

QUARTILES: example 1 Also the median Values less than the median Values greater than the median Also the median

QUARTILES: example 2 The following are the ages of nine employees of an insurance company: 47 28 39 51 33 37 59 24 33 Values less than the median

PERCENTILES Percentiles are the summary measures that divide a ranked data set into 100 equal parts. Each (ranked) data set has 99 percentiles that divide it into 100 equal parts. The kth percentile is denoted by Pk 1% 1% 1% 1% 1% 1% Each of these portions contains 1% of the observations of a data set arranged in increasing order

PERCENTILES Calculating Percentiles The (approximate) value of the kth percentile, denoted by Pk, is where k denotes the number of the percentile and n represents the sample size.

PERCENTILES: example Refer to the data on revenues for the 12 top-grossing North American concert tours of all time Tour Artist Total Revenue (millions of dollars) Steel Wheels, 1989 Magic Summer, 1990 Voodoo Lounge, 1994 The Division Bell, 1994 Hell Freezes Over, 1994 Bridges to Babylon, 1997 Popmart, 1997 Twenty-Four Seven, 2000 No Strings Attached, 2000 Elevation, 2001 Popodyssey, 2001 Black and Blue, 2001 The Rolling Stones New Kids on the Block Pink Floyd The Eagles U2 Tina Turner ‘N-Sync The Backstreet Boys 98.0 74.1 121.2 103.5 79.4 89.3 79.9 80.2 76.4 109.7 86.8 82.1

PERCENTILES: example Find the value of the 42nd percentile The data arranged in increasing order as follows: The position of the 42nd percentile is 74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7 121.2 Pk = 42nd percentile = 80.2 = $80.2 million Thus, approximately 42% of the revenues in the given data are equal to or less than $80.2 million and 58 % are greater than $80.2 million