Data description.

Slides:



Advertisements
Similar presentations
DESCRIBING DISTRIBUTION NUMERICALLY
Advertisements

Ch 11 – Probability & Statistics
Descriptive Statistics
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 1 of 3 Topic 16 Numerically Summarizing Data- Averages.
Intro to Descriptive Statistics
Measures of Central Tendency 3.1. ● Analyzing populations versus analyzing samples ● For populations  We know all of the data  Descriptive measures.
Describing Data: Numerical
Chapter 3 Descriptive Measures
CONFIDENTIAL 1 Grade 8 Algebra1 Data Distributions.
13.2: Measuring the Center and Variation of Data Kalene Mitchell Allie Wardrop Sam Warren Monica Williams Alexis Carroll Brittani Shearer.
Measures of Central Tendency & Spread
Statistics 1. How long is a name? To answer this question, we might collect some data on the length of a name.
Objectives Vocabulary
Descriptive Statistics: Numerical Methods
Table of Contents 1. Standard Deviation
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
INVESTIGATION 1.
Chapter 2 Means to an End: Computing and Understanding Averages Part II  igma Freud & Descriptive Statistics.
CHAPTER 3  Descriptive Statistics Measures of Central Tendency 1.
Summary Statistics and Mean Absolute Deviation MM1D3a. Compare summary statistics (mean, median, quartiles, and interquartile range) from one sample data.
Unit 3: Averages and Variations Week 6 Ms. Sanchez.
Summary Statistics: Measures of Location and Dispersion.
LIS 570 Summarising and presenting data - Univariate analysis.
Medical Statistics (full English class) Ji-Qian Fang School of Public Health Sun Yat-Sen University.
Descriptive Statistics(Summary and Variability measures)
CCGPS Coordinate Algebra Unit 4: Describing Data.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Statistics Unit Test Review Chapters 11 & /11-2 Mean(average): the sum of the data divided by the number of pieces of data Median: the value appearing.
Statistics Review  Mode: the number that occurs most frequently in the data set (could have more than 1)  Median : the value when the data set is listed.
7 th Grade Math Vocabulary Word, Definition, Model Emery Unit 4.
PRESENTATION OF DATA.
Exploratory Data Analysis
Figure 2-7 (p. 47) A bar graph showing the distribution of personality types in a sample of college students. Because personality type is a discrete variable.
Business and Economics 6th Edition
Putting Statistics to Work
Descriptive Measures Descriptive Measure – A Unique Measure of a Data Set Central Tendency of Data Mean Median Mode 2) Dispersion or Spread of Data A.
Descriptive Statistics
Ronald Hui Tak Sun Secondary School
Measures of Central Tendency
Statistics Unit Test Review
Measures of Central Tendency & Center of Spread
Measures of Position & Exploratory Data Analysis
Measures of Central Tendency
Description of Data (Summary and Variability measures)
Chapter 3 Describing Data Using Numerical Measures
Chapter 2 The Mean, Variance, Standard Deviation, and Z Scores
Analyze Data: IQR and Outliers
Measures of Central Tendency & Center of Spread
Theme 4 Describing Variables Numerically
Box and Whisker Plots Algebra 2.
The absolute value of each deviation.
Algebra I Unit 1.
Numerical Descriptive Statistics
Summary (Week 1) Categorical vs. Quantitative Variables
We have Been looking at:
MCC6.SP.5c, MCC9-12.S.ID.1, MCC9-12.S.1D.2 and MCC9-12.S.ID.3
Descriptive Statistics
First Quartile- Q1 The middle of the lower half of data.
Ticket in the Door GA Milestone Practice Test
Putting Statistics to Work
Mean, Median, Mode Year 6/7.
Analyze Data: IQR and Outliers
Tukey Box Plots Review.
Unit 2: Descriptive Statistics
Basic Biostatistics Measures of central tendency and dispersion
Unit 2: Descriptive Statistics
Business and Economics 7th Edition
Starter Put these sets of data in order from smallest to largest:
Presentation transcript:

Data description

Statistics A statistic is a number calculated from the values of variable(s) in a sample. Various statistics are routinely used to describe samples. The following data refer to the total cost of drugs (in Burundi francs) received by 84 adults aged 20-29 visiting five different health centres in the Myinga province of Burundi in 1991-2.

… The data

… There are many statistics that one could calculate from these data - the values of some of the more common ones are listed in the following table.

Medians The median value is the value that halves the distribution, 50% of the values are below and 50% of the values above. So, for example, in the below class of 15 children the median height is 121cm.

… The median by itself is of limited use, so we also find the upper (Qu ) and lower (Ql ) quartiles which together with the median (the middle quartile) split the data into four. An idea of the spread is given by calculating the inter-quartile range, IQR = Qu - Ql . For the child height data, the upper quartile is 134cm, the lower quartile is 111cm and the IQR is 23cm.

Means The arithmetic mean is the most commonly used measure of the central value of a distribution. It is the sum of the observations divided by N (the number of observations).

… In the example of childhood height, what is the mean? (103+104+107+111+111+119+121+124+127+133+134+137+140+150)/15 =114.73 This value is very close to the median, this will generally be the case when the data is distributed roughly symmetrically around the central value.

… When, however we have a few extreme values, then the mean and the median can be very different. Normal practice would be to use the median as it is far more robust to these extreme values. The mean, however, uses all the information that has been collected, possibly at great time and expense, and so is extensively used. It is possible to perform transformations on the data in order to introduce symmetry and thus use the mean.

Mode The mode is the ‘most frequent’ observation For example, in the drug cost example it is 45.4 (occurs 9 times) In the child height example, it is 111 (occurs 3 times)

In Excel Suppose we have the number of clients placed by an employment agency over a period of 11 working days. The mean can be found using the AVERAGE function, =AVERAGE(B2:N2), which is 27 The median can be found using =MEDIAN(B2:N2) = 23 The interquartile range = QUARTILE(B2:N2,3) - QUARTILE(B2:N2,1) = 20 And the mode, =MODE(B2:N2), which is 15

Weighted averages Suppose that 60% and 70% were obtained in two assignments for this course (well done!) The average mark would be =(60+70)/2=65% However, if the second assignment was deemed to be more important, it might have a higher ‘weight’ than the first. Assume that the second assignment is awarded a weight of 0.7, then first must have 0.3 (as the weights must sum to 1)

… To calculate the overall average we multiply each mark by its weight and then add the weighted marks together (0.3*60%)+(0.7*70%) = 18%+49% = 67% This is 2% higher than the simple average, it is better to get greater marks in harder assignments!

In Excel Note that wa and wb are named cells

COUNTIF Now suppose wanted to see how many of the students passed the course. The pass mark is 40% (put into cell D2 and named as passmark) We can then use IF to see whether a student passed, =IF(C4>passmark, “Pass”, “Fail”) And finally can add up the number of passes using COUNTIF Passes, =COUNTIF(D4:D212, “=Pass”) Fails, =COUNTIF(D4:D212, “=Fail”) The pass and fail rates will then be =E2/(E2+F2) and F2/(E2+F2)

… This is the example 8.6 from Whigham (p143, W8_2.xls) which you might like to try for yourselves.