Data description.

Slides:

Advertisements

Similar presentations

DESCRIBING DISTRIBUTION NUMERICALLY

Advertisements

Ch 11 – Probability & Statistics

Descriptive Statistics

B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 1 of 3 Topic 16 Numerically Summarizing Data- Averages.

Intro to Descriptive Statistics

Measures of Central Tendency 3.1. ● Analyzing populations versus analyzing samples ● For populations  We know all of the data  Descriptive measures.

Describing Data: Numerical

Chapter 3 Descriptive Measures

CONFIDENTIAL 1 Grade 8 Algebra1 Data Distributions.

13.2: Measuring the Center and Variation of Data Kalene Mitchell Allie Wardrop Sam Warren Monica Williams Alexis Carroll Brittani Shearer.

Measures of Central Tendency & Spread

Statistics 1. How long is a name? To answer this question, we might collect some data on the length of a name.

Objectives Vocabulary

Descriptive Statistics: Numerical Methods

Table of Contents 1. Standard Deviation

Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.

INVESTIGATION 1.

Chapter 2 Means to an End: Computing and Understanding Averages Part II  igma Freud & Descriptive Statistics.

CHAPTER 3  Descriptive Statistics Measures of Central Tendency 1.

Summary Statistics and Mean Absolute Deviation MM1D3a. Compare summary statistics (mean, median, quartiles, and interquartile range) from one sample data.

Unit 3: Averages and Variations Week 6 Ms. Sanchez.

Summary Statistics: Measures of Location and Dispersion.

LIS 570 Summarising and presenting data - Univariate analysis.

Medical Statistics (full English class) Ji-Qian Fang School of Public Health Sun Yat-Sen University.

Descriptive Statistics(Summary and Variability measures)

CCGPS Coordinate Algebra Unit 4: Describing Data.

Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.

Statistics Unit Test Review Chapters 11 & /11-2 Mean(average): the sum of the data divided by the number of pieces of data Median: the value appearing.

Statistics Review  Mode: the number that occurs most frequently in the data set (could have more than 1)  Median : the value when the data set is listed.

7 th Grade Math Vocabulary Word, Definition, Model Emery Unit 4.

PRESENTATION OF DATA.

Exploratory Data Analysis

Figure 2-7 (p. 47) A bar graph showing the distribution of personality types in a sample of college students. Because personality type is a discrete variable.

Business and Economics 6th Edition

Putting Statistics to Work

Descriptive Measures Descriptive Measure – A Unique Measure of a Data Set Central Tendency of Data Mean Median Mode 2) Dispersion or Spread of Data A.

Descriptive Statistics

Ronald Hui Tak Sun Secondary School

Measures of Central Tendency

Statistics Unit Test Review

Measures of Central Tendency & Center of Spread

Measures of Position & Exploratory Data Analysis

Measures of Central Tendency

Description of Data (Summary and Variability measures)

Chapter 3 Describing Data Using Numerical Measures

Chapter 2 The Mean, Variance, Standard Deviation, and Z Scores

Analyze Data: IQR and Outliers

Measures of Central Tendency & Center of Spread

Theme 4 Describing Variables Numerically

Box and Whisker Plots Algebra 2.

The absolute value of each deviation.

Algebra I Unit 1.

Numerical Descriptive Statistics

Summary (Week 1) Categorical vs. Quantitative Variables

We have Been looking at:

MCC6.SP.5c, MCC9-12.S.ID.1, MCC9-12.S.1D.2 and MCC9-12.S.ID.3

Descriptive Statistics

First Quartile- Q1 The middle of the lower half of data.

Ticket in the Door GA Milestone Practice Test

Putting Statistics to Work

Mean, Median, Mode Year 6/7.

Analyze Data: IQR and Outliers

Tukey Box Plots Review.

Unit 2: Descriptive Statistics

Basic Biostatistics Measures of central tendency and dispersion

Unit 2: Descriptive Statistics

Business and Economics 7th Edition

Starter Put these sets of data in order from smallest to largest:

Presentation transcript:

Data description

Statistics A statistic is a number calculated from the values of variable(s) in a sample. Various statistics are routinely used to describe samples. The following data refer to the total cost of drugs (in Burundi francs) received by 84 adults aged 20-29 visiting ﬁve diﬀerent health centres in the Myinga province of Burundi in 1991-2.

… The data

… There are many statistics that one could calculate from these data - the values of some of the more common ones are listed in the following table.

Medians The median value is the value that halves the distribution, 50% of the values are below and 50% of the values above. So, for example, in the below class of 15 children the median height is 121cm.

… The median by itself is of limited use, so we also ﬁnd the upper (Qu ) and lower (Ql ) quartiles which together with the median (the middle quartile) split the data into four. An idea of the spread is given by calculating the inter-quartile range, IQR = Qu - Ql . For the child height data, the upper quartile is 134cm, the lower quartile is 111cm and the IQR is 23cm.

Means The arithmetic mean is the most commonly used measure of the central value of a distribution. It is the sum of the observations divided by N (the number of observations).

… In the example of childhood height, what is the mean? (103+104+107+111+111+119+121+124+127+133+134+137+140+150)/15 =114.73 This value is very close to the median, this will generally be the case when the data is distributed roughly symmetrically around the central value.

… When, however we have a few extreme values, then the mean and the median can be very diﬀerent. Normal practice would be to use the median as it is far more robust to these extreme values. The mean, however, uses all the information that has been collected, possibly at great time and expense, and so is extensively used. It is possible to perform transformations on the data in order to introduce symmetry and thus use the mean.

Mode The mode is the ‘most frequent’ observation For example, in the drug cost example it is 45.4 (occurs 9 times) In the child height example, it is 111 (occurs 3 times)

In Excel Suppose we have the number of clients placed by an employment agency over a period of 11 working days. The mean can be found using the AVERAGE function, =AVERAGE(B2:N2), which is 27 The median can be found using =MEDIAN(B2:N2) = 23 The interquartile range = QUARTILE(B2:N2,3) - QUARTILE(B2:N2,1) = 20 And the mode, =MODE(B2:N2), which is 15

Weighted averages Suppose that 60% and 70% were obtained in two assignments for this course (well done!) The average mark would be =(60+70)/2=65% However, if the second assignment was deemed to be more important, it might have a higher ‘weight’ than the first. Assume that the second assignment is awarded a weight of 0.7, then first must have 0.3 (as the weights must sum to 1)

… To calculate the overall average we multiply each mark by its weight and then add the weighted marks together (0.3*60%)+(0.7*70%) = 18%+49% = 67% This is 2% higher than the simple average, it is better to get greater marks in harder assignments!

In Excel Note that wa and wb are named cells

COUNTIF Now suppose wanted to see how many of the students passed the course. The pass mark is 40% (put into cell D2 and named as passmark) We can then use IF to see whether a student passed, =IF(C4>passmark, “Pass”, “Fail”) And finally can add up the number of passes using COUNTIF Passes, =COUNTIF(D4:D212, “=Pass”) Fails, =COUNTIF(D4:D212, “=Fail”) The pass and fail rates will then be =E2/(E2+F2) and F2/(E2+F2)

… This is the example 8.6 from Whigham (p143, W8_2.xls) which you might like to try for yourselves.