Download presentation
Presentation is loading. Please wait.
Published byLoraine Holt Modified over 9 years ago
1
Ana Jerončić
2
about half (71+37=108)÷200 = 54% of the bills are “small”, i.e. less than 30 EUR There are only a few telephone bills in the middle range. (18+28+14=60)÷200 = 30% i.e. nearly a third of the phone bills are greater than 75 EUR [EUR] 200 participants
3
2.3 Symmetry A histogram is said to be symmetric if, when we draw a vertical line down the center of the histogram, the two sides are identical in shape and size: Frequency Variable Frequency Variable Frequency Variable
4
A special type of symmetric unimodal histogram is one that is bell shaped Frequency Variable Bell Shaped Many statistical techniques require that the population be bell shaped. Drawing the histogram helps verify the shape of the distribution in question.
5
2.5 Skewness (asymmetry) A skewed histogram is one with a long tail extending to either the right or the left: Frequency Variable Frequency Variable Positively SkewedNegatively Skewed
6
(left)—Serum albumin values in 248 adults FIG 2 (right)—Normal distribution with the same mean and standard deviation as the serum albumin values. Altman D G, and Bland J M BMJ 1995;310:298 ©1995 by British Medical Journal Publishing Group
11
Center of distribution Variability Shape
13
Statistics that show how different units seem similar Parameters of central tendency Mean Median Mode Statistics that show how different units differ Parameters of statistical variability Standard deviation Range Percentils
15
The average arithmetic value of set of numbers Adding all data together and then dividing them by the number of observations (sometimes referred to as n or the sample size) Observations: 3, 4, 5, 6, 7 Total sum: 3+4+5+6+7= 25 Number of observations = 5 Mean = 25/ 5 = 5
17
Calculate the mean of following data: 1, 2, 3, 3, 4, 5 1, 1, 1, 1, 2, 12 =(1+2+3+3+4+5)/6 =3 3
18
Mean is the most commonly used as the measure of central tendency. It is a central point around which the standard deviation is calculated.
19
Mean A =3 Mean B =3
20
-Not a good descriptor of dataset B -Large influence of outliers, especially in small samples (ie. number 12) A B
21
Mean is not a good descriptor of data when distribution is asymmetrical
22
Median is in the Middle
23
Median ordered Median – the middle number in a set of ordered numbers. 1, 3, 7, 10, 13 Median = 7
24
Step 1 – Arrange the numbers in order from least to greatest. 21, 18, 24, 19, 27 18, 19, 21, 24, 27
25
Step 2 – Find the middle number. 21, 18, 24, 19, 27 18, 19, 21, 24, 27 Number that separates the lowest value half and the highest-value half of a sample or a population
26
Centre of the distribution Numbers simply need to be put in order and the middle one is chosen Advantage: 1. More robust to outliers and a better representative of a group in small samples 2. Used in a skewed distribution
27
The value that has the largest number of observations. In a bell curve distribution, the mode is at the peak. Example: 2,2,2,4,5,6,7,7,7,7,8 7 is the most frequent observation (4 times) Mod is 7
28
It is not influenced by the sample size or by intensities of observations However, it may not represent values close to the mean or median Most useful in grouped or categorical data RARELY USED
32
Standard deviation Range Quantiles
33
Smallest interval which contains all the data values. Calculated by substracting smallest observation from the greatest Takes into account outliers (it depends on only two observations) and represents quantitave data well when the sample size is large
34
The interquartile range (IQR) is the range of the middle 50% of the data in a distribution.interquartile range It is computed as follows: IQR = 75th percentile - 25th percentile Data are put in numerical order and then the lower and upper quarter of the data are discarded
35
Advantage: eliminates the risk of misrepresenting data distribution due to outliers
36
The most commonly used measure of data variability. mean Measure of average distance of all data values from the mean.
37
The standard deviation is especially useful measure of data variability when the distribution is normal or approximately normal because the proportion of the distribution within a given number of standard deviations from the mean can be calculated.
38
Mean 1 standard deviation 68% of data!!
39
68% of the distribution is within 1 standard deviation of the mean and approximately 95% of the distribution is within 2 standard deviations of the mean. Example If you observe a normal distribution of your variable with a mean of 50 and a standard deviation of 10, then 68% of the distribution would be between 50 - 10 = 40 and 50 +10 =60. Similarly, about 95% of the distribution would be between 50 - 2 x 10 = 30 and 50 + 2 x 10 = 70.
40
Both distributions have means of 50. The blue distribution has a standard deviation of 5; The red distribution has a standard deviation of 10. For the blue distribution, 68% of the distribution is between 45 and 55; for the red distribution, 68% is between 40 and 60. Figure shows two normal distributions.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.