Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ana Jerončić. about half (71+37=108)÷200 = 54% of the bills are “small”, i.e. less than 30 EUR There are only a few telephone bills in the middle range.

Similar presentations


Presentation on theme: "Ana Jerončić. about half (71+37=108)÷200 = 54% of the bills are “small”, i.e. less than 30 EUR There are only a few telephone bills in the middle range."— Presentation transcript:

1 Ana Jerončić

2 about half (71+37=108)÷200 = 54% of the bills are “small”, i.e. less than 30 EUR There are only a few telephone bills in the middle range. (18+28+14=60)÷200 = 30% i.e. nearly a third of the phone bills are greater than 75 EUR [EUR] 200 participants

3 2.3 Symmetry A histogram is said to be symmetric if, when we draw a vertical line down the center of the histogram, the two sides are identical in shape and size: Frequency Variable Frequency Variable Frequency Variable

4 A special type of symmetric unimodal histogram is one that is bell shaped Frequency Variable Bell Shaped Many statistical techniques require that the population be bell shaped. Drawing the histogram helps verify the shape of the distribution in question.

5 2.5 Skewness (asymmetry) A skewed histogram is one with a long tail extending to either the right or the left: Frequency Variable Frequency Variable Positively SkewedNegatively Skewed

6 (left)—Serum albumin values in 248 adults FIG 2 (right)—Normal distribution with the same mean and standard deviation as the serum albumin values. Altman D G, and Bland J M BMJ 1995;310:298 ©1995 by British Medical Journal Publishing Group

7

8

9

10

11 Center of distribution Variability Shape

12

13  Statistics that show how different units seem similar  Parameters of central tendency  Mean  Median  Mode  Statistics that show how different units differ  Parameters of statistical variability  Standard deviation  Range  Percentils

14

15  The average arithmetic value of set of numbers  Adding all data together and then dividing them by the number of observations (sometimes referred to as n or the sample size) Observations: 3, 4, 5, 6, 7 Total sum: 3+4+5+6+7= 25 Number of observations = 5 Mean = 25/ 5 = 5

16

17  Calculate the mean of following data:  1, 2, 3, 3, 4, 5  1, 1, 1, 1, 2, 12 =(1+2+3+3+4+5)/6 =3 3

18  Mean is the most commonly used as the measure of central tendency.  It is a central point around which the standard deviation is calculated.

19 Mean A =3 Mean B =3

20 -Not a good descriptor of dataset B -Large influence of outliers, especially in small samples (ie. number 12) A B

21 Mean is not a good descriptor of data when distribution is asymmetrical

22 Median is in the Middle

23 Median ordered Median – the middle number in a set of ordered numbers. 1, 3, 7, 10, 13 Median = 7

24 Step 1 – Arrange the numbers in order from least to greatest. 21, 18, 24, 19, 27 18, 19, 21, 24, 27

25 Step 2 – Find the middle number. 21, 18, 24, 19, 27 18, 19, 21, 24, 27 Number that separates the lowest value half and the highest-value half of a sample or a population

26  Centre of the distribution  Numbers simply need to be put in order and the middle one is chosen  Advantage: 1. More robust to outliers and a better representative of a group in small samples 2. Used in a skewed distribution

27  The value that has the largest number of observations. In a bell curve distribution, the mode is at the peak.  Example: 2,2,2,4,5,6,7,7,7,7,8  7 is the most frequent observation (4 times)  Mod is 7

28  It is not influenced by the sample size or by intensities of observations  However, it may not represent values close to the mean or median  Most useful in grouped or categorical data  RARELY USED

29

30

31

32  Standard deviation  Range  Quantiles

33  Smallest interval which contains all the data values.  Calculated by substracting smallest observation from the greatest  Takes into account outliers (it depends on only two observations) and represents quantitave data well when the sample size is large

34  The interquartile range (IQR) is the range of the middle 50% of the data in a distribution.interquartile range  It is computed as follows: IQR = 75th percentile - 25th percentile Data are put in numerical order and then the lower and upper quarter of the data are discarded

35 Advantage: eliminates the risk of misrepresenting data distribution due to outliers

36  The most commonly used measure of data variability. mean  Measure of average distance of all data values from the mean.

37  The standard deviation is especially useful measure of data variability when the distribution is normal or approximately normal because the proportion of the distribution within a given number of standard deviations from the mean can be calculated.

38 Mean 1 standard deviation 68% of data!!

39  68% of the distribution is within 1 standard deviation of the mean  and approximately 95% of the distribution is within 2 standard deviations of the mean.  Example  If you observe a normal distribution of your variable with a mean of 50 and a standard deviation of 10, then 68% of the distribution would be between 50 - 10 = 40 and 50 +10 =60.  Similarly, about 95% of the distribution would be between 50 - 2 x 10 = 30 and 50 + 2 x 10 = 70.

40  Both distributions have means of 50.  The blue distribution has a standard deviation of 5;  The red distribution has a standard deviation of 10.  For the blue distribution, 68% of the distribution is between 45 and 55; for the red distribution, 68% is between 40 and 60. Figure shows two normal distributions.

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55


Download ppt "Ana Jerončić. about half (71+37=108)÷200 = 54% of the bills are “small”, i.e. less than 30 EUR There are only a few telephone bills in the middle range."

Similar presentations


Ads by Google