Chapter 3 Fundamental statistical characteristics I: Measures of central tendency
2 Fundamental statistical characteristics Group indexes Central tendency Variability (Dispersion) Bias (Asymmetry) Skewness (Kurtosis) Individual indexes Position Centiles (C i ) Percentiles (P i ) Quartiles (Q i ) Raw scores (X i ) Differentials scores (x i ) Standard scores (Z i )
3 Which value represents the whole? Around which value are the majority of the data? Central tendency indexes ModeMeanMedian
4 How is the data arranged with respect to the distribution center? How far or together are the data from each other? Variability or dispersion indexes AtS2S2 C.VQS
5 How are the data arranged with respect to the rest? Are data piled at one end? Bias or Asymmetry indexes g1
6 Which form is the distribution? Is it flattened or sharp? Skewness or Kurtosis indexes g2
7 Central tendency indexes
8 To describe a data distribution we need at least two statistics: 1. One that reflects the central tendency: value which represents the whole. Value around which the majority of the data is placed. 2. Another that reflects the dispersion around this value, if the data are far apart or close together with respect to the central value.
9 Central tendency measure Is a brief description of a mass of data, usually obtained from a sample. Serves to describe, indirectly, the population from which the sample was extracted. Representative sample; the average of their values will say a lot of the average we would get on the population they represent.
10 1. Which is the value most often repeated? Mo
11 Mode (Mo) “The most often repeated value”. “The value most frequently observed in a sample or population”. “The variable value with the highest absolute frequency”. It is symbolized by Mo (Fechner y Pearson)
12 Type I distributions: Small data set a) Unimodal distribution: Data: [8 – 8 – 11 – 11 – 15 – 15 – 15 – 15 – 15 – 17 – 17 – 17 – ] Mo = 15
13 b) Amodal distribution: Data: [8 – 8 – 8 – 11 – 11 – 11 – 15 – 15 – 15 – 17 – 17 – 17 – 19 – 19 –19] Without mode
14 c) Bimodal distribution: Data: [8 – 9 – 9 – 10 – 10 – 10 – 10 – 11 – 11 – 13 – 13 – 13 – 13 – 15] Mo 1 = 10 Mo 2 = 13
15 d) Multimodal distribution: Data: [8 – 8 – 9 – 9 – 9 – 10 – 11 – 11 – 11 – 12 – 12 – 13 – 13 – 13 – 14 – ] Mo 1 = 9 Mo 3 = 13 Mo 2 = 11
16 Type II distributions: Big data set XiXi fifi Frequency table a) Unimodal distribution: Mo = 14 MOST OFTEN REPEATED VALUE
17 XiXi fifi b) Bimodal distribution: Mo 1 = 2 y Mo 2 = 6 MOST OFTEN REPEATED VALUES
18 Complete the table if you know that the modes are: -2, -1 y 5 and that f 3 = f 4 XiXi fifi fr i %i%i -25 0O,
19 2. What is the average score in motivation?
20 Arithmetic mean It is the central tendency index most commonly used Definition: “It is the sum of all observed values divided by the total number of them”.
21 Type I distributions: Small data set Example: The following are 10 numbers remembered by 10 children in a immediate memory task 6 – 5 – 4 – 7 – 5 – 7 – 8 – 6 – 7 - 8
22 6 – 5 – 4 – 7 – 5 – 7 – 8 – 6 –
23 In the following serie, the “center of gravity” is: 3 – 10 – 8 – 4 – 7 – 6 – 9 – 12 – 2 – 4
25 Type II distributions: Big data set MEAN FREQUENCY TABLE Possibility 1:
26 XiXi fifi fiXifiXi 033 * * * * * 44 Frequency tables
28
29
30 Clearances
31 XiXi fifi fr i fr i X i 030,150,00 160,30 270,350,70 330,150,45 410,050,20 Possibility 2 (derived from possibility 1) :
32 3. Which is the value exceeded by half of the subjects? Mdn
33 Median (Mdn) Definitions: It is the distribution point that divides it into 2 equal parts. It is the value with the property that the number of observations smaller than itself is equal to the number of observations higher than itself. It is the value that holds the central point of an ordered series of data. 50% of the values are above and the other 50% is below the central value.
34 Graphic representation It is defined as a point (a value), not like a data or particular measure. A point whose value does not necessarily have to match any observed values.
35 ODD data set: [7 – 11 – 6 – 5 – 7 – 12 – 9 – 8 – 10 – 6 – 9] 1º) Data is sorted from the lowest to the highest: [5 – 6 – 6 – 7 – 7 – 8 – 9 – 9 – 10 – 11 – 12] 2º) Central value is obtained: Type I distributions: Small data set
36 Mdn = 8 1º5 2º6 3º6 4º7 5º7 6º8 7º 9 8º9 9º10 10º11 11º12 [5 – 6 – 6 – 7 – 7 – 8 – 9 – 9 – 10 – 11 – 12]
37 EVEN data set: [23 – 35 – 43 – 29 – 34 – 41 – 33 – 38 – 38 – 32] 1º) Data is sorted from the lowest to the highest: [ 23 – 29 – 32 – 33 – 34 – 35 – 38 – 38 – 41 – 43] 2º)
38 1º2º3º4º5º6º7º8º9º10º [23 – 29 – 32 – 33 – 34 – 35 – 38 – 38 – 41 – 43]
39 Frequency tables Example: n= 36 To be even, there are 2 central data 36/2=18. Central point between 18 and 19 (18’5) x 18 =x 19 =10; x 18’5 =10 Type II distributions: Big data set XiXi fifi FiFi
40 Comparison between measures of central tendency If there aren’t arguments against, we always prefer the mean: Other statistics are based on the mean. It's the best estimator of their parameter. We prefer the median: When the variable is ordinal. When there exists very extreme data. When there exist open intervals. We prefer the mode: When the variable is qualitative or nominal. When the open interval matches the median.
41 Degree of agreement to consider "shouting" as a sign of aggression XiXi fifi
42 Number of rituals that students do before an exam XiXi fifi
43 Position measures
44 Central tendency measures: used to indicate around which particular value a concrete data set is placed. Position measures: used to provide information about the relative position in which a case is with respect to the data set which it belongs to. Are used to interpret specific data.
45 Quantiles MdnQuartilesPercentilesDeciles
46 Quantiles Mdn: divides the distribution in 2 parts : Quartiles (Q k ): divide the distribution in 4 parts : Q 1, Q 2, Q 3 : i/k = 1/4 Deciles (D k ): divide the distribution in 10 parts : D 1, D 2,..., D 9 : i/k = 1/10 Percentiles (P k ): divides the distribution in 100 parts : P 1, P 2,..., P 99 : i/k = 1/100 They divide the distribution in K parts with the same amount of data. i/k = ½
47 Quantiles graphic representation
48 Calculating the value that corresponds to a particular quantile 1. Translate the position measurement to an absolute position 2. Find out the value for the data that occupies the absolute position of our interest The question is: What value takes the position...? QuantilPositionFiFi Value iF.A.X E + D(X E+1 - X E )
49 E.g. 7th decile corresponds to the position 20; Which value is shown by the data that takes the absolute position 20?
50 XiXi fifi FiFi Example: Q 3
51 XiXi fifi FiFi Positions 2331 – – – – – – – Example: Q 3 1. 2. Value = 6
52 XiXi fifi FiFi Example: D 6
53 XiXi fifi FiFi Positions – – – – – – Example: D 6 1. 2. Value = 6
54 XiXi fifi FiFi Positions – – – – – – Example: P 13 1. 2. Value = 3.36 (see next) INTERPOLATE
55 Interpolate X E+D =X E + D(X E+1 - X E ) X E+D = Value which corresponds to the quantile X E = Entire position value D= Decimal part of the position X E+1 = Previous position value P 13 = (4 - 3)= – – 19
56 1ºCentile i 2ºPosition i/k(n + 1) 3ºFiFi F.A. 4ºValue X E + D(X E+1 - X E ) PERCENTILE Percentile: It is the value that leaves beneath a percentage of the subjects, as passed by the percentage of the remaining subjects CALCULATION ABSTRACT:
57 Xfifi FiFi PERCENTILES Examples C 50 = P 50 = Q 2 = D 5 = Mdn CentilePosition Cumulative Frequency Value 5040,5Between 40 and 412+0,5(2-2) = 2
58 Centile Xfifi Calculate the following centiles
59 CentilePosition Cumulative frenquency Value 2520,25Between 20 and 211+0,25(1-1) = ,75Between 60 and 613+0,75(3-3) = ,78Between 30 and 311+0,78(2-1) = 1, ,90Between 72 and 734+0,90(4-4) = ,95Between 76 and 774+0,95(5-4) = 4,95
60 A very common example In babies case, we use the percentile to evaluate the growth of babies. The most used are for the weight and size of baby. Imagine Juanita’s case, our cousin. What’s meaning that Juanita is in size 25 percentil? This means that for each 100 babies, 70 weighed more than her (24 weighed less than her). What if she was in the high 80 percentile? This means that for each 100 babies, 20 taller than her (79 were smaller than her)
61 Complete the table, knowing that the distribution is bimodal and the mean and the median have the same value XiXi fifi FiFi
62 WomenMan XiXi fifi XiXi fifi Calculate 84 percentile in the man’s sample and the 3 quartile in the women sample Results obtained from a sample of women and a man in the province of Seville in a psychological test are contained in the following tables: