Dr.Fatima Alkhaledy M.B.Ch.B;F.I.C.M.S/C.M PRESENTATION OF DATA Dr.Fatima Alkhaledy M.B.Ch.B;F.I.C.M.S/C.M
PRESENTATION OF DATA Mathematical presentation Tabular presentation Graphical presentation Pictorial presentation
Measures of Central Tendency Parameter: Descriptive measurement computed from data of population Statistic: descriptive measurement computed from data of a sample
MEAN Arithmetic mean: The sum of all value of a set of observation divided by the number of these observations
MEAN Characteristics of the Mean: A single value Simple, easy to compute and to understand It take in consideration all values in the set ( did not exclude any single value) Greatly affected by extreme value(s)
MEAN Calculated by this equation: ∑ x Mean of population μ =---------- Mean of sample X =--------- n
MEAN Weighted mean: the individual values in the set are weighted by their respective frequencies.
MEDIAN After creating ordered array (arranging data in an ascending or descending order), the median will be the middle value that divides the set of observations into two equal halves.
MEDIAN Characterized by: A single value Simple, easy to compute , and easy to understand Did not take in consideration all observations Not affected by extreme values
MEDIAN Steps in computing the median: Create ordered array Find position of the median which depends on the number of observation in the set: If it is odd no.: position of median= (n+1)/2, the median value is then specified
MEDIAN If it is odd no.: position of median= (n+1)/2, the median value is then specified
MEDIAN If it is even no. we will have 2 positions of the median: n/2 & (n/2)+1, the median will be the mean of the two middle values
MODE The most frequently occurring value in a series of observations. Data distribution with one mode is called unimodal; two modes is called bimodal; more than two is called multimodal distribution. Sometimes the data is nonmodal
MODE It is used for quantitative and qualitative data
MODE To determine the mode in a set of large number of observations, it may be mandatory to create a table showing the frequency distribution of observations values. The most frequent value will be the mode.
Exercise A sample of 15 patients making visits to a health center traveled these distances in miles, calculate measures of central tendency. Distance (mile)(X) Pat. no 13 9 5 1 7 10 2 3 11 15 12 4 14 6 141 T 8
ANSWER ∑ x Mean= ------ = 141/15= 1.4 mile n
ANSWER Median: 1. Arrange data in order: 3,3,5,5,6,7,9,11,12,12,12,13,13,15,15 2. Find the site of the median Since (n=15) is odd number, then the site of the median will be = n+1/2=8 So the median is the 8th value in the ordered array =11 mile
ANSWER f x 2 3 5 1 6 7 9 11 12 13 15 Total Mode: Create a table of frequency distribution of observations in the set: So the mode will be 12 mile since this value had the highest frequency
Measures of dispersion& variability They measure the variability in the values of observations in the set. They also called measures of variation, spread and scatter.
Measures of dispersion & variability If all values are the same the dispersion is zero. If the values are homogenous and close to each other the dispersion is small. If the value are so different the dispersion is large.
Measures of dispersion Range: Is the difference between the largest and smallest value R=XL- XS R=Range XL= largest value, XS= smallest value
Properties of the range: Simple to calculate Easy to understand It neglect all values in the center and depend on the extreme value, extreme value are dependent on sample size
Properties of the range: It is not based on all observations It is not amenable for further mathematic treatment should be used in conjunction with other measures of variability
Variance: The mean sum of squares of the deviation from the mean. e.g. if the data is: 1,2,3,4,5. The mean for these data=3 the difference of each value in the set from the mean: 1-3= -2 2-3= -1 3-3= 0 4-3= 1 5-3= 2 The summation of the differences =zero Summation of square of the differences is not zero
Variance: α =---------------- α =[ N ∑x – (∑ X) ] / N.N Population Variance (sigma squared) 2 2 ∑(X- μ) α =---------------- N 2 2 2 α =[ N ∑x – (∑ X) ] / N.N 2 α= sigma squared(pop.var) X=observation value μ= population mean N=population size ∑x =summation of squared (∑ X)=squared of summation
Variance: Sample Variance _ 2 2 ∑ (X- X ) S=---------------- OR n-1 2 2 2 [ n∑X – (∑X) ] s= ---------------------- n(n-1) 2 S= sample variance n= sample size
Variance: Variance can never be a negative value All observations are considered The problem with the variance is the squared unit
Standard deviation (SD): It is the square root of the variance SD=√sigma square= ± sigma(α)---- for population 2 Sd= √S = ± S----for sample
Standard deviation (SD): The standard deviation measured the variability between observations in the sample or the population from the mean of that sample or that population. The unit is not squared SD is the most widely used measure of dispersion
Standard Error of the mean(SE) It measures the variability or dispersion of the sample mean from population mean It is used to estimate the population mean, and to estimate differences between populations means SE=SD/√ n
Coefficient of variation (CV): It expresses the SD as a percentage of the mean CV= S /mean X100 (mean of the sample) It has no unit It is used to compare dispersion in two sets of data especially when the units are different
Coefficient of variation (CV): It measures relative rather than absolute variation It takes in consideration all values in the set
EXERCISE For the same 15 patients in the previous example , calculate measures of dispersion.
2 X Distance (mile)(X) Pat. no 169 13 9 25 5 1 49 7 10 81 3 11 121 225 15 12 4 144 14 6 1575 141 Total 36 8
Range R=XL- XS =15-3 =12 mile
Variance & sd 2 n∑X – (∑X) s= ---------------------- n(n-1) 2 2 2 2 n∑X – (∑X) s= ---------------------- n(n-1) 2 =(15)(1575) – (141)/ 15 x 14 =17.8 mile sd= √17.8 = ± 4.2 mile
Standard Error SE=SD/√ n =4.2/√15 = 4.2/3.87 = 1.085 mile
Coefficient of Variation CV= S /mean X100 = 4.2 mile/ 9.4 mile X 100% =44.7%
EXERCISE The following are the hemoglobin values (gm/dl) of 10 children receiving treatment for hemolytic anemia: 9.1,10.0, 11.4, 12.4, 9.8,8.3, 9.9, 9.1, 7.5, 6.7 Compute the sample mean, median, variance, and standard deviation
EXERCISE A sample of 11 patients admitted to a psychiatric ward experienced the following lengths of stay, calculate measures of central tendency and dispersion. length No. 28 7 29 1 14 8 2 18 9 11 3 22 10 24 4 5 total 6
Measures of central tendency & dispersion of grouped data These measures are calculated after making certain assumptions, this will make these measures less accurate than those calculated from raw data
Mean of grouped data: We assume that all values within each class interval are located at the midpoint of that class interval. The midpoint of the class interval is obtained by computing the mean of the upper and the lower limits of the interval.
Mean of grouped data: To find the mean we multiply each midpoint by the corresponding frequency, sum these products, and divide by the sum of the frequencies Mean=∑mf /∑ f m= midpoint of the class interval
Median of grouped data: We assume that the values within a class interval are evenly distributed throughout the width of the class interval Find the class interval in which the median is located= n/2, from this value & from cumulative frequency we can locate the class interval containing the median.
Median of grouped data: Median=L + j/f (U-L) U-L= width of class interval L= lower limit of interval containing the median U= upper limit of interval containing the median j=No of observations still lacking to reach the site of the median f= frequency of the interval containing the median Note: j= n/2 – cumulative frequency of previous class interval
Mode of grouped data: We assume that all values in the class interval fall at the midpoint Modal class is the class interval with the highest frequency Modal point is the midpoint of the modal class
Variance and standard deviation of grouped data: We assume that all values falling into a particular class interval are located at the midpoint of that interval.
Variance and standard deviation of grouped data: 2 2 2 S= n∑m f – (∑mf) /n (n -1) 2 2 2 α=N∑m f – (∑mf) / N.N 2 S= √ S= ± S
EXERCISE The following are the S.HDL-C (mg/dl) of 90 individuals. Calculate the mean median, modal class modal point, variance and SD f S.HDL-C mg/dl 7 30-34 16 35-40 33 40-44 21 45-49 13 50-54 90 Total
1.Calculate cf and midpoint S.HDL-C mg/dl 32 7 30-34 37 23 16 35-40 42 56 33 40-44 47 77 21 45-49 52 90 13 50-54 Total
2.Multiply m X f mf m cf f S.HDL-C mg/dl 224 32 7 30-34 592 37 23 16 35-40 1386 42 56 33 40-44 987 47 77 21 45-49 676 52 90 13 50-54 3865 Total
3.Find the Mean _ ∑mf 3865 X=----------- = ----------- = 42.94 mg/dl
4.find the median n 90 Site of the median=--------- =----- = 45 2 2 2 2 The median is located in the class interval:(40-44) Median =L+j/f(w)=40+22/33(5)=43.3 mg/dl
5.Find modal class &modal point The Modal class is (40-44) since it has the highest frequency The modal point is the midpoint of the modal class=42 mg/dl
6.Find m2 & m2f m2f m2 mf m cf f S.HDL-C mg/dl 7168 1024 224 32 7 30-34 21904 1369 592 37 23 16 35-40 58212 1764 1386 42 56 33 40-44 46389 2209 987 47 77 21 45-49 35152 2704 676 52 90 13 50-54 168825 3865 Total
7.Find variance & Sd S= n∑m f – (∑mf) /n (n -1) 2 2 2 2 S= n∑m f – (∑mf) /n (n -1) 2 =(90)(168825) _ (3865) /90 x 89 =31.96 (mg/dl) Sd= √31.96= ±5.65 mg/dl
EXERCISE The following are the ages of 57 individuals (year). Find the mean, median, modal class, modal point, variance and standard deviation f Age (year) 5 10-19 19 20-29 10 30-39 13 40-49 4 50-59 60-69 2 70-79 57 Total