Describing and Displaying Quantitative data
Summarizing continuous data Displaying continuous data Within-subject variability Presentation
Summarizing continuous data A quantitative measurement contains more information than categorical. The two most important pieces of information about quantitative measurement are Where is it ? How variable is it? These are the central tendency and measure of spread (variability)
Measure of central tendency
Example students record student\subjectsubj1subj2subj3subj4Mean s s s s s s s s s s
students record student\subjectsubj1subj2subj3subj4Mean s s s s s s s s s s
Mean or average is a statistical sense and efficient. Outliers are single observations which have noticeable influence on the results. These outliers should be excluded from the sample. Outliers should be excluded from the final data summary.
BabyWeight (kg) B11.2 B21.3 B31.4 B41.5 B52.1 Mean1.5
BabyWeight (kg) B11.2 B21.3 B31.4 B41.5 B521 Mean7.89
Medianis estimated by first ordering the data from smallest to largest, and then counting upwards for half of the observations, the center observation in odd samples or the average of middle two observations in even samples.
Example students record student\subjectsubj1subj2subj3subj4MeanMedian s s s s s s s s s s s average Median
Median measure it will not be affected by the outliers.
Mode More is the value that occurs most frequently, if the data grouped then it will be the grouping with highest frequency. It is useful for categorical data to report the most frequent category.
Example
Measures of Dispersion or variability Range and interquartile range Range is the smallest and largest observations, to measure the variability. Example : In age variable we would like to know the youngest and oldest participant. Outliers presence will give distorted impression about the variability
Quartiles namely are lower, median and upper quartile, which divide the data into four equal parts. First order the data and then count the appropriate number from bottom. the interquartile range is useful measure of variability and is given by the difference of lower and upper quartiles.
Example Meanquartiles lower quartiles (25th percentile)51 median quartile (50th percentile) upper quartile (75th percentile) interquartile rangefrom 51 to 67
Interquartile is not vulnerable to outliers. Here we know that 50% of the data lie within the interquartile range
Standard Deviation and Variance
Example students record student\subjectsubj1subj2subj3subj4Standard Deviation s s s s s s s s s s total standard deviation 31.7
Why Standard deviation is useful? Dark blue is less than one standard deviation from the mean. For the normal distribution, this accounts for % of the set; while two standard deviations from the mean (medium and dark blue) account for 95.45%; three standard deviations (light, medium, and dark blue) account for 99.73%; and four standard deviations account for %.normal distribution
Example : The median age of menopause for cases as 50.1 years and the interquartile range is 48.6 to 52.5, thus we know that 50% of the women experienced the menopause within 4-years age range
Displaying Continuous Data A picture worth thousand words, or numbers, so there is no better way to present the data than figures of graph The graph or figure should convey as much information as possible. With one constraint that the reader is not overwhelmed by too much data
Dot plot Example
Histogram : used with huge numerical data, where the data will be divided none overlapping intervals, then counting the number of observations in each. example
Box whisker plot more compact information can be visualized The whiskers in the diagram indicate the minimum and maximum values of the variable under consideration. The median value is indicated by the central horizontal line. The lower and upper quartile by the corresponding horizontal ends of the box. The shaded box itself represents the interquartile range.
The box-whisker plot is used to display median and two measure of spread, namely the range and interquartile.
Scatter plot It used to illustrate the relationship between two continuous variables
Measures of Symmetry Dot and histogram plots give us idea about the shape of the distribution of the data. Symmetric: means if you fold the shape over the central point the two halves will agree other wise will call it skewed, either left skewed or right skewed. If the distribution is symmetric then the mean and the median will be close to each other.
If the distribution is skewed then the median and interquartile range are the approperiate summary measure than mean and standard deviation. Standard deviation and mean are sesitive to the skewness. Example : If we have mean = 1.31 and median = 1.34 we can conclude that the data are reasonably symmetric
Example: If we have the median = 50.1 but it is not exactly in the mid of the first and third quartile of 48.6 and 52.5 which indicate the skewness in the data distribution.
Within the subject variable Measurement taken once for the subject (weight of the baby) and the variability expressed by standard deviation we call it between-subject variability ( the subject not changing frequently) Measurements taken repeatedly on one subject then we are assessing within-subject variability. ( the subject changing frequently)
Within-subject values are unlikely to be independent. Consecutive values will be dependent on values proceeding them In the investigation of total variability it is very important to distinguish within-subject from between-subject variability. The experimenter must be aware of possible sources which contribute to the variation, decide which are of importance in the intended study, and design the study appropriately.
Exercise The age (in years) of a sample of 20 motor cyclists killed in road traffic accidents is given below: Calculate the mean, median, and mode. Calculate the range, inter quartile range and standard deviation. Which of these is better to describe the variability of these data? Draw a dot plot and histogram. Is this distribution symmetric or skewed?
Mean= 30.9 Median= 24 Mode= 24 SD =
Age classesFrequency More2
Min15 Max71 Range56 Quarter 120 Quarter 335 Interquartile15