Data Display and Summary Kwak kichang
Type of data Quantitative variables can be continuous or discrete. Categorical variables are either nominal(unordered) or ordinal(ordered)
Type of data
In general it is easier to summarize categorical variables, and so quantitative variables are often converted to categorical ones for descriptive purposes. However, categorizing a continuous variable reduces the amount of information available and statistical tests will in general be more sensitive that is they will have more power for a continuous variable than the corresponding nominal one, although more assumptions may have to be made about the data. Categorizing data is therefore useful for summarizing results, but not for statistical analysis.
Stem and leaf plots It is based on separating each observation into two part. A stem, consisting of one or more leading digit. A leaf, consisting of the remaining or trailing digit.
Stem and leaf plots
identification of a typical or representative value extent of spread about the typical value. presence of any gaps in the data. extent of symmetry in the distribution of values number and location of peaks presence of any outlying values
Median Identify the point which has the property that half the data are greater than it, and half the data are less than it. In general, if n is even, we average the n/2 th largest and the (n/2 + 1) th largest observations.
Median
Measures of variation It is informative to have some measure of the variation of observations about the median. The range is very susceptible to what are known as outliers, points well outside the main body of the data. Interquartile range
Dot plot The simplest way to show data is a dot plot
Box-whisker plot When the data sets are large, plotting individual points can be cumbersome. The box is marked by the first and third quartile, and the whiskers extend to the range
Box-whisker plot
Histograms & Bar chart A histogram is a graphical display of tabulated frequencies, shown as bars A histogram for categorical data is often called a bar chart
Histograms & Bar chart
Exercise 1.1 From the 140 children whose urinary concentration of lead were investigated 40 were chosen who were aged at least 1 year but under 5 years. The following concentrations of copper (in umol/24hr ) were found. 0.70, 0.45, 0.72, 0.30, 1.16, 0.69, 0.83, 0.74, 1.24, 0.77, 0.65, 0.76, 0.42, 0.94, 0.36, 0.98, 0.64, 0.90, 0.63, 0.55, 0.78, 0.10, 0.52, 0.42, 0.58, 0.62, 1.12, 0.86, 0.74, 1.04, 0.65, 0.66, 0.81, 0.48, 0.85, 0.75, 0.73, 0.50, 0.34, 0.88 Find the median, range and quartiles.
Exercise 1.1