Organization and description of data Chapter 2 Organization and description of data
Box on Page 24 Describing a Data Set of Measurements
Types of data Qualitative or categorical data Numerical or measurement data We will use the term numerical-valued variable or just variable to refer to a characteristic that is measured on a numerical scale. Two type of variables: Discrete Continuous
Categorical data Each observation is recorded as a member of one of several categories. Data are organized in the form of a frequency table that shows the frequencies of the individual categories. Further, proportions of observations in each category are calculated: An example in the next slide.
Figure 2.1 (p. 26) Pie chart of student opinion on change in dormitory regulations.
Discrete data The underlying scale is discrete and the distinct values observed are not too numerous. As in case of categorical data, describe the data by relative frequencies. Example:
Example (cont.):
Line Diagrams and Histograms The distinct values of the variable are located on the horizontal axis. Draw a vertical rectangle (line) at each value and make the height equal to the relative frequency. Figure 2.4 (p. 29) Graphic display of the frequency distribution of data in Table 3.
Figure 2.5 (p. 30) Dot diagram for the heart transplant data. Data on a continuous variable For small data set, a dot diagram can be used; individual measurements are plotted above a line as prominent dots. Example: Figure 2.5 (p. 30) Dot diagram for the heart transplant data. The second method is frequency distribution on intervals; used when the data consist of a large number of measurements.
Box on Page 30 Constructing a Frequency Distribution for a Continuous Variable
Table 2.4 (p. 32) The Data of Forty Cash Register Receipts (in Dollars) at a University Bookstore
Table 2.5 (p. 33) Frequency Distribution for Bookstore Sales Data
Presenting a frequency distribution as a histogram Mark the class intervals on a horizontal axis On each interval, draw a vertical rectangle whose area represents the relative frequency Height of the rectangle = Relative frequency / width of interval The total are of a histogram is 1.
Figure 2.7 (p. 34) Histogram of the bookstore sales data of Tables 4 and 5. Sample size = 40.
Figure 2.8 (p. 35) Population tree (histograms) of the male and female age distributions in the US in 2001. (Source: US Bureau of the Census.)
Table 2.6 (p. 35) Examination Scores of 50 Students A stem-and-leaf display provides a more efficient variant of the histogram for displaying data, especially when the observations are two-digit numbers. Example: Table 2.6 (p. 35) Examination Scores of 50 Students
Table 2.7 (p. 35) Stem-and-Leaf Display for the Examination Scores