Chapter 2: Organizing Data STP 226: Elements of Statistics Jenifer Boshes Arizona State University
2.1: Variables and Data
Variable HeightAge Number of siblings Sex Marital status Blood Type A variable is a characteristic that varies from one person or thing to another. Example 1:
Qualitative Variable A qualitative variable is a non-numerically valued variable. (categorical variable) Example 2:
Quantitative Variable A quantitative variable is a numerically valued variable. Example 3:
Types of Quantitative Variables A discrete variable is a quantitative variable whose possible values form a finite (or countably infinite) set of numbers. A continuous variable is a quantitative variable whose possible values form some interval of numbers. A discrete variable is a quantitative variable whose possible values form a finite (or countably infinite) set of numbers. A continuous variable is a quantitative variable whose possible values form some interval of numbers.
Classify the following as qualitative, discrete, or continuous: HeightAge Number of siblings Place of birth Number of credit hours Eye color Ounces of coffee drank per day Number of times visited the Grand Canyon Example 4:
Variables
Data Data: Information obtained by observing values of a variable. Data is classified as qualitative data, quantitative data, discrete or continuous data depending on how it was obtained. Determine whether the following examples of data are quantitative or qualitative. You ask a sample of students how many hours of sleep they get. A census is taken of number of cars in a household in a city. First graders are asked about their favorite color. Example 5:
2.2: Grouping Data
Why do we group data? To simplify large/complicated data sets. To further organize data. To study a particular variable of interest.
Example 1 - Cholesterol: The total cholesterol level for 30 twenty year old males is given. Construct a grouped- data table for the data. Use a class width of 20 and a first cutpoint of
Grouped-Data Table ClassesFrequency Relative Frequency Midpoint Be sure to include the following columns:
Example 1 - Cholesterol: Cholesterol Level Frequency Relative Frequency Midpoint
Guidelines for Grouping Quantitative Data (1)The number of classes should be small enough to effectively describe the data, but large enough to display the relevant characteristics. (Usually 5-20.) (2)Each observation must belong to one, and only one, class. (3)Whenever possible, all classes should have the same width.
Terminology in Grouping Data Classes: Categories for grouping data. Frequency: The number of observations that fall into a class. Frequency distribution: A table that provides all classes and their frequencies. Relative frequency: The ratio of the frequency of a class to the total number of observations.
Lower cutpoint: The smallest value in a class. Upper cutpoint: The smallest value that could go into the next higher class. Midpoint: The middle of a class; found by taking the average the upper and lower cutpoints. Width: The difference between the upper and lower cutpoints. Terminology in Grouping Data
Example 2 – Maple Trees: The heights of 10 maple trees were recorded as follows. Determine the frequency and relative- frequency distributions for these data. (Use classes of size 5.)
2.3: Graphs and Charts
Graphical Displays for Quantitative Data Frequency histogram: A graph that displays the classes on the horizontal axis and the frequencies on the vertical axis. Relative frequency histogram: A graph that displays the classes on the horizontal axis and the relative frequencies on the vertical axis. Bar graph: Similar to a relative frequency histogram, but the bars do not touch. This is used for qualitative data.
Example 1 - Cholesterol: Cholesterol Level Frequency Relative Frequency Midpoint Construct a histogram for the cholesterol data, showing both frequencies and relative frequencies.
Example 1-Cholesterol:
Dotplots Dotplots are particularly useful for showing the relative positions of the data in a data set or for comparing two or more data sets. They are most useful for a small data set with a moderate range in values. To construct a dotplot: Draw a horizontal axis. Draw a horizontal axis. Record each data point by placing a dot over the appropriate value. Record each data point by placing a dot over the appropriate value.
Example 2 -Cholesterol: Construct a dotplot for the data
Stem-and-Leaf Diagram (1)Select the leading digit(s) from the data and list in a vertical column. (STEM) (2)Write the final digit of each number to the right of the appropriate leading digit. (LEAVES)
Stem-and-Leaf Diagram 1. Select the leading digit(s) from the data and list in a vertical column. (STEM) 2. Write the final digit of each number to the right of the appropriate leading digit. (LEAVES) A group of women who had just given birth were asked how many pre- natal visits they had made to a doctor. The following information was recorded. Example 3a – Pre-natal Visits:
Stem-and-Leaf Diagrams Example 3a – Pre-natal Visits: The stem-and-leaf diagram with two lines per stem would be:
Example 4 - Cholesterol Create a stem-and-leaf plot for the cholesterol data using the first two digits as the stem
Example 4 - Cholesterol Compare this stem-and-leaf to the histogram for the same data. What similarities do we see?
Example 4 - Cholesterol Compare this stem-and-leaf to the histogram for the same data. What similarities do we see? (Note: histogram grouped by classes of size 20, so not exactly the same.)
Graphing Qualitative Data A pie chart is a circle divided into wedge- shaped pieces that are proportional to the relative frequencies. A bar graph is like a histogram, but the bars do not touch.
Example 5 – Blood Type: A sample of 105 blood donors at a clinic can be described as follows: TypeFrequency A47 B22 AB20 O16 Total105 Type, x Frequency Relative Frequency A B AB O
2.4: Distribution Shapes
Distribution Shapes
Group Assignment/Practice Here are the ages for the CEOs of the 30 top-ranked small companies in America from Forbes. Using this data, produce the following: (a) A grouped-data table using classes of size five and starting with age 35. (b) Construct a histogram for the data. (c) A dotplot. (d) A stem-and-leaf diagram with one line per stem. (e) A stem-and-leaf diagram with two lines per stem
Bibliography Some of the textbook images embedded in the slides were taken from: Elementary Statistics, Sixth Edition; by Weiss; Addison Wesley Publishing Company Copyright © 2005, Pearson Education, Inc.