Chapter 1 Describing Data with Graphs
Variables and Data variable A variable is a characteristic that changes over time and/or for different individuals or objects under consideration. Examples: Examples: Body temperature. Hair color. Time to failure of a computer component.
xperimental Unit and Measurement E xperimental Unit and Measurement experimental unit An experimental unit is the individual or object on which a variable is measured. measurement A single measurement results when a variable is actually measured on an experimental unit. data. A set of measurements is called data.
Example: Hair Color Variable Hair color Experimental unit Person Typical Measurements Brown, black, blonde, etc.
Example Variable Time until a light bulb burns out Experimental unit Light bulb Typical Measurements 1500 hours, hours, etc.
Population and Sample A population A population is the set of all measurements of interest to the investigator. Examples: Body temperatures of all healthy people in the world. Lifetime of a batch of 1000 light bulbs It might be too expensive or even impossible to enumerate the entire population.
Population Sample Asample A sample is a subset of measurements selected from the population of interest.
Sampling Sample Population
How many variables have you measured? Univariate data: Univariate data: One variable is measured on a single experimental unit. Bivariate data: Bivariate data: Two variables are measured on a single experimental unit. Multivariate data: Multivariate data: More than two variables are measured on a single experimental unit.
Types of Variables Qualitative Quantitative Discrete Continuous
Qualitative variables Qualitative variables measure a quality or characteristic on each experimental unit. (Data collected is sometimes called Categorical Data)Examples: Hair color (black, brown, blonde…) Make of car (Dodge, Honda, Ford…) Gender (male, female) State of birth (California, Arizona,….) Qualitative Qualitative Variables
Quantitative variables Quantitative variables measure a numerical quantity on each experimental unit. Discrete Discrete if it can assume only a finite or countable number of values. Continuous Continuous if it can assume the infinitely many values corresponding to the points on a line interval. Quantitative Quantitative Variables
Examples For each orange tree in a grove, the number of oranges is measured. Quantitative discrete For a particular day, the number of cars entering a college campus is measured. Quantitative discrete Time until a light bulb burns out Quantitative continuous
Graphing Qualitative Variables data distribution Use a data distribution to describe: What valuesmeasurements What values (measurements) of the variable have been measured How oftenmeasurement How often each value (measurement) has occurred “How often” can be measured 3 ways: Frequency Relative frequency = Frequency/n Percent = 100 x Relative frequency
Example A bag of M&Ms contains 25 candies: Raw Data: Raw Data: Statistical Table: Statistical Table: ColorTallyFrequencyRelative Frequency Percent Red33/25 =.1212% Blue66/25 =.2424% Green44/25 =.1616% Orange55/25 =.2020% Brown33/25 =.1212% Yellow44/25 =.1616% m m mm m m m m m m m m m m m m m m mmm mmm mmmm mmmmmm mmmm mmmm m m m m m m m m
Bar Chart Pie Chart
Pareto Bar Chart Pareto Bar Chart A Pareto Bar Chart is a bar chart where the bars are ordered from largest to smallest.
Graphing Quantitative Variables pie bar chart A single quantitative variable measured for different population segments or for different categories of classification can be graphed using a pie or bar chart. A Big Mac hamburger costs $4.90 in Switzerland, $2.90 in the U.S. and $1.86 in South Africa.
time series linebar chartA single quantitative variable measured over time is called a time series. It can be graphed using a line chart or bar chart. SeptOctNovDecJanFebMar Example: Consumer Price Index: BUREAU OF LABOR STATISTICS
Dotplots For quantitative data, plots the measurements as points on a horizontal axis, stacking the points that duplicate existing points. Example: Example: The set 4, 5, 5, 7,
Stem and Leaf Plots For quantitative data, use the actual numerical values of each data point. –Divide each measurement into two parts: the stem and the leaf. –List the stems in a column, with a vertical line to their right. –For each measurement, record the leaf portion in the same row as its matching stem. –Order the leaves from lowest to highest in each stem. –Divide each measurement into two parts: the stem and the leaf. –List the stems in a column, with a vertical line to their right. –For each measurement, record the leaf portion in the same row as its matching stem. –Order the leaves from lowest to highest in each stem.
Example The prices ($) of 18 brands of walking shoes: Reorder
Where is the data centered on the horizontal axis, and how does it spread out from the center? Interpreting Graphs: Location and Spread
Interpreting Graphs: Shapes Mound shaped and symmetric (mirror images) Skewed right: a few unusually large measurements Skewed left: a few unusually small measurements Bimodal: two local peaks
Interpreting Graphs: Outliers Are there any strange or unusual measurements that stand out in the data set? Outlier No Outliers
Example A quality control process measures the diameter of a gear being made by a machine (cm). The technician records 15 diameters, but inadvertently makes a typing mistake on the second entry
Interpreting Graphs: Check the horizontal and vertical scalesCheck the horizontal and vertical scales Examine the location of the data distributionExamine the location of the data distribution Examine the shape of the distributionExamine the shape of the distribution Look for any unusual outlier.Look for any unusual outlier.
Relative Frequency Histograms relative frequency histogram A relative frequency histogram for a quantitative data set is a bar graph in which the height of the bar shows “how often” (measured as a proportion or relative frequency) measurements fall in a particular class or subinterval. Create intervals Stack and draw bars
Relative Frequency Histograms
Example The ages of 50 tenured faculty at a state university We choose to use 6 intervals. =(70 – 26)/6 = 7.33 Minimum class width = (70 – 26)/6 = 7.33 = 8 Convenient class width = Use 6 classes of length 8, starting at 25.
AgeTallyFrequencyRelative Frequency Percent 25 to < /50 =.1010% 33 to < /50 =.2828% 41 to < /50 =.2626% 49 to < /50 =.1818% 57 to < /50 =.1414% 65 to < /50 =.044%
Relative Frequency Histograms 5-12 subintervals Divide the range of the data into 5-12 subintervals of equal length. minimum width Calculate the minimum width of the subinterval as Range/Number. Round the minimum width up to a convenient value. left inclusion Use the method of left inclusion,including the left endpoint, but not the right in your tally.
statistical table Create a statistical table including the subintervals, their frequencies and relative frequencies. relative frequency histogram Draw the relative frequency histogram, plotting the subintervals on the horizontal axis and the relative frequencies on the vertical axis.
The height of the bar represents proportion The proportion of measurements falling in that class or subinterval. probability The probability that a single measurement, drawn randomly from the set, will belong to that class or subinterval.
Shape? Outliers? What proportion of the tenured faculty are younger than 41? What is the probability that a randomly selected faculty member is 49 or older? Skewed right No. (14 + 5)/50 = 19/50 =.38 ( )/50 = 18/50 =.36 Describing the Distribution
Chapter review I. How Data Are Generated Experimental units, variables, measurements Samples and populations Univariate, bivariate, and multivariate data Qualitative or Categorical Quantitative a. Discrete b. Continuous II. Types of Variables
III. Graphs for Univariate Data Distributions 2. Quantitative data a. Pie and bar charts b. Line charts c. Dot plots d. Stem and leaf plots e. Relative frequency histograms 1. Qualitative or categorical data a. Pie charts b. Bar charts
3. Describing data distributions Shapes — symmetric, skewed left, skewed right, unimodal, bimodal Proportion of measurements in certain intervals Outliers
A Manufacturer of jeans has plants in CA, AZ and TX. A randomly selected 25 pairs of jeans shows their plants as follows CAAZ TXCA TX AZ CAAZTX CAAZTX CAAZ CA Example
What is the variable? Is it qualitative or quantitative? State Qualitative What is the experimental unit?Pair of jeans
Construct a pie chart Construct a statistical table StateFrequencyRelative FrequencySector Angle CA AZ TX
Construct a bar chart to describe the data StateFrequencyRelative FrequencySector Angle CA AZ TX
What state produces the most jeans in the group? What proportion of the jeans are made in TX? California 8/25=32%
The age (in months) at which 50 children were enrolled in a preschool are listed Example
Construct a stem and leaf to display the data Use the tens digit as the stem, and the ones digit as the leaf, dividing each stem into two parts.
Reorder
What is the shape of the measurements? Rotate 90 degree counterclockwise Unimodal
Construct a relative frequency histogram. Start the lower boundary of the first class at 30 and use a class width of 5. ClassBoundaryFrequencyRelative Freq. 130 to < to < to < to < to < to <
If one child is selected at random, what is probability that the child was less than 50 months? What proportion of the children were 35 month or older, but less than 45 months of age? ( )/50=0.94 (15+12)/50=0.54
The value of a quantitative variable is measured once a year for ten year period. yearMeasur.yearMeasur Example
Create a line chart to describe the variable as it changes over time.
Describle the measurements using the line chart. Observing the change in y as x increases, we see that the measurements are decreasing over time.
Status of Students StatusFreshSophom.JuniorSeniorGrad Frequency Minitab
Assignment Questions (due at 12:00, Wed, Sept. 5) 1.2, 1.3, 1.5, 1.10, 1.20, 1.23