Percentages of State Residents in 2000 who were 65 or older AL13.0 AK5.7 AZ13.0 AR14.0 CA10.6 CO9.7 CT13.8 DE13.0 FL17.6 GA9.6 HI13.3 ID11.3 IL12.1 IN12.4 IO14.9 KS13.3 KY12.5 LA11.6 ME14.4 MD11.3 MA13.5 MI12.3 MN12.1 MS12.1 MO13.5 MT13.4 NE13.6 NV11.0 NH12.0 NJ13.2 NM11.7 NY12.9 NC12.0 ND14.7 OH13.3 OK13.2 OR12.8 PA15.6 RI14.5 SC12.1 SD14.3 TN12.4 TX9.9 UT8.5 VT12.7 VA11.2 WA11.2 WV15.3 WI13.1 WY11.7
Statistics and Data (Graphical) Section 9.6
Statistics Statistics measures characteristics of individuals (people, animals, things, etc.), called variables; two varieties: Categorical Variable – identifies individuals as belonging to a distinct class (e.g., gender, school grade, etc.) Quantitative Variable – takes on numerical values for the characteristic being measured (e.g., height, weight, etc.)
Leading Causes of Death in the U.S. in 1999 Cause of Death Heart Disease Cancer Stroke Other Number of Deaths 725, , , ,003 Percentage What type of variables are these? Categorical!!! We can display categorical data using: bar chart, circle graph, or even a pie chart.
Leading Causes of Death in the U.S. in 1999 Causes of Death Bar Chart Number of Deaths (thousands) Heart Disease CancerStrokeOther
Leading Causes of Death in the U.S. in 1999 Circle Graph Heart Disease 30.3% Cancer 23.0% Stroke 7.0% Other 39.7%
Stemplots Stemplot (also called a stem-and-leaf plot) – a quick way to organize and analyze a small set of quantitative data. Each number in the data set is split into a stem, consisting of its initial digit or digits, and a leaf, which is its final digit. Now, let’s create a stemplot from the “Do Now” data…
To create the stem-and-leaf plot: 1.Use the whole number part of each number as the stem, and the tenths digit as the leaf. 2. Write the stems in order down the first column and, for each number, write the leaf in the appropriate stem row. 3. Finally, arrange the leaves in each stem row in ascending order.
StemLeaf Notes: The “leafless stems” Spacing among the leaves
By looking at both the stemplot and the table, answer the follow- ing questions about the distribution of senior citizens among the 50 states. 1. Judging from the stemplot, what was the approximate average national percentage of residents who were 65 or older? 12-13% 2. In how many states were more than 15% of the residents 65 or older? 3 states 3. Which states were in the bottom tenth of all states in this statistic? Bottom 5 states in the stemplot: AK, CO, GA, TX, UT 4. The numbers 5.7 and 17.6 are so far above or below the other numbers in this stemplot that statisticians would call them outliers. Quite often there is some special circumstance that explains the presence of outliers. What could explain the two outliers in this stemplot?
The average annual salaries for the top 15 U.S. metro areas are shown below. Make a stemplot that provides a good visualization of the data. What is the average of the 15 numbers? Why is the stemplot a better summary of the data than the average? San Jose, CA76,076 San Francisco, CA59,314 New York, NY56,377 New Haven, CT50,585 Middlesex, NJ48,977 Newark, NJ48,733 Jersey City, NJ47,514 Boulder, CO45,565 Washington, D.C.45,333 Boston, MA45,191 Seattle, WA45,171 Trenton, NJ44,576 Oakland, CA44,170 Bergen, NJ43,789 Hartford, CT42,349 Round the data to $1000 units, then create a split-stemplot : StemLeaf
The average annual salaries for the top 15 U.S. metro areas are shown below. Make a stemplot that provides a good visualization of the data. What is the average of the 15 numbers? Why is the stemplot a better summary of the data than the average? San Jose, CA76,076 San Francisco, CA59,314 New York, NY56,377 New Haven, CT50,585 Middlesex, NJ48,977 Newark, NJ48,733 Jersey City, NJ47,514 Boulder, CO45,565 Washington, D.C.45,333 Boston, MA45,191 Seattle, WA45,171 Trenton, NJ44,576 Oakland, CA44,170 Bergen, NJ43,789 Hartford, CT42,349 The average of the 15 numbers is $49,582, but this is misleading; The salaries are actually fairly tightly clustered around $45,000; The few highest salaries skew the average upward…
Mark McGwire and Barry Bonds entered the major leagues in From 1986 to 2001, they averaged and home runs per year, respectively. Compare their annual home run totals with a back-to-back stemplot. Can you tell which player has been more consistent as a home run hitter? Year McGwire Bonds Mark McGwireBarry Bonds Which player do you think was more consistent as a home run hitter?
Frequency Tables Frequency Table for Mark McGwire’s Yearly HR Totals Home RunsFrequency 0 – 9 10 – – – – – – – 79 Total
Frequency Tables and Histograms
Frequency Tables First, think back to the stemplots we just completed – where does the visual impact of a stemplot come from? TThe rows of leaves let us see how many leaves branch off each stem!!! Ex. from last class: Mark McGwire HRs The number of leaves for a particular stem is the frequency of observations within each stem interval. We can also record this information in a frequency table, which gives a frequency distribution of the data.
Frequency Tables Frequency Table for Mark McGwire’s Yearly HR Totals Home RunsFrequency 0 – – – – – – – – 791 Total16 Notes: Higher frequencies in this table correspond to longer leaf rows in a stemplot. Unlike a stemplot, a frequency table does not display what the numbers in each interval actually are.
Histograms HR Intervals Histogram – gives a visual display of information from a frequency table. A histogram is to quantitative data what a bar chart is to categorical data. Frequency What are the differences between a histogram and a bar chart? Histogram for HR frequency of Mark McGwire
Histograms To create this histogram with your calculator: Put the lowest value of each subinterval into L (start with 0, 10, 20,…) 1 Put the corresponding frequencies into L 2 Settings for STAT PLOT1 – Type: Histogram (it’s a picture!), Xlist: L, Freq: L 12 Settings for WINDOW: Xmin = –10, Xmax = 80, Xscl = 10, Ymin = –1, Ymax = 6, Yscl = 1 NOW GRAPH!!!
Histograms Now, create a frequency table and histogram for the HR data for Barry Bonds from last class. Barry Bonds The stemplot: Home RunsFrequency 0 – – – – – – – – 791 Total16 Frequency Table:
Histograms HR Intervals Frequency Histogram for HR frequency of Barry Bonds 6
Make a histogram of Hank Aaron’s annual home run totals given below, using interval width 5. Year HR Year HR First, create a frequency table: HR 10 – – – – – – – – 49 Total Frequency Next, create the histogram (by hand and using a calculator)
Histograms HR Intervals Frequency Histogram for HR frequency of Hank Aaron 6