Download presentation
Presentation is loading. Please wait.
Published byRafe Jefferson Modified over 9 years ago
1
Ch. 1 Looking at Data – Distributions Displaying Distributions with Graphs Section 1.1 IPS © 2006 W.H. Freeman and Company
2
Data Statistics is the science of learning from data. Data are numerical facts. Data are numbers with a context. To make sense of the numbers, we must understand the context.
3
Data Set A Data Set is a list of individual objects and variables. A case is the data for one individual. A variable is a characteristic of an individual. Its value varies among individuals. The distribution of a variable tells us what values the variable takes and how often it takes these values
4
Two types of variables A variable is either quantitative or categorical, depending on the type of value it can take. A quantitative variable takes numerical values. A categorical variable places individuals into one of several categories (or groups).
5
Individuals in sample DIAGNOSISAGE AT DEATH Patient AHeart disease56 Patient BStroke70 Patient CStroke75 Patient DLung cancer60 Patient EHeart disease80 Patient FAccident73 Patient GDiabetes69 Quantitative Each individual is attributed a numerical value. Categorical Each individual is assigned to one of several categories.
6
Graphs of Variables Graphs highlight important features of a data set and often reveal relationships that are not apparent from a listing of the data. The type of graph we use depends on the type of variable.
7
Graphs for Categorical Variables The distribution of a categorical variable gives the count or percent of individuals in each category. It is represented visually by a bar graph or a pie chart. Bar graph Pie chart The count of each category is The percent in each category is represented by the height of a bar. represented by a slice of the pie.
8
Bar graph sorted by rank Easy to analyze Top 10 causes of deaths in the United States 2001 Sorted alphabetically Much less useful
9
Percent of people dying from top 10 causes of death in the United States in 2000 Pie charts Each slice represents a piece of one whole. The size of a slice depends on what percent of the whole this category represents.
10
Child poverty before and after government intervention—UNICEF, 1996 What does this chart tell you? The United States has the highest rate of child poverty among developed nations (22% of under 18). Its government does the least—through taxes and subsidies—to remedy the problem (size of orange bars and percent difference between orange/blue bars). Could you transform this bar graph to fit in 1 pie chart? In two pie charts? Why? The poverty line is defined as 50% of national median income.
11
Figure 1.1 p. 8 Which graph is a better representation of the data on p. 7?
12
Graphs for Quantitative Variables Stemplots Histograms Line graphs and time plots
13
Stem plots How to make a stemplot: 1)Separate each observation into a stem, consisting of all but the final (rightmost) digit, and a leaf, which displays the final digit. Stems may have as many digits as needed, but each leaf contains only a single digit. 2)Write the stems in a vertical column with the smallest value at the top, and draw a vertical line at the right of this column. 3)Write each leaf in the row to the right of its stem, in increasing order out from the stem. STEMLEAVES
14
Example 1.5 p. 11. a. Do a stem plot for the female percent. b. Then do a histogram of the same data set. c. Split the stems
15
Example 1.5 p. 11. Do a back-to-back stem plot of the female and male percent..
16
Histograms The range of values a variable can take is divided into equal size intervals called classes or bins. The height of each bar shows the number (or %) of individual data points that fall in each interval. The first bar represents all states where the percent of Hispanics in their population is between 0% and 4.99%. The height of the bar shows how many states (27) have a percent Hispanic in this range. The last bar represents all states with a percent Hispanic between 40% and 44.99%. There is only one such state: New Mexico, at 42.1% Hispanics.
17
Stemplots are quick and dirty histograms that can easily be done by hand, therefore very convenient for back of the envelope calculations. However, they are rarely found in scientific or laymen publications. Stemplots versus histograms
18
Distribution of a Variable The distribution of a variable tells us what values the variable takes and how often it takes these values. When examining a distribution, look for the following: SHAPE of the distribution. Some shapes are symmetric or skewed. Some shapes have a number of modes (major peaks). CENTER of the distribution. The center is the middle of the data. SPREAD of the distribution. The spread is the range of values. OUTLIERS and deviations from the overall shape. Outliers are observations that lie outside the overall pattern of a distribution.
19
Most common distribution shapes A distribution is symmetric if the right and left sides of the histogram are approximately mirror images of each other. Symmetric distribution Complex, multimodal distribution Not all distributions have a simple overall shape, especially when there are few observations. Skewed distribution A distribution is skewed to the right if the right side of the histogram (side with larger values) extends much farther out than the left side. It is skewed to the left if the left side of the histogram extends much farther out than the right side.
20
AlaskaFlorida Outliers Outliers are observations that lie outside the overall pattern of a distribution. Always look for outliers and try to explain them. The overall pattern is fairly symmetrical except for 2 states clearly not belonging to the main trend. Alaska and Florida have unusual representation of the elderly in their population. A large gap in the distribution is typically a sign of an outlier.
21
How to create a histogram The shape of a histogram is determined by the bin size. What bin size should you use? Not too many bins with either 0 or 1 counts Not overly summarized (large bins) that you loose all the information Not so detailed (small bins) that it is no longer summary rule of thumb: start with 5 to10 bins Look at the distribution and refine your bins (There isn’t a unique or “perfect” solution)
22
Not summarized enough Too summarized Same data set
23
Line graphs: time plots A trend is a rise or fall that persist over time, despite small irregularities. In a time plot, time always goes on the horizontal, x axis. We describe time series by looking for an overall pattern and for striking deviations from that pattern. In a time series: A pattern that repeats itself at regular intervals of time is called seasonal variation.
24
A picture is worth a thousand words, BUT There is nothing like hard numbers. Look at the scales. Scales matter How you stretch the axes and choose your scales can give a different impression.
25
Using Excel 1. Do a bar graph for Problem 1.14 (page 28). Format the data set for its appearance, then copy and paste the data set and graph into Text Boxes in MS Word. Arrange the layout so that it has a professional look. 2. Manually do a stem plot for Problem 1.17. (page 29) Construct a histogram by doing a frequency count on the leaves and then a bar graph of the frequency count. 3. Do a back-to-back stem plot for Problem 1.18. (page 29) 4. Do a bar graph and time plot for Problem 1.12 (page 27).
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.