Download presentation
Presentation is loading. Please wait.
Published byJane Thompson Modified over 9 years ago
1
The Practice of Statistics Third Edition Chapter 1: Exploring Data Copyright © 2008 by W. H. Freeman & Company Daniel S. Yates
2
Objectives for 1.1 What is exploratory data analysis? What is meant by the distribution of a variable? How do you construct a bar graph, a pie chart, a stemplot, a histogram, and a time plot? These graphs are used for what type of data? How do you describe a distribution?
3
Review Previous Lesson
4
AP Statistics Conceptual Themes The Four broad activities from the previous slide are incorporated in the course curriculum themes: Data Production Data Analysis Probability Statistical Inference –– “…allows us to use results of a properly designed experiments, samples surveys, and observational studies to draw conclusions that go beyond the data themselves.”
5
Data Terminology Data – any set of information about some group of individuals. Individuals –objects described by a set of data. –Individuals may be people, animals, or things. –( Usually the rows in a table.) Variable –any characteristic of an individual. –A variable can take different values for different individuals. –(Usually the columns in a table.)
6
Data Terminology Continued There are two general types of variables: –Categorical variables descriptions that places individuals in categories such as gender or job titles. –Quantitative variable takes on numerical values for which it makes sense to do arithmetic operations. Distribution of a variable –tells us what values of the variable takes and how often it takes these values. In other words, the pattern of variation.
7
Exploratory Data Analysis Statistics is the examination of data. We must keep in mind: –Who are the individuals the data describe. –What variables were recorded about each individual. (Types of variables and units of measure.) –When and where the data was collected. –How the data was obtained. (Was the data from a randomized comparative experiment or from a convenience sample or a survey or other observational studies) –Why was the data collect and for whom.
8
Analyzing Data entails organizing data in order to identify patterns Graphs
9
What types of graphs are used to display categorical variables? Bar graphs Pie charts Pareto Charts
10
Pie charts are awkward by hand but software will do the job for you. Must include all categories that make up the whole. Use only when you want to emphasize each category’s relation to the whole.
11
Categorical data displayed as a bar graph. Bar graph can display the distribution but also can compare any set of quantities that are measured in the same units.
12
Example 1.2 Below are the percents of people in various age groups who own a portable MP3 player: Age group (Years)Percent owning an MP3 Player 12-17 18-24 25-34 35-44 45-54 55-64 65+ 27 18 20 16 10 6 2
13
Example 1.2 Continued
14
What types of graphs are used to display quantitative variables? Histograms Stem-leaf plots Dot plots Frequency polygons Ogive Time plots
15
Stemplots Stemplots display the actual values of the observations. Stemplots are awkward for large data sets.
16
Example 1.4 (Stemplot) The table below shows the percent of men and women at least 15 years old who were literate in 2002 in the major Islamic nations. (Note Afghanistan, Iraq and countries with populations of less than 3 million are omitted.) Country Female Percent Male Percent Country Female Percent Male Percent Algeria 60 Bangladesh 31 Eqypt 46 Iran 71 Jordan 86 Kazakhstan 99 Lebanon 82 Libya 71 Malaysia 85 78 50 68 85 96 100 95 92 Morrocco 38 Saudi Arabia 70 Syria 63 Tajikistan 99 Tunisia 63 Turkey 78 Uzbekistan 99 Yemen 29 68 84 89 100 83 94 100 70
17
Example 1.4 Continued Construct a stemplot.
18
Example 1.4 Continued (Back to Back Stemplot)
19
Dotplots Exercise 1.3 p 47
20
Histograms A histogram breaks the range of values for one variable into classes. Displays only the count or percent of observations that fall into each class. The number of classes is not fixed but the minimum number should be five. The classes should always be of equal width. Unlike the bar graph the bars should be touching each other.
21
General Steps for Constructing a Histogram Step1 Divide the range of the data into classes of equal width Step 2 Count the number of observations that fall within each class. –Count = Frequency –A table of frequencies for all classes is a frequency table. Step 3 Draw the histogram and label your axes. –There are no horizontal spacing between bars unless a class is empty. NOTE: Use histograms of percents for comparing several distributions with different number of observations.
23
Example – Histogram (Exercise 1.11 Presidential Age page 57) Link to Applet Link to Applet Export-8-10-2008
25
Describing Overall Pattern of Data Distribution From A Graph Describe the shape –Symmetric –Uniform –Mound shaped –Bi modal –Skewed Right or left Look at the direction of the tail Center Spread –Smallest to largest values (range) –Gaps –Outliers. You should investigate outlier. The outlier may point to errors.
26
Example 1 – Describe The Distribution
27
Example 2 – Describe The Distribution Note: This is a relative Frequency Histogram – in percents.
28
Example 3
29
Example 4
30
Example 5
31
Ogive – The Relative cumulative Frequency Graph Histogram do a good job of displaying the distribution of values of a variable but not the relative standing of an observation. The relative cumulative frequency graph shows relative standing. (Percent below and percent above)
32
General Steps for Constructing a Relative Cumulative frequency Graph Step 1 – Decide on the class intervals and make a frequency table. Add three columns: one for relative frequency (frequency / total); one for cumulative frequency; and one for relative cumulative frequency (cumulative frequency / total.) Step 2 Label axes and title graph Step 3 Plot a point corresponding to the relative cumulative frequency in each class at the left endpoint of the next class interval.
33
Example 1.9 p.60 (Presidents Ages at Inauguration) ClassFrequencyRelative FrequencyCumulative Frequency Relative Cumulative Frequency 40-44 45-49 50-54 55-59 60-64 65-69 Total 2 6 13 12 7 3 43 2/43 = 0.047 (4.7%) 6/43 = 0.140 (14.0%) 13/43 = 0.302 (30.2%) 12/43 = 0.279 (27.9%) 7/43 = 0.163 (16.3%) 3/43 = 0.070 (7.0%) 2 8 21 33 40 43 2/43 = 0.047 (4.7%) 8/43 = 0.186 (18.6%) 21/43 = 0.488 (48.8%) 33/43 = 0.767 (76.7%) 40/43 = 0.930 (93.0%) 43/43 = 1.000 (100%)
34
Example 1.9 Continued
35
What Percentile is Clinton’s Age?
36
Find the 50 th Percentile.
37
Time Plot Time plots show trends over time. Such as seasonal variation.
38
Example of a Time Plot Gas Prices
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.