Download presentation
Presentation is loading. Please wait.
Published byLesley Booker Modified over 8 years ago
1
Class Two Before Class Two Chapter 8: 34, 36, 38, 44, 46 Chapter 9: 28, 48 Chapter 10: 32, 36 Read Chapters 1 & 2 For Class Three: Chapter 1: 24, 30, 32, 36, 44 Chapter 2: 26, 28, 38, 42, 50 Complete Quiz #1 Read Chapters 3, 4 & 5
2
Objectives for Class Two Identify categorical and quantitative variables. Represent data graphically using: –bar charts –pie charts –histograms –stem plots –box plots –time plots Describe the distribution of a variable in terms of overall pattern and identify potential exceptions or outliers Compute standard measures of the center and spread of a distribution and interpret their values.
3
Answering the question: Will? Making sense of what you’ve got. Now that you have designed and completed your study or experiment you must make sense of all of the data that you have collected before it can be interpreted. This process is called: Exploratory Data Analysis. The purpose of this process is to organize the data to determine what statistical tools we may use to make predictions or decisions about the population the data was collected from and to give us some basic insights into the content of the data. Organizing Data –Group and count each variable –Look for relationships between/among variables or groups –Create simple graphs –Create numerical summaries
4
Answering the question: Will? continued Remember the two types of variables are: –categorical (words) –quantitative (numbers) Distribution: describes the value of a variable and the frequency with which it appeared in the data set. –For categorical variables the value of the variable will be the specific word(s) you use to describe the individuals and the frequency is a count of how many individuals are described by the word. –For quantitative variables the value of the variable will be the number(s) you collected on an individual and the frequency is a count of how many individuals are described by the number. –Sometimes it is necessary with quantitative variables to make closely related groups of numerical values and then your frequency is a count of how many individuals have a number in your group.
5
Answering the question: Will? Working with categorical variables. identify/calculate the distribution of each categorical variable –count: the number of individuals in that category –percent: the ratio of the number of individuals in that category to the total of all individuals in all the categories –round-off error occurs when each category is rounded separately from the total of all categories select a graphical display that will convey the importance of the relationships –pie charts give a good visual comparison of percentages but are poor ways to communicate counts –bar graphs provide a good visual comparison of counts –make sure that all diagrams are clearly labeled so that the viewer easily understands the information and relationships being displayed –make sure to include an other category for pie charts if the categories do not total to 100%
6
YearCountPercent Freshman1841.9% Sophomore1023.3% Junior614.0% Senior920.9% Total43100.1% Data Table
7
Pie Chart
8
Bar Graph
9
Answering the question: Will? Working with quantitative variables Display distribution of quantitative variables with either a histogram, a stem plot or a time plot. –A histogram is like a bar graph only instead of the x-axis being labeled with categories it is labeled with groups of closely related values. It is used to diagram cross-sectional data from a fixed moment in time. –A stem plot is a chart that saves time and space by writing numbers with the same first digits, called stems, in rows followed by a list of the last digits, called leaves. Often it is easy to convert a stem plot into a histogram by using the stems as your groupings. –A time plot allows the data to be tracked over time and reveal trends that would not be evident if only a single moment were analyzed.
10
Creating a Histogram Choose classes: divide the range of the data into classes of equal width –as the eye scans the histogram it responds to the area of each rectangle as a function of its height since all of the bases are of equal size. –too few classes will give a “skyscraper” effect –too many will give a “pancake” effect Count the individuals in each class Draw the histogram: note Microsoft Excel® will label the classes on the x-axis differently than the histograms in your text. Excel will center the bar over the value of your class, grouping individuals based on whether they are below or equal to the class value but greater than the next lower class.
11
Weight Data
12
Weight Data: Frequency Table
13
Weight Data: Histogram 100120140160180200220240260280 Weight * Left endpoint is included in the group, right endpoint is not. Number of students
14
Interpreting Histograms Look for the overall pattern as well as any striking deviations from the pattern. Overall pattern is described using words for: –shape: give the number of peaks and whether it is skewed right (lots of low bars on the right), skewed left (lots of low bars on the left), symmetric (roughly bell shaped), or has clusters of bars each with their own shape, center and spread. –center: midpoint (middle) of the values, the category or group where half of the observations are below and half are above –spread: give the smallest and largest values usually excluding outliers Deviations are known as outliers because they lie outside the overall pattern.
15
Shape: Symmetric Bell-Shaped
16
Shape: Symmetric Mound-Shaped
17
Shape: Symmetric Uniform
18
Shape: Asymmetric Skewed to the Left
19
Shape: Asymmetric Skewed to the Right
20
Creating a Stem Plot Separate each observation into a stem, consisting of all but the final (rightmost) digit, and a leaf, the final digit. Stems may have as many digits as needed, but each leaf contains only a single digit. Write the stems in a vertical column with the smallest at the top, and draw a vertical line at the right of this column. Do not skip and stem values even if there is no data with that particular stem. Write each leaf in a row to the right of its stem, in increasing order out from the stem. If there are no leaves for a stem leave the area next to it blank. Special Circumstances: –rounding: if data have more than three digits sometimes it is better to round numbers to three significant digits before creating the stem plot –split stems: each stem can be split into two with leaves 0-4 appearing on the first stem and leaves 5-9 appearing on the second stem –back-to-back stems are helpful when comparing two distributions
21
Weight Data
22
Weight Data: Stemplot (Stem & Leaf Plot) 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Key 20 | 3 means 203 pounds Stems = 10’s Leaves = 1’s 192 2 152 2 5 135
23
Weight Data: Stemplot (Stem & Leaf Plot) 10 0166 11 009 12 0034578 13 00359 14 08 15 00257 16 555 17 000255 18 000055567 19 245 20 3 21 025 22 0 23 24 25 26 0 Key 20 | 3 means 203 pounds Stems = 10’s Leaves = 1’s
24
Creating a time plot Time plots are used for quantitative variables that are measure at regular intervals over time. Time is always the variable plotted on the x-axis. Connecting the data points with line segments will often emphasize the trend over time.
25
Class Make-up on First Day (Fall Semesters: 1985-1993)
26
Average Tuition (Public vs. Private)
27
Numbers as a measure of center Mean ( ): an arithmetic average found by finding the sum of all of the data and dividing by the number of data. It is NOT a resistant measure of center, meaning outliers will pull the mean towards themselves; therefore we only use the mean with symmetric data. Median (M): the midpoint of the data. It is a resistant measure of center, meaning it is effected little by outliers; therefore we use the median with skewed data. –arrange the data in order by size from least to greatest –if n is odd then M is the center of the ordered list or it is (n+1)/2 observations from the beginning of the list –if n is even then M is the mean of the two center positions Mode: is the most frequent observation
28
Comparisons of Measures of Center Symmetric Distributions: the mean and the median will be close together. If the distribution is perfectly symmetrical then the mean and the median will have the exact same value. Skewed Distributions: the mean will be pulled along the tail of the distribution towards any outliers.
29
Basic Measures of Spread Range: the difference between the maximum and minimum observations (usually outliers are omitted) Quartiles: mark out the middle half of the data –1 st quartile (Q 1 ) is one-quarter of the way up the list or is larger than 25% of the list –2 nd quartile (M) is the median, half of the way up the list or is larger than 50% of the list –3 rd quartile (Q 3 ) is three-quarters of the way up the list or is larger than 75% of the list Interquartile Range (IQR): is the difference between the 1 st and 3 rd quartiles or Q 3 - Q 1 = IQR
30
Weight Data: Sorted
31
10 0166 11 009 12 0034578 13 00359 14 08 15 00257 16 555 17 000255 18 000055567 19 245 20 3 21 025 22 0 23 24 25 26 0 Weight Data: Quartiles first quartile third quartile median or second quartile
32
Five-Number Summary minimum = 100 Q 1 = 127.5 M = 165 Q 3 = 185 maximum = 260 Interquartile Range (IQR) = Q 3 Q 1 = 57.5 IQR gives spread of middle 50% of the data
33
Diagramming the Basic Measures of Spread Five Number Summary: includes the minimum observation, Q 1, M, Q 3, and the maximum observation The five number summary is diagrammed using a box plot sometimes also known as a box and whisker plot. –a central box (IQR) spans the quartiles Q 1 and Q 3 –a line marks the median –lines (whiskers) extend from the box out to the minimum and maximum values of the observations –Any whisker that is longer then 1.5 times the IQR(the box) indicates the presence of outliers.
34
M Weight Data: Boxplot Q1Q1 Q3Q3 minmax 100 125 150 175 200 225 250 275 Weight
35
More Measures of Spread Variance (s 2 ): is the average of the squares of the deviations of the observations from the mean Standard Deviation (s): is the square root of the variance degrees of freedom: is equal to n – 1
36
Usefulness of Standard Deviation s measures the spread about the mean and should only be used when the mean is chosen as the measure of center s = 0 only when there is no spread. This happens only when all of the observations have the same value. Otherwise s > 0. As the observations become more spread out about the mean, s gets larger. s has the same units of measure as the original observations. s is NOT resistant. Strong skewness or a few outliers can greatly increase s.
37
Choosing Descriptive Statistics Use the five number summary and box plots for distributions with skewness or outliers Use mean and standard deviation for distributions that are symmetric Always plot your data; remember a picture is worth a thousand words. Keep in mind that bar graphs and pie charts are best for categorical variables and histograms, time plots and stem plots are best for quantitative variables.
38
Objectives for Class One Identify categorical and quantitative variables. Represent data graphically using: –bar charts –pie charts –histograms –stem plots –box plots –time plots Describe the distribution of a variable in terms of overall pattern and identify potential exceptions or outliers Compute standard measures of the center and spread of a distribution and interpret their values.
39
Next Week Class Three To Be Completed Before Class Two: Chapter 1: 24, 30, 32, 36, 44 Chapter 2: 26, 28, 38, 42, 50 Complete Quiz #1 Read Chapters 3, 4 & 5
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.