Download presentation
Presentation is loading. Please wait.
1
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.1 Chapter Two Graphical and Tabular Descriptive Techniques
2
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.2 Introduction & Re-cap… Descriptive statistics involves arranging, summarizing, and presenting a set of data in such a way that useful information is produced. Its methods make use of graphical techniques and numerical descriptive measures (such as averages) to summarize and present the data. Data Statistics Information
3
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.3 Populations & Samples The graphical & tabular methods presented here apply to both entire populations and samples drawn from populations. Population Sample Subset
4
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.4 Definitions… A variable is some characteristic of a population or sample. E.g. student grades. Typically denoted with a capital letter: X, Y, Z… The values of the variable are the range of possible values for a variable. E.g. student marks (0..100) Data are the observed values of a variable. E.g. student marks: {67, 74, 71, 83, 93, 55, 48}
5
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.5 Types of Data & Information Data (at least for purposes of Statistics) fall into three main groups: Interval Data Nominal Data Ordinal Data
6
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.6 Interval Data… Interval data Real numbers, i.e. heights, weights, prices, etc. Also referred to as quantitative or numerical. Arithmetic operations can be performed on Interval Data, thus its meaningful to talk about 2*Height, or Price + $1, and so on.
7
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.7 Nominal Data… Nominal Data The values of nominal data are categories. E.g. responses to questions about marital status, coded as: Single = 1, Married = 2, Divorced = 3, Widowed = 4 Because the numbers are arbitrary arithmetic operations don’t make any sense (e.g. does Widowed ÷ 2 = Married?!) Nominal data are also called qualitative or categorical.
8
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.8 Ordinal Data… Ordinal Data appear to be categorical in nature, but their values have an order; a ranking to them: E.g. College course rating system: poor = 1, fair = 2, good = 3, very good = 4, excellent = 5 While its still not meaningful to do arithmetic on this data (e.g. does 2*fair = very good?!), we can say things like: excellent > poor or fair < very good That is, order is maintained no matter what numeric values are assigned to each category.
9
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.9 Types of Data & Information… Categorical? Data Interval Data Nominal Data Ordinal Data N Ordered? Y Y N Categorical Data
10
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.10 E.g. Representing Student Grades… Categorical? Data Interval Data e.g. {0..100} Nominal Data e.g. {Pass | Fail} Ordinal Data e.g. {F, D, C, B, A} N Ordered? Y Y N Categorical Data Rank order to data NO rank order to data
11
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.11 Calculations for Types of Data As mentioned above, All calculations are permitted on interval data. Only calculations involving a ranking process are allowed for ordinal data. No calculations are allowed for nominal data, save counting the number of observations in each category. This lends itself to the following “hierarchy of data”…
12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.12 Hierarchy of Data… Interval Values are real numbers. All calculations are valid. Data may be treated as ordinal or nominal. Ordinal Values must represent the ranked order of the data. Calculations based on an ordering process are valid. Data may be treated as nominal but not as interval. Nominal Values are the arbitrary numbers that represent categories. Only calculations based on the frequencies of occurrence are valid. Data may not be treated as ordinal or interval.
13
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.13 Graphical & Tabular Techniques for Nominal Data… The only allowable calculation on nominal data is to count the frequency of each value of the variable. We can summarize the data in a table that presents the categories and their counts called a frequency distribution. A relative frequency distribution lists the categories and the proportion with which each occurs. Refer to Example 2.1Example 2.1
14
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.14 Nominal Data (Tabular Summary)
15
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.15 Nominal Data (Frequency) Bar Charts are often used to display frequencies…
16
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.16 Nominal Data (Relative Frequency) Pie Charts show relative frequencies…
17
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.17 Nominal Data It all the same information, (based on the same data). Just different presentation.
18
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.18 Graphical Techniques for Interval Data There are several graphical methods that are used when the data are interval (i.e. numeric, non-categorical). The most important of these graphical methods is the histogram. The histogram is not only a powerful graphical technique used to summarize interval data, but it is also used to help explain probabilities.
19
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.19 Building a Histogram… 1)Collect the Data (Example 2.4)Example 2.4 2)Create a frequency distribution for the data… How? a) Determine the number of classes to use… How? Refer to Table 2.6: With 200 observations, we should have between 7 & 10 classes… Alternative, we could use Sturges’ formula: Number of class intervals = 1 + 3.3 log (n)
20
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.20 Building a Histogram… 1)Collect the Data 2)Create a frequency distribution for the data… How? a) Determine the number of classes to use. [8] b) Determine how large to make each class… How? Look at the range of the data, that is, Range = Largest Observation – Smallest Observation Range = $119.63 – $0 = $119.63 Then each class width becomes: Range ÷ (# classes) = 119.63 ÷ 8 ≈ 15
21
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.21 Building a Histogram… 1)Collect the Data 2)Create a frequency distribution for the data… How? a) Determine the number of classes to use. [8] b) Determine how large to make each class. [15] c) Place the data into each class… each item can only belong to one class; classes contain observations greater than their lower limits and less than or equal to their upper limits.
22
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.22 Building a Histogram… 1)Collect the Data 2)Create a frequency distribution for the data. 3) Draw the Histogram…
23
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.23 Building a Histogram… 1)Collect the Data 2)Create a frequency distribution for the data. 3)Draw the Histogram.
24
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.24 Interpret… about half (71+37=108) of the bills are “small”, i.e. less than $30 There are only a few telephone bills in the middle range. (18+28+14=60)÷200 = 30% i.e. nearly a third of the phone bills are $90 or more.
25
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.25 Shapes of Histograms… Symmetry A histogram is said to be symmetric if, when we draw a vertical line down the center of the histogram, the two sides are identical in shape and size: Frequency Variable Frequency Variable Frequency Variable
26
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.26 Shapes of Histograms… Skewness A skewed histogram is one with a long tail extending to either the right or the left: Frequency Variable Frequency Variable Positively SkewedNegatively Skewed
27
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.27 Shapes of Histograms… Modality A unimodal histogram is one with a single peak, while a bimodal histogram is one with two peaks: Frequency Variable Unimodal Frequency Variable Bimodal A modal class is the class with the largest number of observations
28
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.28 Shapes of Histograms… Bell Shape A special type of symmetric unimodal histogram is one that is bell shaped: Frequency Variable Bell Shaped Many statistical techniques require that the population be bell shaped. Drawing the histogram helps verify the shape of the population in question.
29
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.29 Histogram Comparison… Compare & contrast the following histograms based on data from Example 2.6 & Example 2.7.Example 2.6 Example 2.7 The two courses have very different histograms… unimodal vs. bimodal spread of the marks (narrower | wider)
30
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.30 Stem & Leaf Display… Retains information about individual observations that would normally be lost in the creation of a histogram. Split each observation into two parts, a stem and a leaf: e.g. Observation value: 42.19 There are several ways to split it up… We could split it at the decimal point: Or split it at the “tens” position (while rounding to the nearest integer in the “ones” position) StemLeaf 4219 42
31
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.31 Stem & Leaf Display… Continue this process for all the observations. Then, use the “stems” for the classes and each leaf becomes part of the histogram (based on Example 2.4 data) as follows…Example 2.4 StemLeaf 00000000000111112222223333345555556666666778888999999 1000001111233333334455555667889999 20000111112344666778999 3001335589 4124445589 533566 63458 7022224556789 8334457889999 900112222233344555999 10001344446699 11124557889 Thus, we still have access to our original data point’s value!
32
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.32 Histogram and Stem & Leaf…
33
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.33 Ogive… (pronounced “Oh-jive”) is a graph of a cumulative frequency distribution. We create an ogive in three steps… First, from the frequency distribution created earlier, calculate relative frequencies:earlier Relative Frequency = # of observations in a class Total # of observations
34
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.34 Relative Frequencies… For example, we had 71 observations in our first class (telephone bills from $0.00 to $15.00). Thus, the relative frequency for this class is 71 ÷ 200 (the total # of phone bills) = 0.355 (or 35.5%)
35
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.35 Ogive… Is a graph of a cumulative frequency distribution. We create an ogive in three steps… 1) Calculate relative frequencies. 2) Calculate cumulative relative frequencies by adding the current class’ relative frequency to the previous class’ cumulative relative frequency. (For the first class, its cumulative relative frequency is just its relative frequency)
36
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.36 Cumulative Relative Frequencies… first class… next class:.355+.185=.540 last class:.930+.070=1.00 ::::
37
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.37 Ogive… Is a graph of a cumulative frequency distribution. 1) Calculate relative frequencies. 2) Calculate cumulative relative frequencies. 3) Graph the cumulative relative frequencies…
38
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.38 Ogive… The ogive can be used to answer questions like: What telephone bill value is at the 50th percentile? (Refer also to Fig. 2.13 in your textbook) “around $35”
39
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.39 Two Nominal Variables… So far we’ve looked at tabular and graphical techniques for one variable (either nominal or interval data). A contingency table (also called a cross-classification table or cross-tabulation table) is used to describe the relationship between two nominal variables. A contingency table lists the frequency of each combination of the values of the two variables…
40
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.40 Contingency Table… In Example 2.8, a sample of newspaper readers was asked to report which newspaper they read: Globe and Mail (1), Post (2), Star (3), or Sun (4), and to indicate whether they were blue-collar worker (1), white-collar worker (2), or professional (3).Example 2.8 This reader’s response is captured as part of the total number on the contingency table…
41
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.41 Contingency Table… Interpretation: The relative frequencies in the columns 2 & 3 are similar, but there are large differences between columns 1 and 2 and between columns 1 and 3. This tells us that blue collar workers tend to read different newspapers from both white collar workers and professionals and that white collar and professionals are quite similar in their newspaper choice. dissimilar similar
42
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.42 Graphing the Relationship Between Two Nominal Variables… Use the data from the contingency table to create bar charts… Professionals tend to read the Globe & Mail more than twice as often as the Star or Sun…
43
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.43 Graphing the Relationship Between Two Interval Variables… Moving from nominal data to interval data, we are frequently interested in how two interval variables are related. To explore this relationship, we employ a scatter diagram, which plots two variables against one another. The independent variable is labeled X and is usually placed on the horizontal axis, while the other, dependent variable, Y, is mapped to the vertical axis.
44
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.44 Scatter Diagram… Example 2.9 Example 2.9 A real estate agent wanted to know to what extent the selling price of a home is related to its size… 1)Collect the data 2)Determine the independent variable (X – house size) and the dependent variable (Y – selling price) 3)Use Excel to create a “scatter diagram”…
45
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.45 Scatter Diagram… It appears that in fact there is a relationship, that is, the greater the house size the greater the selling price…
46
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.46 Patterns of Scatter Diagrams… Linearity and Direction are two concepts we are interested in Positive Linear RelationshipNegative Linear Relationship Weak or Non-Linear Relationship
47
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.47 Time Series Data… Observations measured at the same point in time are called cross-sectional data. Observations measured at successive points in time are called time-series data. Time-series data graphed on a line chart, which plots the value of the variable on the vertical axis against the time periods on the horizontal axis.
48
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.48 Line Chart… From Example 2.10, plot the total amounts of U.S. income tax for the years 1987 to 2002…Example 2.10
49
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.49 Line Chart… From ’87 to ’92, the tax was fairly flat. Starting ’93, there was a rapid increase taxes until 2001. Finally, there was a downturn in 2002.
50
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.50 Summary I… Factors That Identify When to Use Frequency and Relative Frequency Tables, Bar and Pie Charts 1. Objective: Describe a single set of data. 2. Data type: Nominal Factors That Identify When to Use a Histogram, Ogive, or Stem-and-Leaf Display 1. Objective: Describe a single set of data. 2. Data type: Interval Factors that Identify When to Use a Contingency Table 1. Objective: Describe the relationship between two variables. 2. Data type: Nominal Factors that Identify When to Use a Scatter Diagram 1. Objective: Describe the relationship between two variables. 2. Data type: Interval
51
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 2.51 Summary II… Interval Data Nominal Data Single Set of Data Histogram, Ogive, or Stem-and-Leaf Display Frequency and Relative Frequency Tables, Bar and Pie Charts Relationship Between Two Variables Scatter DiagramContingency Table, Bar Charts
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.