1 Describing Categorical Data Here we study ways of describing a variable that is categorical.

Slides:



Advertisements
Similar presentations
1 Types of Data. 2 Data As we get started in this chapter say as a research project we want to learn more about faculty at WSC. Say we gather information.
Advertisements

Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Slide 1 Spring, 2005 by Dr. Lianfen Qian Lecture 2 Describing and Visualizing Data 2-1 Overview 2-2 Frequency Distributions 2-3 Visualizing Data.
1 Normal Probability Distributions. 2 Review relative frequency histogram 1/10 2/10 4/10 2/10 1/10 Values of a variable, say test scores In.
Chapter 2 Graphs, Charts, and Tables – Describing Your Data
1 Describing Qualitative Data Here we study ways of describing a variable that is qualitative.
Chapter 2 Presenting Data in Tables and Charts
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 2-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Presenting Data in Tables & Charts Organizing Numerical Data.
Ch. 2: The Art of Presenting Data Data in raw form are usually not easy to use for decision making. Some type of organization is needed Table and Graph.
Chapter 2 Graphs, Charts, and Tables – Describing Your Data
Chapter 2 Describing Data Sets
Organizing Information Pictorially Using Charts and Graphs
Statistics-MAT 150 Chapter 2 Descriptive Statistics
1 Describing Categorical Data Here we study ways of describing a variable that is categorical or qualitative.
1 Describing Quantitative Data Here we study ways of describing a variable that is quantitative.
CHAPTER 1: Picturing Distributions with Graphs
Frequency Distributions and Graphs
2.1: Frequency Distributions and Their Graphs. Is a table that shows classes or intervals of data entries with a count of the number of entries in each.
Basic Descriptive Statistics Chapter 2. Percentages and Proportions Most used statistics Could say that 927 out of 1,516 people surveyed said that hard.
Welcome to Data Analysis and Interpretation
CHAPTER 2 Frequency Distributions and Graphs. 2-1Introduction 2-2Organizing Data 2-3Histograms, Frequency Polygons, and Ogives 2-4Other Types of Graphs.
Descriptive Statistics
8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually displaying large sets of data so.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Describing Data: Frequency Tables, Frequency Distributions, and Graphic Presentation Chapter 3.
Sect. 2-1 Frequency Distributions and Their graphs
The Diminishing Rhinoceros & the Crescive Cow Exploring, Organizing, and Describing, Qualitative Data.
Business Statistics **** Management Information Systems Business Statistics Third level First mid-term: Instructor: Dr. ZRELLI Houyem Majmaah.
Lecture 2 Graphs, Charts, and Tables Describing Your Data
Basic Business Statistics Chapter 2:Presenting Data in Tables and Charts Assoc. Prof. Dr. Mustafa Yüzükırmızı.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 2 Descriptive Statistics: Tabular and Graphical Methods.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 2 Section 1 – Slide 1 of 27 Chapter 2 Section 1 Organizing Qualitative Data.
Chapter 1.4. Variable: any characteristic whose value may change from one individual to another Data: observations on single variable or simultaneously.
ORGANIZING QUALITATIVE DATA 2.1. FREQUENCY DISTRIBUTION Qualitative data values can be organized by a frequency distribution A frequency distribution.
 Frequency Distribution is a statistical technique to explore the underlying patterns of raw data.  Preparing frequency distribution tables, we can.
Chapter 2 Describing Data.
Basic Statistics  Statistics in Engineering  Collecting Engineering Data  Data Summary and Presentation  Probability Distributions - Discrete Probability.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 2-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
STATISTICS. Statistics * Statistics is the area of science that deals with collection, organization, analysis, and interpretation of data. * A collection.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Section 2-2 Frequency Distributions.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 2-1 Chapter 2 Presenting Data in Tables and Charts Statistics For Managers 4 th.
Chap 2-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course in Business Statistics 4 th Edition Chapter 2 Graphs, Charts, and Tables.
Chapter 11 Data Descriptions and Probability Distributions Section 1 Graphing Data.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Describing Data: Frequency Tables, Frequency Distributions, and Graphic Presentation Chapter 2.
Sect. 2-1 Frequency Distributions and Their graphs Objective SWBAT construct a frequency distribution including limits,boundaries, midpoints, relative.
Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Describing Data: Frequency Tables, Frequency Distributions, and Graphic Presentation Chapter 2.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 2 Section 2 – Slide 1 of 37 Chapter 2 Section 2 Organizing Quantitative Data.
Frequency Distributions and Graphs. Organizing Data 1st: Data has to be collected in some form of study. When the data is collected in its’ original form.
Descriptive Statistics: Tabular and Graphical Methods
Organizing Quantitative Data: The Popular Displays
Descriptive Statistics: Tabular and Graphical Methods
The Diminishing Rhinoceros & the Crescive Cow
Organizing Qualitative Data
Describing Data: Frequency Tables, Frequency Distributions, and Graphic Presentation Chapter 2.
Chapter(2) Frequency Distributions and Graphs
Chapter 2: Methods for Describing Data Sets
Unit 4 Statistical Analysis Data Representations
Describing Data: Frequency Tables, Frequency Distributions, and Graphic Presentation Chapter 2.
Describing Data: Frequency Tables, Frequency Distributions, and Graphic Presentation Chapter 2.
Chapter 2 Presenting Data in Tables and Charts
Lecture 3 part-2: Organization and Summarization of Data
Describing Data: Frequency Tables, Frequency Distributions, and Graphic Presentation Chapter 2.
Sexual Activity and the Lifespan of Male Fruitflies
Describing Data: Frequency Tables, Frequency Distributions, and Graphic Presentation Chapter 2.
Organizing Qualitative Data
Organizing, Displaying and Interpreting Data
Describing Data: Frequency Tables, Frequency Distributions, and Graphic Presentation Chapter 2.
Presentation transcript:

1 Describing Categorical Data Here we study ways of describing a variable that is categorical.

2 Say 50 people you know purchased a soft drink from a machine recently. A variable of interest might be the BRAND PURCHASED. Say the brands are made up of the 5 soft drinks Coke Classic, Diet Coke, Dr. Pepper, Pepsi-Cola, and Sprite (of course there are more varieties of soft drinks, but this is an illustrative example.) Here each specific brand represents a different value on the variable brand purchased. Each specific brand represents an nonoverlapping class – each specific class represents a mutually exclusive category. Here the variable brand purchased is a categorical or qualitative variable - values of the variable represent categories. One thing that makes sense to do is ask each of the 50 people what they purchased. Then we could count the number of people who purchased Coke Classic and the others. The total number of people of the 50 who purchased Coke Classic would be the frequency.

3 Soft DrinkFrequencyRelative Frequency Percent Frequency Coke Classic Diet Coke Dr. Pepper Pepsi-Cola Sprite Total

4 The first two columns on the previous screen, the Soft Drink and Frequency columns, make up what is called a frequency distribution. It is a tabular summary of data showing the number, or frequency, of items in each of several nonoverlapping classes. The third column shows the relative frequency. We need the second column to create the third. To get the relative frequency in each row take the frequency in that row and divide by the total frequency. The fourth column shows the percent frequency. The fourth column equals the third column multiplied by 100.

5 Do you know why we put information in columns? Because then we can call’um as we see’um. Sorry:) So, the frequency, relative frequency and percent frequency distributions are different ways of summarizing information about a categorical variable. Notes about our table. 1) The total, or sum, of the frequency column is equal to the number of observations, sometimes called n, in general. 2) The total, or sum, of the relative frequency column is equal to 1. 3) The total, or sum, of the percent frequency column is equal to 100 (sometimes it may be a little off due to rounding of decimal places).

6 In our example here we had 50 people and we asked what soft drink they purchased. Studies occur that have thousands of people and they are asked several questions. Using a computer can help in the counting of responses. Bar Graphs Bar graphs just put the frequency, relative frequency and percent frequency distributions into visual form. The form is a graph with certain properties. The horizontal axis does not have numbers on it and the axis represents the categories. In our soft drink example we would put each brand in a different location on the axis.

7 Imagine you have a piece of construction paper that is red. Do you remember way back when in school you would cut strips of paper and then curl the paper with the scissors? Well, we will not need to curl the paper here! I mention this silly example because I want you to think about cutting strips that are one inch wide. The height of each strip would then represent the frequency, relative frequency or percent frequency on the variable. You would tape each strip onto the graph above each category. (You could also put the bars sideways.) So the vertical axis, or height, in the bar graphs is either the frequency, relative frequency or percent frequency distributions. In constructing the bar graph on a qualitative variable a space is left between each bar to help us remember we have a qualitative a variable.

8 This is an example of what a percent frequency graph would look like. The variable is “what is the type of area in which you live” and the height of each bar is the percent frequency. (See how each bar is like a cut out from a piece of paper?)

9 Pie Charts Say we order a pizza pie and it is cut up into pieces. Below I show a pizza pie cut, and I wanted it to show it cut into slices that hits the middle. If you get a quarter of the pie, you get one of the sections shown of the pie is an example of the relative frequency. So, par charts show each category getting its relative share of the pie. A pie chart could really be the frequency, relative frequency or percent frequency pie, but the size of each piece of the pie is always the relative frequency.

10 Describing Numerical Data Here we study ways of describing a variable that is numerical.

11 Numerical variables have values that are numbers. Remember that categorical variables may use numbers, but the variable really has values that represent groups. Example of a categorical variable: eye color 1 = blue, 2 = green, 3 = red (especially on Friday morning ). Our initial method of describing a numerical variable will be basically the same as with a qualitative variable, with some modification in our understanding. Let’s consider the variable age. Consider the first 20 people you see today. Consider yourself if you look in the mirror, but just count yourself once. The age of these folks could be 1 day to 110 years in Nebraska, right?

12 Remember, a frequency distribution is a tabular summary of data showing the number, or frequency, of items in each of several nonoverlapping classes. With a variable like eye color (qualitative), we typically make each color a class. But with a variable like age (quantitative), if we make each age a class then we could have so many classes that the distribution is hard to interpret. The authors suggest grouping the ages into classes and having anywhere from 5 to 15 classes. Let’s digress for a minute and think about a data set. Say I have data on people. Say I have social security number, eye color, age and blood alcohol level last Thursday night at 11:30. On the next screen I have what the data might look like in Excel, or other computer programs. Note each column is a variable. Each row represents a person in this example. Thus in each row we see the values of the variables for each person.

13 SS#Eye colorageBlood alcohol level Blue Blue Green Blue Brown Red22.023

14 The reason for my digression was to have you begin to think about data sets. (Typically) A variable is in a column. The values down the column are for different people (or what ever the subject might be). I believe it is useful to think about data as you consider statistical ideas. Here we are looking at how to describe a column of data, one variable. Now, when we have a numerical variable like age we have to think about how many classes to have. We want each class to have more than a few people in it. For now, let’s not worry too much about how many classes to have. The “width” of each class should be equal. Using age as an example, we might have classes that have 5 consecutive ages included. The first class might be year olds, then year olds and so on.

15 Class “limits” need to be considered. Each person should be in only one class. Each class has a lower limit and an upper limit and these limits are exclusive to the class. On the next screen I have an example of the frequency, relative frequency and percent frequency distributions for the variable age for 50 people. The frequency column is just the counting of the number of people in each class. The relative frequency is the frequency of each class divided by the total number of people in the data set. The percent frequency is the relative frequency times 100. (Look back at the distributions we had for the qualitative variable. Does it look the same?)

16 AgeFrequencyRelative Frequency Percent Frequency Total

17 Do you know why we put information in columns? Because then we can call’um as we see’um. Sorry:) So, the frequency, relative frequency and percent frequency distributions are different ways of summarizing information about a numerical variable. Notes about our table. 1) The total, or sum, of the frequency column is equal to the number of observations, n. 2) The total, or sum, of the relative frequency column is equal to 1. 3) The total, or sum, of the percent frequency column is equal to 100.

18 Bar graphs are used for qualitative variables. What amounts to the same thing for quantitative variables are called histograms. Histograms just put the the frequency, relative frequency and percent frequency distributions into visual form. The form is a graph with certain properties. The variable of interest is put along the horizontal axis. We would have the variable age on the axis.

19 Imagine you have a piece of construction paper that is blue. Do you remember way back when in school you would cut strips of paper and then curl the paper with the scissors? Well, we will not need to curl the paper here! I mention this silly example because I want you to think about cutting strips that are of the same with and are as wide as the class width (remember class widths are equal). The height of each strip would then represent the frequency, relative frequency or percent frequency on the variable. You would tape each strip onto the graph above each category. So the vertical axis, or height, in the bar graphs is either the frequency, relative frequency or percent frequency distributions. In constructing the histogram on a quantitative variable THERE IS NO SPACE between each bar to help us remember we have a quantitative variable.

20 Pie Charts The authors do not mention it, but pie charts could be made in a similar fashion to what we saw before. Cumulative Distributions Have you every accumulated a bunch of junk in your room? Yea, me to. Each day more stuff just shows up. So tomorrow I will have all the stuff I have today and more. Cumulative distributions are kind of like my story. When you look at the frequency distribution we just saw, a slight modification can make then into cumulative distributions. For the cumulative frequency, start with the first class in the first row. The cumulative value for this row is the frequency.

21 But the cumulative value for the second row is the frequency for the first row plus the frequency for the second row. So to get the cumulative frequency for a given row, add up the frequencies for that row and all previous rows. The cumulative relative frequency and cumulative percent frequency are found as before: cumulative relative frequency is cumulative frequency divided by total and the cumulative percent frequency is the cumulative relative frequency times 100. What’s a henway? About 4 or 5 pounds! What’s an Ogive? It is what we call a graph of a cumulative frequency distribution. The horizontal axis has values of the variable and the vertical axis has the appropriate cumulative frequency.

22 What is the most frequently occurring age group in this example? How many times does it occur? (the group is and the frequency 17)

23 This is a frequency Ogive (or polygon). Note here that what was accumulated was just the frequency. The highest frequency is 50 because that was the total number of folks in the study. What would be the highest value if we had a cumulative relative frequency? (1, right?)

Summary 24 With both categorical and numerical data one way to summarize the data is to look at frequency information of groups. The categorical data are already in natural groups and the numerical data has to be grouped. Then we might look at the frequency, relative frequency, or the percent frequency of each group. The relative frequency of a group = group frequency divided by the total frequency across all groups. The percent frequency is the relative frequency times 100. Bar charts, pie charts and histograms are based on these ideas.