Unit 4: A Brief Look at the World of Statistics

Slides:



Advertisements
Similar presentations
UNIT 8: Statistical Measures
Advertisements

Unit 1.1 Investigating Data 1. Frequency and Histograms CCSS: S.ID.1 Represent data with plots on the real number line (dot plots, histograms, and box.
Introduction to Summary Statistics
1 Excursions in Modern Mathematics Sixth Edition Peter Tannenbaum.
UNIT 8:Statistical Measures Measures of Central Tendency: numbers that represent the middle of the data Mean ( x ): Arithmetic average Median: Middle of.
 The mean is typically what is meant by the word “average.” The mean is perhaps the most common measure of central tendency.  The sample mean is written.
Statistics topics from both Math 1 and Math 2, both featured on the GHSGT.
Statistics and Data Analysis
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Descriptive Statistics ( )
Chapter 1: Exploring Data
INTRODUCTION TO STATISTICS
Chapter 1: Exploring Data
CHAPTER 2: Describing Distributions with Numbers
Statistics Principles of Engineering © 2012 Project Lead The Way, Inc.
Introduction to Summary Statistics
Introduction to Summary Statistics
Description of Data (Summary and Variability measures)
Introduction to Summary Statistics
CHAPTER 1 Exploring Data
Numerical Descriptive Measures
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Introduction to Summary Statistics
Introduction to Summary Statistics
Statistics Principles of Engineering © 2012 Project Lead The Way, Inc.
Introduction to Summary Statistics
CHAPTER 1 Exploring Data
Statistics: The Interpretation of Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Statistics Principles of Engineering © 2012 Project Lead The Way, Inc.
CHAPTER 2: Describing Distributions with Numbers
Chapter 1: Exploring Data
Introduction to Summary Statistics
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Summary (Week 1) Categorical vs. Quantitative Variables
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Introduction to Summary Statistics
Chapter 1: Exploring Data
Chapter 1: Exploring Data
11.1 Find Measures of Central Tendency & Dispersion
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Introduction to Summary Statistics
CHAPTER 1 Exploring Data
Advanced Algebra Unit 1 Vocabulary
Probability and Statistics
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Compare and contrast histograms to bar graphs
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Ch. 12 Vocabulary 9.) measure of central tendency 10.) outlier
Chapter 1: Exploring Data
UNIT 8: Statistical Measures
Presentation transcript:

Unit 4: A Brief Look at the World of Statistics WHAT IS STATISTICS? Unit 4: A Brief Look at the World of Statistics

WEBSTER’S DEFINITION

Another way to think about it:

According to our Math 2 book: Statistics are numerical values used to summarize and compare sets of data.

NEW TERMINOLOGY

CENTRAL TENDENCY & MEASURES OF DISPERSION DAY 1 CENTRAL TENDENCY & MEASURES OF DISPERSION

EXPLORATORY DATA ANALYSIS

CENTER: Measures of central tendency Mean the traditional “average” of a data set. This can be found by adding up all of the values and dividing by the number of values Median this is the value that would be in the middle of the data set if all of the value were written in order. Mode this is the value in a data set that occurs the most frequently.

CENTER: MEAN Mean—the traditional “average” of a data set. This can be found by adding up all of the values and dividing by the number of values. Example: The grades for a quiz are as follows: 90 70 96 92 69 53 70 87 80 89 78 72 91 76 97 70 82 75 74 72 84 76 So the mean is 79.2273 (note the symbol used).

CENTER: MEDIAN Median—this is the value that would be in the middle of the data set if all of the value were written in order. Example: The grades for a quiz are as follows: 90 70 96 92 69 53 70 87 80 89 78 72 91 76 97 70 82 75 74 72 84 76 First, put them in order: 53,69,70,70,70,72,72,74,75,76,76,78,80,82,84,87,89,90,91,92,96,97with 22 numbers, the median will be the average of the two “middle” numbers. In this case, 76 and 78 are the 11th and 12th terms. Therefore, the median is 77.

CENTER: MODE Mode—this is the value in a data set that occurs the most frequently. Example: The grades for a quiz are as follows: 90 70 96 92 69 53 70 87 80 89 78 72 91 76 97 70 82 75 74 72 84 76 Put them in order (this helps detect the mode(s)): 53,69,70,70,70,72,72,74,75,76,76,78,80,82,84,87,89,90,91,92,96,97 We can see that 70 occurs three times while the next highest occurrence is seen only two times. Therefore, 70 is the mode. Note, this doesn’t tell us very much about the set of data as a whole. Also note, there can be no mode or multiple modes.

SPREAD: RANGE Range—this is a simplistic measure of spread that is calculated as the difference between the greatest and least data values. Example: The grades for a quiz are as follows: 90 70 96 92 69 53 70 87 80 89 78 72 91 76 97 70 82 75 74 72 84 76 First, put them in order: 53,69,70,70,70,72,72,74,75,76,76,78,80,82,84,87,89,90,91,92,96,97 We can see that the lowest number is 53 and the highest is 97. Therefore, the range is 44 (found by 97-53).

Assignment Write a paragraph about what would be your most useful indicator of central tendency in this case: You are ordering a small number of shoes for your shoe store based on the sales of sizes from last year. What would you use (mean, median, mode, range) to represent your data of shoe sizes? Why did you choose that method? Why didn’t you choose the other 3?

MEASURES OF DISPERSION DAY 2 MEASURES OF DISPERSION

SPREAD: Measures of dispersion Range this is a simplistic measure of spread that is calculated as the difference between the greatest and least data values. Mean Absolute Deviation you learned about this measure last year. It is the average of the absolute deviations from the mean. Standard Deviation this is a more complex calculation that is the most commonly used measure of spread in the practice of statistics. Interquartile Range (IQR) this is calculated as the difference between the 3rd and 1st quartiles. It is often used to help calculate outliers.

SPREAD: MEAN ABSOLUTE DEVIATION Mean Absolute Deviation—you learned about this measure last year. It is the average of the absolute deviations from the mean. Example: The grades for a quiz are as follows: 90 70 96 92 69 53 70 87 80 89 78 72 91 76 97 70 82 75 74 72 84 76 Recall, the mean is 79.2273, so to calculate the mean absolute deviation we subtract the mean from each value, take the absolute value, add up all such values, and divide by the number of values. So the mean absolute deviation is 8.7025

SPREAD: STANDARD DEVIATION Standard Deviation—this is a more complex calculation that is the most commonly used measure of spread in the practice of statistics. Example: The grades for a quiz are as follows: 90 70 96 92 69 53 70 87 80 89 78 72 91 76 97 70 82 75 74 72 84 76 Recall, the mean is 79.2273, so to calculate the standard deviation we subtract the mean from each value, square this value, add up all such values, and divide by the number of values. Then take the square root. N-1=( -1) = 21

Calculate Standard Deviation Steps Calculate the mean X Step 2 Subtract the mean from each value Xi Step 3 Square the value found in step 2 Step 4 Sum up all the values in step 3 divide by n-1 Step 5 Square root

Variance, what is it? Variance is the average distance between data points Just so you are aware, variance = standard deviation squared. So, variance = while, standard deviation = Of course, that means you can also consider the standard deviation to be the square root of the variance. Our book doesn’t directly address variance, but you may see it in some situations.

SPREAD: INTERQUARTILE RANGE (IQR) Interquartile Range (IQR)—this is calculated as the difference between the 3rd and 1st quartiles. It is often used to help calculate outliers. Example: The grades for a quiz are as follows: 90 70 96 92 69 53 70 87 80 89 78 72 91 76 97 70 82 75 74 72 84 76 First, put them in order: 53,69,70,70,70,72,72,74,75,76,76,78,80,82,84,87,89,90,91,92,96,97 Then divide them into 4 equal sets Now there are 5.5 (22/4) values in each quarter of the data set.

SPREAD: INTERQUARTILE RANGE (IQR) cont. 53,69,70,70,70,72,72,74,75,76,76,78,80,82,84,87,89,90,91,92,96,97 We had already determined that the Median was 77 (avg of 76 & 78). That divided the set into two halves. To find Q1 and Q3, we simply find the median of the first and second halves. Seen here, Q1 is 72, the Median is 77 and Q3 is 89. So, the IQR = Q3 – Q1 = 17 This measure essentially lets us know how close together the middle 50% of all the data is located. Or how far spread out is the middle 50%. Q1 M Q3

5 Number Summary The 5 Number Summary is: Minimum Q1 Median Q3 Maximum For our example, the minimum (lowest value) was 53 and the maximum (highest value) was 97. So our 5 Number Summary for this data set is: 53 72 77 89 97

OUTLIERS: Deviations from the majority of the data When you look at a graph for a set of data, an outlier is typically a visibly different point. It will not “fit” with the rest of the data. There are multiple ways to define an outlier. An outlier is a data point that is more than two standard deviations from the mean.

Outlier Example Assume that the mean is 75 and the standard deviation is 11. We would consider anything about a 97 an outlier. Likewise, we would consider anything below a 53 and outlier. To determine if there are any outliers, we simply look at the set of data to see if there are any values more than 2 standard deviations away from the mean.

Tonight’s Assignment Worksheet 1: Absolute Mean Deviation Standard Deviation Finding Outliers

Exploring Basic One Variable Graphs DAY 3 Exploring Basic One Variable Graphs

DISTRIBUTION?

How do we display a distribution?

Why do we graph?

What graph & when?

Categorical variables Given Categorical variables, we can use bar charts and pie charts to express them in a visual manner. Ex. 1 – For the given data, create a bar chart and a pie chart to express them in a clear and visual manner. Favorite Music Genre Count (thousands) Percent Classical 20 6.5% Rock 100 32.3% Country 40 12.9% Alternative 90 29.0% Heavy metal 60 19.3%

BAR CHART

Pie Chart

Graphing Focus In this class, we will not be creating graphs for categorical variables. We will focus on graphing quantitative variables. By virtue of learning how to create these, you should be more comfortable reading these graphs when they are presented to you.

Constructing Histograms

Interpreting Histograms (and other similar graphs) There are really three things for us to consider: CENTER SPREAD (or dispersion) OUTLIERS We have already spent some time exploring measures of center and spread. We want to also consider outliers.

Tonight’s Assignment Worksheet 2: Histograms

DAY 4 Samples & Populations

How can we gather data about a very large group?

Population vs. Sample A population is a group of people or objects that you want information about. A sample is a subset of a population. Example: The height of 15 year old girls in the U.S. Example: A sample of 15 year old girls in the U.S.

Types of Samples Self-selected sample: People volunteer to participate in the group. Systematic sample: A rule is used to select members of a population (people or data) to participate in the group. Convenience sample: The easiest members of a population are selected (such as people sitting in the 1st row). Random sample: Each member of the popultation has an equal chance of being selected.

What is good & bad about these sample types? Self-selected sample: People volunteer to participate in the group. Systematic sample: A rule is used to select members of a population (people or data) to participate in the group. Convenience sample: The easiest members of a population are selected (such as people sitting in the 1st row). Random sample: Each member of the population has an equal chance of being selected.

The Goal: Unbiased Sample An unbiased sample accurately represents the population. A biased sample may over-represent some members of the population, so is less likely to represent the entire population accurately.

How do we know if we have a good sample?

Margin of Error We can calculate how closely a sample measures the exact population by using the Margin of Error. The Margin of Error gives a limit on how much the sample data will vary from the entire population data. It is calculated as: P = percent responding a certain way. N= total in sample

Lunch Habits In a survey of 990 workers, 30% said that they eat lunch at home during a typical work week. What is the Margin of Error for the survey? What is the interval of workers that is likely to contain the exact percent of all workers who eat at home each week.

Lunch Habits In a survey of 990 workers, 30% said that they eat lunch at home during a typical work week. What is the Margin of Error for the survey? Margin of Error = What is the interval of workers that is likely to contain the exact percent of all workers who eat at home each week. Find the low end and high end of the population range: So, between 26.8% and 33.2% of all workers are likely to eat lunch at home each week.

Tonight’s Assignment Text Book: p. 270 # 1-25 ODD p. 275 # 5-9 ALL QUIZ TOMORROW on Central Tendency, Samples & Populations!

Normal Distribution Normal Distribution – the modeling of data in a bell shaped curve Normal Curve – the bell of the bell curve

The Bell Curve Remember μ and x are mean and σ is standard deviation

Normal Curve – What does it mean? 68.2% of the data falls within 1 standard deviation of the mean (center) 95.4% of the data falls within 2 standard deviations of the mean (center) 99.7% of the data falls within 3 standard deviations of the mean (center)

Empirical Rule The Empirical Rule is the 68.2% - 95.4% - 99.7% ratio 68.2% within 1 standard deviation (left or right) 95.4% within 2 standard deviations (left or right 99.7% within 3 standard deviations (left or right) Empirical rule

How do we apply this? We can use the bell curve to solve for probability by converting the % into a decimal (hint – move the decimal 2 places to the right and get rid of the %) 68.2% ---- 0.682

Probability What is the probability that the data will fall between μ and μ+σ Read the graph  34.1% So the probability will be … 0.341

Another Example…. What is the probability that the data will fall between μ-σ and μ+2σ? Find μ-σ on the graph, add up all the %s to μ+2σ 34.1% + 34.1% + 13.6% = 82.2% Probability …….0.822

One more example Find the probability of the data falling 0.0215

Last one A normal distribution has a mean of 27 and a standard deviation of 5. What is the probability that a randomly selected value will be between 22 and 37. Step 1 – draw and label your bell curve with the data given

Graph the bell curve 13.6% 34.1% 34.1% 2.15% 13.6% 2.15% 12 17 22 27 32 37 42 Now, add up all the %s between 22 and 37 – what do you get? 81.8% Probability? 0.818