Introduction to Statistics

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

Descriptive Measures MARE 250 Dr. Jason Turner.
CHAPTER 1 Exploring Data
Objectives 1.2 Describing distributions with numbers
1.3: Describing Quantitative Data with Numbers
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
Introduction to Statistics
Chapter 1: Exploring Data
Chapter 3 Describing Data Using Numerical Measures
CHAPTER 2: Describing Distributions with Numbers
CHAPTER 2: Describing Distributions with Numbers
Statistical Reasoning
The Practice of Statistics, Fourth Edition.
Laugh, and the world laughs with you. Weep and you weep alone
Chapter 3 Describing Data Using Numerical Measures
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
DAY 3 Sections 1.2 and 1.3.
Please take out Sec HW It is worth 20 points (2 pts
Topic 5: Exploring Quantitative data
Warmup What is the shape of the distribution? Will the mean be smaller or larger than the median (don’t calculate) What is the median? Calculate the.
Measure of Center And Boxplot’s.
Lesson 1: Summarizing and Interpreting Data
Measure of Center And Boxplot’s.
Displaying Distributions with Graphs
Displaying and Summarizing Quantitative Data
CHAPTER 1 Exploring Data
Displaying and Summarizing Quantitative Data
Describing Quantitative Data with Numbers
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 2: Describing Distributions with Numbers
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Honors Statistics Review Chapters 4 - 5
CHAPTER 2: Describing Distributions with Numbers
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Lesson – Teacher Notes Standard:
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
The Five-Number Summary
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Compare and contrast histograms to bar graphs
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Presentation transcript:

Introduction to Statistics Topics 7 - 10 Nellie Hedrick

Topic 7 – Displaying and Describing Distribution Center – the center of data distribution is the most important part of the data analyzing Spread, variability, consistency – how data are distributed is a second important part of data analysis. Shape of distribution third important component of analyzing data.

Symmetric and Skew Distribution Skewed to the Left Skewed to the Right Symmetric – Single Pick Symmetric – Two Picks

Graphical Representations of Data Quantitative Variables Stem plot (21, 20, 40, 22, 31, 19, 25, 23, 22, 18, 10) Stem Leaf Stem Leaf 1 2 3 4 980 102532 1 1 2 3 4 089 012235 1

Activity 7-5 Exercise 7-10 Exercise 7-21

Definition Side by side Stem plot- common set of stems is placed in the middle of the display with leaves branching out in either direction to the left and right. The convention is to order the leaves from the middle out from least to greatest. Histogram is graphical display similar to dot plot or stem plot. Histogram is more feasible with the larger dataset. Construct the range data into subintervals (bins) of equal length. Counting the number(frequency) of observational units in each subinterval. The bar height represent proportions (relative frequencies) of observational units in the subinterval.

Wrap up, Watch out and in Brief Direction of skewed is the indicated by the longer tail Pay attention to the units of the stem plot Pay attention to outliers – identify them, investigate possible explanations for their occurrences. Make sure if it is not typo error Remember context! Your description of the data should be clear for everyone to be able to read. Remember to label Examine different type of graph to see which gives you better representation Anticipate features of the data by considering the nature of the variable involved.

Topic 8 – Measures of Center Mean – is the ordinary average. It is calculated by adding all the numbers and dividing it by the number of observational units. Median – the value of the middle observational units when observational units are sorted low to high. Median of the odd number of observational units is in (n+1)/2 location Median of even number of observational units in average of the middle two values. Resistant, a measure whose value is relatively unaffected by the presence of outliers in a distribution. Median is resistant, mean is not. Mode - numerical value that appears more often in a distribution.

Describing Distributions with Numbers Example: 20, 40, 22, 22, 21, 31, 19, 25, 23 Mean - Average Median – Measuring Center Mode – Most repeated Minimum – smallest value Maximum – largest value in the data set

Describing Distributions with Numbers Example: 20, 40, 22, 22, 21, 31, 19, 25, 23 Mean – Average Median – Measuring Center Minimum Maximum Mode Sort the data: 19 20 21 22 22 23 25 31 40 Median: 9 different data + 1 is 10, the divide by 2 is 5 so the median is the 5th location. (22) Minimum = 19, Maximum = 40, Mode = 22

Describing Distributions with Numbers Example: 20, 40, 22, 22, 21, 31, 19, 25, 23 Mean - Average Median – Measuring Center Minimum Maximum Mode TI83: [1.edit] Enter all the data in the example 1 for L1. Press  after each entry. After completing data entry, press [Quit] [calc] [1:1-var stats]  [L1] . Use (or ) to view all the information.

Median and Mean of a Density Curve symmetric Mean Median Mode Mean Mean Median Mode Mode Median Skewed right Skewed left

Wrap up and Warning - Center is a property. Mean and median are two ways to measure center. Neither one is synonymous with center. Either one is have their own properties and straight. Center is only one aspect of a distribution of data. Measures of center do not tell the whole story. Other important features are spread, shape, cluster and outliers. Mode does not apply to categorical as well as quantitative variables. Notion of center does not make sense in categorical values.

Exercise 8-7 page 161 Exercise 8-9 page 161 Exercise 8-17 page 163

Topic 9 – Measures of Spread Range – difference between maximum and minimum Lower quartile – data located ¼th = 25% location Upper quartile – data located 3/4th = 75% location Inter quartile range (IQR) difference between upper and lower quartile Start here

Measuring the Spread The Standard Deviation (s) – Square root of the Variance Standard deviation: Measure of the spread about the mean of a distribution. It is an average of the squares of the deviations of the observations from their mean, also equal to the square root of the variance.

Describing Distributions with Numbers Be aware that various software packages and calculators might use slightly different rules for calculating quartiles It can be tempting to regard range and IQR as an interval of values, but they should each be reported as a single number that measures the spread of the distribution Measure of spread apply only to quantitative variables, not categorical ones.

Activity 9-5 page 182 Exercise 9-12 page 190 Exercise 9-22 page 193

Watch out Variability can be tricky concept to grasp! But it is the absolute fundamental to working with data When looking at the variable distribution, make sure to focus on variability in the horizontal values (the variable) and not the heights (frequency) The number of distinct values represented in a histogram does not necessary indicates greater variability. Consider how far the values fall from the center more than the variety of their exact numerical values.

Mound-Shaped Distribution – Empirical rule 68% of data fall within one standard deviation from Mean 95% of data fall within two standard deviation from Mean 99.7% of data fall within three standard deviation from Mean 68% 95% 99.7% The 68-95-99.7 rule

Attendance at a university's basketball games follows a normal distribution with mean µ = 8,000 and standard deviation σ = 1,000. Use the 68–95–99.7 rule and give your answer as a percent. Estimate the percentage of games that have between 6,000 to 8,000 people in attendance. Estimate the percentage of games that have more than 7000 people in attendance Estimate the percentage of games that have less than 6,000 people in attendance Estimate the percentage of games that have less than 8,000 people in attendance Estimate the percentage of games that have less than 5,000 people in attendance Estimate the percentage of games that have more than 10,000 people in attendance

Mound-Shaped Distribution – Empirical rule 68% of data fall within one standard deviation from Mean 95% of data fall within two standard deviation from Mean 99.7% of data fall within three standard deviation from Mean 34% 34% 13.5% 13.5% 2.35% 2.35% 0.15% 0.15% The 68-95-99.7 rule

The Standard Normal Distribution As 68-95-99.7 rule suggest all the normal distribution share a common property. Z-score The z-score is process of standardization. If x is an observation from a distribution that has a mean  and standard deviation , the standardized value of x is

Calculating Standard Normal Z Example: Calculate standard normal for x = 120, where Mean  =170 and standard deviation  = 30. µ = 170  = 30 120 µ = 0  = 1 -1.67

Normal distribution Same Mean, but different standard deviation (S2 < S1) larger spread with larger standard deviation. S2 S1

The length of human pregnancies from conception to birth is known to be normally distributed with a mean of 266 days and standard deviation of 16 days. 1. What proportion of pregnancies last between 250 and 282 days? 2. What proportion of pregnancies last between 232 and 282 days?

Wrap up In study of variability, you see that even if two databases have similar center, the spread of the values might differ substantially. Z-score is a useful tool when you are comparing two or more dataset. Z-score serves as a ruler for measuring distances. Variability is a property of a distribution; standard deviation and IQR are two ways to measure variability. Standard deviation, mean absolute deviation, loosely interpreted as the typical deviation of an observation from the mean.

Topic 10 – More Summary Measures and Graph Five-number summery (FNS) – the FNS provides a quick and convenient description of where the four quarters of the data in a distribution fall. Median Quartiles (Q1, Q3) Extremes (min, max) Box Plot – the FNS forms the basis for a graph called a box-plot. Box plot are especially useful for comparing distributions of a quantitative variable across two or three groups.

Measuring the Center and Spread Five-number summary Mean and standard deviation Choosing a Summary Five-number summary Mean and standard deviation Symmetric distribution Skewed distribution Outlier

The Five-Number Summary Box Plot Maximum Q3 Median Q1 Minimum

Modified box plot Modified box plot – convey additional information by treating Outliner differently. On these graphs the outlier is marked differently using special symbol and extended the whisker to the next non-outliers. We call any observation falling more than 1.5 times the IQR away from the nearer quartile to be an outlier.

Activity 10-1 page Exercise 10-22 page 217

Watch out and Wrap up Box plot can be tricky to read and interpret. It only provides that data is divided into 4 pieces and each containing 25% of the data. Box plot and modified box plot is nice tool to compare between groups. Make sure to use a same scaling.