Looking at Data-Distributions

Slides:



Advertisements
Similar presentations
Measures of Location and Dispersion
Advertisements

Review of Descriptive Graphs and Measures Here is a quick review of what we have covered so far. Pie Charts Bar Charts Pareto Tables Dotplots Stem-and-leaf.
Copyright © 2014 by McGraw-Hill Higher Education. All rights reserved.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 5- 1.
Chapter Four Making and Describing Graphs of Quantitative Variables
Describing Data: Measures of Dispersion
Box plot Edexcel S1 Mathematics 2003 (or box and whisker plot)
Box and Whiskers with Outliers. Outlier…… An extremely high or an extremely low value in the data set when compared with the rest of the values. The IQR.
Quantitative Analysis (Statistics Week 8)
STATISTICAL ANALYSIS. Your introduction to statistics should not be like drinking water from a fire hose!!
Tutorial: Understanding the normal curve. Gauss Next mouse click.
Five Number Summary and Box Plots
Describing Quantitative Variables
Chapter 2 Exploring Data with Graphs and Numerical Summaries
CHAPTER 4 Displaying and Summarizing Quantitative Data Slice up the entire span of values in piles called bins (or classes) Then count the number of values.
Chapter 1 & 3.
Chapter 1 Introduction Individual: objects described by a set of data (people, animals, or things) Variable: Characteristic of an individual. It can take.
Descriptive statistics (Part I)
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Chapter 2 Describing Data with Numerical Measurements
Describing distributions with numbers
Chapter 1 Exploring Data
 Multiple choice questions…grab handout!. Data Analysis: Displaying Quantitative Data.
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
REPRESENTATION OF DATA.
Objectives 1.2 Describing distributions with numbers
Let’s Review for… AP Statistics!!! Chapter 1 Review Frank Cerros Xinlei Du Claire Dubois Ryan Hoshi.
Chapter 1: Exploring Data AP Stats, Questionnaire “Please take a few minutes to answer the following questions. I am collecting data for my.
What is Statistics? Statistics is the science of collecting, analyzing, and drawing conclusions from data –Descriptive Statistics Organizing and summarizing.
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
Describing distributions with numbers
Statistics Chapter 1: Exploring Data. 1.1 Displaying Distributions with Graphs Individuals Objects that are described by a set of data Variables Any characteristic.
Displaying Distributions with Graphs. the science of collecting, analyzing, and drawing conclusions from data.
Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.
1 Chapter 4 Numerical Methods for Describing Data.
Numerical descriptions of distributions
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
Descriptive Statistics Unit 6. Variable Any characteristic (data) recorded for the subjects of a study ex. blood pressure, nesting orientation, phytoplankton.
IPS Chapter 1 © 2012 W.H. Freeman and Company  1.1: Displaying distributions with graphs  1.2: Describing distributions with numbers  1.3: Density Curves.
Chapter 5 Describing Distributions Numerically Describing a Quantitative Variable using Percentiles Percentile –A given percent of the observations are.
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Analyzing Data Week 1. Types of Graphs Histogram Must be Quantitative Data (measurements) Make “bins”, no overlaps, no gaps. Sort data into the bins.
Describe Quantitative Data with Numbers. Mean The most common measure of center is the ordinary arithmetic average, or mean.
UNIT ONE REVIEW Exploring Data.
CHAPTER 1 Exploring Data
Numerical descriptions of distributions
Laugh, and the world laughs with you. Weep and you weep alone
Drill {A, B, B, C, C, E, C, C, C, B, A, A, E, E, D, D, A, B, B, C}
Displaying Distributions with Graphs
Displaying and Summarizing Quantitative Data
Means & Medians.
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Basic Practice of Statistics - 3rd Edition
Chapter 1: Exploring Data
Basic Practice of Statistics - 3rd Edition
MATH 2400 – Ch. 2 Vocabulary Mean: the average of a set of data sum/n
Honors Statistics Review Chapters 4 - 5
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Presentation transcript:

Looking at Data-Distributions 1.1-Displaying Distributions with Graphs

Basic definitions Data-numbers with a context Eg. Your friends new baby weighed 10.5 pounds, we know that baby is quite large. But if it is 10.5ounces or 10.5kg, we know that it is impossible-the context makes the number informative Individuals-objects described in the data(people,animals,things) Variable-any property/characteristics of an individual(IQ scores of persons) Distribution-of a variable tells us what values & how often(frequency of a variable)

Types of variables categorical variable-places an individual into one of several categories(male/female, smoker/nonsmoker) quantitative variable-takes numerical values for which arithmetic operations such as adding & averaging can be performed(shoe size,age)

How to represent data? Categorical variables-can use Pie-chart & bar graphs Eg. make a pie chart/bar graph for distribution of gender Quantitative variables-can use histogram

Example 1-The color of your car(distribution of the most popular colors for 2005 model luxury cars made in North America Color Percent Silver 20 White, pearl 18 Black 16 Blue 13 Light brown 10 Red 7 Yellow,gold 6 What percent of vehicles are some other color? Make a bar graph? Can we make a pie chart for the given colors? Would it be correct to make a pie chart if you added an “Other” category?

Example 2-The density of the earth (the variable recorded was the density of the earth as multiple of the density of water)

Using TI-84 create a histogram Discuss the shape, center, spread and outliers

Example3-Do women study more than men Example3-Do women study more than men? Variable-minutes studied on a typical weeknight of a first-year college class Here are the responses of random samples of 30 women and 30 men from the class: Women 180,120,150,200,120,90,120,180,120,150,60,240,180,120,180,180,120, 180,360,240,180,150,180,115,240,170,150,180,180,120 Men 90,90,150,240,30,0,120,45,120,60,230,200,30,30,60, 120, 120, 120, 90,120,240,60,95,120,200,75,300,30,150,180 Examine the data. Why are you not surprised that more responses are multiples of 10minutes? We eliminated one student who claimed to study 30,000 minutes per night. Are there any other responses you consider suspicious? Make a back-to-back stem plot to compare the two samples. That is, use one set of stems with two sets of leaves, one to the right and one to the left of the stems.(Draw a line on either side of the stems to separate stems and leaves.) Order both sets of leaves from smallest at the stem to largest away from the stem. Report the approximate midpoints of both graphs. Does it appear that women study more than men(or at least claim that they do)?

Answers a) Most people round their answers. The students who claimed 0 minutes, 360 minutes and 300 minutes. B)The stemplots suggest that women(claim to) study more than men. The approximate centers are 175 minutes for women and 120 minutes for men.

Looking at Data-Distributions 1.2-Describing Distributions with numbers

Mean =sum of numbers/ number of numbers Mean & Median Mean =sum of numbers/ number of numbers Median=Middle value(when the numbers are in ascending order) Example 1: 103,105,109,140,170 (Median is 109-the number in the (n+1)/2th position from the bottom of the list-n is number of values) Example 2: 18,19,20,20,26,28(Median is 20- the avg of n/2 position number & n/2+1 position). Mean =21.83 Example 3: replace 28 in example 2 by 100 & re -compute mean and median? 18,19,20,20,26,100 Mean =33.83 Median-does not change Mean is affected by outliers Median is not affected by outliers A measure of center alone can be misleading Solution-need a measure of spread(variability)

Measuring spread Quartiles Example 4–Age of 10 students 26,19,20,18,20,19,19,19,19,21 Sort them in ascending order 18,19,19,19,19,19,20,20,21,26 Median =19 (Q2 ) First quartile=median of the lower half of data(Q1 )=19 Third quartile=median of the upper half of data(Q3 )=20

IQR(Inter quartile range)= Q3 - Q1 Five-number summary Min Q1 Q2 Q3 Max Box plot- Picture of the five number summary. Can be used to compare two distributions IQR(Inter quartile range)= Q3 - Q1 Max Q3 IQR Median(Q2 ) Q1 Min

The 1.5 X IQR rule for suspected outliers Example 5(travel times to work in New York-in minutes) 10,30,5,25,40,20,10,15,30,20,15,20,85,15,65,15,60,60,40,45 (single peaked/right skewed/no center observation,but there is a center pair) The five number summary 5 15 22.5 42.5 85 IQR=42.5-15=27.5 Apply 1.5XIQR rule Step 1:calculate 1.5 X IQR=1.5 x 27.5 Step 2: Calculate Q1 -(1.5 X IQR)= 15-41.25=-26.25 Step 3: Calculate Q3 +(1.5 X IQR)=42.5+41.25=83.75 Any values outside of (-26.25,83.75) are flagged as outliers The suspected outlier in the data is 85

Standard deviation(s) Used as a measure of spread when mean=center Units of s=same as data units s always positive Higher s->more spread s=0->no spread -> all observations equal s affected by outliers Example :1,1,2,5,3

Formulas for variance & standard deviation

Looking at Data-Distributions 1.3 –Density Curves and Normal Distributions

Definitions Density Curve-Special type of histogram such that total area under the curve is 1 Typical histogram Example for a Density Curve Relative frequency Bin limits Characteristics of density curve All y values positive total area under curve=1 Curve approaches to zero for extreme left & right x values

Definitions Normal Distribution Formula It can be shown that the probability density function for a normal random variable, X, with mean X and standard deviation X has the following form. TI-84 calculator-> 1)STAT plot off 2) enter in Y1-use normalpdf(x,mean,standard deviation) 3)normalpdf( ) found in 2nd->DISTR

Definitions The 68-95-99.7 rule Example-When Mean 0 & standard deviation is 1 Approximately 68% of the observations fall within one standard deviation of the mean Approximately 95% of the observations fall within two standard deviation of the mean Approximately 99.7% of the observations fall within three standard deviation of the mean

Definitions tables-allows us to calculate the probabilities for a normal distribution How to get numbers? There are too many normals(one per possible mean/one per possible standard deviation) >infinitely many Need to standardize Standardization of Normal Random Variables. If X is normally distributed, its standardization is Equation:

Definitions Standard normal(Z) : N(0,1) , mean 0 & Standard deviation 1 Now can calculate the fraction of my data set between any two limits