Unit 1 Mr. Lang’s AP Statistics Power point
Homework Assignment 4 For the A: 1, 3, 5, 7, 8, Odd, 27 – 32, 37 – 59 Odd, 60, 69 – 74, 79 – 105 Odd (except 85, 99, 101) 107 – 110, R1-R10 4 For the C: 1, 3, 5, 8, Odd, 37 – 59 Odd, 79 – 103 Odd (except 85, 99, 101) R1- R10 4 For the D- : 1, 3, 5, 11, 15, 19, 23, 37, 41, 45, 49, 79, 83, 87, 91, 97, 103, R1- R10 All problems must be complete, including explanations with complete sentences and or work to show if the question asks for it. All Multiple Choice problems will be graded for correctness.
Statistics 4 the science of collecting, analyzing, and drawing conclusions from data
Descriptive statistics 4 the methods of organizing & summarizing data
Inferential statistics 4 involves making generalizations from a sample to a population
Population 4 The entire collection of individuals or objects about which information is desired
Sample 4 A subset of the population, selected for study in some prescribed manner
Variable 4 any characteristic whose value may change from one individual to another
Data 4 observations on single variable or simultaneously on two or more variables
Types of variables
Categorical variables 4 or qualitative 4 identifies basic differentiating characteristics of the population
Numerical variables 4 or quantitative 4 observations or measurements take on numerical values 4 makes sense to average these values 4 two types - discrete & continuous
Discrete (numerical) 4 listable set of values 4 usually counts of items
Continuous (numerical) 4 data can take on any values in the domain of the variable 4 usually measurements of something
Classification by the number of variables 4 Univariate - data that describes a single characteristic of the population 4 Bivariate - data that describes two characteristics of the population 4 Multivariate - data that describes more than two characteristics (beyond the scope of this course
Identify the following variables: 1. the income of adults in your city 2. the color of M&M candies selected at random from a bag 3. the number of speeding tickets each student in AP Statistics has received 4. the area code of an individual 5. the birth weights of female babies born at a large hospital over the course of a year Numerical Categorical
Self Check #1
Assignment #1
Graphs for categorical data
Bar Graph 4 Used for categorical data 4 Bars do not touch 4 Categorical variable is typically on the horizontal axis 4 To describe – comment on which occurred the most often or least often 4 May make a double bar graph or segmented bar graph for bivariate categorical data sets
Using class survey data: graph birth month graph gender & handedness
Pie (Circle) graph 4 Used for categorical data 4 To make: –Proportion 360° –Using a protractor, mark off each part 4 To describe – comment on which occurred the most often or least often
Graphs for numerical data
Dotplot 4 Used with numerical data (either discrete or continuous) 4 Made by putting dots (or X’s) on a number line 4 Can make comparative dotplots by using the same axis for multiple groups
Stemplots (stem & leaf plots) 4 Used with univariate, numerical data 4 Must have key so that we know how to read numbers 4 Can split stems when you have long list of leaves 4 Can have a comparative stemplot with two groups Would a stemplot be a good graph for the number of pieces of gun chewed per day by AP Stat students? Why or why not? Would a stemplot be a good graph for the number of pairs of shoes owned by AP Stat students? Why or why not?
Example: The following data are price per ounce for various brands of dandruff shampoo at a local grocery store Can you make a stemplot with this data?
Example: Tobacco use in G-rated Movies Total tobacco exposure time (in seconds) for Disney movies: Total tobacco exposure time (in seconds) for other studios’ movies: Make a comparative stemplot.
Graphing Activity
Self Check #2
Assignment #2
Histograms 4 Used with numerical data 4 Bars touch on histograms 4 Two types –Discrete Bars are centered over discrete values –Continuous Bars cover a class (interval) of values 4 For comparative histograms – use two separate graphs with the same scale on the horizontal axis Would a histogram be a good graph for the fastest speed driven by AP Stat students? Why or why not? Would a histogram be a good graph for the number of pieces of gun chewed per day by AP Stat students? Why or why not?
Cumulative Relative Frequency Plot (Ogive) 4... is used to answer questions about percentiles. 4 Percentiles are the percent of individuals that are at or below a certain value. 4 Quartiles are located every 25% of the data. The first quartile (Q1) is the 25th percentile, while the third quartile (Q3) is the 75th percentile. What is the special name for Q2? 4 Interquartile Range (IQR) is the range of the middle half (50%) of the data. IQR = Q3 – Q1
Ogive Activity
Self Check #3
Multiple Choice Test #1
Types (shapes) of Distributions
Symmetrical 4 refers to data in which both sides are (more or less) the same when the graph is folded vertically down the middle 4 bell-shaped is a special type –has a center mound with two sloping tails
Uniform 4 refers to data in which every class has equal or approximately equal frequency
Skewed (left or right) 4 refers to data in which one side (tail) is longer than the other side 4 the direction of skewness is on the side of the longer tail
Bimodal (multi-modal) 4 refers to data in which two (or more) classes have the largest frequency & are separated by at least one other class
Distribution Activity...
Self Check #4
How to describe a numerical, univariate graph
What strikes you as the most distinctive difference among the distributions of exam scores in classes A, B, & C ?
1. Center 4 discuss where the middle of the data falls 4 three types of central tendency –mean, median, & mode
What strikes you as the most distinctive difference among the distributions of scores in classes D, E, & F? Class
2. Spread 4 discuss how spread out the data is 4 refers to the variability of the data –Range, standard deviation, IQR
What strikes you as the most distinctive difference among the distributions of exam scores in classes G, H, & I ?
3. Shape 4 refers to the overall shape of the distribution 4 symmetrical, uniform, skewed, or bimodal
What strikes you as the most distinctive difference among the distributions of exam scores in class K ? K
4. Unusual occurrences 4 outliers - value that lies away from the rest of the data 4 gaps 4 clusters 4 anything else unusual
5. In context 4 You must write your answer in reference to the specifics in the problem, using correct statistical vocabulary and using complete sentences!
Features of the Distribution Activity
Means & Medians
Parameter - 4 Fixed value about a population 4 Typical unknown
Statistic - 4 Value calculated from a sample
Measures of Central Tendency 4 Median - the middle of the data; 50 th percentile –Observations must be in numerical order –Is the middle single value if n is odd –The average of the middle two values if n is even NOTE: n denotes the sample size
Measures of Central Tendency 4 Mean - the arithmetic average –Use to represent a population mean –Use x to represent a sample mean Formula: is the capital Greek letter sigma – it means to sum the values that follow parameter statistic
Measures of Central Tendency 4 Mode – the observation that occurs the most often –Can be more than one mode –If all values occur only once – there is no mode –Not used as often as mean & median
Suppose we are interested in the number of lollipops that are bought at a certain store. A sample of 5 customers buys the following number of lollipops. Find the median The numbers are in order & n is odd – so find the middle observation. The median is 4 lollipops!
Suppose we have sample of 6 customers that buy the following number of lollipops. The median is … The numbers are in order & n is even – so find the middle two observations. The median is 5 lollipops! Now, average these two values. 5
Suppose we have sample of 6 customers that buy the following number of lollipops. Find the mean To find the mean number of lollipops add the observations and divide by n.
Using the calculator...
What would happen to the median & mean if the 12 lollipops were 20? The median is... 5 The mean is What happened?
What would happen to the median & mean if the 20 lollipops were 50? The median is... 5 The mean is What happened?
What would happen to the median & mean if the 20 lollipops were 50? The median is... 5 The mean is What happened?
Resistant - 4 Statistics that are not affected by outliers 4 Is the median resistant? ► Is the mean resistant? YES NO
Now find how each observation deviates from the mean. What is the sum of the deviations from the mean? Look at the following data set. Find the mean Will this sum always equal zero? YES This is the deviation from the mean.
Look at the following data set. Find the mean & median. Mean = Median = Create a histogram with the data. (use x-scale of 2) Then find the mean and median. 27 Look at the placement of the mean and median in this symmetrical distribution.
Look at the following data set. Find the mean & median. Mean = Median = Create a histogram with the data. (use x-scale of 8) Then find the mean and median Look at the placement of the mean and median in this right skewed distribution.
Look at the following data set. Find the mean & median. Mean = Median = Create a histogram with the data. Then find the mean and median Look at the placement of the mean and median in this skewed left distribution.
Recap: 4 In a symmetrical distribution, the mean and median are equal. 4 In a skewed distribution, the mean is pulled in the direction of the skewness. 4 In a symmetrical distribution, you should report the mean! 4 In a skewed distribution, the median should be reported as the measure of center!
Trimmed mean: To calculate a trimmed mean: 4 Multiply the % to trim by n 4 Truncate that many observations from BOTH ends of the distribution (when listed in order) 4 Calculate the mean with the shortened data set
Find a 10% trimmed mean with the following data %(10) = 1 So remove one observation from each side!
Matching Graphs Activity
Mean and Median Assignment
Why use boxplots? 4 ease of construction 4 convenient handling of outliers 4 construction is not subjective (like histograms) 4 Used with medium or large size data sets (n > 10) 4 useful for comparative displays
Disadvantage of boxplots 4 does not retain the individual observations 4 should not be used with small data sets (n < 10)
How to construct 4 find five-number summary Min Q1 Med Q3 Max 4 draw box from Q1 to Q3 4 draw median as center line in the box 4 extend whiskers to min & max
Modified boxplots 4 display outliers 4 fences mark off mild & extreme outliers 4 whiskers extend to largest (smallest) data value inside the fence ALWAYS use modified boxplots in this class!!!
Inner fence Q1 – 1.5IQRQ IQR Any observation outside this fence is an outlier! Put a dot for the outliers. Interquartile Range (IQR) – is the range (length) of the box Q3 - Q1
Modified Boxplot... Draw the “whisker” from the quartiles to the observation that is within the fence!
Outer fence Q1 – 3IQRQ3 + 3IQR Any observation outside this fence is an extreme outlier! Any observation between the fences is considered a mild outlier.
For the AP Exam you just need to find outliers, you DO NOT need to identify them as mild or extreme. Therefore, you just need to use the 1.5IQRs
A report from the U.S. Department of Justice gave the following percent increase in federal prison populations in 20 northeastern & mid- western states in Create a modified boxplot. Describe the distribution. Use the calculator to create a modified boxplot.
Evidence suggests that a high indoor radon concentration might be linked to the development of childhood cancers. The data that follows is the radon concentration in two different samples of houses. The first sample consisted of houses in which a child was diagnosed with cancer. Houses in the second sample had no recorded cases of childhood cancer. (see data on note page) Create parallel boxplots. Compare the distributions.
Cancer No Cancer Radon The median radon concentration for the no cancer group is lower than the median for the cancer group. The range of the cancer group is larger than the range for the no cancer group. Both distributions are skewed right. The cancer group has outliers at 39, 45, 57, and 210. The no cancer group has outliers at 55 and 85.
Matching Box Plots, Histograms, and Summary Statistics Activity
Self Check #5
Comparative Boxplots Assignment
Why is the study of variability important? 4 Allows us to distinguish between usual & unusual values 4 In some situations, want more/less variability –scores on standardized tests –time bombs –medicine
Measures of Variability 4 range (max-min) 4 interquartile range (Q3-Q1) 4 deviations 4 variance 4 standard deviation Lower case Greek letter sigma
Suppose that we have these data values: Find the mean. Find the deviations. What is the sum of the deviations from the mean?
Square the deviations: Find the average of the squared deviations:
The average of the deviations squared is called the variance. PopulationSample parameter statistic
Calculation of variance of a sample df
Degrees of Freedom (df) 4 n deviations contain (n - 1) independent pieces of information about variability
A standard deviation is a measure of the average deviation from the mean.
Use calculator
Which measure(s) of variability is/are resistant?
Mean and Variance Activity
Mean and Variance Worksheet
Self Check #6
Show me the Money Assignment
Multiple Choice Test #2
Assignment #3
Linear transformation rule 4 When adding a constant to a random variable, the mean changes but not the standard deviation. 4 When multiplying a constant to a random variable, the mean and the standard deviation changes.
An appliance repair shop charges a $30 service call to go to a home for a repair. It also charges $25 per hour for labor. From past history, the average length of repairs is 1 hour 15 minutes (1.25 hours) with standard deviation of 20 minutes (1/3 hour). Including the charge for the service call, what is the mean and standard deviation for the charges for labor?
Rules for Combining two variables 4 To find the mean for the sum (or difference), add (or subtract) the two means 4 To find the standard deviation of the sum (or differences), ALWAYS add the variances, then take the square root. 4 Formulas: If variables are independent
Bicycles arrive at a bike shop in boxes. Before they can be sold, they must be unpacked, assembled, and tuned (lubricated, adjusted, etc.). Based on past experience, the times for each setup phase are independent with the following means & standard deviations (in minutes). What are the mean and standard deviation for the total bicycle setup times? PhaseMeanSD Unpacking Assembly Tuning
Self Check #7