Presentation is loading. Please wait.

Presentation is loading. Please wait.

AP Statistics One-Variable Data Analysis. Key Ideas Shape of a Distribution Dotplot, stemplot, histogram Measures of Center and Spread Five-Number Summary.

Similar presentations


Presentation on theme: "AP Statistics One-Variable Data Analysis. Key Ideas Shape of a Distribution Dotplot, stemplot, histogram Measures of Center and Spread Five-Number Summary."— Presentation transcript:

1 AP Statistics One-Variable Data Analysis

2 Key Ideas Shape of a Distribution Dotplot, stemplot, histogram Measures of Center and Spread Five-Number Summary and Boxplots Density Curves, z-Scores, and Normal Distributions The Empirical Rule

3 Graphical Analysis We graph data in order to get a visual sense of it. Shape, gaps, clusters and outliers Shape: Symmetric or skewed, bell-shaped, bimodal, uniform There are four types of graphs we want to look at in order to help us understand shape: dotplot, stemplot, histogram and boxplot.

4 Data 31 scores from a 50-point quiz 28384233292841 40153627342223 28504246282743 29502932342726 274118

5 Dotplot Involves plotting the data values, with dots, above the corresponding values on a number line. Most calculators do not have a built-in function for drawing dotplots. See page 47 in your textbook for an example.

6 Stemplot Also known as stem and leaf plot, it is a bit more complicated than a dotplot. Each data value has a stem and a leaf. There are no rules of what constitutes a stem and what constitutes a leaf. The nature of the data will suggest reasonable choices. Split stems are useful when the leaves get unwieldy. Back-to-back (or side-to-side) stem-plots can be used to compare two sets of data.

7 Assignment Read pages 38 – 48 in your textbook. Exercises start on page 46 – problems 1.1 – 1.6

8 Histogram A bar graph is used to illustrate qualitative data and a histogram is used to illustrate quantitative data. The horizontal axis in a histogram contains numerical values, and the vertical axis contains the frequencies of the values. A histogram is composed of bars of equal width, usually with common edges. When you choose the intervals, be sure that each datapoint fits into a category. A histogram is like a stemplot that has been rotated 90 degrees.

9 Calculator Tip If you are using your calculator to draw a histogram, be careful about using ZoomStat. This command causes the calculator to choose a window for your graph – which means the calculator will choose an interval width. Set XScl and Xmin and Xmax for your data rather than using ZoomStat.

10 Additional Info Exercises page 55, 1.7 – 1.12 Technology Toolbox page 59 Relative Frequency, Cumulative Frequency and ogive: page 60 Time plot: page 63 Exercises page 64, 1.13 – 1.18 Section Summary, page 67, 1.19 – 1.26

11 Example Heights of 100 college-age women. Describe the graph of the data using either a histogram or a stem plot, or both. 63656862636766646864 65 6766 65 666566 63647065666264656766 62666568616663676563 67666166 6167656369 63656268635967627063 69666566676563676660 667267 666864686061 64656460696364656667 6463 686766656063

12 Measures of Center There are two primary measures of center: the mean and the median. The mean of the set is defined as the sum of the x’s divided by n. Symbolically, We use when we are talking about the mean of a sample. For the mean of a population, we use

13 Example During Babe Ruth’s major league career, he hit the following number of homeruns: 0, 4, 3, 2, 11, 29, 54, 35, 41, 46, 25, 47, 60, 54, 46, 49, 46, 41, 34, 22, 6. What was the mean number of homeruns per year for his major league career?

14 Calculator You should use (and are expected to use) a calculator to do examples like the previous one. Vital things you must know: how to clear lists, name lists, add data to a list, define a list, find one-variable statistics.

15 Median The median of an ordered dataset is the “middle” number. If the dataset has an odd number of values, the median is a member of the set and is the middle value. If the dataset has an even number of values, the median is the mean of the two middle numbers.

16 Example Returning to the data on homeruns hit by Babe Ruth, what was the median number of homeruns he hit during his career?

17 Resistant Mean and median are both measures of center. The choice of which one to use depends on the shape of the distribution. If the distribution is symmetric and bell-shaped, the mean and median will be close. If the distribution is strongly skewed or has outliers, the median is the best measure of center. The median is a resistant statistic while the mean is not. Resistant means that the value is not dramatically affected by extreme values.

18 Example A group of five teachers in a school have salaries of $32,700, $32,700, $38,500, $41,600, and $44,500. What are the mean and median? The highest paid teacher gets sick and the school superintendent volunteers to substitute for her. The superintendent’s salary is $174,300. What are the mean and median now?

19 Measures of Spread Knowing the center of a distribution doesn’t tell you all you need to know about it. Two sets of data can have the same mean and median but differ in terms of their spread, or variability. We have measures of spread based on the mean and on the median.

20 Variance and Standard Deviation One measure of spread based on the mean is the variance. The variance is the average squared deviation from the mean. Symbolically,

21 Why n – 1? We average by n – 1 rather than n. There are only n – 1 datapoints, not n, if you know If you know n – 1 of the values and you also know, then the nth datapoint is determined. Exercises: page 74, 1.27 – 1.32

22 Standard Deviation One problem of using variance is that the units for variance won’t match the units of the original data because each difference was squared. To correct this, we take the square root of the variance as our measure of spread. This is known as the standard deviation. Symbolically,

23 Calculator Tip When you use 1-Var Stats, the calculator will give you both and The formal definition is This assumes you know the population mean, which you rarely do. In most cases, we will use S x

24 Useful Things to Know Standard deviation is independent of the mean. Standard deviation is sensitive to the spread. Standard deviation is independent of n. Standard deviation, like the mean, is not resistant to extreme values.

25 Assignment Pages 89 – 90, 1.39 – 1.44 Read “Changing the Unit of Measurement”: pages 90 – 96

26 Interquartile Range When the mean is not the preferred measure of the center, we use the interquartile range as a measure of the spread. The median of a distribution divides in two. The medians of the upper and lower halves of the distribution are called quartiles. The median of the lower half is called quartile one and is the 25 th percentile. The median of the upper half is called quartile three is the 75 th percentile. The interquartile range (IQR) is the difference between Q3 and Q1.

27 Example Find Q1, Q3, and IQR for the following dataset: 5, 5, 6, 7, 8, 9, 11, 13, 17 Find the standard deviation and IQR for the number of homeruns hit by Babe Ruth in his major league career.

28 Outliers An outlier is a value far removed from the others. An outlier can be defined as a datapoint that is more than two or three standard deviations from the mean. However, an outlier is typically defined in terms of how far above or below the quartiles it is.

29 1.5 (IQR) Rule Find the IQR Multiply the IQR by 1.5 Find Q1 – 1.5(IQR) and Q3 + 1.5(IQR) Any value below Q1 – 1.5(IQR) or above Q3 + 1.5(IQR) is an outlier.

30 Example The following data represent the amount of money, in British pounds, spent weekly on tobacco for 11 regions in Britain: 4.03, 3.76, 3.77, 3.34, 3.47, 2.92, 3.20, 2.71, 3.53, 4.51, 4.56 Do any of the regions seem to be spending a lot more or less than the other regions? That is, are there any outliers in the data?

31 Five-Number Summary The five-number summary of a dataset is composed of the minimum value, the lower quartile, the median, the upper quartile, and the maximum value. The following data are standard of living indices for 20 cities: 2.8, 3.9, 4.6, 5.3, 10.2, 9.8, 7.7, 13, 2.1, 0.3, 9.8, 5.3, 9.8, 2.7, 3.9, 7.7, 7.6, 10.1, 8.4, 8.3. Find the five-number summary for the data.

32 Boxplot We have discussed three types of graph: dotplot, stemplot, and histogram. Now we can add a fourth: boxplot. A boxplot is a graphical version of the five- number summary. Technology Toolbox: page 81 – 82

33 Percentile Rank of a Term The percentile rank of a term in a distribution equals the proportion of terms in the distribution less than the term. A term that is at the 75 th percentile is larger than 75% of the terms in a distribution. Some define the percentile rank of a term to be the proportion of terms less than or equal to the term – this makes it poosible to be at the 100 th percentile.

34 Assignment Page 82 – 84, 1.33 – 1.38 Section Summary: page 100, 1.51 – 1.58 Chapter Review: page 106, 1.59 – 1.70

35 z-Scores One way to identify the position of a term in a distribution is to note how many standard deviations the term is above or below the mean. The statistic that does this is the z-score.

36 Example For the first test of the year, Harvey got a 68. The class average was 73, and the standard deviation was 3. What was Harvey’s z-score on this test? Page 118, 2.1 – 2.4 Page 121, 2.5 – 2.8

37 Density Curve A density curve is a curve that Is always on or above the horizontal axis, and Has area exactly 1 underneath it. A density curve describes the overall pattern of a distribution. The area under the curve and above any interval of values on the horizontal axis is the proportion of all observations that fall in that interval.

38 Median and Mean of a Density Curve The median of a density curve is the “equal areas point,” the point that divides the area under the curve in half. The mean of a density curve is the “balance point,” at which the curve would balance if made of solid material. The median and mean are the same for a symmetric density curve. The mean of a skewed curve is pulled away from the median in the direction of the long tail.

39 Assignment Page 128, 2.9 – 2.13 Section Exercises, page 131, 2.15 – 2.19

40 Normal Distribution Certain distributions hold particular interest to us in statistics. Those distributions that are symmetric and bell-shaped are very special. If we “model” a symmetric, bell-shaped distribution with a continuous curve, we get what is known as a normal curve. A normal curve is defined completely in terms of its mean and standard deviation.

41 The Empirical Rule The Empirical Rule is also known as the 68-95- 99.7 Rule. It states that approximately 68% of the values in a normal distribution are within one standard deviation of the mean, 95% are within two standard deviations, and 99.7% are within three standard deviations.

42 Assignment Page 137, 2.23 – 2.26

43 Standard Normal Distribution Because we are dealing with a theoretical distribution, we will use mu and sigma, rather than x-bar and s. If X is a variable that is normally distributed with a mean of mu and a standard deviation of sigma, we use the notation If we convert the data to z-scores, we say we have standardized the data. The new distribution is called a standard normal distribution.

44 Example Use Table A to find the proportion of the area under a normal curve that lies to the left of z = - 1.37? Page 142, 2.29 – 2.30

45 Solving Problems Involving Normal Distributions Step 1: State the problem Step 2: Standardize and draw a picture Step 3: Use the table Step 4: Conclusion

46 Calculator Tip For a standard normal distribution: normalcdf(lower bound, upper bound). For a nonstandard normal distribution: normalcdf(lower bound, upper bound, mean, standard deviation). “CtlgHelp” can be activated by choosing it from the APPS menu and pressing ENTER twice. To use CtlgHelp, move the cursor to the desired function on the DISTR menu and press +. The function syntax will be displayed. Then press ENTER to use the function on the home screen.

47 Examples What proportion of the area under a normal curve lies between z = -1.2 and z = 0.58? The heights of men are approximately normally distributed with a mean of 70 and a standard deviation of 3. What proportion of men are more than 6’ tall? Be sure to include a sketch. For the population of men in the above problem, how tall must a man be to be in the top 10% of all men in terms of height?

48 Calculator Tip invNorm essentially reverses normalcdf. That is, rather than reading from the margins in, it reads from the table out. invNorm(A) returns the z-score that corresponds to an area equal to A lying to the left of z. returns the value of x that has area A to the left of x if x is a normal distribution with mean mu and standard deviation sigma.

49 Example Scores on the SAT Verbal test in recent years follow approximately the N(505, 110) distribution. How high must a student score in order to place in the top 10% of all students taking the SAT?

50 Assignment Page 147, 2.31 – 2.36

51 Assessing Normality Method 1: Construct a histogram or stemplot. Method 2: Construct a Normal probability plot.

52 Normal Probability Plot Arrange the observed values from smallest to largest. Record what percentile of the data each value occupies. Use Table A to find the z-scores at these same percentiles. Plot each data point x against the corresponding z. If the data distribution is close to Normal, the plotted points will lie close to a straight line. Technology Toolbox: page 153

53 Assignment Section Summary, page 157, 2.43 – 2.50 Review Exercises: page 162, 2.51 – 2.60 VERY IMPORTANT! Technology Toolboxes on pages 166 and 167.


Download ppt "AP Statistics One-Variable Data Analysis. Key Ideas Shape of a Distribution Dotplot, stemplot, histogram Measures of Center and Spread Five-Number Summary."

Similar presentations


Ads by Google