Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ten things about Descriptive Statistics

Similar presentations


Presentation on theme: "Ten things about Descriptive Statistics"— Presentation transcript:

1 Ten things about Descriptive Statistics
AP Statistics, Second Semester Review

2 What are descriptive statistics?
Describing or summarizing a set of data … Numerically Graphically Why do we summarize data? Data, in its raw form is overwhelming Descriptive statistics allows us to digest the data more easily

3 Overwhelming… 3, 2, 2, 4, 7, 6, 11, 5, 3, 5, 4, 5, 5, 0, 0, 6, 7, 3, 7, 4, 7, 4, 4, 2, 10, 0, 3, 6, 4, 5, 0, 6, 7, 3, 2, 8, 2, 1, 8, 3, 8, 6, 3, 4, 5, 6, 8, 7, 13, 12, 3, 4, 7, 5, 5, 6, 1, 3, 5, 4, 3, 0, 3, 2, 9, 3, 3, 4, 3, 7, 8, 4, 6, 7, 5, 0

4 Numerical summaries Center Spread Other Mean, Median
Standard Deviation, Inter-Quartile Range, Range Other Min, Max, Q1, Q3, Percentiles

5 Numerical Summaries: Center
Mean: The sum of data values, divided by the number of values. Median: The middle data value when sort from low to high.

6 Numerical Summaries: Center
When should we avoid using the mean? In the presence of outliers and skewness, the mean get pulled toward those outliers and skewness. Mean is not a resistant measurement of center, but median is When should we use the mean? Because the Central Limit Theorem and other great tools are based on mean.

7 Numerical Summaries: Spread
Spreads are measurements of the variation in a data. Standard Deviation: the average distance of the individual data value from the mean. Interquartile Range (IQR): the distance from Q1 to Q3 Range: the distance from Min to Max.

8 Numerical Summaries: Spread
Remember: Measurements of spread are single values. For example, the range is given “4”, not “from 6 to 10” When should we avoid using standard deviation? Like mean, standard deviation is not resistant to the effects of outliers and skewness. Why should we use standard deviation? Because the Central Limit Theorem and other great tools are based on standard deviation.

9 Numerical Summaries: Percentiles
“The pth percentile of a distribution is the value with p percent of the observations less than it.” The median is the 50th percentile. Q1 is the 25th percentile. Q3 is the 75th percentile.

10 Outliers Outliers are atypical data values
Outliers are data values that are unusually far from the center Using Tukey’s rule for outliers Any data that is greater than Q3+1.5*IQR is an outlier Any data that is less than Q1-1.5*IQR is an outlier

11 Comparing Apple to Oranges
When comparing performance you can look at … Percentiles Standardize values z-values or t-values

12 Graphical Display Choosing the right graphical display depends upon the kind of variable There are two types of variables Quantitative Variables that are numbers where adding and averaging make sense Categorical Variables that take on one of a list of categories

13 Dotplots A quick summary of the distribution of the values
What is the shape of this distribution?

14 Histograms Great for seeing the shape of the distribution
Vertical axis can be using frequency or relative frequency (percents)

15 Stemplots Like a histogram, but with the ability to see all the data
What is the shape of this distribution? What is the median? What is the Q1 score?

16 Ogives (Graphs of Cumulative Relative Frequency)
What is the median of this data set? What is the Q1 score? What is the Q3 score?

17 Normal Probability Plots
When the display is linear, the data from which the display came is normal

18 Boxplots Emphasizes the 5 number summary and outliers

19 Side by Side Boxplots Great for comparing two distributions
Compare and contrast the amount of texts sent by males and females.

20 Graphical Displays: Categorical Variables
Pie Charts Not my favorite Bar Charts Show frequencies or relative frequencies (percents) Stacked Bar Charts Each bar is 100%, but broken into sub-categories

21 Pie Charts

22 Bar Charts Show frequencies or relative frequencies (percents)
The bars don’t touch because the don’t represent a continuum

23 Segmented Bar Charts Used for showing the relationship between two categorical variables Each bar is 100%, but broken into sub-categories

24 Scatterplots Used for showing relationships between two quantitative variables Typically the explanatory variable is placed on the horizontal axis and the response variable is on the vertical axis

25 Residuals Residuals is the difference between a real y value and the predicted y value A Least-Squares Regression Line (LSRL) minimizes the squares of the residuals.

26 Residual Plots Help us to determine whether the relationship between two quantitative variables is linear When the residual plot shows no pattern, the relationship is linear When the residual plot shows a pattern, the relationship is not linear

27 LSRL statistics a=vertical intercept b=slope
Slope is the amount of change in the response variable for every unit change in the explanatory variable r=coefficient of correlation Ranges from -1 to 1, and measures the strength of the linear relationship. r2=coefficient of explanation The proportion of the change in the response variable that can be attributed to the explanatory variable


Download ppt "Ten things about Descriptive Statistics"

Similar presentations


Ads by Google