Presentation is loading. Please wait.

Presentation is loading. Please wait.

STAT 250 Dr. Kari Lock Morgan

Similar presentations


Presentation on theme: "STAT 250 Dr. Kari Lock Morgan"โ€” Presentation transcript:

1 STAT 250 Dr. Kari Lock Morgan
Describing Data II SECTIONS 2.3, 2.4, 2.5 One quantitative variable (2.3, 2.4) One quantitative by one categorical (2.4) Two quantitative (2.5)

2 The 95% Rule The standard deviation for hours of sleep per night is closest to 1 2 4 I have no idea

3 z-score The z-score for a data value, x, is ๐‘ง= ๐‘ฅโˆ’ ๐‘ฅ ๐‘  for sample data, and ๐‘ง= ๐‘ฅโˆ’๐œ‡ ๐œŽ for population data. z-score measures the number of standard deviations away from the mean

4 z-score A z-score puts values on a common scale
A z-score is the number of standard deviations a value falls from the mean For symmetric, bell-shaped distributions, 95% of all z-scores fall between -2 and 2, so z-scores beyond these values can be considered extreme

5 z-score Which is better, an ACT score of 28 or a combined SAT score of 2100? ACT: ๏ญ = 21, ๏ณ = 5 SAT: ๏ญ = 1500, ๏ณ = 325 Assume ACT and SAT scores have approximately bell-shaped distributions ACT score of 28 SAT score of 2100 I donโ€™t know

6 Other Measures of Location
Maximum = largest data value Minimum = smallest data value Quartiles: Q1 = median of the values below m. Q3 = median of the values above m.

7 Five Number Summary Five Number Summary: Min Max Q1 Q3 m ๏‚ฌ25%๏‚ฎ
Minitab: Stat -> Basic Statistics -> Display Descriptive Statistics

8 Five Number Summary > summary(study_hours) Min. 1st Qu. Median 3rd Qu. Max. The distribution of number of hours spent studying each week is Symmetric Right-skewed Left-skewed Impossible to tell

9 The Pth percentile is the value which is greater than P% of the data
We already used z-scores to determine whether an SAT score of 2100 or an ACT score of 28 is better We could also have used percentiles: ACT score of 28: 91st percentile SAT score of 2100: 97th percentile

10 Five Number Summary Five Number Summary: Min Max Q1 Q3 m ๏‚ฌ25%๏‚ฎ
0th percentile 25th percentile 50th percentile 75th percentile 100th percentile

11 Measures of Spread Range = Max โ€“ Min
Interquartile Range (IQR) = Q3 โ€“ Q1 Is the range resistant to outliers? Yes No Is the IQR resistant to outliers?

12 Comparing Statistics Measures of Center: Measures of Spread:
Mean (not resistant) Median (resistant) Measures of Spread: Standard deviation (not resistant) IQR (resistant) Range (not resistant) Most often, we use the mean and the standard deviation, because they are calculated based on all the data values, so use all the available information

13 Boxplot Lines (โ€œwhiskersโ€) extend from each quartile to the most extreme value that is not an outlier Q3 Middle 50% of data Median Q1 Minitab: Graph -> Boxplot -> One Y -> Simple

14 Boxplot Outlier *For boxplots, outliers are defined as any point more than 1.5 IQRs beyond the quartiles (although you donโ€™t have to know that)

15 Boxplot This boxplot shows a distribution that is Symmetric
Left-skewed Right-skewed

16 Summary: One Quantitative Variable
Summary Statistics Center: mean, median Spread: standard deviation, range, IQR 5 number summary Percentiles Visualization Dotplot Histogram Boxplot Other concepts Shape: symmetric, skewed, bell-shaped Outliers, resistance z-scores

17 Quantitative and Categorical Relationships
Interested in a quantitative variable broken down by categorical groups

18 Side-by-Side Boxplots
Minitab: Graph -> Boxplot -> One Y -> With Groups

19 Stacked Dotplots Minitab: Graph -> Dotplot -> One Y -> With Groups

20 Overlaid Histograms Minitab: Graph -> Histogram -> With Groups

21 Quantitative Statistics by a Categorical Variable
Any of the statistics we use for a quantitative variable can be looked at separately for each level of a categorical variable Minitab: Stat -> Basic Statistics -> Display Descriptive Statistics -> By variables

22 Difference in Means Often, when comparing a quantitative variable across two categories, we compute the difference in means ๐‘ฅ ๐น โˆ’ ๐‘ฅ ๐‘€ = โˆ’24.466=1.12

23 Summary: One Quantitative and One Categorical
Summary Statistics Any summary statistics for quantitative variables, broken down by groups Difference in means Visualization Side-by-side graphs

24 Two Quantitative Variables
Summary Statistics: correlation Visualization: scatterplot

25 Scatterplot A scatterplot is the graph of the relationship between two quantitative variables. Minitab: Graph -> Scatterplot -> Simple

26 Direction of Association
A positive association means that values of one variable tend to be higher when values of the other variable are higher A negative association means that values of one variable tend to be lower when values of the other variable are higher Two variables are not associated if knowing the value of one variable does not give you any information about the value of the other variable

27 Exploring Associations
In the states data, explore the associations between obesity rate and the following variables: PhysicalActivity: % doing physical activity in the past month Smokers: % who smoke Population: State population (in millions) HouseholdIncome: Mean household income (in $) McCainVote: % voting for McCain in 2008 election IQ: Mean IQ score Make your initial guessesโ€ฆ

28 Associations Minitab: Graph -> Scatterplot -> Simple -> Multiple Graphs -> In separate panels of the same graph

29 Correlation The correlation is a measure of the strength and direction of linear association between two quantitative variables Sample correlation: r Population correlation: ๏ฒ (โ€œrhoโ€) Minitab: Stat -> Basic Statistics -> Correlation

30 What are the properties of correlation?
Correlations What are the properties of correlation?

31 Correlation

32 Correlation Guessing Game
Enter PennState for the group ID.

33 Correlation NFL Teams r = 0.43

34 Testosterone Levels and Time
What is the correlation between testosterone levels and hour of the day? Positive Negative About 0 Are testosterone level and hour of the day associated? Yes No

35 TVs and Life Expectancy

36 Correlation Cautions Correlation can be heavily affected by outliers. Always plot your data! r = 0 means no linear association. The variables could still be otherwise associated. Always plot your data! Correlation does not imply causation!

37 Summary: Two Quantitative Variables
Summary Statistics: correlation Visualization: scatterplot

38 Lots of Scatterplots Minitab: Graph -> Matrix Plot

39 3 Variables: Adding a Categorical Variable to a Scatterplot
Minitab: Graph -> Scatterplot -> With Groups

40 3 Variables: Adding a Quantitative Variable to a Scatterplot
Minitab: Graph -> Bubble Plot -> Simple

41 Four Variables!: Adding a categorical and a quantitative variable to a scatterplot
Minitab: Graph -> Bubble Plot -> With Groups

42 Scatterplot with Histograms/Boxplots/Dotplots
Minitab: Graph -> Marginal Plot

43 To Do Read Sections 2.4 and 2.5 Do Homework 2.2, 2.3, 2.4, 2.5 (due Friday, 2/6)


Download ppt "STAT 250 Dr. Kari Lock Morgan"

Similar presentations


Ads by Google