Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Describing Data: Two Variables SECTIONS 2.1, 2.4, 2.5 Two categorical (2.1)

Slides:



Advertisements
Similar presentations
Exploratory Data Analysis I
Advertisements

Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTION 2.6, 9.1 Least squares line Interpreting.
Hypothesis Testing: Hypotheses
Describing Data: One Variable
Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?
1 Objective Investigate how two variables (x and y) are related (i.e. correlated). That is, how much they depend on each other. Section 10.2 Correlation.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/6/12 Describing Data: One Variable SECTIONS 2.1, 2.2, 2.3, 2.4 One categorical.
Describing Data: One Quantitative Variable
STAT 101 Dr. Kari Lock Morgan Exam 2 Review.
Describing Relationships: Scatterplots and Correlation
Ch 2 and 9.1 Relationships Between 2 Variables
Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan.
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
Statistics: Unlocking the Power of Data Lock 5 1 in 8 women (12.5%) of women get breast cancer, so P(breast cancer if female) = in 800 (0.125%)
Chapter 7 Scatterplots and Correlation Scatterplots: graphical display of bivariate data Correlation: a numerical summary of bivariate data.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Hypotheses STAT 101 Dr. Kari Lock Morgan SECTION 4.1 Statistical test Null and alternative.
Agresti/Franklin Statistics, 1 of 63 Chapter 2 Exploring Data with Graphs and Numerical Summaries Learn …. The Different Types of Data The Use of Graphs.
STAT 211 – 019 Dan Piett West Virginia University Lecture 2.
Describing distributions with numbers
STAT 250 Dr. Kari Lock Morgan
Algebra 1 and statistics…. teacher reference: descriptive statistics and analyses.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Nathaniel Cannon Describing Data: Categorical Variables SECTIONS 2.1 One categorical variable Two.
ASSOCIATION: CONTINGENCY, CORRELATION, AND REGRESSION Chapter 3.
Gerald Kruse, Ph.D. & Cathy Stenson, Ph.D. Juniata College Mathematics Department.
Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence.
Examining Relationships Prob. And Stat. 2.2 Correlation.
Chapter 3 concepts/objectives Define and describe density curves Measure position using percentiles Measure position using z-scores Describe Normal distributions.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/11/12 Describing Data: Two Variables SECTIONS 2.1, 2.4, 2.5 Two categorical.
Chapters 1 and 2 Week 1, Monday. Chapter 1: Stats Starts Here What is Statistics? “Statistics is a way of reasoning, along with a collection of tools.
Chapter 10 Correlation and Regression
Exploring Relationships Between Variables Chapter 7 Scatterplots and Correlation.
Describing distributions with numbers
1 Treat everyone with sincerity, they will certainly appear loveable and friendly.
Data Analysis for Two-Way Tables. The Basics Two-way table of counts Organizes data about 2 categorical variables Row variables run across the table Column.
June 11, 2008Stat Lecture 10 - Review1 Midterm review Chapters 1-5 Statistics Lecture 10.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Describing Data: Categorical Variables SECTIONS 2.1 One categorical variable.
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
Aim: How do we analyze data with a two-way table?
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Describing Data: One Quantitative Variable SECTIONS 2.2, 2.3 One quantitative.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Hypotheses STAT 250 Dr. Kari Lock Morgan SECTION 4.1 Hypothesis test Null and alternative.
Describing Data: Two Variables
Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A.
STAT 101: Day 5 Descriptive Statistics II 1/30/12 One Quantitative Variable (continued) Quantitative with a Categorical Variable Two Quantitative Variables.
1 Take a challenge with time; never let time idles away aimlessly.
Statistics - is the science of collecting, organizing, and interpreting numerical facts we call data. Individuals – objects described by a set of data.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: p-value STAT 250 Dr. Kari Lock Morgan SECTION 4.2 p-value.
Statistics: Unlocking the Power of Data Lock 5 Section 4.1 Introducing Hypothesis Tests.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Synthesis and Review for Exam 1.
Activity: Car Correlations Consumer Reports’ data from a sample of n=109 car models We’ll explore the following associations: (a) Weight vs. City MPG (b)
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Midterm Review IN CLASS. Chapter 1: The Art and Science of Data 1.Recognize individuals and variables in a statistical study. 2.Distinguish between categorical.
AP Statistics Chapter 3 Part 2 Displaying and Describing Categorical Data.
Review.
Describing Data: Two Variables
MATH-138 Elementary Statistics
Introducing Hypothesis Tests
Measuring Evidence with p-values
Basic Statistics Overview
Chapter 2 Looking at Data— Relationships
Data Analysis for Two-Way Tables
AP Exam Review Chapters 1-10
Boxplots and Quantitative/Categorical Relationships
Activity: Car Correlations
AP Statistics Chapter 3 Part 2
Treat everyone with sincerity,
Chapter 2 Looking at Data— Relationships
Treat everyone with sincerity,
Presentation transcript:

Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Describing Data: Two Variables SECTIONS 2.1, 2.4, 2.5 Two categorical (2.1) Quantitative and categorical (2.4) Two quantitative (2.5)

Statistics: Unlocking the Power of Data Lock 5 The Big Picture Population Sample Sampling Statistical Inference Descriptive Statistics

Statistics: Unlocking the Power of Data Lock 5 Two Categorical Variables Look at the relationship between two categorical variables 1. Relationship status 2. Gender

Statistics: Unlocking the Power of Data Lock 5 Two-Way Table FemaleMaleTotal In a Relationship It’s Complicated12719 Single Total It doesn’t matter which variable is displayed in the rows and which in the columns R: table(relationship, gender) Data from Duke students

Statistics: Unlocking the Power of Data Lock 5 Two-Way Table What proportion of students in this sample are in a relationship? a)42/169  25% b)32/107  30% c)10/62  16% d)32/42  76% FemaleMaleTotal In a Relationship It’s Complicated12719 Single Total

Statistics: Unlocking the Power of Data Lock 5 Two-Way Table What proportion of females in this sample are in a relationship? a)42/169  25% b)32/107  30% c)10/62  16% d)32/42  76% FemaleMaleTotal In a Relationship It’s Complicated12719 Single Total

Statistics: Unlocking the Power of Data Lock 5 Male and Female Proportions 30% of females in the sample say they are in a relationship 16% of males in the sample say they are in a relationship Why the difference???

Statistics: Unlocking the Power of Data Lock 5 Difference in Proportions

Statistics: Unlocking the Power of Data Lock 5 Two-Way Table What proportion of people in a relationship in this sample are female? a)42/169  25% b)32/107  30% c)10/62  16% d)32/42  76% FemaleMaleTotal In a Relationship It’s Complicated12719 Single Total

Statistics: Unlocking the Power of Data Lock 5 Two-Way Table CAUTION: The proportion of females in a relationship is NOT THE SAME AS the proportion of people in a relationship who are female! 30% ≠ 76%!

Statistics: Unlocking the Power of Data Lock 5 Side-by-Side Bar Chart R: barplot(relationship~gender, beside=TRUE) The height of each bar is the number of the corresponding cell in the two-way table

Statistics: Unlocking the Power of Data Lock 5 Segmented Bar Chart A segmented bar chart is like a side-by-side bar chart, but the bars are stacked instead of side-by-side R: barplot(relationship~gender)

Statistics: Unlocking the Power of Data Lock 5 Vitamin D Injections Many kidney dialysis patients get vitamin D injections to correct for a lack of calcium. Two forms of vitamin D injections are used: calcitriol and paricalcitol. The records of 67,000 dialysis patients were examined, and half received one drug; the other half the other drug. After three years, 58.7% of those getting paricalcitol had survived, while only 51.5% of those getting calcitriol had survived. Construct an approximate two-way table of the data ( due to rounding of the percentages we can’t recover the exact counts – round to whole numbers). Source: Teng, M., et. al., “Survival of patients undergoing hemodialysis with paricalcitol or calcitriol Therapy,” New England Journal of Medicine, July 31, 2003; 349(5): Survival of patients undergoing hemodialysis with paricalcitol or calcitriol Therapy

Statistics: Unlocking the Power of Data Lock 5 Vitamin D Injections SurvivedDiedTotal Calcitriol Paricalcitol Total 67,000 33,500 19,66513,835 17,25216,248 36,91730,083

Statistics: Unlocking the Power of Data Lock 5 Kidney Stones R. Charig, D. R. Webb, S. R. Payne, O. E. Wickham (1986). "Comparison of treatment of renal calculi by open surgery, percutaneous nephrolithotomy, and extracorporeal shockwave lithotripsy". Br Med J (Clin Res Ed) 292 (6524): 879–882"Comparison of treatment of renal calculi by open surgery, percutaneous nephrolithotomy, and extracorporeal shockwave lithotripsy" SuccessFailure Treatment A27377 Treatment B28961 Which treatment is better at removing kidney stones? a) Treatment A b) Treatment B

Statistics: Unlocking the Power of Data Lock 5 Kidney Stones SMALL STONESSuccessFailure Treatment A816 Treatment B23436 Which treatment is better at removing small kidney stones? a) Treatment A b) Treatment B

Statistics: Unlocking the Power of Data Lock 5 Kidney Stones LARGE STONESSuccessFailure Treatment A19271 Treatment B5525 Which treatment is better at removing large kidney stones? a) Treatment A b) Treatment B

Statistics: Unlocking the Power of Data Lock 5 Kidney Stones Treatment A is more effective for all kidney stones, but the data shows Treatment B to be effective overall! How is this possible!?!?

Statistics: Unlocking the Power of Data Lock 5 Kidney Stones – Simpson’s Paradox Large StonesSuccessFailureSuccess Rate Treatment A % Treatment B552569% Small StonesSuccessFailureSuccess Rate Treatment A81693% Treatment B % ALL STONESSuccessFailureSuccess Rate Treatment A % Treatment B %

Statistics: Unlocking the Power of Data Lock 5 Kidney Stones Treatment A is used more often on large stones, which are harder to treat. This is an example of Simpson’s Paradox: an observed relationship between two variables can change (or even reverse!) when a third variable is considered

Statistics: Unlocking the Power of Data Lock 5 Kidney Stones

Statistics: Unlocking the Power of Data Lock 5

Small Stones Treatment A Treatment B Successful81 (93%)234 (87%) Unsuccessful636 Slope = # successful / # unsuccessful = odds

Statistics: Unlocking the Power of Data Lock 5 Large Stones Treatment A Treatment B Successful192 (73%)55 (69%) Unsuccessful7125 Slope = # successful / # unsuccessful = odds

Statistics: Unlocking the Power of Data Lock 5 Combined Treatment A Treatment B Successful = Unsuccessful6+71=7761

Statistics: Unlocking the Power of Data Lock 5 Combined Treatment A Treatment B Successful273 (78%)289 (83%) Unsuccessful7761

Statistics: Unlocking the Power of Data Lock 5 Combined Treatment A Treatment B Successful273 (78%)289 (83%) Unsuccessful7761

Statistics: Unlocking the Power of Data Lock 5

Summary: Two Categorical Variables Summary Statistics  Two-way table  Difference in proportions Visualization  Side-by-side bar chart  Segmented bar chart

Statistics: Unlocking the Power of Data Lock 5 Quantitative and Categorical Relationships Interested in a quantitative variable broken down by categorical groups

Statistics: Unlocking the Power of Data Lock 5 Tea and the Immune System Mednick, Cai, Kanady, and Drummond (2008). “Comparing the benefits of caffeine, naps and placebo on verbal, motor and perceptual memory,” Behavioral Brain Research, 193, Participants were randomized to drink five or six cups of either tea or coffee every day for two weeks (both drinks have caffeine but only tea has L-theanine) After two weeks, blood samples were exposed to an antigen, and production of interferon gamma (immune system response) was measured Explanatory variable: tea or coffee Response variable: measure of interferon gamma

Statistics: Unlocking the Power of Data Lock 5 Tea and the Immune System If the tea drinkers have significantly higher levels of interferon gamma, can we conclude that drinking tea rather than coffee caused an increase in this aspect of the immune response? a) Yes b) No Randomized experiment – possible to make conclusions about causality

Statistics: Unlocking the Power of Data Lock 5 Side-by-Side Boxplots R: boxplot(InterferonGamma~Drink)

Statistics: Unlocking the Power of Data Lock 5 Quantitative Statistics by a Categorical Variable

Statistics: Unlocking the Power of Data Lock 5 Difference in Means R: compareMean(InterferonGamma~Drink)

Statistics: Unlocking the Power of Data Lock 5 Summary: One Quantitative and One Categorical Summary Statistics  Any summary statistics for quantitative variables, broken down by groups  Difference in means Visualization  Side-by-side boxplots

Statistics: Unlocking the Power of Data Lock 5 Two Quantitative Variables Summary Statistics: correlation Visualization: scatterplot

Statistics: Unlocking the Power of Data Lock 5 Scatterplot A scatterplot is the graph of the relationship between two quantitative variables. R: plot(study_hours, gpa)

Statistics: Unlocking the Power of Data Lock 5 Direction of Association A positive association means that values of one variable tend to be higher when values of the other variable are higher A negative association means that values of one variable tend to be lower when values of the other variable are higher Two variables are not associated if knowing the value of one variable does not give you any information about the value of the other variable

Statistics: Unlocking the Power of Data Lock 5 Cars Data Handout Quantitative Variables:  Weight (pounds)  City MPG  Fuel capacity (gallons)  Page number (in Consumer Reports)  Time to go ¼ mile (in seconds)  Acceleration time from 0 to 60 mph Relationships  Weight vs. CityMPG  Weight vs. FuelCapacity  PageNum vs. Fuel Capacity  Weight vs. QtrMile  Acc060 vs. QtrMile  CityMPG vs. QtrMile

Statistics: Unlocking the Power of Data Lock 5 Car Associations

Statistics: Unlocking the Power of Data Lock 5 Correlation The correlation is a measure of the strength and direction of linear association between two quantitative variables Sample correlation: r Population correlation:  (“rho”) R: cor(x,y)

Statistics: Unlocking the Power of Data Lock 5 Car Correlations What are the properties of correlation? (-.91) (.89) (-.08) (-.45) (.99) (.51)

Statistics: Unlocking the Power of Data Lock 5 Correlation ≤ r ≤ 1 2. The sign indicates the direction of association 1. positive association: r > 0 2. negative association: r < 0 3. no linear association: r  0 3. The closer r is to ±1, the stronger the linear association 4. r has no units and does not depend on the units of measurement 5. The correlation between X and Y is the same as the correlation between Y and X

Statistics: Unlocking the Power of Data Lock 5 Correlation Guessing Game Highest scorer in the class by the first exam gets one extra credit point!

Statistics: Unlocking the Power of Data Lock 5 Correlation r = 0.43 NFL Teams

Statistics: Unlocking the Power of Data Lock 5 Correlation Same plot, but with Dolphins and Raiders (outliers) removed r = 0.08

Statistics: Unlocking the Power of Data Lock 5 Human Cannonball Y X Plot Y vs. X What is the correlation between X and Y? a) r > 0 b) r < 0 c) r = 0 Are X and Y associated? a) Yes b) No

Statistics: Unlocking the Power of Data Lock 5 Correlation Cautions 1. Correlation can be heavily affected by outliers. Always plot your data! 2. r = 0 means no linear association. The variables could still be otherwise associated. Always plot your data! 3. Correlation does not imply causation!

Statistics: Unlocking the Power of Data Lock 5 Summary: Two Quantitative Variables Summary Statistics: correlation Visualization: scatterplot

Statistics: Unlocking the Power of Data Lock 5 Variable(s)VisualizationSummary Statistics Categoricalbar chart, pie chart frequency table, relative frequency table, proportion Quantitativedotplot, histogram, boxplot mean, median, max, min, standard deviation, range, IQR, five number summary Categorical vs Categorical side-by-side bar chart, segmented bar chart two-way table, difference in proportions Quantitative vs Categorical side-by-side boxplotsstatistics by group, difference in means Quantitative vs Quantitative scatterplotcorrelation

Statistics: Unlocking the Power of Data Lock 5 To Do Read Sections 2.1, 2.4, 2.5 Do HW 2 (due Wednesday, 1/29)