STAT 101 Dr. Kari Lock Morgan

Slides:



Advertisements
Similar presentations
Which Test? Which Test? Explorin g Data Explorin g Data Planning a Study Planning a Study Anticipat.
Advertisements

Correlation & the Coefficient of Determination
Panel at 2013 Joint Mathematics Meetings
Introducing Hypothesis Tests
Contingency Tables Prepared by Yu-Fen Li.
Chapter 16 Goodness-of-Fit Tests and Contingency Tables
Review bootstrap and permutation
Comparing Two Groups’ Means or Proportions: Independent Samples t-tests.
Statistical Analysis SC504/HS927 Spring Term 2008
Simple Linear Regression Analysis
Multiple Regression and Model Building
Chapter 16 Inferential Statistics
Unit 4 – Inference from Data: Principles
Simple Linear Regression Conditions Confidence intervals Prediction intervals Section 9.1, 9.2, 9.3 Professor Kari Lock Morgan Duke University.
Hypothesis Testing, Synthesis
Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice.
Hypothesis Testing I 2/8/12 More on bootstrapping Random chance
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTION 2.6, 9.1 Least squares line Interpreting.
Statistics: Unlocking the Power of Data Lock 5 Inference Using Formulas STAT 101 Dr. Kari Lock Morgan Chapter 6 t-distribution Formulas for standard errors.
July 1, 2008Lecture 17 - Regression Testing1 Testing Relationships between Variables Statistics Lecture 17.
Data Analysis Statistics. Inferential statistics.
1 Practicals, Methodology & Statistics II Laura McAvinue School of Psychology Trinity College Dublin.
Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null.
STAT 101 Dr. Kari Lock Morgan Exam 2 Review.
Data Analysis Statistics. Inferential statistics.
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals.
Statistics: Unlocking the Power of Data Lock 5 Inference for Proportions STAT 250 Dr. Kari Lock Morgan Chapter 6.1, 6.2, 6.3, 6.7, 6.8, 6.9 Formulas for.
ANOVA 3/19/12 Mini Review of simulation versus formulas and theoretical distributions Analysis of Variance (ANOVA) to compare means: testing for a difference.
Inferential statistics Hypothesis testing. Questions statistics can help us answer Is the mean score (or variance) for a given population different from.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Hypotheses STAT 101 Dr. Kari Lock Morgan SECTION 4.1 Statistical test Null and alternative.
Synthesis and Review 3/26/12 Multiple Comparisons Review of Concepts Review of Methods - Prezi Essential Synthesis 3 Professor Kari Lock Morgan Duke University.
Review Tests of Significance. Single Proportion.
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
More Randomization Distributions, Connections
Hypothesis Testing in Linear Regression Analysis
September In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Multiple Regression SECTIONS 10.1, 10.3 (?) Multiple explanatory variables.
Essential Synthesis SECTION 4.4, 4.5, ES A, ES B
Statistics: Unlocking the Power of Data Lock 5 Synthesis STAT 250 Dr. Kari Lock Morgan SECTIONS 4.4, 4.5 Connecting bootstrapping and randomization (4.4)
Using Lock5 Statistics: Unlocking the Power of Data
Statistics: Unlocking the Power of Data Lock 5 Afternoon Session Using Lock5 Statistics: Unlocking the Power of Data Patti Frazer Lock University of Kentucky.
Estimation: Sampling Distribution
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/18/12 Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap.
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory variables.
Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
11 Chapter 12 Quantitative Data Analysis: Hypothesis Testing © 2009 John Wiley & Sons Ltd.
Section 9-1: Inference for Slope and Correlation Section 9-3: Confidence and Prediction Intervals Visit the Maths Study Centre.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/6/12 Simple Linear Regression SECTIONS 9.1, 9.3 Inference for slope (9.1)
Bayesian Inference, Review 4/25/12 Frequentist inference Bayesian inference Review The Bayesian Heresy (pdf)pdf Professor Kari Lock Morgan Duke University.
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 12/6/12 Synthesis Big Picture Essential Synthesis Bayesian Inference (continued)
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Synthesis Big Picture Essential Synthesis Synthesis and Review.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/20/12 Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory.
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Synthesis and Review for Exam 2.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Multiple Regression SECTIONS 10.1, 10.3 Multiple explanatory variables (10.1,
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Synthesis and Review for Exam 1.
Statistics and probability Dr. Khaled Ismael Almghari Phone No:
CHAPTER 26: Inference for Regression
Review of Hypothesis Testing
Ass. Prof. Dr. Mogeeb Mosleh
Simple Linear Regression
Introductory Statistics
Presentation transcript:

STAT 101 Dr. Kari Lock Morgan Synthesis Big Picture Essential Synthesis Review Speed Dating

Final Monday, April 28th, 2 – 5pm No make-ups, no excuses 30% of your course grade Cumulative from the entire course Open only to a calculator and 3 double-sided pages of notes prepared only by you

Help Before Final Wednesday, 4/23: Thursday, 4/24: Friday, 4/25: 3 – 4pm, Prof Morgan, Old Chem 216 4 – 9pm, Stat Ed Help, Old Chem 211A Thursday, 4/24: 5 – 7pm, Yating, Old Chem 211A Friday, 4/25: 1 – 3pm, Prof Morgan, Old Chem 216 3 – 4 pm, REVIEW SESSION, room tbd Sunday, 4/27: 4 – 6pm, Tori, Old Chem 211A 6 – 7pm, Stat Ed Help, Old Chem 211A 7 – 9pm, David, Old Chem 211A Monday, 4/28: 12:30 – 1:30, Prof Morgan, Old Chem 216

Review What is Bayes Rule? A way of getting from P(A if B) to P(B if A) A way of calculating P(A and B) A way of calculating P(A or B)

Data Collection The way the data are/were collected determines the scope of inference For generalizing to the population: was it a random sample? Was there sampling bias? For assessing causality: was it a randomized experiment? Collecting good data is crucial to making good inferences based on the data

Exploratory Data Analysis Before doing inference, always explore your data with descriptive statistics Always visualize your data! Visualize your variables and relationships between variables Calculate summary statistics for variables and relationships between variables – these will be key for later inference The type of visualization and summary statistics depends on whether the variable(s) are categorical or quantitative

Estimation For good estimation, provide not just a point estimate, but an interval estimate which takes into account the uncertainty of the statistic Confidence intervals are designed to capture the true parameter for a specified proportion of all samples A P% confidence interval can be created by bootstrapping (sampling with replacement from the sample) and using the middle P% of bootstrap statistics

Hypothesis Testing A p-value is the probability of getting a statistic as extreme as observed, if H0 is true The p-value measures the strength of the evidence the data provide against H0 “If the p-value is low, the H0 must go” If the p-value is not low, then you can not reject H0 and have an inconclusive test

p-value A p-value can be calculated by A randomization test: simulate statistics assuming H0 is true, and see what proportion of simulated statistics are as extreme as that observed Calculating a test statistic and comparing that to a theoretical reference distribution (normal, t, 2, F)

Hypothesis Tests Variables Appropriate Test One Quantitative Single mean (t) One Categorical Single proportion (normal) Chi-square Goodness of Fit Two Categorical Difference in proportions (normal) Chi-square Test for Association One Quantitative, Difference in means (t) Matched pairs (t) ANOVA (F) Two Quantitative Correlation (t) Slope in Simple Linear Regression (t) More than two Multiple Regression (t, F)

Regression Regression is a way to predict one response variable with multiple explanatory variables Regression fits the coefficients of the model The model can be used to Analyze relationships between the explanatory variables and the response Predict Y based on the explanatory variables Adjust for confounding variables

Probability

Romance Do these variables differ for males and females? What variables help to predict romantic interest? Do these variables differ for males and females? All we need to figure this out is DATA! (For all of you, being almost done with STAT 101, this is the case for many interesting questions!)

Speed Dating We will use data from speed dating conducted at Columbia University, 2002-2004 276 males and 276 females from Columbia’s various graduate and professional schools Each person met with 10-20 people of the opposite sex for 4 minutes each After each encounter each person said either “yes” (they would like to be put in touch with that partner) or “no”

Speed Dating Data What are the cases? Students participating in speed dating Speed dates Ratings of each student

Speed Dating What is the population? Ideal population? More realistic population?

Speed Dating It is randomly determined who the students will be paired with for the speed dates. We find that people are significantly more likely to say “yes” to people they think are more intelligent. Can we infer causality between perceived intelligence and wanting a second date? Yes No

Successful Speed Date? What is the probability that a speed date is successful (results in both people wanting a second date)? To best answer this question, we should use Descriptive statistics Confidence Interval Hypothesis Test Regression Bayes Rule

Successful Speed Date? 63 of the 276 speed dates were deemed successful (both male and female said yes). A 95% confidence interval for the true proportion of successful speed dates is (0.2, 0.3) (0.18, 0.28) (0.21, 0.25) (0.13, 0.33)

Pickiness and Gender Are males or females more picky when it comes to saying yes? Guesses? Males Females

Pickiness and Gender Yes No Males 146 130 Females 127 149 Are males or females more picky when it comes to saying yes? How could you answer this? Test for a single proportion Test for a difference in proportions Chi-square test for association ANOVA Either (b) or (c)

Pickiness and Gender Do males and females differ in their pickiness? Using α = 0.05, how would you answer this? a) Yes b) No c) Not enough information

Reciprocity Male says Yes Male says No Female says Yes 63 64 Female says No 83 66 Are people more likely to say yes to someone who says yes back? How would you best answer this? Descriptive statistics Confidence Interval Hypothesis Test Regression Bayes Rule

Reciprocity Male says Yes Male says No Female says Yes 63 64 Female says No 83 66 Are people more likely to say yes to someone who says yes back? How could you answer this? Test for a single proportion Test for a difference in proportions Chi-square test for association ANOVA Either (b) or (c)

Reciprocity Are people more likely to say yes to someone who says yes back? p-value = 0.3731 Based on this data, we cannot determine whether people are more likely to say yes to someone who says yes back.

Race and Response: Females Does the chance of females saying yes to males differ by race? How could you answer this question? Test for a single proportion Test for a difference in proportions Chi-square goodness of fit Chi-square test for association ANOVA Asian Black Caucasian Latino Other 0.50 0.57 0.42 0.48 0.53

Race and Response: Males Each person rated their date on a scale of 1-10 based on how much they liked them overall. Does how much males like females differ by race? How would you test this? Chi-square test t-test for a difference in means Matched pairs test ANOVA Either (b) or (d)

Physical Attractiveness Each person also rated their date from 1-10 on the physical attractiveness. Do males rate females higher, or do females rate males higher? Which tool would you use to answer this question? Two-sample difference in means Matched pair difference in means Chi-Square ANOVA Correlation

Physical Attractiveness The histogram shown is of the data bootstrap distribution randomization distribution sampling distribution 𝑥 𝑀 − 𝑥 𝐹 =0.406 95% CI: (0.10, 0.71) p-value =0.01

Other Ratings Each person also rated their date from 1-10 on the following attributes: Attractiveness Sincerity Intelligence How fun the person seems Ambition Shared interests Which of these best predict how much someone will like their date?

Multiple Regression MALES RATING FEMALES: FEMALES RATING MALES:

Ambition and Liking Do people prefer their dates to be less ambitious??? How does the perceived ambition of a date relate to how much the date is liked? How would you answer this question? Inference for difference in means ANOVA Inference for correlation Inference for simple linear regression Either (b), (c) or (d)

Simple Linear Regression MALES RATING FEMALES: FEMALES RATING MALES:

Ambition and Liking r = 0.44, SE = 0.05 Find a 95% CI for . Test whether 1 differs from 0.

ALL YOU NEED IS DATA!!! After taking STAT 101: Thank You!!! If you have a question that needs answering… ALL YOU NEED IS DATA!!! Thank You!!!