Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Slides:

Advertisements

Similar presentations

Objectives 10.1 Simple linear regression

Advertisements

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.

Inference for Regression

CHAPTER 23: Two Categorical Variables: The Chi-Square Test

Stat 31, Section 1, Last Time Paired Diff’s vs. Unmatched Samples

Chapter 11 Inference for Distributions of Categorical Data

CHAPTER 24: Inference for Regression

Chapter 13: Inference for Distributions of Categorical Data

Chapter 10 Simple Regression.

The Simple Regression Model

Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.

Linear Regression and Correlation Analysis

Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.

SIMPLE LINEAR REGRESSION

Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.

Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.

Chapter 12 Section 1 Inference for Linear Regression.

1 Chapter 20 Two Categorical Variables: The Chi-Square Test.

SIMPLE LINEAR REGRESSION

The Chi-Square Distribution 1. The student will be able to  Perform a Goodness of Fit hypothesis test  Perform a Test of Independence hypothesis test.

Inference for regression - Simple linear regression

Chapter 13: Inference in Regression

Regression Analysis (2)

Stat 31, Section 1, Last Time T distribution –For unknown, replace with –Compute with TDIST & TINV (different!) Paired Samples –Similar to above, work.

Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.

Inference for Regression

BPS - 3rd Ed. Chapter 211 Inference for Regression.

Inference for Linear Regression Conditions for Regression Inference: Suppose we have n observations on an explanatory variable x and a response variable.

Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values.

+ Chapter 12: Inference for Regression Inference for Linear Regression.

POSC 202A: Lecture 12/10 Announcements: “Lab” Tomorrow; Final ed out tomorrow or Friday. I will make it due Wed, 5pm. Aren’t I tender? Lecture: Substantive.

Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.

Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.

Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.

+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.

Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution.

CHAPTER 11 SECTION 2 Inference for Relationships.

Data Analysis for Two-Way Tables. The Basics Two-way table of counts Organizes data about 2 categorical variables Row variables run across the table Column.

Final Examination Thursday, April 30, 4:00 – 7:00 Location: here, Hanes 120.

Copyright © 2010 Pearson Education, Inc. Slide

Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.

4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.

Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.

June 30, 2008Stat Lecture 16 - Regression1 Inference for relationships between variables Statistics Lecture 16.

Administrative Matters Midterm II Results Take max of two midterm scores:

The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.

McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.

Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.

The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.

BPS - 5th Ed. Chapter 231 Inference for Regression.

The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.

Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.

Chi Square Procedures Chapter 14. Chi-Square Goodness-of-Fit Tests Section 14.1.

AP Stats Check In Where we’ve been… Chapter 7…Chapter 8… Where we are going… Significance Tests!! –Ch 9 Tests about a population proportion –Ch 9Tests.

11/12 9. Inference for Two-Way Tables. Cocaine addiction Cocaine produces short-term feelings of physical and mental well being. To maintain the effect,

Chapter 15 Inference for Regression. How is this similar to what we have done in the past few chapters?  We have been using statistics to estimate parameters.

CHAPTER 12 More About Regression

CHAPTER 12 More About Regression

Stat 31, Section 1, Last Time Sampling Distributions

CHAPTER 12 More About Regression

POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.

CHAPTER 29: Multiple Regression*

CHAPTER 26: Inference for Regression

AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…

Basic Practice of Statistics - 3rd Edition Inference for Regression

CHAPTER 12 More About Regression

CHAPTER 12 More About Regression

Presentation transcript:

Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test

Reading In Textbook Approximate Reading for Today’s Material: Pages , Approximate Reading for Next Class: Pages

Midterm I - Results Preliminary comments: Circled numbers are points taken off Total for each problem in brackets Points evenly divided among parts Page total in lower right corner Check those sum to total on front Overall score out of 100 points

Midterm I - Results Interpretation of Scores: Too early for letter grades These will change a lot: –Some with good grades will relax –Some with bad grades will wake up Don’t believe “A & C” average to “B”

Midterm I - Results Interpretation of Scores: Recall large variation over 2 midterms –No exception this semester

Midterm I - Results

Line of Equal Scores

Midterm I - Results Some have Dramatically Improved Others have Been distracted By other things

Midterm I - Results Interpretation of Scores: Recall large variation over 2 midterms –No exception this semester Get better info from 2 test Total –So will report answers in those terms

Midterm I - Results Histogram of Results:

Midterm I - Results Interpretation of Scores (2 Test total): A 155 – 168B 131 – 154C 120 – 129D F

Midterm I - Results Where do we go from here? I see 2 rather different groups… Which are you in? What can you do? Most important: It is still early days……

Chapter 9: Two-Way Tables Main idea: Divide up populations in two ways –E.g. 1: Age & Sex –E.g. 2: Education & Income Typical Major Question: How do divisions relate? Are the divisions independent? –Similar idea to indepe’nce in prob. Theory –Statistical Inference?

Two-Way Tables Big Question: Is there a relationship? Note: tallest bars French Wine  French Music Italian Wine  Italian Music Other Wine  No Music Suggests there is a relationship

Two-Way Tables General Directions: Can we make this precise? Could it happen just by chance? –Really: how likely to be a chance effect? Or is it statistically significant? –I.e. music and wine purchase are related?

Two-Way Tables An alternate view: Replace counts by proportions (or %-ages) Class Example 31 (Wine & Music), Part 2 Advantage: May be more interpretable Drawback: No real difference (just rescaled)

Two-Way Tables Testing for independence: What is it? From probability theory: P{A | B} = P{A} i.e. Chances of A, when B is known, are same as when B is unknown Table version of this idea?

Independence in 2-Way Tables Counts analog of P{A|B}??? Equivalent condition for independence is: So for counts, look for: Table Prop’n = Row Marg’l Prop’n x Col’n Marg’l Prop’n i.e. Entry = Product of Marginals

Independence in 2-Way Tables Visualize Product of Marginals for: Class Example 31 (Wine & Music), Part 4 Shows same structure as marginals But not match between music & wine Good null hypothesis

Independence in 2-Way Tables Approach: Measure “distance between tables” –Use Chi Square Statistic –Has known probability distribution when table is independent Assess significance using P-value –Set up as: H 0 : Indep. H A : Dependent –P-value = P{what saw or m.c. | Indep.}

Independence in 2-Way Tables Chi-square statistic: Based on: Observed Counts (raw data), Expected Counts (under indep.), Notes: –Small for only random variation –Large for significant departure from indep.

Independence in 2-Way Tables Chi-square statistic calculation: Class example 31, Part 5: –Calculate term by term –Then sum –Is X 2 = 18.3 “big” or “small”?

Independence in 2-Way Tables H 0 distribution of the X 2 statistic: “Chi Squared” (another Greek letter ) Parameter: “degrees of freedom” (similar to T distribution) Excel Computation: –CHIDIST (given cutoff, find area = prob.) –CHIINV (given prob = area, find cutoff)

Independence in 2-Way Tables For test of independence, use: degrees of freedom = = (#rows – 1) x (#cols – 1) E.g. Wine and Music: d.f. = (3 – 1) x (3 – 1) = 4

Independence in 2-Way Tables E.g. Wine and Music: P-value = P{Observed X 2 or m.c. | Indep.} = = P{X 2 = 18.3 of m.c. | Indep.} = = P{X 2 >= 18.3 | d.f. = 4} = = Also see Class Example 31, Part 5

Independence in 2-Way Tables E.g. Wine and Music: P-value = Yes-No: Very strong evidence against independence, conclude music has a statistically significant effect Gray-Level: Also very strong evidence

Independence in 2-Way Tables Excel shortcut: CHITEST Avoids the (obs-exp)^2 / exp calculat’n Automatically computes d.f. Returns P-value

Independence in 2-Way Tables HW:

And Now for Something Completely Different A statistics joke, from: GARY C. RAMSEYER'S INTERNET GALLERY OF STATISTICS JOKES

And Now for Something Completely Different A somewhat advanced society has figured how to package basic knowledge in pill form. A student, needing some learning, goes to the pharmacy and asks what kind of knowledge pills are available.

And Now for Something Completely Different The pharmacist says "Here's a pill for English literature." The student takes the pill and swallows it and has new knowledge about English literature!

And Now for Something Completely Different " What else do you have?" asks the student. "Well, I have pills for art history, biology, and world history, "replies the pharmacist. The student asks for these, and swallows them and has new knowledge about those subjects!

And Now for Something Completely Different Then the student asks, "Do you have a pill for statistics?" The pharmacist says "Wait just a moment", and goes back into the storeroom and brings back a whopper of a pill that is about twice the size of a jawbreaker and plunks it on the counter. "I have to take that huge pill for statistics?" inquires the student.

And Now for Something Completely Different The pharmacist understandingly nods his head and replies: "Well, you know statistics always was a little hard to swallow."

Caution about 2-Way Tables Simpson’s Paradox: Aggregation into tables can be dangerous E.g. from: Study Admission rates to professional programs, look for sex bias….

Simpson’s Paradox Admissions to Business School: % Males ad’ted = 480 / ( ) * 100% = 80% % Females ad’ted = 180 / ( )* 100% = 90% Better for females??? AdmitDeny Male Female18020

Simpson’s Paradox Admissions to Law School: % Males ad’ted = 10 / ( ) * 100% = 10% % Females ad’ted = 100 / ( )*100% = 33.3% Better for females??? AdmitDeny Male1090 Female100200

Simpson’s Paradox Combined Admissions: % Males ad’ted = 490 / ( ) * 100% = 70% % Females ad’ted = 280 / ( )*100% = 56% Better for males??? AdmitDeny Male Female280220

Simpson’s Paradox How can the rate be higher for both females and also males? Reason: depends on relative proportions Notes: In Business (male applicants dominant), easier to get in (660 / 800) In Law (female applicants dominant), much harder to get in (110 / 400)

Simpson’s Paradox Lesson: Must be very careful about aggregation Worse: may not be aware that aggregation has been done…. Recall terminology: Lurking Variable Can hide in aggregation… Could be used for cheating…

Simpson’s Paradox HW:

Inference for Regression Chapter 10 Recall: Scatterplots Fitting Lines to Data Now study statistical inference associated with fit lines E.g. When is slope statistically significant?

Recall Scatterplot For data (x,y) View by plot: (1,2) (3,1) (-1,0) (2,-1)

Recall Linear Regression Idea: Fit a line to data in a scatterplot To learn about “basic structure” To “model data” To provide “prediction of new values”

Recall Linear Regression Recall some basic geometry: A line is described by an equation: y = mx + b m = slope m b = y intercept b Varying m & b gives a “family of lines”, Indexed by “parameters” m & b (or a & b)

Recall Linear Regression Approach: Given a scatterplot of data: Find a & b (i.e. choose a line) to “best fit the data”

Recall Linear Regression Given a line,, “indexed” by Define “residuals” = “data Y” – “Y on line” = Now choose to make these “small”

Recall Linear Regression Excellent Demo, by Charles Stanton, CSUSB More JAVA Demos, by David Lane at Rice U.

Recall Linear Regression Make Residuals > 0, by squaring Least Squares: adjust to Minimize the “Sum of Squared Errors”

Least Squares in Excel Computation: 1.INTERCEPT (computes y-intercept a) 2.SLOPE (computes slope b) Revisit Class Example 14 HW: 10.17a

Inference for Regression Goal: develop Hypothesis Tests and Confidence Int’s For slope & intercept parameters, a & b Also study prediction

Inference for Regression Idea: do statistical inference on: –Slope a –Intercept b Model: Assume: are random, independent and

Inference for Regression Viewpoint: Data generated as: y = ax + b Y i chosen from X i Note: a and b are “parameters”

Inference for Regression Parameters and determine the underlying model (distribution) Estimate with the Least Squares Estimates: and (Using SLOPE and INTERCEPT in Excel, based on data)

Inference for Regression Distributions of and ? Under the above assumptions, the sampling distributions are: Centerpoints are right (unbiased) Spreads are more complicated

Inference for Regression Formula for SD of : Big (small) for big (small, resp.) –Accurate data  Accurate est. of slope Small for x’s more spread out –Data more spread  More accurate Small for more data –More data  More accuracy

Inference for Regression Formula for SD of : Big (small) for big (small, resp.) –Accurate data  Accur’te est. of intercept Smaller for –Centered data  More accurate intercept Smaller for more data –More data  More accuracy

Inference for Regression One more detail: Need to estimate using data For this use: Similar to earlier sd estimate, Except variation is about fit line is similar to from before

Inference for Regression Now for Probability Distributions, Since are estimating by Use TDIST and TINV With degrees of freedom =

Inference for Regression Convenient Packaged Analysis in Excel: Tools  Data Analysis  Regression Illustrate application using: Class Example 27, Old Text Problem 8.6 (now 10.12)