Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Similar presentations


Presentation on theme: "Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2."— Presentation transcript:

1 Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test

2 Reading In Textbook Approximate Reading for Today’s Material: Pages 582-611, 634-667 Approximate Reading for Next Class: Pages 634-667

3 Midterm I - Results Preliminary comments: Circled numbers are points taken off Total for each problem in brackets Points evenly divided among parts Page total in lower right corner Check those sum to total on front Overall score out of 100 points

4 Midterm I - Results Interpretation of Scores: Too early for letter grades These will change a lot: –Some with good grades will relax –Some with bad grades will wake up Don’t believe “A & C” average to “B”

5 Midterm I - Results Interpretation of Scores: Recall large variation over 2 midterms –No exception this semester

6 Midterm I - Results

7 Line of Equal Scores

8 Midterm I - Results Some have Dramatically Improved Others have Been distracted By other things

9 Midterm I - Results Interpretation of Scores: Recall large variation over 2 midterms –No exception this semester Get better info from 2 test Total –So will report answers in those terms

10 Midterm I - Results Histogram of Results:

11 Midterm I - Results Interpretation of Scores (2 Test total): 170 - 200A 155 – 168B 131 – 154C 120 – 129D -- 119F

12 Midterm I - Results Where do we go from here? I see 2 rather different groups… Which are you in? What can you do? Most important: It is still early days……

13 Chapter 9: Two-Way Tables Main idea: Divide up populations in two ways –E.g. 1: Age & Sex –E.g. 2: Education & Income Typical Major Question: How do divisions relate? Are the divisions independent? –Similar idea to indepe’nce in prob. Theory –Statistical Inference?

14 Two-Way Tables Big Question: Is there a relationship? Note: tallest bars French Wine  French Music Italian Wine  Italian Music Other Wine  No Music Suggests there is a relationship

15 Two-Way Tables General Directions: Can we make this precise? Could it happen just by chance? –Really: how likely to be a chance effect? Or is it statistically significant? –I.e. music and wine purchase are related?

16 Two-Way Tables An alternate view: Replace counts by proportions (or %-ages) Class Example 31 (Wine & Music), Part 2 http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg31.xls Advantage: May be more interpretable Drawback: No real difference (just rescaled)

17 Two-Way Tables Testing for independence: What is it? From probability theory: P{A | B} = P{A} i.e. Chances of A, when B is known, are same as when B is unknown Table version of this idea?

18 Independence in 2-Way Tables Counts analog of P{A|B}??? Equivalent condition for independence is: So for counts, look for: Table Prop’n = Row Marg’l Prop’n x Col’n Marg’l Prop’n i.e. Entry = Product of Marginals

19 Independence in 2-Way Tables Visualize Product of Marginals for: Class Example 31 (Wine & Music), Part 4 http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg31.xls Shows same structure as marginals But not match between music & wine Good null hypothesis

20 Independence in 2-Way Tables Approach: Measure “distance between tables” –Use Chi Square Statistic –Has known probability distribution when table is independent Assess significance using P-value –Set up as: H 0 : Indep. H A : Dependent –P-value = P{what saw or m.c. | Indep.}

21 Independence in 2-Way Tables Chi-square statistic: Based on: Observed Counts (raw data), Expected Counts (under indep.), Notes: –Small for only random variation –Large for significant departure from indep.

22 Independence in 2-Way Tables Chi-square statistic calculation: Class example 31, Part 5: http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg31.xls –Calculate term by term –Then sum –Is X 2 = 18.3 “big” or “small”?

23 Independence in 2-Way Tables H 0 distribution of the X 2 statistic: “Chi Squared” (another Greek letter ) Parameter: “degrees of freedom” (similar to T distribution) Excel Computation: –CHIDIST (given cutoff, find area = prob.) –CHIINV (given prob = area, find cutoff)

24 Independence in 2-Way Tables For test of independence, use: degrees of freedom = = (#rows – 1) x (#cols – 1) E.g. Wine and Music: d.f. = (3 – 1) x (3 – 1) = 4

25 Independence in 2-Way Tables E.g. Wine and Music: P-value = P{Observed X 2 or m.c. | Indep.} = = P{X 2 = 18.3 of m.c. | Indep.} = = P{X 2 >= 18.3 | d.f. = 4} = = 0.0011 Also see Class Example 31, Part 5 http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg31.xls

26 Independence in 2-Way Tables E.g. Wine and Music: P-value = 0.001 Yes-No: Very strong evidence against independence, conclude music has a statistically significant effect Gray-Level: Also very strong evidence

27 Independence in 2-Way Tables Excel shortcut: CHITEST Avoids the (obs-exp)^2 / exp calculat’n Automatically computes d.f. Returns P-value

28 Independence in 2-Way Tables HW: 9.27 9.29

29 And Now for Something Completely Different A statistics joke, from: GARY C. RAMSEYER'S INTERNET GALLERY OF STATISTICS JOKES http://www.ilstu.edu/~gcramsey/Gallery.html

30 And Now for Something Completely Different A somewhat advanced society has figured how to package basic knowledge in pill form. A student, needing some learning, goes to the pharmacy and asks what kind of knowledge pills are available.

31 And Now for Something Completely Different The pharmacist says "Here's a pill for English literature." The student takes the pill and swallows it and has new knowledge about English literature!

32 And Now for Something Completely Different " What else do you have?" asks the student. "Well, I have pills for art history, biology, and world history, "replies the pharmacist. The student asks for these, and swallows them and has new knowledge about those subjects!

33 And Now for Something Completely Different Then the student asks, "Do you have a pill for statistics?" The pharmacist says "Wait just a moment", and goes back into the storeroom and brings back a whopper of a pill that is about twice the size of a jawbreaker and plunks it on the counter. "I have to take that huge pill for statistics?" inquires the student.

34 And Now for Something Completely Different The pharmacist understandingly nods his head and replies: "Well, you know statistics always was a little hard to swallow."

35 Caution about 2-Way Tables Simpson’s Paradox: Aggregation into tables can be dangerous E.g. from: http://www.math.sfu.ca/~cschwarz/Stat-301/Handouts/node50.html Study Admission rates to professional programs, look for sex bias….

36 Simpson’s Paradox Admissions to Business School: % Males ad’ted = 480 / (480 + 120) * 100% = 80% % Females ad’ted = 180 / (180 + 20)* 100% = 90% Better for females??? AdmitDeny Male480120 Female18020

37 Simpson’s Paradox Admissions to Law School: % Males ad’ted = 10 / (10 + 90) * 100% = 10% % Females ad’ted = 100 / (100+200)*100% = 33.3% Better for females??? AdmitDeny Male1090 Female100200

38 Simpson’s Paradox Combined Admissions: % Males ad’ted = 490 / (490 + 210) * 100% = 70% % Females ad’ted = 280 / (280+210)*100% = 56% Better for males??? AdmitDeny Male490210 Female280220

39 Simpson’s Paradox How can the rate be higher for both females and also males? Reason: depends on relative proportions Notes: In Business (male applicants dominant), easier to get in (660 / 800) In Law (female applicants dominant), much harder to get in (110 / 400)

40 Simpson’s Paradox Lesson: Must be very careful about aggregation Worse: may not be aware that aggregation has been done…. Recall terminology: Lurking Variable Can hide in aggregation… Could be used for cheating…

41 Simpson’s Paradox HW: 9.15 9.17

42 Inference for Regression Chapter 10 Recall: Scatterplots Fitting Lines to Data Now study statistical inference associated with fit lines E.g. When is slope statistically significant?

43 Recall Scatterplot For data (x,y) View by plot: (1,2) (3,1) (-1,0) (2,-1)

44 Recall Linear Regression Idea: Fit a line to data in a scatterplot To learn about “basic structure” To “model data” To provide “prediction of new values”

45 Recall Linear Regression Recall some basic geometry: A line is described by an equation: y = mx + b m = slope m b = y intercept b Varying m & b gives a “family of lines”, Indexed by “parameters” m & b (or a & b)

46 Recall Linear Regression Approach: Given a scatterplot of data: Find a & b (i.e. choose a line) to “best fit the data”

47 Recall Linear Regression Given a line,, “indexed” by Define “residuals” = “data Y” – “Y on line” = Now choose to make these “small”

48 Recall Linear Regression Excellent Demo, by Charles Stanton, CSUSB http://www.math.csusb.edu/faculty/stanton/m262/regress/regress.html More JAVA Demos, by David Lane at Rice U. http://www.ruf.rice.edu/~lane/stat_sim/reg_by_eye/index.html http://www.ruf.rice.edu/~lane/stat_sim/comp_r/index.html

49 Recall Linear Regression Make Residuals > 0, by squaring Least Squares: adjust to Minimize the “Sum of Squared Errors”

50 Least Squares in Excel Computation: 1.INTERCEPT (computes y-intercept a) 2.SLOPE (computes slope b) Revisit Class Example 14 http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg14.xls HW: 10.17a

51 Inference for Regression Goal: develop Hypothesis Tests and Confidence Int’s For slope & intercept parameters, a & b Also study prediction

52 Inference for Regression Idea: do statistical inference on: –Slope a –Intercept b Model: Assume: are random, independent and

53 Inference for Regression Viewpoint: Data generated as: y = ax + b Y i chosen from X i Note: a and b are “parameters”

54 Inference for Regression Parameters and determine the underlying model (distribution) Estimate with the Least Squares Estimates: and (Using SLOPE and INTERCEPT in Excel, based on data)

55 Inference for Regression Distributions of and ? Under the above assumptions, the sampling distributions are: Centerpoints are right (unbiased) Spreads are more complicated

56 Inference for Regression Formula for SD of : Big (small) for big (small, resp.) –Accurate data  Accurate est. of slope Small for x’s more spread out –Data more spread  More accurate Small for more data –More data  More accuracy

57 Inference for Regression Formula for SD of : Big (small) for big (small, resp.) –Accurate data  Accur’te est. of intercept Smaller for –Centered data  More accurate intercept Smaller for more data –More data  More accuracy

58 Inference for Regression One more detail: Need to estimate using data For this use: Similar to earlier sd estimate, Except variation is about fit line is similar to from before

59 Inference for Regression Now for Probability Distributions, Since are estimating by Use TDIST and TINV With degrees of freedom =

60 Inference for Regression Convenient Packaged Analysis in Excel: Tools  Data Analysis  Regression Illustrate application using: Class Example 27, Old Text Problem 8.6 (now 10.12)


Download ppt "Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2."

Similar presentations


Ads by Google