Last Time Hypothesis Testing –Yes – No Questions –Assess with p-value P[what saw or m.c. | Boundary] –Interpretation –Small is conclusive –1-sided vs.

Slides:



Advertisements
Similar presentations
Analyzing Data (C2-5 BVD) C2-4: Categorical and Quantitative Data.
Advertisements

Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
Copyright © 2014 Pearson Education, Inc. All rights reserved Chapter 2 Picturing Variation with Graphs.
Random Sampling and Data Description
6-1 Stats Unit 6 Sampling Distributions and Statistical Inference - 1 FPP Chapters 16-18, 20-21, 23 The Law of Averages (Ch 16) Box Models (Ch 16) Sampling.
Stat 31, Section 1, Last Time Paired Diff’s vs. Unmatched Samples
8.1 Types of Data Displays Remember to Silence Your Cell Phone and Put It In Your Bag!
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 2.2.
Stat 155, Section 2, Last Time Producing Data: How to Sample? –Placebos –Double Blind Experiment –Random Sampling Statistical Inference –Population “parameters”,,
Statistics for Decision Making Descriptive Statistics QM Fall 2003 Instructor: John Seydel, Ph.D.
Stat 301 – Day 14 Review. Previously Instead of sampling from a process  Each trick or treater makes a “random” choice of what item to select; Sarah.
Probability and Probability Distributions
Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.
Hypothesis Tests for Means The context “Statistical significance” Hypothesis tests and confidence intervals The steps Hypothesis Test statistic Distribution.
Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s.
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 2.
Objective To understand measures of central tendency and use them to analyze data.
Data Collection & Processing Hand Grip Strength P textbook.
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 10.2.
Stat 31, Section 1, Last Time T distribution –For unknown, replace with –Compute with TDIST & TINV (different!) Paired Samples –Similar to above, work.
PowerPoint Template – delete this slide Fill in the appropriate slides Remove any bold or italicized words after you’ve added your changes Delete slides.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
1 1 Slide Simple Linear Regression Part A n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n.
Stor 155, Section 2, Last Time Hypothesis Testing –Assess strength of evidence with P-value P-value interpretation: –Yes – No –Gray – level –1 - sided.
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 34 Chapter 11 Section 1 Random Variables.
The Scientific Method Honors Biology Laboratory Skills.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Stat 31, Section 1, Last Time Statistical Inference Confidence Intervals: –Range of Values to reflect uncertainty –Bracket true value in 95% of repetitions.
Organizing Quantitative Data: The Popular Displays
 Frequency Distribution is a statistical technique to explore the underlying patterns of raw data.  Preparing frequency distribution tables, we can.
Last Time Central Limit Theorem –Illustrations –How large n? –Normal Approximation to Binomial Statistical Inference –Estimate unknown parameters –Unbiasedness.
Chapter 10 Correlation and Regression
5.3 Random Variables  Random Variable  Discrete Random Variables  Continuous Random Variables  Normal Distributions as Probability Distributions 1.
Statistics Unit 2: Organizing Data Ms. Hernandez St. Pius X High School
1 Chapter 3 Looking at Data: Distributions Introduction 3.1 Displaying Distributions with Graphs Chapter Three Looking At Data: Distributions.
Last Time Hypothesis Testing –1-sided vs. 2-sided Paradox Big Picture Goals –Hypothesis Testing –Margin of Error –Sample Size Calculations Visualization.
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 2.2.
Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution.
S-012 Testing statistical hypotheses The CI approach The NHST approach.
Unit 4 Statistical Analysis Data Representations.
Grade 8 Math Project Kate D. & Dannielle C.. Information needed to create the graph: The extremes The median Lower quartile Upper quartile Any outliers.
Stat 155, Section 2, Last Time Interpreted Midterm Results Variance of Random Variables –From Probability Table –Properties: Ignores shift Multiples comes.
Statistics – OR 155, Section 2 J. S. Marron, Professor Department of Statistics and Operations Research.
Last Time Normal Distribution –Density Curve (Mound Shaped) –Family Indexed by mean and s. d. –Fit to data, using sample mean and s.d. Computation of Normal.
Last Time Binomial Distribution Political Polls Hypothesis Testing
MM207 Statistics Welcome to the Unit 7 Seminar With Ms. Hannahs.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Excel 2007 Part (3) Dr. Susan Al Naqshbandi
Stat 31, Section 1, Last Time Distributions (how are data “spread out”?) Visual Display: Histograms Binwidth is critical Bivariate display: scatterplot.
The TITANIC In 1912 the luxury liner Titanic, on its first voyage across the Atlantic, struck an iceberg and sank. Some passengers got off the ship in.
Administrative Matters Midterm II Results Take max of two midterm scores:
A significance test or hypothesis test is a procedure for comparing our data with a hypothesis whose truth we want to assess. The hypothesis is usually.
Stat 31, Section 1, Last Time Course Organization & Website What is Statistics? Data types.
Stat 31, Section 1, Last Time Big Rules of Probability –The not rule –The or rule –The and rule P{A & B} = P{A|B}P{B} = P{B|A}P{A} Bayes Rule (turn around.
Stat 31, Section 1, Last Time Distribution of Sample Means –Expected Value  same –Variance  less, Law of Averages, I –Dist’n  Normal, Law of Averages,
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
Tests of Significance We use test to determine whether a “prediction” is “true” or “false”. More precisely, a test of significance gets at the question.
Histogram The data must be in Frequency Distribution (see presentation if needed) form for Excel to draw a histogram Make your Frequency Distribution active.
Last Time Proportions Continuous Random Variables Probabilities
Stat 31, Section 1, Last Time Choice of sample size
Unit 4 Statistical Analysis Data Representations
Review of 6th grade material to help with new Statistics unit
Presentation transcript:

Last Time Hypothesis Testing –Yes – No Questions –Assess with p-value P[what saw or m.c. | Boundary] –Interpretation –Small is conclusive –1-sided vs. 2-sided

Administrative Matters Midterm I, coming Tuesday, Feb. 24

Administrative Matters Midterm I, coming Tuesday, Feb. 24 Numerical answers: –No computers, no calculators –Handwrite Excel formulas (e.g. =9+4^2) –Don’t do arithmetic (e.g. use such formulas)

Administrative Matters Midterm I, coming Tuesday, Feb. 24 Numerical answers: –No computers, no calculators –Handwrite Excel formulas (e.g. =9+4^2) –Don’t do arithmetic (e.g. use such formulas) Bring with you: –8.5 x 11 inch sheet of paper –With your favorite info (formulas, Excel, etc.)

Administrative Matters Midterm I, coming Tuesday, Feb. 24 Numerical answers: –No computers, no calculators –Handwrite Excel formulas (e.g. =9+4^2) –Don’t do arithmetic (e.g. use such formulas) Bring with you: –8.5 x 11 inch sheet of paper –With your favorite info (formulas, Excel, etc.) Course in Concepts, not Memorization

Administrative Matters State of BlackBoard Discussion Board Generally happy with result

Administrative Matters State of BlackBoard Discussion Board Generally happy with result But think carefully about “where to post” –Look at current Thread HW 4 –Note “diffusion of questions” –Hard to find what you want

Administrative Matters State of BlackBoard Discussion Board Generally happy with result But think carefully about “where to post” –Look at current Thread HW 4 –Note “diffusion of questions” –Hard to find what you want Suggest keep HW problems all together –i.e. One “Root node” per HW problem

Administrative Matters State of BlackBoard Discussion Board Suggest keep HW problems all together –i.e. One “Root node” per HW problem

Administrative Matters State of BlackBoard Discussion Board Suggest keep HW problems all together –i.e. One “Root node” per HW problem Choose where to post (in tree) carefully

Administrative Matters State of BlackBoard Discussion Board Suggest keep HW problems all together –i.e. One “Root node” per HW problem Choose where to post (in tree) carefully Use better “Subject Lines” –Not just dumb “Replies” –You can enter anything you want –Try to make it clear to readers… –Especially when “not following current line”

Reading In Textbook Approximate Reading for Today’s Material: Pages , 9-14 Approximate Reading for Next Class: , 30-34

Hypothesis Testing In General: p-value = P[what was seen, or more conclusive | at boundary between H 0 & H 1 ] Caution: more conclusive requires careful interpretation

Hypothesis Testing Caution: more conclusive requires careful interpretation Reason: Need to decide between 1 - sided Hypotheses, like H 0 : p < vs. H 1 : p ≥ And 2 - sided Hypotheses, like H 0 : p = vs. H 1 : p ≠

Hypothesis Testing e.g. a slot machine bears a sign which says “Win 30% of the time” In 10 plays, I don’t win any. Can I conclude sign is false? (& thus have grounds for complaint, or is this a reasonable occurrence?)

Hypothesis Testing e.g. a slot machine bears a sign which says “Win 30% of the time” In 10 plays, I don’t win any. Conclude false? Let p = P[win], let X = # wins in 10 plays Model: X ~ Bi(10, p) Test: H 0 : p = 0.3 vs. H 1 : p ≠ 0.3

Hypothesis Testing Test: H 0 : p = 0.3 vs. H 1 : p ≠ 0.3 p-value = P[X = 0 or more conclusive | p = 0.3]

Hypothesis Testing Test: H 0 : p = 0.3 vs. H 1 : p ≠ 0.3 p-value = P[X = 0 or more conclusive | p = 0.3] (understand this by visualizing # line)

Hypothesis Testing Test: H 0 : p = 0.3 vs. H 1 : p ≠ 0.3 p-value = P[X = 0 or more conclusive | p = 0.3]

Hypothesis Testing Test: H 0 : p = 0.3 vs. H 1 : p ≠ 0.3 p-value = P[X = 0 or more conclusive | p = 0.3] % of 10, most likely when p = 0.3 i.e. least conclusive

Hypothesis Testing Test: H 0 : p = 0.3 vs. H 1 : p ≠ 0.3 p-value = P[X = 0 or more conclusive | p = 0.3] so more conclusive includes

Hypothesis Testing Test: H 0 : p = 0.3 vs. H 1 : p ≠ 0.3 p-value = P[X = 0 or more conclusive | p = 0.3] so more conclusive includes but since 2-sided, also include

Hypothesis Testing Generally how to calculate?

Hypothesis Testing Generally how to calculate? Observed Value

Hypothesis Testing Generally how to calculate? Observed Value Most Likely Value

Hypothesis Testing Generally how to calculate? Observed Value Most Likely Value # spaces = 3

Hypothesis Testing Generally how to calculate? Observed Value Most Likely Value # spaces = 3 so go 3 spaces in other direct’n

Hypothesis Testing Result: More conclusive means X ≤ 0 or X ≥

Hypothesis Testing Result: More conclusive means X ≤ 0 or X ≥ 6 p-value = P[X = 0 or more conclusive | p = 0.3]

Hypothesis Testing Result: More conclusive means X ≤ 0 or X ≥ 6 p-value = P[X = 0 or more conclusive | p = 0.3] = P[X ≤ 0 or X ≥ 6 | p = 0.3]

Hypothesis Testing Result: More conclusive means X ≤ 0 or X ≥ 6 p-value = P[X = 0 or more conclusive | p = 0.3] = P[X ≤ 0 or X ≥ 6 | p = 0.3] = P[X ≤ 0] + (1 – P[X ≤ 5])

Hypothesis Testing Result: More conclusive means X ≤ 0 or X ≥ 6 p-value = P[X = 0 or more conclusive | p = 0.3] = P[X ≤ 0 or X ≥ 6 | p = 0.3] = P[X ≤ 0] + (1 – P[X ≤ 5]) = 0.076

Hypothesis Testing Result: More conclusive means X ≤ 0 or X ≥ 6 p-value = P[X = 0 or more conclusive | p = 0.3] = P[X ≤ 0 or X ≥ 6 | p = 0.3] = P[X ≤ 0] + (1 – P[X ≤ 5]) = Excel result from:

Hypothesis Testing Test: H 0 : p = 0.3 vs. H 1 : p ≠ 0.3 p-value = 0.076

Hypothesis Testing Test: H 0 : p = 0.3 vs. H 1 : p ≠ 0.3 p-value = Yes-No Conclusion: > 0.05, so not safe to conclude “P[win] = 0.3” sign is wrong, at level 0.05

Hypothesis Testing Test: H 0 : p = 0.3 vs. H 1 : p ≠ 0.3 p-value = Yes-No Conclusion: > 0.05, so not safe to conclude “P[win] = 0.3” sign is wrong, at level 0.05 (10 straight losses is reasonably likely)

Hypothesis Testing Test: H 0 : p = 0.3 vs. H 1 : p ≠ 0.3 p-value = Yes-No Conclusion: > 0.05, so not safe to conclude “P[win] = 0.3” sign is wrong, at level 0.05 Gray Level Conclusion: in “fuzzy zone”, some evidence, but not too strong

Hypothesis Testing Alternate Question: Same setup, can we conclude: P[win] < 30% ???

Hypothesis Testing Alternate Question: Same setup, can we conclude: P[win] < 30% ??? Seems like same question?

Hypothesis Testing Alternate Question: Same setup, can we conclude: P[win] < 30% ??? Seems like same question? Careful, “≠” became “<”

Hypothesis Testing Alternate Question: Same setup, can we conclude: P[win] < 30% ??? Seems like same question? Careful, “≠” became “<” I.e. 2-sided hypo became 1-sided hypo

Hypothesis Testing Alternate Question: Same setup, can we conclude: P[win] < 30% ??? Seems like same question? Careful, “≠” became “<” I.e. 2-sided hypo became 1-sided hypo Difference can have major impact

Hypothesis Testing Alternate Question: Same setup, can we conclude: P[win] < 30% ???

Hypothesis Testing Alternate Question: Same setup, can we conclude: P[win] < 30% ??? Test: H 0 : p ≥ 0.3 vs. H 1 : p < 0.3

Hypothesis Testing Alternate Question: Same setup, can we conclude: P[win] < 30% ??? Test: H 0 : p ≥ 0.3 vs. H 1 : p < 0.3 p-value = P[ X = 0 or m. c. | p = 0.3]

Hypothesis Testing Alternate Question: Same setup, can we conclude: P[win] < 30% ??? Test: H 0 : p ≥ 0.3 vs. H 1 : p < 0.3 p-value = P[ X = 0 or m. c. | p = 0.3] same boundary between H 0 & H 1

Hypothesis Testing Alternate Question: Same setup, can we conclude: P[win] < 30% ??? Test: H 0 : p ≥ 0.3 vs. H 1 : p < 0.3 p-value = P[ X = 0 or m. c. | p = 0.3]

Hypothesis Testing Alternate Question: Same setup, can we conclude: P[win] < 30% ??? Test: H 0 : p ≥ 0.3 vs. H 1 : p < 0.3 p-value = P[ X = 0 or m. c. | p = 0.3] = P[ X ≤ 0 | p = 0.3]

Hypothesis Testing Alternate Question: Same setup, can we conclude: P[win] < 30% ??? Test: H 0 : p ≥ 0.3 vs. H 1 : p < 0.3 p-value = P[ X = 0 or m. c. | p = 0.3] = P[ X ≤ 0 | p = 0.3] = 0.028

Hypothesis Testing Alternate Question: Same setup, can we conclude: P[win] < 30% ??? Test: H 0 : p ≥ 0.3 vs. H 1 : p < 0.3 p-value = P[ X = 0 or m. c. | p = 0.3] = P[ X ≤ 0 | p = 0.3] = Excel result from:

Hypothesis Testing Alternate Question: Same setup, can we conclude: P[win] < 30% ??? p-value = 0.028

Hypothesis Testing Alternate Question: Same setup, can we conclude: P[win] < 30% ??? p-value = Yes-No: Now can conclude P[win] < 30%

Hypothesis Testing Yes-No: Now can conclude P[win] < 30% Paradox of Yes-No Approach:

Hypothesis Testing Yes-No: Now can conclude P[win] < 30% Paradox of Yes-No Approach: Have strong evidence that P[win] < 30%

Hypothesis Testing Yes-No: Now can conclude P[win] < 30% Paradox of Yes-No Approach: Have strong evidence that P[win] < 30% But cannot conclude P[win] diff’t from 30%

Hypothesis Testing Yes-No: Now can conclude P[win] < 30% Paradox of Yes-No Approach: Have strong evidence that P[win] < 30% But cannot conclude P[win] diff’t from 30% Different from Common Sense

Hypothesis Testing Yes-No: Now can conclude P[win] < 30% Paradox of Yes-No Approach: Have strong evidence that P[win] < 30% But cannot conclude P[win] diff’t from 30% Different from Common Sense I.e. “logic of statistical significance” different from“ordinary logic”

Hypothesis Testing Yes-No: Now can conclude P[win] < 30% Paradox of Yes-No Approach: Have strong evidence that P[win] < 30% But cannot conclude P[win] diff’t from 30% Different from Common Sense I.e. “logic - stat. sig.” not “ordinary logic” Reason: for 2-sided, uncertainty comes from both sides, just adds to gray level

Hypothesis Testing Alternate Question: Same setup, can we conclude: P[win] < 30% ??? p-value = Yes-No: Now can conclude P[win] < 30% Gray Level: Evidence still flaky, but stronger

Hypothesis Testing Alternate Question: Same setup, can we conclude: P[win] < 30% ??? p-value = Yes-No: Now can conclude P[win] < 30% Gray Level: Evidence still flaky, but stronger Note: No gray level paradox

Hypothesis Testing Alternate Question: Same setup, can we conclude: P[win] < 30% ??? p-value = Yes-No: Now can conclude P[win] < 30% Gray Level: Evidence still flaky, but stronger Note: No gray level paradox Since no cutoff, just “somewhat stronger…”

Hypothesis Testing Alternate Question: Same setup, can we conclude: P[win] < 30% ??? p-value = Yes-No: Now can conclude P[win] < 30% Gray Level: Evidence still flaky, but stronger Note: No gray level paradox Since no cutoff, just “somewhat stronger…” This is why I recommend gray level

Hypothesis Testing Lessons: 1-sided vs. 2-sided issues need: 1.Careful Implementation

Hypothesis Testing Lessons: 1-sided vs. 2-sided issues need: 1.Careful Implementation (strongly affects answer)

Hypothesis Testing Lessons: 1-sided vs. 2-sided issues need: 1.Careful Implementation (strongly affects answer) 2.Careful Interpretation

Hypothesis Testing Lessons: 1-sided vs. 2-sided issues need: 1.Careful Implementation (strongly affects answer) 2.Careful Interpretation (notion of “P[win]≠30%” being tested is different from usual)

Hypothesis Testing Lessons: 1-sided vs. 2-sided issues need: 1.Careful Implementation 2.Careful Interpretation But not so bad with Gray Level interpretation

Hypothesis Testing Lessons: 1-sided vs. 2-sided issues need: 1.Careful Implementation 2.Careful Interpretation But not so bad with Gray Level interpretation: “very strong” p-val < 0.01 “marginal” – “flaky” 0.01 ≤ p-val ≤ 0.1 “very weak” 0.1 < p-val

Hypothesis Testing HW C14: Answer from both gray-level and yes-no viewpoints: (c)A TV ad claims that 30% of people prefer Brand X. Should we dispute this claim if a random sample of 10 people show: (i)2 people who prefer Brand X (p-val = 0.733) (ii)3 people who prefer Brand X (p-val = 1) (iii)6 people who prefer Brand X (p-val = 0.076) (iv)10 people who prefer Brand X (p-val = 5.9e-6)

Hypothesis Testing HW C14: Answer from both gray-level and yes-no viewpoints: (d)A manager asks 12 workers, of whom 7 say they are satisfied with working conditions. Does this contradict the CEO’s claim that ¾ of the workers are satisfied? (p-val = 0.316)

Hypothesis Testing HW: 8.22a, ignore “z statistic” (p-val = 0.006) 8.29a, ignore “sketch …” (p-val = 0.184)

And now for something completely different Coin tossing & die rolling

And now for something completely different Coin tossing & die rolling: Useful thought models in this course

And now for something completely different Coin tossing & die rolling: Useful thought models in this course We’ve calculated various probabilities

And now for something completely different Coin tossing & die rolling: Useful thought models in this course We’ve calculated various probabilities Model for “randomness”…

And now for something completely different Coin tossing & die rolling: Useful thought models in this course We’ve calculated various probabilities Model for “randomness”… But how random are they really?

And now for something completely different Randomness in coin tossing

And now for something completely different Randomness in coin tossing: Excellent source Prof. Persi Diaconis (Stanford U.)

And now for something completely different Randomness in coin tossing: Excellent source Prof. Persi Diaconis (Stanford U.)

And now for something completely different Randomness in coin tossing

And now for something completely different Randomness in coin tossing: Prof. Persi Diaconis (Stanford U.) Trained as performing magician

And now for something completely different Randomness in coin tossing: Prof. Persi Diaconis (Stanford U.) Trained as performing magician Legendary Trick: –He tosses coin, you call it, he catches it!

And now for something completely different Randomness in coin tossing: Prof. Persi Diaconis (Stanford U.) Trained as performing magician Legendary Trick: –He tosses coin, you call it, he catches it! Coin tosses not really random

And now for something completely different Randomness in die rolling?

Big Picture Hypothesis Testing (Given dist’n, answer “yes-no”)

Big Picture Hypothesis Testing (Given dist’n, answer “yes-no”) Can solve using BINOMDIST

Big Picture Hypothesis Testing (Given dist’n, answer “yes-no”) Margin of Error (Find dist’n, use to measure error)

Big Picture Hypothesis Testing (Given dist’n, answer “yes-no”) Margin of Error (Find dist’n, use to measure error) Choose Sample Size (for given amount of error)

Big Picture Hypothesis Testing (Given dist’n, answer “yes-no”) Margin of Error (Find dist’n, use to measure error) Choose Sample Size (for given amount of error) Need better prob. tools

Big Picture Margin of Error Choose Sample Size Need better prob tools

Big Picture Margin of Error Choose Sample Size Need better prob tools Start with visualizing probability distributions

Big Picture Margin of Error Choose Sample Size Need better prob tools Start with visualizing probability distributions (key to “alternate representation”)

Visualization Idea: Visually represent “distributions” (2 types)

Visualization Idea: Visually represent “distributions” (2 types) a)Probability Distributions (e.g. Binomial)

Visualization Idea: Visually represent “distributions” (2 types) a)Probability Distributions (e.g. Binomial) Summarized by f(x)

Visualization Idea: Visually represent “distributions” (2 types) a)Probability Distributions (e.g. Binomial) Summarized by f(x) b)Lists of numbers, x 1, x 2, …, x n

Visualization Idea: Visually represent “distributions” (2 types) a)Probability Distributions (e.g. Binomial) Summarized by f(x) b)Lists of numbers, x 1, x 2, …, x n Use subscripts to index different ones

Visualization Examples of lists: (will often use below) 1.Collection of “#’s of Males, from HW ??? 2.2.3, 4.5, 4.7, 4.8, 5.1

Visualization Examples of lists: (will often use below) 1.Collection of “#’s of Males, from HW ??? 2.2.3, 4.5, 4.7, 4.8, 5.1 … (there are many others)

Visualization Connections between prob. dist’ns and lists

Visualization Connections between prob. dist’ns and lists: (i)Given dist’n, can construct a related list by drawing sample values from dist’n

Visualization Connections between prob. dist’ns and lists: (i)Given dist’n, can construct a related list by drawing sample values from dist’n e.g. Bi(1,0.5) (toss coins, count H’s) 1, 1, 1, 0, 0, 0, 1

Visualization Connections between prob. dist’ns and lists (ii)Given a list, x 1, x 2, …, x n,

Visualization Connections between prob. dist’ns and lists (ii)Given a list, x 1, x 2, …, x n, (not thinking of these as random, so use lower case)

Visualization Connections between prob. dist’ns and lists (ii)Given a list, x 1, x 2, …, x n, can construct a dist’n:

Visualization Connections between prob. dist’ns and lists (ii)Given a list, x 1, x 2, …, x n, can construct a dist’n:

Visualization Connections between prob. dist’ns and lists (ii)Given a list, x 1, x 2, …, x n, can construct a dist’n: Use different symbol, to distinguish from f

Visualization Connections between prob. dist’ns and lists (ii)Given a list, x 1, x 2, …, x n, can construct a dist’n: Use different symbol, to distinguish from f Use “hat” to indicate “estimate”

Visualization Connections between prob. dist’ns and lists (ii)Given a list, x 1, x 2, …, x n, can construct a dist’n: E.g. For above list: 1, 1, 1, 0, 0, 0, 1

Visualization Connections between prob. dist’ns and lists (ii)Given a list, x 1, x 2, …, x n, can construct a dist’n: E.g. For above list: 1, 1, 1, 0, 0, 0, 1

Visualization Connections between prob. dist’ns and lists (ii)Given a list, x 1, x 2, …, x n, can construct a dist’n: Called the “empirical prob. dist’n” or “frequency distribution”

Visualization Connections between prob. dist’ns and lists (ii)Given a list, x 1, x 2, …, x n, can construct a dist’n: Called the “empirical prob. dist’n” or “frequency distribution” Provides probability model for: choose random number from list

Visualization Note: if start with f(x),

Visualization Note: if start with f(x), and draw random sample, X 1, X 2, …, X n, (as in (i))

Visualization Note: if start with f(x), and draw random sample, X 1, X 2, …, X n, (as in (i)) (random, so use capitals)

Visualization Note: if start with f(x), and draw random sample, X 1, X 2, …, X n, And construct frequency distribution of

Visualization Note: if start with f(x), and draw random sample, X 1, X 2, …, X n, And construct frequency distribution of Then for n large,

Visualization Note: if start with f(x), and draw random sample, X 1, X 2, …, X n, And construct frequency distribution of Then for n large, (so “hat” notation is sensible)

Visualization Note: if start with f(x), and draw random sample, X 1, X 2, …, X n, And construct frequency distribution of Then for n large, –Recall “frequentist interpretation” of probability

Visualization Note: if start with f(x), and draw random sample, X 1, X 2, …, X n, And construct frequency distribution of Then for n large, –Recall “frequentist interpretation” of probability –Can make precise, using

Visualization Simple visual representation for lists: Use number line, put x’s

Visualization Simple visual representation for lists: Use number line, put x’s E.g. 2 (above) 2.3, 4.5, 4.7, 4.8, 5.1

Visualization Simple visual representation for lists: Use number line, put x’s E.g. 2 (above) 2.3, 4.5, 4.7, 4.8,

Visualization Simple visual representation for lists: Use number line, put x’s E.g. 2 (above) 2.3, 4.5, 4.7, 4.8,

Visualization Simple visual representation for lists: Use number line, put x’s E.g. 2 (above) 2.3, 4.5, 4.7, 4.8,

Visualization Simple visual representation for lists: Use number line, put x’s E.g. 2 (above) 2.3, 4.5, 4.7, 4.8,

Visualization Simple visual representation for lists: Use number line, put x’s E.g. 2 (above) 2.3, 4.5, 4.7, 4.8,

Visualization Simple visual representation for lists: Use number line, put x’s E.g. 2 (above) 2.3, 4.5, 4.7, 4.8,

Visualization Simple visual representation for lists: Use number line, put x’s E.g. 2 (above) 2.3, 4.5, 4.7, 4.8, Picture already gives better impression than list of numbers

Visualization Simple visual representation for lists: Use number line, put x’s E.g. 2 (above) 2.3, 4.5, 4.7, 4.8, Will be much better when lists become “too long to comprehend”

Visualization Drawbacks of: Number line, & x’s

Visualization Drawbacks of: Number line, & x’s When have many data points: Hard to construct Can’t see all (overplotting) Hard to interpret

Visualization Alternatives (Text, Sec. 1.1): Stem and leaf plots

Visualization Alternatives (Text, Sec. 1.1): Stem and leaf plots –Clever visualization, for only pencil & paper –But we have computers –So won’t study further

Visualization Alternatives (Text, Sec. 1.1): Stem and leaf plots Histograms –Will study carefully

Statistical Folklore Graphical Displays: Important Topic in Statistics Has large impact Need to think carefully to do this Watch for attempts to fool you

Statistical Folklore Graphical Displays: Interesting Article: “How to Display Data Badly” Howard Wainer The American Statistician, 38, Internet Available:

Statistical Folklore Main Idea: Point out 12 types of bad displays With reasons behind Here are some favorites…

Statistical Folklore Hiding the data in the scale

Statistical Folklore The eye perceives areas as “size”:

Statistical Folklore Change of Scales in Mid- Axis Really trust the Post???

Histograms Idea: show rectangles, where area represents

Histograms Idea: show rectangles, where area represents: (a)Distributions: probabilities

Histograms Idea: show rectangles, where area represents: (a)Distributions: probabilities (b)Lists (of numbers): # of observations

Histograms Idea: show rectangles, where area represents: (a)Distributions: probabilities (b)Lists (of numbers): # of observations Note: will studies these in parallel for a while (several concepts apply to both)

Histograms Idea: show rectangles, where area represents: (a)Distributions: probabilities (b)Lists (of numbers): # of observations Caution: There are variations not based on areas, see bar graphs in text

Histograms Idea: show rectangles, where area represents: (a)Distributions: probabilities (b)Lists (of numbers): # of observations Caution: There are variations not based on areas, see bar graphs in text But eye perceives area, so sensible to use it

Histograms Steps for Constructing Histograms: 1.Pick class intervals that contain full dist’n

Histograms Steps for Constructing Histograms: 1.Pick class intervals that contain full dist’n

Histograms Steps for Constructing Histograms: 1.Pick class intervals that contain full dist’n a. Prob. dist’ns: If possible values are: x = 0, 1, …, n,

Histograms Steps for Constructing Histograms: 1.Pick class intervals that contain full dist’n a. Prob. dist’ns: If possible values are: x = 0, 1, …, n, get good picture from choice: [-½, ½), [½, 1.5), [1.5, 2.5), …, [n-½, n+½)

Histograms Steps for Constructing Histograms: 1.Pick class intervals that contain full dist’n a. Prob. dist’ns: If possible values are: x = 0, 1, …, n, get good picture from choice: [-½, ½), [½, 1.5), [1.5, 2.5), …, [n-½, n+½) where [1.5, 2.5) is “all #s ≥ 1.5 and < 2.5”

Histograms Steps for Constructing Histograms: 1.Pick class intervals that contain full dist’n a. Prob. dist’ns: If possible values are: x = 0, 1, …, n, get good picture from choice: [-½, ½), [½, 1.5), [1.5, 2.5), …, [n-½, n+½) where [1.5, 2.5) is “all #s ≥ 1.5 and < 2.5” (called a “half open interval”)

Histograms Steps for Constructing Histograms: 1.Pick class intervals that contain full dist’n a. Prob. dist’ns b. Lists: e.g. 2.3, 4.5, 4.7, 4.8, 5.1 same e.g. as above

Histograms Steps for Constructing Histograms: 1.Pick class intervals that contain full dist’n a. Prob. dist’ns b. Lists: e.g. 2.3, 4.5, 4.7, 4.8, 5.1 Start with [1,3), [3,7) As above use half open intervals

Histograms Steps for Constructing Histograms: 1.Pick class intervals that contain full dist’n a. Prob. dist’ns b. Lists: e.g. 2.3, 4.5, 4.7, 4.8, 5.1 Start with [1,3), [3,7) As above use half open intervals (to break ties)

Histograms Steps for Constructing Histograms: 1.Pick class intervals that contain full dist’n a. Prob. dist’ns b. Lists: e.g. 2.3, 4.5, 4.7, 4.8, 5.1 Start with [1,3), [3,7) As above use half open intervals Note: These contain full data set

Histograms Steps for Constructing Histograms: 1.Pick class intervals that contain full dist’n a. Prob. dist’ns b. Lists: e.g. 2.3, 4.5, 4.7, 4.8, 5.1 Start with [1,3), [3,7) Can use anything for class intervals

Histograms Steps for Constructing Histograms: 1.Pick class intervals that contain full dist’n a. Prob. dist’ns b. Lists: e.g. 2.3, 4.5, 4.7, 4.8, 5.1 Start with [1,3), [3,7) Can use anything for class intervals But some choices better than others…

Histograms Steps for Constructing Histograms: 1.Pick class intervals that contain full dist’n 2.Find “probabilities” or “relative frequencies” for each class

Histograms Steps for Constructing Histograms: 1.Pick class intervals that contain full dist’n 2.Find “probabilities” or “relative frequencies” for each class (a) Probs: use f(x) for [x-½, x+½), etc.

Histograms Steps for Constructing Histograms: 1.Pick class intervals that contain full dist’n 2.Find “probabilities” or “relative frequencies” for each class (a) Probs: use f(x) for [x-½, x+½), etc. (b) Lists: [1,3): rel. freq. = 1/5 = 20% [3,7): rel. freq. = 4/5 = 80%

Histograms Steps for Constructing Histograms: 1.Pick class intervals that contain full dist’n 2.Find “probabilities” or “relative frequencies” for each class 3.Above each interval, draw rectangle where area represents class frequency

Histograms 3.Above each interval, draw rectangle where area represents class frequency

Histograms 3.Above each interval, draw rectangle where area represents class frequency (a) Probs: If width = 1, then area = width x height = height

Histograms 3.Above each interval, draw rectangle where area represents class frequency (a) Probs: If width = 1, then area = width x height = height So get area = f(x), by taking height = f(x)

Histograms 3.Above each interval, draw rectangle where area represents class frequency (a) Probs: If width = 1, then area = width x height = height So get area = f(x), by taking height = f(x) E.g. Binomial Distribution

Binomial Prob. Histograms From Class Example 5

Binomial Prob. Histograms From Class Example 5 Construct Prob. Histo: Create column of x values (do 1 st two, and drag box)

Binomial Prob. Histograms From Class Example 5 Construct Prob. Histo: Create column of x values Compute f(x) values (create 1 st one, and drag twice)

Binomial Prob. Histograms From Class Example 5 Construct Prob. Histo: Create column of x values Compute f(x) values Make bar plot

Binomial Prob. Histograms Make bar plot –“Insert” tab –Choose “Column” –Right Click – Select Data (Horizontal – x’s, “Add series”, Probs) –Resize, and move by dragging –Delete legend –Click and change title –Right Click on Bars, Format Data Series: Border Color, Solid Line, Black Series Options, Gap Width = 0

Binomial Prob. Histograms From Class Example 5 Construct Prob. Histo: Create column of x values Compute f(x) values Make bar plot Make several, for interesting comparison