StatCrunch Workshop Hector Facundo
Resources Math Lab Website http://www.cos.edu/Academics/MathEngineering/Pages/Math-Lab.aspx Small Group Math Tutoring http://www.cos.edu/Library/Services/TutorialCenter/Pages/Small-Group- Math-Tutorial-Hours.aspx
Basic Summary Stats Calculates Mean, Median, Mode, Q1, Q3, Standard Deviation, Variance, etc. Stat -> Summary Stats -> Column Calculates statistics from Column Variable Lets Play Around with the “Test Scores” Column Compute: Mean Min, Q1, Median, Q3, Max Standard Deviation and Unadjusted Standard Deviation Note: The difference between the two is Standard Deviation is for sample data and Unadjusted Standard Deviation is for Population data.
Simple Graphs Lets create a histogram of the “Test Scores” data with starting value 50 and class width of 10. Graph -> Histogram Frequency Histogram: Relative Frequency Histogram
Simple Graphs Lets do a split bar plot of the “Education” data with the salaries for men and women. Graph -> Chart -> Column
Data is Your Friend! Manipulate values, columns, rows, etc. Data -> Arrange -> Stack Allows you to stack observations from multiple columns into one column. Let’s Stack the “Height” Data for Men and Women into one column.
Data is Your Friend! Data -> Compute -> Expression Allows you to do arithmetic operations (+, -, *, /) Allows you to do operations with more than one column. Has built in functions for better “equation” building. Some built in functions: Mean -> mean() Sum -> sum() Cumulative Sum -> cumsum() “Good for cumulative frequencies” Standard Deviation -> std() Unadjusted Standard Deviation -> ustd()
Data is Your Friend! Some simple computations: Add 2 to every score in “Test Scores” Subtract the Height of Men and the Height of women (i.e. Height (Men) – Height (Women)) Subtract the mean of “Test Scores” from all the values in “Test Scores”
Graph Revisited Create a cumulative frequency bar graph for the “Frequency” column Step 1: Get cumulative frequency counts from “Frequency” column using Data -> Compute -> Expression Step 2: Graph the cumulative frequencies Graph -> Chart -> Column
Probability with Stat X (Outcome) P(x) “Probability” 0.1 1 0.15 2 0.3 0.1 1 0.15 2 0.3 3 0.25 4 0.2 Probability with Stat Discrete Random Variable Example: 𝑀𝑒𝑎𝑛= 𝜇 𝑋 = 𝑖=1 𝑛 𝑋 𝑖 ∗𝑃( 𝑋 𝑖 ) = 0∗0.1 + 1∗0.15 + 2∗0.3 + 3∗0.25 + 4∗0.2 =2.3 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒= 𝜎 𝑋 2 = 𝑖=1 𝑛 𝑋 𝑖 − 𝜇 𝑋 2 ∗𝑃( 𝑋 𝑖 ) = 0−2.3 2 ∗0.1 +…+ 4−2.3 2 ∗0.2 =1.51 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛= 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 1.51 ≈1.2288 Stat -> Calculators -> Custom
Probability with Stat Normal Distribution Stat -> Calculators -> Normal Say X come from a normal distribution with a mean of 0 and standard deviation of 1 (Standard Normal). How would we find the following probabilities: P(X ≥ 0.5) P(X ≤ -1) P(-3 ≤ X ≤ -2) Hint: Use “Between” P(X ≥ 0.5) = 0.3085 P(X ≤ -1) = 0.1587 P(-3 ≤ X ≤ -2) = 0.0214
Probability with Stat Normal Distribution Stat -> Calculators -> Normal On the contrary, what if X has a normal distribution of mean 1 and standard deviation of 0 and we wanted to find the value(s) that gave us the upper 5%? Lower 1%? Middle 90% P(X ≥ ?) = 0.05 P(X ≤ ?) = 0.01 P(? ≤ X ≤ ?) = 0.90 P(X ≥ 1.645) = 0.05 P(X ≤ -2.326) = 0.01 P(-1.645 ≤ X ≤ 1.645) = 0.90 This will be very helpful when finding “critical values”
Confidence Intervals 100(1 - α)% Confidence intervals for μ (Mean) Say we want a 95% Confidence Interval for the mean of “Test Scores” Note: I’m assuming we are using t – distribution for this problem. 𝑋 + 𝑡 ∝ 2 ,𝑛−1 𝑠 𝑛 Stat -> T-Stats -> One Sample -> With Data Lots of Work!
Confidence Intervals 100(1 - α)% Confidence intervals for μ (Mean) Say we want a 90% Confidence Interval for the mean and we are given the following data: Sample Mean = 34.5, Sample Standard Deviation = 2.3, Sample Size = 20 Note: I’m still assuming we are using t – distribution for this problem. Stat -> T-Stats -> One Sample -> With Summary
Confidence Intervals 100(1 - α)% Confidence intervals for p (Proportion) We have Political “Party” data where we have the political affiliation of 50 people (Rep, Dem, Ind). We want a 92% Confidence Interval of the true proportion of people who are republican. Note: I’m using Normal distribution for this problem. 𝑝 ± 𝑍 1− ∝ 2 𝑝 (1− 𝑝 ) 𝑛 Stat -> Proportion Stats -> One Sample -> With Data More Work!
Confidence Intervals 100(1 - α)% Confidence intervals for p (Proportion) Say we did a survey in which we sampled 2500 people if they eat tofu and 768 people respond with yes. We want a 98% Confidence Interval of the true proportion of people who eat tofu. “Successes” = 768, Observations = 2500 Stat -> Proportion Stats -> One Sample -> With Summary
Hypothesis Testing Tips 𝐻 0 :𝜇=80 𝐻 𝐴 :𝜇≠80 Hints to set up the Alternative Hypothesis: Conclusions Guide: This is referred to as the alternative hypothesis < > ≠ “Less Than” “Smaller” “Lower” “Greater Than” “More” “Higher” “Different” “Difference” “Change” “If they are the same” P-Value > ∝ P-Value < ∝ Fail to Reject Null Hypothesis Reject Null Hypothesis
Hypothesis Testing – One Sample Mean Test Say we believe that the average for all test scores in math classes that took a particular test is 80. However others believe it is not 80. We set up a hypothesis test to test this claim at the α = 0.05 level using the “Test Scores” column as our random sample. Note: I’m Assuming a t-distribution for this problem 𝐻 0 :𝜇=80 𝐻 𝐴 :𝜇≠80 Stat -> T-Stats -> One Sample -> With Data
Hypothesis Testing – One Sample Proportion Test A previous study suggested 48% of all voters in the untied States identify as Republican. However, researchers believe the true value to be higher. They take a sample of people’s voting preference where the Column “Party” is the data. We set up a hypothesis test to test this claim at the α = 0.10. 𝐻 0 :𝑝=0.48 𝐻 𝐴 :𝑝>0.48 Stat -> Proportion Stats -> One Sample -> With Data
Hypothesis Testing – Paired Difference Test We want to know if a structured tutoring session will increase test scores for students. We give them a test before the tutoring session and then we test the students after the tutoring session with a similar but slightly different test. We set up a hypothesis test to test this claim at the α = 0.05. 𝑑 𝑖 = 𝐴𝑓𝑡𝑒𝑟 𝑖 − 𝐵𝑒𝑓𝑜𝑟𝑒 𝑖 𝐻 0 : 𝜇 𝑑 =0 𝐻 𝐴 : 𝜇 𝑑 >0 Stat -> T-Stats -> Paired
Hypothesis Testing – Two Sample Mean Test A study was conducted to see the average body temperature for males and females. For 2355 males, the average body temperature was 98.105 degrees F with a standard deviation of 0.699 F. For 1985 females, the average body temperature 98.342 degrees F with a standard deviation of 0.743 F. We set up a hypothesis test to test the claim that males have lower body temperatures than females at the α = 0.01. 𝐻 0 : 𝜇 𝑀𝑎𝑙𝑒𝑠 − 𝜇 𝐹𝑒𝑚𝑎𝑙𝑒𝑠 =0 𝐻 𝐴 : 𝜇 𝑀𝑎𝑙𝑒𝑠 − 𝜇 𝐹𝑒𝑚𝑎𝑙𝑒𝑠 <0 Stat -> T-Stats -> 2 Sample -> With Summary
Hypothesis Testing – Two Sample Proportion Test Time magazine reported the result of a telephone poll of 800 adult Americans. The question posed of the Americans who were surveyed was: "Should the federal tax on cigarettes be raised to pay for health care reform?" The results of the survey were: Is there sufficient evidence at the α = 0.05 level, say, to conclude that the two populations — smokers and non-smokers — differ significantly with respect to their opinions? 𝐻 0 : 𝑝 𝑀𝑎𝑙𝑒𝑠 − 𝑝 𝐹𝑒𝑚𝑎𝑙𝑒𝑠 =0 𝐻 𝐴 : 𝑝 𝑀𝑎𝑙𝑒𝑠 − 𝑝 𝐹𝑒𝑚𝑎𝑙𝑒𝑠 ≠0 Stat -> Proportion Stats -> 2 Sample -> With Summary
Thank You For Coming! If you have any suggestions on how we can improve the workshop, send an email to hector@cos.edu Don’t forget, you can get extra math help in the Math Lab in the Learning Resource Center.