Practical Statistics Abbreviated Summary
There are six statistics that will answer 90% of all questions! Descriptive Chi-square Z-tests Comparison of Means Correlation Regression
Z-test are for proportions. This test is so easy…. That it is not even given in some computer programs like SPSS…..
Z-test are for proportions. What is the probability that out of 250 customers, 220 would like the service when the usual percent is that 70% (175 out of 250) are satisfied?
Z-test are for proportions. What is the probability that out of a random sample of male and female customers, the percent of both men and women who like a new product is the same?
Z-test are for proportions. They come in two types: A sample proportion against a hypothesis.
Z-test are for proportions. They come in two types: A sample proportion against a hypothesis. Two samples compared to each other.
Z-test are for proportions. The standard error (sampling error) for proportions is: Where p = freq/total and q = 1 - p
Z-test are for proportions. Hence: Where p is the hypothesized value, and pt is the proportion found in a sample of size n.
Z-test are for proportions. Suppose that XYZ Company believed that 20% of their customers bought 80% of their product (“heavy half”). A sample of 200 customers found that 25% bought 80% of the product. Was the company correct in their estimate?
Z-test are for proportions. The test statistic looks like this:
Z-test are for proportions. Since the test was “two-tailed,” the critical value of Z would be 1.96. Therefore, we would conclude that there is not enough evidence to over-ride the assumption that 20% of the customers bought 80% of the product.
Z-test are for proportions. https://www.medcalc.org/calc/test_one_proportion.php http://www.danielsoper.com/statcalc/calculator.aspx?id=2 P = 0.077
Z-test are for proportions. They come in two types: A sample proportion against a hypothesis. Two samples compared to each other.
Z-test are for proportions. The test for this case looks like this:
Z-test are for proportions. Usually, the test assumes that the two groups Are equal, or:
There is a problem here. What is the value of p: ?
p is the value of the population proportion, but we usually don’t know that, so p is estimated by the weighted average of the two groups….
Suppose that a new product was test marketed in the United States and in Japan. The company hypothesizes that both countries response to the product will be the same. 80% of a sample of 500 said they would buy the product again in the U.S., while 75% of a sample of 200 in Japan said they would buy the product again.
Test the hypothesize…..
Since p = 0.80 in the U.S., and 0.75 in Japan, the weighted average is used for p. So: p = ((.8 x 500)+(.75 x 200))/700 = 0.786
The test would be: Z = .05/.0343 = 1.45 The critical value is 1.96; p = 0.147. The null hypothesis cannot be rejected, the U.S. and Japanese customers are assumed to be the same.
There are six statistics that will answer 90% of all questions! Descriptive Chi-square Z-tests Comparison of Means Correlation Regression
interval and ratio scales t-test and ANOVA are for the means of interval and ratio scales They are very common statistics….
T-test Why is it called a t-test?
William S. Gosset 1876-1937 Published under the name: Student
t-test come in three types: A sample mean against a hypothesis.
t-test come in three types: A sample mean against a hypothesis. Two sample means compared to each other.
t-test come in three types: A sample mean against a hypothesis. Two sample means compared to each other. Two means within the same sample.
t-test The standard error for means is:
Each t value comes with a certain degree t-test Hence for one mean compared to a hypothesis: Each t value comes with a certain degree of freedom df = n - 1
IQ has a mean of 100 and a standard deviation of t-test IQ has a mean of 100 and a standard deviation of 15. Suppose a group of immigrants came into Iowa. A sample of 400 of these immigrants found an average IQ of 98. Does this group have an IQ below the population average?
The test statistic looks like this: t-test The test statistic looks like this: There are n – 1 = 399 degrees of freedom. The results are printed out by a computer or looked up on a t-test table.
Of course, we could look this up on the internet…. http://www.danielsoper.com/statcalc/calculator.aspx?id=8 For the IQ test: t(399) = 2.67, p = 0.00395
Therefore, t(399) = 2.67 would indicate t-test Since the test was “one-tailed,” the critical value of t would be 1.65. Therefore, t(399) = 2.67 would indicate that the immigrants IQ is below normal.
t-test come in three types: A sample mean against a hypothesis. Two sample means compared to each other. Two means within the same sample.
t-test The standard error of the difference between two means looks like this:
Therefore the test statistic would look like this: t-test Therefore the test statistic would look like this: With degrees of freedom = n(1) + n(2) - 2
t-test Usually this is simplified by looking at the difference between two samples; so that:
Where:
Suppose that a new product was test marketed in the United States and in Japan. The company hypothesizes that customers in both countries would consume the product at the same rate. A sample of 500 in the U.S. used an average of 200 kilograms a year (SD = 20), while a sample of 400 in Japan used an average of 180 kilograms a year (SD = 25). Test the hypothesize…..
The test would start be computing: = 500 (a SD = 22.36)
The results are written as: (t(898) = 13.33, p < .0001), and the conclusion is that there is a large difference in the consumption rate between the U.S. and Japanese customers. 44
t-test come in three types: A sample mean against a hypothesis. Two sample means compared to each other. Two means within the same sample.
t-test come in three types: 3. Two means within the same sample. This t-test is used with correlated samples and/or when the same person or object is measured twice in the same sample.
Student T1 T2 d Tom 89 90 1 Jan 88 91 3 Jason 87 86 -1 Halley 90 90 0 Bill 75 79 4 The measurement of interest is d.
That is… the average difference between test 1 and test 2 is zero. H0 : Average of d = 0 That is… the average difference between test 1 and test 2 is zero.
t-test The sampling error for this t-test is: Were d = score(2) – score(1)
t-test The t-test is: The degrees of freedom = n - 1
Suppose there are more than two groups that need to be compared. The t-test cannot be utilized for two reason. The number of pairs becomes large. The probability of t is no longer accurate.
Analysis of Variance (ANOVA) Hence a new statistic is needed: The F-test Or Analysis of Variance (ANOVA) R.A. Fisher 1880-1962
Compares the means of two or more groups The F-test Compares the means of two or more groups by comparing the variance between groups with the variance that exists within groups. According to the Central Limit Theorem there is a relationship between the variance of a statistic and the variance of the population. If that relationship is violated, it is likely that the statistics did not come from the same population as the other statistics.
https://en.wikipedia.org/wiki/Analysis_of_variance https://www.youtube.com/watch?v=0Vj2V2qRU10
F is the ratio of variance:
The F-test http://www.statsoft.com/textbook/distribution-tables/
The F-test Typical output looks like this:
In SPSS ANOVA looks like this:
There are six statistics that will answer 90% of all questions! Descriptive Chi-square Z-tests t-tests Correlation Regression
Correlation tests the degree of association between interval and ratio measures.
Karl Pearson Darwin Galton
Correlation is based on a very simple idea that Karl Pearson saw….
If you take two measures of the same person or object, multiple them, and then add the products across persons or objects… such as: Person M1 M2 Product 1 5 5 25 2 4 4 16 3 3 3 9 4 2 2 4 5 1 1 1 Sum = 55
This is called: the sum of the cross products. Person M1 M2 Product 1 5 5 25 2 4 4 16 3 3 3 9 4 2 2 4 5 1 1 1 Sum = 55
The largest possible sum will occur if M1 and M2 are in perfect ordinal order. Note what happens when only one measure changes. Person M1 M2 Product 1 5 4 20 2 4 5 20 3 3 3 9 4 2 2 4 5 1 1 1 Sum = 54
The smallest possible sum will occur if M1 and M2 are in perfect inverse order. Person M1 M2 Product 1 5 1 5 2 4 2 8 3 3 3 9 4 2 4 8 5 1 5 5 Sum = 35
The “normal score” or “standardized score” is equal to:
Converting the measures to Z scores… Person M1 M2 Product 1 -1.41 -1.41 2.0 2 -0.71 -0.71 0.5 3 0 0 0 4 0.71 1.71 0.5 5 1.41 1.41 2.0 Sum = 5.0
Note, that the sum is equal to the number of people in the sample, i.e., 5. Person M1 M2 Product 1 -1.41 -1.41 2.0 2 -0.71 -0.71 0.5 3 0 0 0 4 0.71 1.71 0.5 5 1.41 1.41 2.0 Sum = 5.0
Note now what happens when the measure in Z scores are arranging for perfect inverse order: Person M1 M2 Product 1 -1.41 1.41 -2.0 2 -0.71 0.71 -0.5 3 0 0 0 4 0.71 -1.71 -0.5 5 1.41 -1.41 -2.0 Sum = -5.0
This is called “the sum of the cross products”
When X and Y are in ranked order the So: When X and Y are in ranked order the max will be equal to n, and when ranked in perfect negative order, the max will be -n. (Or, n-1 and –(n-1) if taken from a sample).
The average of the sum of cross products is the correlation.
r = 1.0 or r = -1.0 This means that a perfect association always has a value of 1.0 when in positive order and a value of -1.0 when in negative order. A value of zero would indicate a random relationship between the two variables. r = 1.0 or r = -1.0
The correlation can be graphically shown by using a scatter plot:
The correlation is related to the shape of the scatter plot: http://en.wikipedia.org/wiki/Scatter_plot http://www.itl.nist.gov/div898/handbook/eda/section3/eda33q.htm
The correlation is an INDEX of association. It contains three pieces of information:
The correlation is an INDEX of association. It contains three pieces of information: How much association is present, (an index)
The correlation is an INDEX of association. It contains three pieces of information: How much association is present, Is that a “significant” association, That is, can we reject the H0 that the true association is zero.
The correlation is an INDEX of association. It contains three pieces of information: How much association is present, Is that a “significant” association, And, what is the magnitude of that association.
is the amount of variation accounted The correlation is an INDEX of association. It contains three pieces of information: The correlation is r….. and r-squared is the amount of variation accounted for in Y by knowing X.
Be careful!! The correlation does not tell you that X is the cause of Y. It is a necessary condition for cause, but it does not prove cause…
cum hoc ergo propter hoc That correlation proves causation, is a logical fallacy by which two events that occur together are claimed to have a cause-and-effect relationship. The fallacy is also known as cum hoc ergo propter hoc (Latin for "with this, therefore because of this") and false cause.
The number of people waiting for a bus or train is highly correlated with how long a person must wait for a ride.
Trains come sooner when more people are waiting for it! Does this mean that It will come sooner if you bring your friends?
Hence, atmospheric CO2 causes crime. Since the 1950s, both the atmospheric CO2 level and crime levels have increased sharply. Hence, atmospheric CO2 causes crime.
There are six statistics that will answer 90% of all questions! Descriptive Chi-square Z-tests Comparison of Means Correlation Regression
Regression tests the degree of association between interval and ratio measures, AND gives the best fit to the data.
Regression Does three things: 1. Association 2. Best fit 3. Prediction
Regression Regression creates an equation: A simple linear equation would be: Y = bX + a
Can we use the correlations to create equations to estimate one variable from another?
For example: Evaluations = b*Personality + a Y = bX + a
So… Evaluation = 0.637 * Personality - 0.530
The equations do not have to be linear?
Regression can use more than one variable to predict. This is called multiple regression.
When all these variables are put together, as they are in the real world, only the instructors gender, what section a student took, the students’ GPA, and the evaluation the students gave to the instructor were related to the final grade.
Path Diagram
Question: What predicts the evaluation a class and instructor will get?
The final evaluation of the class and instructor is related to (in order): Expected grade in Week 16 Actual grade in Week 16 Final grade for the class Note: If all these grades were the same thing, only one would be related.
Why? If we add variables one at a time, sometimes the answer is different. Notice below how the deserved grade at Week 16 becomes important. Why?
This diagram shows that it is the expected grade at Week 16 that is predicting the evaluation of the Class and instructor.
This diagram shows that it is the expected grade at Week 16 that is predicting the evaluation of the Class and instructor. The other grades are being used by students to estimated the expected grades.