Found StatCrunch Resources

Slides:



Advertisements
Similar presentations
7.1 Seeking Correlation LEARNING GOAL
Advertisements

Section 5.3 ~ The Central Limit Theorem Introduction to Probability and Statistics Ms. Young ~ room 113.
Sampling Distributions
Copyright © 2015, 2011, 2008 Pearson Education, Inc. Chapter 5, Unit E, Slide 1 Statistical Reasoning 5.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-1 Statistics for Business and Economics 7 th Edition Chapter 2 Describing Data:
The Simple Regression Model
Sampling Distributions
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
Correlation and Linear Regression
Lecture 16 Correlation and Coefficient of Correlation
This Week: Testing relationships between two metric variables: Correlation Testing relationships between two nominal variables: Chi-Squared.
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
Linear Regression and Correlation
Correlation and Linear Regression
Significance Tests …and their significance. Significance Tests Remember how a sampling distribution of means is created? Take a sample of size 500 from.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Relationships Scatterplots and correlation BPS chapter 4 © 2006 W.H. Freeman and Company.
Section 7.3 ~ Best-Fit Lines and Prediction Introduction to Probability and Statistics Ms. Young.
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
LECTURE UNIT 7 Understanding Relationships Among Variables Scatterplots and correlation Fitting a straight line to bivariate data.
© 2008 Pearson Addison-Wesley. All rights reserved Chapter 1 Section 13-6 Regression and Correlation.
From Theory to Practice: Inference about a Population Mean, Two Sample T Tests, Inference about a Population Proportion Chapters etc.
Elementary Review over GRAPHS!!! Seriously…students seem to forget this stuff. Outcome 5, Component 2.
Correlation & Regression
Examining Relationships in Quantitative Research
Statistical Reasoning for everyday life Intro to Probability and Statistics Mr. Spering – Room 113.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
By: Amani Albraikan.  Pearson r  Spearman rho  Linearity  Range restrictions  Outliers  Beware of spurious correlations….take care in interpretation.
Section 7.4 ~ The Search for Causality Introduction to Probability and Statistics Ms. Young.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Check roster below the chat area for your name to be sure you get credit! Audio will start at class time. Previously requested topics will be gone over.
Chapter 7 Sampling Distributions Statistics for Business (Env) 1.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
© 2008 Pearson Addison-Wesley. All rights reserved Chapter 5 Statistical Reasoning.
Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter.
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Copyright © 2009 Pearson Education, Inc. 8.1 Sampling Distributions LEARNING GOAL Understand the fundamental ideas of sampling distributions and how the.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
Lecture 7: Bivariate Statistics. 2 Properties of Standard Deviation Variance is just the square of the S.D. If a constant is added to all scores, it has.
Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Page 286 Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or.
Copyright © 2009 Pearson Education, Inc. 9.2 Hypothesis Tests for Population Means LEARNING GOAL Understand and interpret one- and two-tailed hypothesis.
Copyright © Cengage Learning. All rights reserved. 8 9 Correlation and Regression.
MM150 ~ Unit 9 Statistics ~ Part II. WHAT YOU WILL LEARN Mode, median, mean, and midrange Percentiles and quartiles Range and standard deviation z-scores.
Unit 5E Correlation and Causality. CORRELATION Heights and weights Study Time and Test Score Available Gasoline and Price of Gasoline A correlation exists.
MM207 Statistics Welcome to the Unit 9 Seminar With Ms. Hannahs Final Project is due Tuesday, August 7 at 11:59 pm ET. No late projects will be accepted.
Some Reminders: Check the Roster below the chat area to make sure you are listed, especially if it says you left! Audio starts on the hour. Active on-topic.
Welcome to the Unit 5 Seminar Kristin Webster
9.3 Hypothesis Tests for Population Proportions
GS/PPAL Section N Research Methods and Information Systems
Statistical analysis.
Regression and Correlation
Lecture Slides Elementary Statistics Twelfth Edition
10.2 Regression If the value of the correlation coefficient is significant, the next step is to determine the equation of the regression line which is.
Statistical analysis.
5.3 The Central Limit Theorem
7.2 Interpreting Correlations
7.3 Best-Fit Lines and Prediction
Elementary Statistics
Lecture Slides Elementary Statistics Thirteenth Edition
CHAPTER 26: Inference for Regression
7.3 Best-Fit Lines and Prediction
Sampling Distributions
5.3 The Central Limit Theorem
Correlation and Causality
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Found StatCrunch Resources Use StatCrunch to find correlation between two variables http://screencast.com/t/rAbGVY5We8 Find a Confidence Interval for a population mean using StatCrunch  http://www.youtube.com/watch?v=G5nw2B9g19c StatCrunch Cheat Sheet http://www3.jjc.edu/staff/msullivan/Stats/Technology%20Step%20by%20Step%20StatCrunch.pdf You might want to have StatCrunch open

Final Project Notes Hit the Next button to see the graph. 1. Using the MM207 Student Data Set: What is the correlation between student cumulative GPA and the number of hours spent on school work each week? Be sure to include the computations or StatCrunch output to support your answer. Stat->summary stats->correlation What would be the predicted GPA for a student who spends 16 hours per week on school work? Be sure to include the computations or StatCrunch output to support your prediction. Stat->regression->simple linear Choose dependent (y) and independent (x) variable Each of the “next” screens have useful options “ Confidence Intervals” , “Predict Y for x=“, “plot fitted Line” Hit the Next button to see the graph. Highlight the tables with the mouse, press ctrl-c to copy to the clipboard, Ctrl-v to paste in your document

Final Project Notes 2. bar graphs are great, Graphics->bar plot->with data 3. Jonathan is a 42 year old male student and Mary is a 37 year old female student thinking about taking this class. Based on their relative position, which student would be farther away from the average age of their gender group based on this sample of MM207 students? compute the z- values 4. If you were to randomly select a student from the set of students who have completed the survey, what is the probability that you would select a male? Explain your answer.

Final Project Notes 5. Using the sample of MM207 students: What is the probability of randomly selecting a person who is conservative and then selecting from that group someone who is a nursing major? What is the probability of randomly selecting a liberal or a male? Stat->tables->Contingency-> with data. Choose q9 and q13 Business IT Legal Studies Nursing Other Psychology Total Conservative 4 1 6 17 22 56 Liberal 2 3 12 8 26 53 Moderate 13 32 49 119 19 10 61 31 97 228

Final Project Notes What is the probability of randomly selecting a liberal or a male? Female Male Total Conservative 41 15 56 Liberal 44 9 53 Moderate 101 17 118 186 227

Final Project Notes 6. compute the z-score using the standard deviation from the CLT, n=25 7. Select a random sample of 30 student responses to question 6, "How many credit hours are you taking this term?" Using the information from this sample, and assuming that our data set is a random sample of all Kaplan statistics students, estimate the average number of credit hours that all Kaplan statistics students are taking this term using a 95% level of confidence. Be sure to show the data from your sample and the data to support your estimate. To get the sample of 30 students hit Data->Sample columns, choose q6, enter sample size of 30 and hit “Sample Columns”. A popup comes up somewhere to tell you that a new column has been added to the data, Sample(q6) Compute 95% CI from this column. Stat->z-statistics-> one sample->with data. Choose the new Sample(q6) column, hit next and select Confidence Interval, calculate

Final Project Notes 8. Assume that the MM207 Student Data Set is a random sample of all Kaplan students; estimate the proportion of all Kaplan students who are female using a 90% level of confidence. First get the number of females in the sample: Stat->table->frequency, choose Gender. Proportions-> one sample-> with summary. Successes=192, observations=233. Next, choose Confidence Intervals, calculate Gender Frequency Relative Frequency Female 192 0.82403433 Male 41 0.17596567 Proportion Count Total Sample Prop. Std. Err. L. Limit U. Limit p 192 233 0.82403433 0.024946446 0.7751402 0.87292844

Final Project Notes 9. Assume you want to estimate with the proportion of students who commute less than 5 miles to work within 2%, what sample size would you need?  N=1/E^2 10. A professor at Kaplan University claims that the average age of all Kaplan students is 36 years old. Use a 95% confidence interval to test the professor's claim. Is the professor's claim reasonable or not? Explain. Stat->z-stat-one sample->with data. Choose q2, next, select confidence intervals, calculate Variable n Sample Mean Std. Err. L. Limit U. Limit Q2 How old are you? 237 37.291138 0.67025393 35.977467 38.604813

z-Scores Page 209 z-scores determine how far, in terms of standard deviations, a given score is from the mean of the distribution.

Figure 5.22 Standard scores for IQ scores of 85, 100, and 125. Figure 5.22 shows the values on the distribution of IQ scores from Example 6. Figure 5.22 Standard scores for IQ scores of 85, 100, and 125. Copyright © 2009 Pearson Education, Inc. Slide 5.2- 10

Graph from: http://www.comfsm.fm/~dleeling/statistics/fx_2001_02.html

Percentiles Are normally used with lots of data. We divide the number of data values by 100, and that will tell us how many data values are in each percent. The following example has the grocery bills for 300 families for a week. There will be 3 data values to each percent, or 30 values for each 10 %.

591 215 150 342 265 426 414 33 507 269 116 205 153 199 418 177 106 318 473 52 461 328 172 82 451 384 480 68 580 191 98 477 468 471 398 124 222 551 315 134 249 599 272 210 485 183 535 43 55 274 94 331 536 317 446 152 65 358 254 196 209 213 447 431 593 162 220 239 129 259 102 92 491 469 35 487 273 216 214 428 282 226 149 271 330 452 574 538 420 488 170 263 218 256 475 372 110 550 425 59 194 138 518 402 594 184 305 309 146 112 416 390 45 262 520 306 597 407 558 348 234 276 261 438 246 118 481 130 391 441 399 164 486 257 144 238 408 83 157 204 86 352 498 351 203 182 242 587 566 125 241 369 444 405 319 523 255 542 429 227 563 419 180 506 341 314 289 512 243 202 58 244 57 335 533 32 122 401 108 350 212 113 596 392 73 48 525 513 465 44 549 534 543 137 176 490 250 295 267 522 41 581 302 42 132 275 363 365 181 360 232 285 433 380 270 69 511 99 49 516 185 344 136 288 389 85 500 77 338 400 501 307 72 556 569

32 65 99 130 150 184 215 241 261 273 309 341 372 399 419 447 486 507 536 569 33 102 152 185 216 262 274 314 342 400 451 487 511 538 574 35 68 106 132 153 191 242 263 275 315 401 420 452 488 512 542 41 134 157 194 218 243 276 317 344 425 461 513 543 580 42 69 108 136 162 196 220 244 265 282 348 380 402 426 465 516 581 43 72 110 137 164 199 246 285 350 384 405 468 490 518 549 587 44 73 112 138 170 202 222 249 267 288 318 389 407 428 469 491 520 45 77 113 144 172 203 226 250 269 289 319 351 390 429 471 498 522 550 591 48 82 116 146 176 204 227 254 295 328 352 391 431 473 500 551 49 83 118 149 177 205 255 270 302 408 433 475 501 523 593 52 85 122 180 209 232 256 271 305 330 358 414 438 477 506 525 556 55 86 124 181 210 234 257 306 331 360 441 480 533 558 594 57 92 125 182 212 238 272 307 363 392 416 444 481 534 596 58 94 129 183 213 259 335 365 398 418 446 535 563 597 59 98 214 239 338 369 485 566 599

The Central Limit Theorem Page 217 Suppose we take many random samples of size n for a variable with any distribution (not necessarily a normal distribution) and record the distribution of the means of each sample. Then, The distribution of means will be approximately a normal distribution for large sample sizes. The mean of the distribution of means approaches the population mean, µ, for large sample sizes. The standard deviation of the distribution of means approaches σ/√n for large sample sizes, where σ is the standard deviation of the population.

Figure 5.26 As the sample size increases (n = 5, 10, 30), the distribution of sample means approaches a normal distribution, regardless of the shape of the original distribution. The larger the sample size, the smaller is the standard deviation of the distribution of sample means.

EXAMPLE 1 Predicting Test Scores You are a middle school principal and your 100 eighth-graders are about to take a national standardized test. The test is designed so that the mean score is m = 400 with a standard deviation of s = 70. Assume the scores are normally distributed. a. What is the likelihood that one of your eighth-graders, selected at random, will score below 375 on the exam? Solution: In dealing with an individual score, we use the method of standard scores discussed in Section 5.2. Given the mean of 400 and standard deviation of 70, a score of 375 has a standard score of z = = = -0.36 data value – mean standard deviation 375 – 400 70

EXAMPLE 1 Predicting Test Scores Solution: (cont.) According to Table 5.1, a standard score of -0.36 corresponds to about the 36th percentile— that is, 36% of all students can be expected to score below 375. Thus, there is about a 0.36 chance that a randomly selected student will score below 375. Notice that we need to know that the scores have a normal distribution in order to make this calculation, because the table of standard scores applies only to normal distributions.

EXAMPLE 1 Predicting Test Scores You are a middle school principal and your 100 eighth-graders are about to take a national standardized test. The test is designed so that the mean score is m = 400 with a standard deviation of s = 70. Assume the scores are normally distributed. b. Your performance as a principal depends on how well your entire group of eighth-graders scores on the exam. What is the likelihood that your group of 100 eighth-graders will have a mean score below 375? Solution: b. The question about the mean of a group of students must be handled with the Central Limit Theorem. According to this theorem, if we take random samples of size n = 100 students and compute the mean test score of each group, the distribution of means is approximately normal.

EXAMPLE 1 Predicting Test Scores Solution: (cont.) Moreover, the mean of this distribution is m = 400 and its standard deviation is = 70/ 100 = 7. With these values for the mean and standard deviation, the standard score for a mean test score of 375 is data value – mean standard deviation 375 – 400 7 z = = = -03.57 Table 5.1 shows that a standard score of -3.5 corresponds to the 0.02th percentile, and the standard score in this case is even lower. In other words, fewer than 0.02% of all random samples of 100 students will have a mean score of less than 375.

EXAMPLE 1 Predicting Test Scores Solution: (cont.) Therefore, the chance that a randomly selected group of 100 students will have a mean score below 375 is less than 0.0002, or about 1 in 5,000. Notice that this calculation regarding the group mean did not depend on the individual scores’ having a normal distribution. This example has an important lesson. The likelihood of an individual scoring below 375 is more than 1 in 3 (36%), but the likelihood of a group of 100 students having a mean score below 375 is less than 1 in 5,000 (0.02%). In other words, there is much more variation in the scores of individuals than in the means of groups of individuals.

Types of Correlation Page 289 Figure 7.3 Types of correlation seen on scatter diagrams.

Linear Correlation Coefficient Page 294

The line of best fit (regression line or the least squares line) is the line that best fits the data, i.e. it is closer to the data than any other line.

Regression  

Data Set 1, WT is y and HT is x  

Cautions in Making Predictions from Best-Fit Lines Don’t expect a best-fit line to give a good prediction unless the correlation is strong and there are many data points. If the sample points lie very close to the best-fit line, the correlation is very strong and the prediction is more likely to be accurate. If the sample points lie away from the best-fit line by substantial amounts, the correlation is weak and predictions tend to be much less accurate. Don’t use a best-fit line to make predictions beyond the bounds of the data points to which the line was fit. A best-fit line based on past data is not necessarily valid now and might not result in valid predictions of the future. Don’t make predictions about a population that is different from the population from which the sample data were drawn. Remember that a best-fit line is meaningless when there is no significant correlation or when the relationship is nonlinear.

EXAMPLE 1 Valid Predictions? State whether the prediction (or implied prediction) should be trusted in each of the following cases, and explain why or why not. You’ve found a best-fit line for a correlation between the number of hours per day that people exercise and the number of calories they consume each day. You’ve used this correlation to predict that a person who exercises 18 hours per day would consume 15,000 calories per day. Solution: No one exercises 18 hours per day on an ongoing basis, so this much exercise must be beyond the bounds of any data collected. Therefore, a prediction about someone who exercises 18 hours per day should not be trusted.

EXAMPLE 1 Valid Predictions? State whether the prediction (or implied prediction) should be trusted in each of the following cases, and explain why or why not. Historical data have shown a strong negative correlation between national birth rates and affluence. That is, countries with greater affluence tend to have lower birth rates. These data predict a high birth rate in Russia. Solution: We cannot automatically assume that the historical data still apply today. In fact, Russia currently has a very low birth rate, despite also having a low level of affluence.

EXAMPLE 1 Valid Predictions? State whether the prediction (or implied prediction) should be trusted in each of the following cases, and explain why or why not. A study in China has discovered correlations that are useful in designing museum exhibits that Chinese children enjoy. A curator suggests using this information to design a new museum exhibit for Atlanta-area school children. Solution: The suggestion to use information from the Chinese study for an Atlanta exhibit assumes that predictions made from correlations in China also apply to Atlanta. However, given the cultural differences between China and Atlanta, the curator’s suggestion should not be considered without more information to back it up.

EXAMPLE 1 Valid Predictions? State whether the prediction (or implied prediction) should be trusted in each of the following cases, and explain why or why not. Scientific studies have shown a very strong correlation between children’s ingesting of lead and mental retardation. Based on this correlation, paints containing lead were banned. Solution: Given the strength of the correlation and the severity of the consequences, this prediction and the ban that followed seem quite reasonable. In fact, later studies established lead as an actual cause of mental retardation, making the rationale behind the ban even stronger.

EXAMPLE 1 Valid Predictions? State whether the prediction (or implied prediction) should be trusted in each of the following cases, and explain why or why not. Based on a large data set, you’ve made a scatter diagram for salsa consumption (per person) versus years of education. The diagram shows no significant correlation, but you’ve drawn a best-fit line anyway. The line predicts that someone who consumes a pint of salsa per week has at least 13 years of education. Solution: Because there is no significant correlation, the best-fit line and any predictions made from it are meaningless.

The square of the correlation coefficient, or r2, is the proportion of the variation in a variable that is accounted for by the best-fit line. The use of multiple regression allows the calculation of a best-fit equation that represents the best fit between one variable (such as price) and a combination of two or more other variables (such as weight and color). The coefficient of determination, R2, tells us the proportion of the scatter in the data accounted for by the best-fit equation.

EXAMPLE 4 Voter Turnout and Unemployment Political scientists are interested in knowing what factors affect voter turnout in elections. One such factor is the unemployment rate. Data collected in presidential election years since 1964 show a very weak negative correlation between voter turnout and the unemployment rate, with a correlation coefficient of about r = -0.1. Based on this correlation, should we use the unemployment rate to predict voter turnout in the next presidential election? Note that there is a scatter diagram of the voter turnout data on page 312. Solution: The square of the correlation coefficient is r2 = (-0.1)2 = 0.01, which means that only about 1% of the variation in the data is accounted for by the best-fit line. Nearly all of the variation in the data must therefore be explained by other factors. We conclude that unemployment is not a reliable predictor of voter turnout.

The Search for Causality A correlation may suggest causality, but by itself a correlation never establishes causality. Much more evidence is required to establish that one factor causes another. a correlation between two variables may be the result of either (1) coincidence, (2) a common underlying cause, or (3) one variable actually having a direct influence on the other. The process of establishing causality is essentially a process of ruling out the first two explanations.

Determining Causality We can rule out coincidence by repeating the experiment many times or using a large number of subjects in the experiment. Because coincidences occur randomly, they should not occur consistently in many subjects or experiments. If the controls rule out confounding variables, any remaining effects must be caused by the variables being studied.

Guidelines for Establishing Causality If you suspect that a particular variable (the suspected cause) is causing some effect: Look for situations in which the effect is correlated with the suspected cause even while other factors vary. Among groups that differ only in the presence or absence of the suspected cause, check that the effect is similarly present or absent. Look for evidence that larger amounts of the suspected cause produce larger amounts of the effect. If the effect might be produced by other potential causes (besides your suspected cause), make sure that the effect still remains after accounting for these other potential causes. If possible, test the suspected cause with an experiment. If the experiment cannot be performed with humans for ethical reasons, consider doing the experiment with animals, cell cultures, or computer models . Try to determine the physical mechanism by which the suspected cause produces the effect.