business analytics II ▌assignment one - solutions autoparts 

Slides:

Advertisements

Similar presentations

Chapter 10 Estimation and Hypothesis Testing II: Independent and Paired Sample T-Test.

Advertisements

Section 9.3 Inferences About Two Means (Independent)

Sociology 601 Class 8: September 24, : Small-sample inference for a proportion 7.1: Large sample comparisons for two independent sample means.

Confidence Interval and Hypothesis Testing for:

Comparing Two Population Means The Two-Sample T-Test and T-Interval.

Chapter 9: Inferences for Two –Samples

Lab 4: What is a t-test? Something British mothers use to see if the new girlfriend is significantly better than the old one?

BCOR 1020 Business Statistics

Chapter Goals After completing this chapter, you should be able to:

Chapter 9 Hypothesis Testing.

5-3 Inference on the Means of Two Populations, Variances Unknown

Two Sample Tests Ho Ho Ha Ha TEST FOR EQUAL VARIANCES

Overview Definition Hypothesis

Statistics for Managers Using Microsoft® Excel 7th Edition

1/2555 สมศักดิ์ ศิวดำรงพงศ์

T-test Mechanics. Z-score If we know the population mean and standard deviation, for any value of X we can compute a z-score Z-score tells us how far.

Copyright © Cengage Learning. All rights reserved. 10 Inferences Involving Two Populations.

McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.

Chapter 9 Hypothesis Testing and Estimation for Two Population Parameters.

Introduction to Hypothesis Testing: One Population Value Chapter 8 Handout.

1 Objective Compare of two matched-paired means using two samples from each population. Hypothesis Tests and Confidence Intervals of two dependent means.

6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.

Copyright © Cengage Learning. All rights reserved. 10 Inferences Involving Two Populations.

A Course In Business Statistics 4th © 2006 Prentice-Hall, Inc. Chap 9-1 A Course In Business Statistics 4 th Edition Chapter 9 Estimation and Hypothesis.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Section Inference about Two Means: Independent Samples 11.3.

1 Chapter 9 Inferences from Two Samples 9.2 Inferences About Two Proportions 9.3 Inferences About Two Means (Independent) 9.4 Inferences About Two Means.

Lecture 9 Chap 9-1 Chapter 2b Fundamentals of Hypothesis Testing: One-Sample Tests.

Week111 The t distribution Suppose that a SRS of size n is drawn from a N(μ, σ) population. Then the one sample t statistic has a t distribution with n.

Testing Differences between Means, continued Statistics for Political Science Levin and Fox Chapter Seven.

Chapter Outline Goodness of Fit test Test of Independence.

© Copyright McGraw-Hill 2004

Inferences Concerning Variances

1 Objective Compare of two matched-paired means using two samples from each population. Hypothesis Tests and Confidence Intervals of two dependent means.

Copyright ©2006 Brooks/Cole A division of Thomson Learning, Inc. Introduction to Probability and Statistics Twelfth Edition Robert J. Beaver Barbara M.

1 Pertemuan 09 & 10 Pengujian Hipotesis Mata kuliah : A Statistik Ekonomi Tahun: 2010.

Lecture 8 Estimation and Hypothesis Testing for Two Population Parameters.

Chapter 7 Inference Concerning Populations (Numeric Responses)

Managerial Economics & Decision Sciences Department introduction  inflated standard deviations  the F  test  business analytics II Developed for ©

Managerial Economics & Decision Sciences Department hypotheses, test and confidence intervals  linear regression: estimation and interpretation  linear.

Copyright © 2009 Pearson Education, Inc t LEARNING GOAL Understand when it is appropriate to use the Student t distribution rather than the normal.

Managerial Economics & Decision Sciences Department tyler realty  old faithful  business analytics II Developed for © 2016 kellogg school of management.

Managerial Economics & Decision Sciences Department hypotheses  tests  confidence intervals  business analytics II Developed for © 2016 kellogg school.

Managerial Economics & Decision Sciences Department intro to linear regression  underlying concepts for the linear regression  interpret linear regression.

Managerial Economics & Decision Sciences Department random variables  density functions  cumulative functions  business analytics II Developed for ©

Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.

business analytics II ▌assignment three - solutions pet food 

Inference for a Single Population Proportion (p)

Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests

business analytics II ▌assignment four - solutions mba for yourself 

business analytics II ▌assignment three - solutions pet food 

CHAPTER 9 Testing a Claim

Chapter 12 Chi-Square Tests and Nonparametric Tests

Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.

assignment 7 solutions ► office networks ► super staffing

3. The X and Y samples are independent of one another.

business analytics II ▌applications fuel efficiency 

Chapters 20, 21 Hypothesis Testing-- Determining if a Result is Different from Expected.

Lecture Slides Elementary Statistics Twelfth Edition

Hypothesis Tests for a Population Mean in Practice

Chapter 9 Hypothesis Testing.

Hypothesis tests for the difference between two means: Independent samples Section 11.1.

Elementary Statistics

Chapter 9 Hypothesis Testing.

Elementary Statistics

CHAPTER 9 Testing a Claim

Lesson Comparing Two Means.

Inferential Statistics and Probability a Holistic Approach

Hypothesis tests for the difference between two proportions

CHAPTER 9 Testing a Claim

CHAPTER 9 Testing a Claim

Presentation transcript:

business analytics II ▌assignment one - solutions autoparts  Managerial Economics & Decision Sciences Department Developed for business analytics II week 1 ▌assignment one - solutions week 2 autoparts  mba demographics  week 3 © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II

learning objectives ► statistics ► readings ► (MSN) ► (CS) assignment one - solutions statistical models: hypotheses, tests & confidence intervals Developed for business analytics II learning objectives ► statistics  null and alternative hypotheses  testing a hypothesis  pvalue  test significance level and test power: type I and type II errors  confidence intervals: construction and interpretation ►  load, modify and save data  basic statistical tools and graphics  perform tests and build confidence intervals readings ► (MSN)  Chapter 2 ► (CS)  Autoparts  MBA Demographics © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II

Autoparts: hypothesis, test and decision Managerial Economics & Decision Sciences Department assignment one - solutions statistical models: hypotheses, tests & confidence intervals Developed for business analytics II autoparts ◄ mba demographics ◄ Autoparts: hypothesis, test and decision ► The claim is that the auto-parts (“Skokie Auto”) last year sales were lower than one would expect for an end-cap location due to construction of a new building, which partially obstructs the view of the auto-parts store from the street. ► Available information: last year sales for Skokie Auto and another 24 auto-parts stores in end-cap locations. ► To keep notations consistent with the notes: let X (as a random variable) be the last year sales for all Illinois autoparts companies in end-cap locations. Let X0 be last year sales for Skokie Auto, thus X0  $1,883,000. ► Skokie Auto’s claim is thus E[X]  X0 and our analysis could be organized as below: hypothesis H0: E[X]  X0 Ha: E[X]  X0 test calculate based on the sample of 24 comparable companies decision reject the null hypothesis (and therefore Skokie Auto’s claim) if pvalue   Remark. Here X stands for the sample-based mean and sX for standard error of the sample mean. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II assignment one | page 1

Autoparts: hypothesis, test and decision Managerial Economics & Decision Sciences Department assignment one - solutions statistical models: hypotheses, tests & confidence intervals Developed for business analytics II autoparts ◄ mba demographics ◄ Autoparts: hypothesis, test and decision hypothesis H0: E[X]  X0 Ha: E[X]  X0 Remark. Skokie Auto’s claim is that its sales are lower than otherwise expected, i.e. lower than other similar companies also located in end-cap locations. test ttest sales  1883 You have to specify the variable for which you conduct the test (sales) and what is the benchmark. Here we use the sales for the sample of 24 comparable companies and Skokie Auto’s sales as benchmark. X0 Figure 1. Results of ttest command Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- sales | 24 2847 171.6289 840.8064 2491.959 3202.041 ------------------------------------------------------------------------------ mean = mean(sales) t = 5.6168 Ho: mean = 1883 degrees of freedom = 23 Ha: mean < 1883 Ha: mean != 1883 Ha: mean > 1883 Pr(T < t) = 1.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 0.0000 decision cannot reject the null hypothesis (and therefore Skokie Auto’s claim) since pvalue    5% © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II assignment one | page 2

Autoparts: hypothesis, test and decision Managerial Economics & Decision Sciences Department assignment one - solutions statistical models: hypotheses, tests & confidence intervals Developed for business analytics II autoparts ◄ mba demographics ◄ Autoparts: hypothesis, test and decision ► Our conclusion is that we cannot reject Skokie Auto’s claim that last year sales were lower than one would expect for an end-cap location due to construction of a new building, which partially obstructs the view of the auto-parts store from the street. However, keep in mind that this conclusion is based on the available “control” sample and a simple analysis of average sales. Figure 2. Graphical results of ttest command Remark. The left tail pvalue is calculated as 1ttail(df,ttest) and in our case this is 1ttail(23,5.6168)  0.99999488. To reject null you would need a significance level of almost 1. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II assignment one | page 3

Autoparts: confidence interval Managerial Economics & Decision Sciences Department assignment one - solutions statistical models: hypotheses, tests & confidence intervals Developed for business analytics II autoparts ◄ mba demographics ◄ Autoparts: confidence interval ► The confidence interval is calculated for E[X] based on the sample mean X and standard error of the sample mean sX as with probability 1   ci means sales, level(90) You have to specify for which statistics you want the confidence interval, i.e. you have to include “means” if you want a confidence interval for the mean of the variable of interest. 1 Figure 3. Results of ci command Figure 4. The confidence interval Variable | Obs Mean Std. Err. [90% Conf. Interval] ---------+----------------------------------------------- sales | 24 2847 171.6289 2552.85 3141.15 Remark. The interval was calculated for a level   10% thus the area between 1.714 and 1.714 under the t distribution between is 90%. This leaves an area of /2  5% in each of the tails. The cutoffs are calculated based on the degrees of freedom (df) and level  as invttail(df,/2) that is invttail(23,0.05)  1.7138715 (and the negative of this number.) 5.00% 90.00% 5.00% –1.714 –1.714 © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II assignment one | page 4

MBA Demographics: sample-based inference Managerial Economics & Decision Sciences Department assignment one - solutions statistical models: hypotheses, tests & confidence intervals Developed for business analytics II autoparts ◄ mba demographics ◄ MBA Demographics: sample-based inference ► The sequence of commands that would solve the requirements is provided below. The outcome is specific to each generated sample. generate random  runiform() sort random sample 40, count drop Indx drop random ttest Age  28.8 ci means Age, level(90)  The first sequence of five commands will generate the random sample of 40 observations after shuffling the original data set of 500 observations. The two drop commands are not really necessary but they are “cleaning” the resulting sample keeping only the relevant information.  The ttest command will provide the calculated ttest and the three pvalues for the corresponding three possible null/alternative hypotheses. You should choose the pvalue that corresponds to the null/alternative pair you are actually using.  Finally, the ci command will provide the confidence interval for the desired level (here at 90%). Note that you have to specify that the confidence interval is for means (try for example variances to see the result). © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II assignment one | page 5

MBA Demographics: different samples Managerial Economics & Decision Sciences Department assignment one - solutions statistical models: hypotheses, tests & confidence intervals Developed for business analytics II autoparts ◄ mba demographics ◄ MBA Demographics: different samples ► The diagrams below represents the mean and confidence intervals provided by you (on the left) and for 800 samples generated for level 90% using the sequence provided in the previous slide (on the right). We are in the position to compare the inference based on our samples with the true mean of 30.0194 (for the original data set - the red line in the two diagrams.) Figure 5. Graphical results of class generated samples Figure 6. Graphical results of 800 generated samples sample average true mean upper bound lower bound Remark. Notice how some intervals do not contain the true mean! This is why we call them “confidence intervals” with a given probability! © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II assignment one | page 6

MBA Demographics: by categories Managerial Economics & Decision Sciences Department assignment one - solutions statistical models: hypotheses, tests & confidence intervals Developed for business analytics II autoparts ◄ mba demographics ◄ MBA Demographics: by categories ► First let’s look at the summary statistics for the specific sample that was generated. summarize Figure 7. Results of summarize command Variable | Obs Mean Std. Dev. Min Max ---------+---------------------------------------------- Age | 40 29.5125 1.824504 25.9 35.1 Gender | 40 .325 .4743416 0 1 ► In the specific sample from file SampleMBA.dta the average age is 29.5125 and the female proportion is 32.5%. If you would like to find the exact number of females you can use: count if Gender  1 ► You will get that there are 13 females (and similarly you can run count if Gender  0 to get that there are 27 males in the sample). Here the mean age of 29.5125 is for all the observations in the sample. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II assignment one | page 7

MBA Demographics: by categories Managerial Economics & Decision Sciences Department assignment one - solutions statistical models: hypotheses, tests & confidence intervals Developed for business analytics II autoparts ◄ mba demographics ◄ MBA Demographics: by categories ► We can actually summarize now the variable Age by Gender (remember that Gender  1 means female) by Gender, sort: summarize Age  With command by varname, sort STATA will split the sample according to the values of the varname that follow the command by.  As expected the number of observations for females is 13 and for males is 27.  Note that now we are provided the average age by gender: 29.25 for males and 30.06 for females. Figure 8. Results of by command -> Gender = 0 Variable | Obs Mean Std. Dev. Min Max ---------+---------------------------------------------- Age | 27 29.24815 1.685999 25.9 32.8 -> Gender = 1 Age | 13 30.06154 2.043501 27.6 35.1 ► How different is the true mean age for males from than of females? Note that here the two average ages are sample-based averages! ► Say XM stands for male age and XF for female age. Then E[XM] is the true mean (at the population level) for males and E[XF] is the true mean (at the population level) for females. The null/alternative hypotheses are: © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II assignment one | page 8

mean age for males (Gender  0)  mean age for females (Gender 1) Managerial Economics & Decision Sciences Department assignment one - solutions statistical models: hypotheses, tests & confidence intervals Developed for business analytics II autoparts ◄ mba demographics ◄ MBA Demographics: by categories hypothesis H0: E[XM]  E[XF] Ha: E[XM]  E[XF]  Quite conveniently STATA offers the ttest command with specification that the variable tested is done by category. STATA interprets the null as mean age for males (Gender  0)  mean age for females (Gender 1) i.e. in increasing order for the values of the category variable.  There are only 38  40  2 degrees of freedom as we are using two variables in this test (age for males and age for females) test ttest Age, by(Gender) Figure 9. Results of ttest command Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- 0 | 27 29.24815 .3244707 1.685999 28.58119 29.91511 1 | 13 30.06154 .5667653 2.043501 28.82666 31.29641 combined | 40 29.5125 .2884794 1.824504 28.929 30.096 diff | -.8133903 .609856 -2.047979 .4211986 ------------------------------------------------------------------------------ diff = mean(0) - mean(1) t = -1.3337 Ho: diff = 0 degrees of freedom = 38 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0951 Pr(|T| > |t|) = 0.1902 Pr(T > t) = 0.9049 decision cannot reject the null hypothesis (that the two means are equal) since pvalue    5% © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II assignment one | page 9

MBA Demographics: by categories Managerial Economics & Decision Sciences Department assignment one - solutions statistical models: hypotheses, tests & confidence intervals Developed for business analytics II autoparts ◄ mba demographics ◄ MBA Demographics: by categories ► At the population level: by Gender, sort: summarize Age Figure 10. Results of by command -> Gender = 0 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- Age | 339 29.9354 2.047798 24.4 36.8 -> Gender = 1 Age | 161 30.19627 1.937553 24.4 35.1 ► At the population level the proportion of females is 32.2% (run summarize) and the two means look fairly close to each other (from the table above) © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II assignment one | page 10

Appendix: Two-Sample t-test Managerial Economics & Decision Sciences Department assignment one - solutions statistical models: hypotheses, tests & confidence intervals Developed for business analytics II autoparts ◄ mba demographics ◄ Appendix: Two-Sample t-test ► There are two ways to run a t-test for means of two samples depending on your underlying assumption on standard deviation for the two populations from where the samples come. ttest Age, by(Gender) ttest Age, by(Gender) unequal  The first command will assume that the standard deviation is the same for both populations. At the samples level, the estimated standard deviation is calculated for the pooled sample (all observations).  The second command will assume that the standard deviations are different for the populations. At the samples level, the estimated standard deviations are calculated separately for each sample. ► The ttest is calculated as but what differs are the standard errors and degrees of freedom:  equal std.dev. assumption:  unequal std.dev. assumption: ► In the formulas above the subscripts 1 and 2 refer to samples 1 and 2 respectively, while p to the pooled sample. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II assignment one | page 11

Appendix: Two-Sample t-test Managerial Economics & Decision Sciences Department assignment one - solutions statistical models: hypotheses, tests & confidence intervals Developed for business analytics II autoparts ◄ mba demographics ◄ Appendix: Two-Sample t-test Figure 11. Results of ttest command: equal variances Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- 0 | 27 29.24815 .3244707 1.685999 28.58119 29.91511 1 | 13 30.06154 .5667653 2.043501 28.82666 31.29641 combined | 40 29.5125 .2884794 1.824504 28.929 30.096 diff | -.8133903 .609856 -2.047979 .4211986 ------------------------------------------------------------------------------ diff = mean(0) - mean(1) t = -1.3337 Ho: diff = 0 degrees of freedom = 38 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0951 Pr(|T| > |t|) = 0.1902 Pr(T > t) = 0.9049 Figure 12. Results of ttest command: unequal variances Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- 0 | 27 29.24815 .3244707 1.685999 28.58119 29.91511 1 | 13 30.06154 .5667653 2.043501 28.82666 31.29641 combined | 40 29.5125 .2884794 1.824504 28.929 30.096 diff | -.8133903 .6530728 -2.175002 .548221 ------------------------------------------------------------------------------ diff = mean(0) - mean(1) t = -1.2455 Ho: diff = 0 Satterthwaite's degrees of freedom = 20.1558 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.1136 Pr(|T| > |t|) = 0.2272 Pr(T > t) = 0.8864 © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II assignment one | page 12