Comparing Different Samples ©2005 Dr. B. C. Paul modified 2009 Note- The concepts in these slides are considered common knowledge to those familiar with.

Comparing Different Samples ©2005 Dr. B. C. Paul modified 2009 Note- The concepts in these slides are considered common knowledge to those familiar with the subject of statistics. Formulas and ideas found here appear in many text books on the subject although no one book is known by the author to outline the subject in exactly the manner identified in these slides. The specific tables shown are Fundamental Concepts in the Design of Experiments by Hicks, although most statistics texts have similar tables.

Things We Have Learned So Far Normal Distribution Model is most frequently chosen model for populations Normal Distribution Model is most frequently chosen model for populations It is fully defined by its mean and standard deviation (both of which we can calculate) It is fully defined by its mean and standard deviation (both of which we can calculate) We can check the population distribution to find out how much of it is above or below some limit We can check the population distribution to find out how much of it is above or below some limit One and Two Tailed Tests One and Two Tailed Tests The means of multiple samples are less variable than the underlying population The means of multiple samples are less variable than the underlying population We can calculate the new standard deviation for either random or spatially correlated sample means We can calculate the new standard deviation for either random or spatially correlated sample means We can then do one or two tailed tests on the means of the sample sets We can then do one or two tailed tests on the means of the sample sets If the number of samples taken is under about 100 and we have estimated standard deviation our population will have a T distribution If the number of samples taken is under about 100 and we have estimated standard deviation our population will have a T distribution We can do our one and two tailed tests using a T distribution We can do our one and two tailed tests using a T distribution

Sometimes We Want to Compare Samples from Two Different Populations Why? Why? Most frequently for engineers we want to see if something we have done has had an effect Most frequently for engineers we want to see if something we have done has had an effect We sample before and after, product 1 vs. product 2 We sample before and after, product 1 vs. product 2 We test our two sample sets to see if things changed. We test our two sample sets to see if things changed.

The Problem is Set Up Like A Typical Confidence Interval You calculate a test statistic (either a Z or a T) and then check to see if the statistic is way out in the tail of the distribution somewhere. You calculate a test statistic (either a Z or a T) and then check to see if the statistic is way out in the tail of the distribution somewhere. A difference is indicated by the test statistic being way out in the loony fringe tail A difference is indicated by the test statistic being way out in the loony fringe tail

Application Red Rooster Carburetor company would like to claim that their carburetors improve fuel economy by 20% when their replacement carburetors are used. Red Rooster Carburetor company would like to claim that their carburetors improve fuel economy by 20% when their replacement carburetors are used. Red Rooster assembles teams of drivers to drive two sets of cars – one that has been retrofit with Red Rooster Carburetors and one that uses the manufactures original carburetors Red Rooster assembles teams of drivers to drive two sets of cars – one that has been retrofit with Red Rooster Carburetors and one that uses the manufactures original carburetors

Data Begins Coming In The standard vehicles came in with an average of 21.4 mpg and stdev of 6.1 from 60 car and driver combinations The standard vehicles came in with an average of 21.4 mpg and stdev of 6.1 from 60 car and driver combinations The Rooster Carburetor Vehicles came in with 29.5 mpg and stdev of 6.2 from 41 car and driver combinations The Rooster Carburetor Vehicles came in with 29.5 mpg and stdev of 6.2 from 41 car and driver combinations

Setting Up A Test If the average gas mileage for the no Rooster set is improved 20% its adjusted mean is 25.68 If the average gas mileage for the no Rooster set is improved 20% its adjusted mean is 25.68 The Null Hypothesis is that the mean of cars gas mileage is the same (after the 20% adjustment) The Null Hypothesis is that the mean of cars gas mileage is the same (after the 20% adjustment) Set the test up to reject and conclude the Rooster Carburetor set is more than 20% better if the test statistic is extreme enough Set the test up to reject and conclude the Rooster Carburetor set is more than 20% better if the test statistic is extreme enough

The Test Statistic We will let Y1 be our Rooster carburetor We will let Y2 be our Standard Vehicles with 20% improvement If Y1 is bigger than Y2 it will cause Z to become increasingly large. If Z is So far out in the upper tail that there is little chance it could be a random Event we will reject the null hypothesis and conclude that the Red Rooster Carburetors do improve fuel economy by 20%

A Note on Our Test Statistic The denominator is what we call A pooled estimate of variance Strictly speaking the test is assuming That the two populations have The same variance. If the variances Are close it is accepted practice to To allow the lye as close enough. How much different can the variances be and still be about the same? Actually a bit of a judgment call but I’m not worried about 6.1 and 6.2

Plug and Chug Z=3.06 do to the table to look up how much of The normal distribution is beyond 3.06 standard Deviation units

Do A Table Look Up Area under the curve is 0.99889 or 0.00111 ie 0.111% of the distribution is Further out. There is about 1/10 th of 1% chance that the observed result is A fluke. Action – Reject the null hypothesis on conclude that the Red Rooster Carburetor Does improve fuel economy by more than 20%

Some Commentary Two approaches to test statistics. Two approaches to test statistics. Style one (the old conventional one) Style one (the old conventional one) Pick your alpha level Pick your alpha level Look up the Z value that corresponds to the alpha level Look up the Z value that corresponds to the alpha level Run the test statistic Run the test statistic If Z is larger or more negative than the critical value then reject the null hypothesis with a confidence of alpha If Z is larger or more negative than the critical value then reject the null hypothesis with a confidence of alpha Style two (gaining now with computers) Style two (gaining now with computers) We have a minimum confidence we would like to have (an idea of alpha level – but not necessarily an exact number) We have a minimum confidence we would like to have (an idea of alpha level – but not necessarily an exact number) Calculate the test statistic Calculate the test statistic Let the computer tell you what the probability is of a more extreme test statistic Let the computer tell you what the probability is of a more extreme test statistic If you are comfortable with that value then reject the null hypothesis If you are comfortable with that value then reject the null hypothesis

Why the Difference Old Days we only had printed tables Old Days we only had printed tables Tables were printed for standard confidence levels Tables were printed for standard confidence levels 95% is the old standby 95% is the old standby 99% tends to show up in medical applications 99% tends to show up in medical applications 97.5% shows up (often as the two tailed version of 95%) 97.5% shows up (often as the two tailed version of 95%) Now Days – if computers are going to do the work they can easily crunch the exact “significance” of a test statistic Now Days – if computers are going to do the work they can easily crunch the exact “significance” of a test statistic We can use more of our own judgment We can use more of our own judgment If a Z statistic is 94.97% significant we will not reject the null hypothesis at the 5% alpha level If a Z statistic is 94.97% significant we will not reject the null hypothesis at the 5% alpha level Does that really make sense? Does that really make sense? Why not report our decision to reject and just tell how sure we are Why not report our decision to reject and just tell how sure we are

The Estimated Variance Issue We remember that normal populations put out Z test statistics if we know their standard deviation We remember that normal populations put out Z test statistics if we know their standard deviation Or if we have estimated it with around 100 or more samples Or if we have estimated it with around 100 or more samples I had 101 samples in my pooled estimate so I can defend using a Z statistic but I’m on the edge I had 101 samples in my pooled estimate so I can defend using a Z statistic but I’m on the edge Some people will complain that my individual variance estimates were actually well below 100 samples and 101 is calling the shot pretty close Some people will complain that my individual variance estimates were actually well below 100 samples and 101 is calling the shot pretty close They would say my statistic should actually be based on the T distribution They would say my statistic should actually be based on the T distribution

If I decide to be a bit more careful and use the T distribution Eacks! – Let me reconsider that Its more painful but if I plug and chug I get 3.07

Doing My Table Look Up The statistic has n1+n2-2 degrees of freedom – in this case 99 Table shows significance is Over 99.5%

Observation Note that having to do table look ups really does make it harder to know other than standard alpha levels Note that having to do table look ups really does make it harder to know other than standard alpha levels Note that when we used a T test with about 100 samples we had indeed converged to about the same value as we got with standard normal statistical tables and Z values Note that when we used a T test with about 100 samples we had indeed converged to about the same value as we got with standard normal statistical tables and Z values

Variations on a Theme The comparisons we did thus far assumed The comparisons we did thus far assumed We had a normally distributed population We had a normally distributed population Our two populations had about the same standard deviation Our two populations had about the same standard deviation What if this had not been true? What if this had not been true? If our populations had not been normally distributed we have “Non-Parametric” tests that do the same thing (more on that later) If our populations had not been normally distributed we have “Non-Parametric” tests that do the same thing (more on that later) If we had not had the same standard deviation we could have set up a “Behrens-Fisher” test which uses a modified T distribution (more on that later) If we had not had the same standard deviation we could have set up a “Behrens-Fisher” test which uses a modified T distribution (more on that later)

Strengthening Our Experiment Red Rooster tested their carburetor by running two randomly chosen fleet and driver combinations Red Rooster tested their carburetor by running two randomly chosen fleet and driver combinations A lot of things contribute to variance A lot of things contribute to variance Different types of cars have vastly different gas mileage Different types of cars have vastly different gas mileage Even in the same car different drivers will get different mileage by virtue of how they handle the car Even in the same car different drivers will get different mileage by virtue of how they handle the car One of the things we have noticed is that the more variable a population, the harder it is to reject the “null hypothesis” One of the things we have noticed is that the more variable a population, the harder it is to reject the “null hypothesis” Often we have to resort to larger sample sizes and more tests Often we have to resort to larger sample sizes and more tests

Experiments by Design What if Red Rooster Carburetors is a group of students who designed their carburetor in the machine shop at school What if Red Rooster Carburetors is a group of students who designed their carburetor in the machine shop at school The idea that they can go out and build 41 carburetors and send 101 cars and drivers out to burn up a bunch of gas is kind of “iffy” The idea that they can go out and build 41 carburetors and send 101 cars and drivers out to burn up a bunch of gas is kind of “iffy” One Way to Get Sample Size Down is to get rid of some of that random variance One Way to Get Sample Size Down is to get rid of some of that random variance What if we used the same car and driver with and without the Red Rooster Carburetor? What if we used the same car and driver with and without the Red Rooster Carburetor? We just took out two sources of scatter in the data We just took out two sources of scatter in the data This is called a Paired Experiment This is called a Paired Experiment

Paired Experiments Needs to be a solid basis for pairing Needs to be a solid basis for pairing Can make the numbers crunch pairing up anything Can make the numbers crunch pairing up anything Experiment – I want to show that students from Illinois are smarter than students from Missouri. I give a test to 40 SIU seniors that are Illinois residents. I then give the same test to 40 Kindergarteners from Missouri. I match the students up in the order in which tests were turned in and do my test. Experiment – I want to show that students from Illinois are smarter than students from Missouri. I give a test to 40 SIU seniors that are Illinois residents. I then give the same test to 40 Kindergarteners from Missouri. I match the students up in the order in which tests were turned in and do my test. If my test statistic shows that my Illinois students scored higher are you willing to believe that Illinois students are smarter than Missouri students? If my test statistic shows that my Illinois students scored higher are you willing to believe that Illinois students are smarter than Missouri students?

OK that last one raises some concerns about the Intelligence of who ever designed that experiment The basis for pairing should be that we are pairing like items to eliminate variation from what ever we are trying to “write out” of the experiment by pairing. The basis for pairing should be that we are pairing like items to eliminate variation from what ever we are trying to “write out” of the experiment by pairing. Suppose we make one Red Rooster Carburetor to go on a Dodge Neon and I have 10 students drive the vehicle over the same road course before adding the carburetor. I then add the carburetor and have the same 10 students drive the same car over the same course. I will then pair the results before and after adding the carburetor Suppose we make one Red Rooster Carburetor to go on a Dodge Neon and I have 10 students drive the vehicle over the same road course before adding the carburetor. I then add the carburetor and have the same 10 students drive the same car over the same course. I will then pair the results before and after adding the carburetor

Looking at My Results Standard Dodge Neon Standard Dodge Neon Don Dork 26.5 Don Dork 26.5 Kurt Kurtosis 25.7 Kurt Kurtosis 25.7 Angela Airhead 25.2 Angela Airhead 25.2 Mark Maniac 23.9 Mark Maniac 23.9 Katty Careful 28.1 Katty Careful 28.1 Jim Junkyard 26.2 Jim Junkyard 26.2 Steve Stickshift 25.9 Steve Stickshift 25.9 Burt Bunion 27.1 Burt Bunion 27.1 Saedy Sadist 26.7 Saedy Sadist 26.7 Melvin Mizer 28.2 Melvin Mizer 28.2 Neon with RR Carb Don Dork 32.1 Kurt Kurtosis 30.1 Angela Airhead 31.8 Mark Maniac 29.8 Katty Careful 34.2 Jim Junkyard 30.6 Steve Stickshift 31.2 Burt Bunion 33.2 Saedy Sadist 32.8 Melvin Mizer 34.5

The test requires us to get the differences within our pairing Don Dork Result – 32.1- 26.5 = 5.6 Don Dork Result – 32.1- 26.5 = 5.6 Kurt Kurtosis - 30.1 – 25.7 = 4.4 Kurt Kurtosis - 30.1 – 25.7 = 4.4 And so on through the pairing. And so on through the pairing.

Tuning in a Little More Red Rooster actually wants to claim a 20% increase in gas mileage so we may be able to normalize out some more variance by directly measuring % improvement. Red Rooster actually wants to claim a 20% increase in gas mileage so we may be able to normalize out some more variance by directly measuring % improvement. Results 21.13%, 17.12%, 26.19%, 24.68%, 21.71%, 16.79%, 20.48%, 22.51%, 22.85%, 22.34% Results 21.13%, 17.12%, 26.19%, 24.68%, 21.71%, 16.79%, 20.48%, 22.51%, 22.85%, 22.34% We also are interested in how much these values differ from 20% improvement so we can subtract 20% from each value We also are interested in how much these values differ from 20% improvement so we can subtract 20% from each value 1.13%, -2.88%, 6.19%, 4.68%, 1.71%, -3.21%, 0.48%, 2.51%, 2.85%, 2.34% 1.13%, -2.88%, 6.19%, 4.68%, 1.71%, -3.21%, 0.48%, 2.51%, 2.85%, 2.34% Plug the Data into SPSS to get Mean and Standard Deviation Plug the Data into SPSS to get Mean and Standard Deviation Could also use Excel and function =average(data range) and =stdev(data range) for standard deviation Could also use Excel and function =average(data range) and =stdev(data range) for standard deviation

The Hypothesis H o = there is no difference between our set of numbers and 0 H o = there is no difference between our set of numbers and 0 Specifically means we cannot be sure we have over 20% improvement Specifically means we cannot be sure we have over 20% improvement Rejecting the null hypothesis means we are sure we have over 20% improvement Rejecting the null hypothesis means we are sure we have over 20% improvement

The Test Statistic for a Paired Experiment D with the bar over it is the average Difference (in this case 1.58%) Sd is the standard deviation of the Individual differences as calculated (in this case 2.95%) N is of course the number of samples (in this case 10) Crunching the number we get 1.69

Looking Up Our Result We have n-1 degrees of freedom (in this case 9) 1.69 is between 90 and 95% Significant. We cannot reject The null hypothesis at the 95% Level, but we can at about 93% confidence.

Limitations of Our Results 93% confidence we have over 20% improvement may fall short of the proof some people would demand 93% confidence we have over 20% improvement may fall short of the proof some people would demand One way to strengthen the conclusion is more samples (the standard deviation shrinks with more samples and since it is in the denominator that makes t bigger) One way to strengthen the conclusion is more samples (the standard deviation shrinks with more samples and since it is in the denominator that makes t bigger) We may also be concerned that all our tests were on a Dodge Neon which furnishes no data on whether the result would be improved on other cars as well We may also be concerned that all our tests were on a Dodge Neon which furnishes no data on whether the result would be improved on other cars as well

Now it’s Your Turn Do Unit #2 assignment #3 (also can be called assignment #4 if units are not considered) Do Unit #2 assignment #3 (also can be called assignment #4 if units are not considered) You will be ask to compare the number of truckloads of rock delivered by mining trucks with and without the use of a computerized truck dispatch system. You will be ask to compare the number of truckloads of rock delivered by mining trucks with and without the use of a computerized truck dispatch system. The question – does computerized truck dispatching improve production? The question – does computerized truck dispatching improve production?

Comparing Different Samples ©2005 Dr. B. C. Paul modified 2009 Note- The concepts in these slides are considered common knowledge to those familiar with.

Similar presentations

Presentation on theme: "Comparing Different Samples ©2005 Dr. B. C. Paul modified 2009 Note- The concepts in these slides are considered common knowledge to those familiar with."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Comparing Different Samples ©2005 Dr. B. C. Paul modified 2009 Note- The concepts in these slides are considered common knowledge to those familiar with.

Similar presentations

Presentation on theme: "Comparing Different Samples ©2005 Dr. B. C. Paul modified 2009 Note- The concepts in these slides are considered common knowledge to those familiar with."— Presentation transcript:

Similar presentations

About project

Feedback