Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistic Methods (3.10 – Internal 4 credits)

Similar presentations


Presentation on theme: "Statistic Methods (3.10 – Internal 4 credits)"— Presentation transcript:

1 Statistic Methods (3.10 – Internal 4 credits)
By Joanna Charteris

2 Skills required posing a comparison investigative question using a given multivariate data set selecting and using appropriate displays and summary statistics discussing sample distributions discussing sampling variability, including the variability of estimates making an appropriate formal statistical inference communicating findings in a conclusion.

3 Achieved, Merit and Excellence
Achieved - Use statistical methods to make a formal inference involves showing evidence of using each component of the statistical enquiry cycle. Merit - Use statistical methods to make a formal inference, with justification involves linking components of the statistical enquiry cycle to the context, and referring to evidence such as sample statistics, data values, or features of visual displays in support of statements made. Excellence - Use statistical methods to make a formal inference, with statistical insight involves integrating statistical and contextual knowledge throughout the statistical enquiry cycle, and may include reflecting about the process; considering other relevant explanations.

4 Recap Box and Whisker Graphs

5 PPDAC

6 How accurate are Statistics?
The mean shower length for females is 25min and the mean shower length for males is 30min. Can we make the call that males spend longer in the shower?? Obviously its not feasible to collect data on all females and males, therefore the mean is based on a sample. What would happen if we took a different sample?

7 Sampling Variability To answer our question about what is happening back in the population, we need to understand that we have only calculated estimates based on a sample. If we had a different sample our estimates WILL CHANGE!! If that sample was biased in any way, then it is not a true reflection of what is happening back in the population therefore, our investigation is not valid.

8 How do we deal with sampling variability? How can we make conclusions?
Males Females Number of messages sent a day The median of females is bigger, therefore can we say females send more messages a day than males? NO!!! This is based on a sample and is only an estimate. Really we need more information to understand what is happening back in the population.

9 How do we deal with sampling variability? How can we make conclusions?
Last year (in MAT202) we used an informal confidence interval to make calls about a population. This allowed for sampling variability and is a way of making valid, accurate conclusions about a population. Remember an informal confidence interval is a range where the true population mean/median is likely to lie between. Males Females Number of messages sent a day This “bar” is the informal confidence interval. This is the range where the true population median is likely to be. If the overlap then, we cant conclude that there is a difference between males and females.

10 How do we deal with sampling variability? How can we make conclusions?
At level 3 we use a similar idea. The ICI is just a calculation – so how accurate is it? A better way would be to go out a collect 100 different samples and then find the median of all the sample medians. WHOA! – is this feasible??? This is where the idea of bootstrapping comes in. Explain that if each person in the class went out to take a sample, calculated the median, then you would find the median of everyones in the class.

11 Meet the Little Ponies at Paradise Estate
Ponyland is a mystical land, home to all kinds of magical creatures. The Little Ponies make their home in Paradise Estate, living a peaceful life filled with song and games. However, not all of the creatures of Ponyland are so peaceful, and the Ponies often find themselves having to fight for survival against witches, trolls, goblins and all the other beasts that would love to see the Little Ponies destroyed, enslaved or otherwise harmed.[1] Watch youtube theme to “hook” students Use Wikipedia to find out more information

12 Height distribution of the Little Ponies at Paradise Estate
The mean height of Little Ponies at Paradise Estate is 150mm with a standard deviation of 5mm. Sketch a possible height distribution for the population of Little Ponies at Paradise Estate. Remember to give an indication of scale. Discussion points: Estimation and expectation Purpose of Standard deviation Lead into normal distribution

13 Height and Distribution of the POPULATION
Emphasis on population Mean is slightly under 150cm – why? SAMPLES (this is why we cannot just use means and medians to estimate about the population) Re-sampling process

14 Draw your OWN Little Pony
How tall is your Pony? Casio Graphics calculator: RandNorm#(5, 150) Excel (normsinv(rand())*5+150 TI 84+ calculator: randnorm(150,5) Draw your pony Tell your neighbor about your Pony

15 Problem??? Reminder on what a CI is – seen in Time Series
Difference in informal CI and formal CI

16 Introduction to Bootstrapping
Find the mean of your sample first – write it down Shuffle your Ponies Select 1 and record their height in excel Put that Pony back and re-shuffle Select another Pony Repeat process until you have recorded 10 Pony heights THIS IS YOUR SAMPLE OF 10 Using excel (=average) find the mean of your sample Plot your mean on the board Bootstrapping is how we get our confidence interval as repeating it over 1000 times will give us an interval of where the true mean/median will lie Using our distribution of re-sample means, what would be an appropriate interval estimate for the population mean height of Doozers? REPEAT 3 times

17 Introduction to Bootstrapping
Using iNZight to re-sample (Teacher only) Start iNZight and select the Bootstrap Confidence Interval Construction VIT module. Import the Pony sample session 1 file. Drag Height down to the variable 1 box, and then click the Analyse tab. The default quantity is “mean”. Do NOT change this, just click on “Record my choices” Play, and replicate what you have just done by hand. Check you know what each selection does. To finish, copy and paste the Bootstrap distribution of re-sample means into a word document. Use VIT to show the bootstrapping process

18 Using NZgrapher Open Nzgrapher
Import (or paste) My little pony session 1 Select bootstrap (single variable) What is this telling you? I am fairly confident that the median height of a Little Pony from Paradise Estate will be somewhere between … and … tall.

19 How does it work? TEACHER ONLY
You will rarely have data on the whole population! This is just a teaching tool to show you how it works! Just remember: TEACHER ONLY Using iNZight to check how well this method works Start iNZight and select the Confidence interval coverage VIT module (or select FILE and VIT modules). Import the Pony height population file. Drag “Height” down to the variable 1 box, and then click the Analyse tab. The default quantity is mean. Do NOT change this. Change the CI Method to bootstrap: percentile and the Sample Size to 10, then click on Record my choices. Play. Check you know what each selection does, and how it relates to the bootstrap confidence intervals. Use VIT to show how we are “fairly” confident that the actual median from the population is included in our bootstrap interval.

20 Increasing the sample size
What’s the impact on our bootstrap intervals? Complete activity See answer sheet. How much does the width change? How could you describe the change? Can you describe how the width is related to the number of Doozers in the samples?

21 Writing questions

22 Autism and Vaccines – the Debate…
As you may know the link between autism and vaccines has a long and contentious history. Use this topic to do some research into this area. The table below may help you summarise your findings. Come up with AT LEAST two different questions I DO NOT want you to spend much time on this Basic facts One thing I didn’t know One thing I found interesting Autism and Vaccines

23 Comparing two populations Writing a question
Nightmare Moon is planning to attack her sister as she wanted to lower the moon. The princess wants to fit all the Pegasus Ponies 18years and over with an army uniform. They are unsure if they should make wing guards especially for Females as this will take more time. The Princess has employed you to investigate this problem. VARIABLE being examined GROUPS being compared POPULATION inferences are being made about STATISTIC being estimated

24 Comparing two populations Writing a question
Nightmare Moon is planning to attack her sister as she wanted to lower the moon. The princess wants to fit all the Pegasus Ponies 18years and over with an army uniform. They are unsure if they should make wing guards especially for Females as this will take more time. The Princess has employed you to investigate this problem. VARIABLE being examined Wing length cm GROUPS being compared Male/Females POPULATION inferences are being made about Pegasus Ponies 18 or over STATISTIC being estimated Difference of means Complete “Comparison” activity in student resources

25 Comparing two populations Writing a question
Nightmare Moon is planning to attack her sister as she wanted to lower the moon. The princess wants to fit all the Pegasus Ponies 18years and over with an army uniform. They are unsure if they should make wing guards especially for Females as this will take more time. The Princess has employed you to investigate this problem. I wonder what the difference is between the mean wing length of Male Pegasus Ponies 18 years or over and Female Pegasus Ponies that are 18 years of age or over at Paradise Estate VARIABLE being examined Wing length cm GROUPS being compared Male/Females POPULATION inferences are being made about Pegasus Ponies 18 or over STATISTIC being estimated Difference of means Students to have two google/word docs one for each question Complete “Comparison” activity in student resources

26 Features

27 Unusual values Features of B & W ** Must write COMPARISON statements
This is a key aspect to the topic. We can use the acronym SUCCOS to help describe the graphs. Spread Unusual values Centre Clusters/groupings Overlap Shape ** Must write COMPARISON statements

28 SUCCOS S SPREAD Discuss the Inter Quartile Range (IQR) – which is UQ – LQ This is the spread of the middle 50% U UNUSUAL FEATURES This is usually seen by looking at the raw data (dot plot) OR a long whisker C CLUSTERS Where does most of the data lie between OR any groupings? CENTRE Compare the middle 50% of the data and which is higher up the scale O OVERLAP Is there a visible overlap of the boxes? SHAPE ??? This is your GO TO

29 Spread Unusual values Centre *Clusters/groupings Overlap Shape

30 O.S.E.M Obvious Specific Evidence OR Example Meaningful
This is ANOTHER acronym to help you to write about features/succos. Obvious Specific Evidence OR Example Meaningful

31 Using iNZight Open run mode Import data Chose your Variable 1 (has to be numerical) Subset by your two groups Import ‘Student Data’ and draw a comparison B & W for the head perimeter between males and females. Get summary Statistics ** Data is based on Year 11 students at Blah College

32 S U C O SPREADAD UNUSUAL FEATURES CLUSTERS CENTRE Female Male OVERLAP
SHAPE Female Male

33 Is it a big deal if I forgot to write middle 50%?
SPREAD: Compare the IQR (middle 50% spread) Female IQR = 58 – 55 = 3 Male IQR = 58 – = 4.25 The middle 50% of head circumferences belonging to the male year 11 students at Blah College are more spread out than the middle 50% of head circumferences of female Year 11 students at Blah College. This is shown by the male head circumference IQR range being larger by 1.25. This could be because … (possible reason why) Female Male Is it a big deal if I forgot to write middle 50%? What if I only wrote 50%

34 Unusual features/value:
There is one unusually small head circumference for year 11 males at Blah College at 46cm whereas there are no unusual head circumferences for females at Blah College. This could be because … (possible reason why) Female Male Can I just say there is an unusually small head circumference for the males at Blah College?

35 Clusters: Most of the head circumferences for Year 11 females at Blah College are between 53cm and 58cm whereas most of the head circumferences for the Year 11 males at Blah College are between 54cm and 58cm. There also seems to be two groupings of Year 11 female students with a head circumference of 57cm and 55cm, whereas the male year 11 students seem to be more scattered with no clusters. This could be because … (possible reason why) Female Male

36 Centre: Expectation is to compare the middle 50% Female middle 50% = 58 and 55cm median = 57cm Male middle 50% = 58 and 53.75cm median = 55cm The median head circumference for year 11 female students at Blah College is 2cm bigger than the male Year 11 students at Blah College. The middle 50% of year 11 female students at Blah College is between 55 and 58cm, which is approximately the same as the year 11 male students at Blah College. For example the middle 50% of students have roughly the same head circumference no matter if you were male or female. This could be because … (possible reason why) Female Male

37 Why I have said, suggests?
Overlap: Does the boxes (middle 50%) overlap?? Female middle 50% = 58 and 55cm Male middle 50% = 58 and 53.75cm There is significant overlapping of the middle 50% between male and female year 11 students at Blah College which suggests that we may not be able to make a call whether there is a difference in head circumferences between male and female. This could be because … (possible reason why) Female Male Why I have said, suggests?

38 Shape: Both male and female students at Blah College have asymmetric distributions meaning there is an uneven distribution. This is because the head circumferences for female year 11 students at Blah College have been slightly skewed towards having larger head circumferences whereas the males have been a more uniform distribution with a large tail to the left from an unusual head circumference. This could be because … (possible reason why) Female Male

39 Practice I wonder what is the difference between the mean wing length of Male Pegasus18 years or over and Female Pegasus Ponies that are 18 years of age or over Draw a comparison box and whisker graphs on the wing length of Pegasus Ponies at Paradise Estate Describe any features.

40

41 Drawing the graph and writing features
Comment on the sample distribution for your TWO investigation questions Heights Spike Copy and paste ANY relevant graphs and/or statistics you have used. Describe the features

42 Making an Inference

43 Bootstrap process for a comparison situation
Creating a bootstrap confidence interval for the difference in medians (or any other statistic of interest) is essentially the same as for a summary situation. Remember: Outline of bootstrap method Re-sample with replacement from our original random sample. Create a re-sample that is the same size as our original random sample. Calculate the difference in medians (or statistic of interest) for the re-sample. Key points… The distribution of re-sample means (the bootstrap distribution) is similar to the distribution of means from repeated random sampling. Therefore we can use the bootstrap distribution to model the sampling variability in our data, and base our confidence interval on this bootstrap distribution.

44 Example I wonder if there are any differences between the mean wing length of Male and Female Pegasus Ponies that are 18 years of age or over I am fairly confident that there is a difference between the wing length of female and male Pegasus Ponies that are 18 years or over. I can make the call that Males have long wings than females as the bootstrap values are both positive. I can also say that Male Pegasus Ponies 18 years and over have somewhere between 1.534cm and 3.595cm longer wings than females. Sample and re-sample Difference between means/medians Understanding the 3 different graphs How is a positive difference represented? What does the arrow mean? What is the middle number? What does the bootstrap mean?

45 Activity Answer both of your comparison questions Open NZgrapher
Import appropriate data Show bootstrap distribution Calculate confidence interval Write a inference. Remember We want to create a bootstrap confidence interval for the difference between median heights of female ponies and median heights of male ponies. We want to create a bootstrap confidence interval for the difference in median heights between the ponies chased by Spike and the ponies not chased by Spike.

46 MY CALL Complete this sheet in student resources

47 Writing a Conclusion

48 Conclusion Make a formal statistical inference.
Conclude your investigation, reflecting on your hypothesis and justifying your formal inference This may include: - Discussing sampling variability, including the variability of estimates. - Reflecting on the process you have used to make the formal inference Empahsis on difference between an inference and conclusion

49 Example Copy and paste question into your conclusion
I wonder if there are any differences between the mean wing lengths of Male and Female Pegasus Ponies that are 18 years of age or over Copy and paste question into your conclusion When looking at the sample variation between male and females, females wing lengths are a lot more spread out than males. However when you compare just the middle 50% spread they only have a difference of 0.65cm which is very small. This leads me to believe that if I had a different sample the spread could potentially be different where there may not be as many female Pegasus Ponies with short wings. If this was the case, this would push up the mean, but may have little effect on the median. Through my research about Pegasus Ponies wings I have learnt that female Pegasus Ponies have a different shape of wing as they are narrower, so looking just at the length of the wing may not be enough to make a recommendation about whether to make special female army wing guards. When looking at the SD and the bootstrap distribution there is not much variation between the difference on means. Based on my investigation and the sample that I was given, I would conclude that there is a difference between male and female wing lengths for all Pegasus Ponies that are 18 years and over. I therefore make the recommendation that they should be making special wing guards for females. One thing that is not written very well is sample variability. Discuss if you can but make sure there shows understanding that it is just a sample and if it were a different sample the results may differ.

50 Conclusions Because our Ponies are fictional we are not going to write a conclusions based on this.

51 Conclusion women equality in workplace
Do a bootstrap interval of the income data and make an inference and a conclusion. You should think about when doing a reflection – What would happen if you had a different sample? Am I certain that this is the case and the call can be made? Are there other underlying factors that should be considered? Are there other factors that may have contributed to this call to be made? How could this investigation be improved? (assuming your sample is fair and unbiased) Any other comments/features worth commenting on? RESEARCH, RESEARCH, RESEARCH – how has what you have researched relate to your investigation (you should be doing this throughout your assessment)

52 Putting it all TOGETHER

53 Complete Legend activity
Research Question Graphs Features conclusion Research


Download ppt "Statistic Methods (3.10 – Internal 4 credits)"

Similar presentations


Ads by Google