Chapter 1 Statistical Thinking What is statistics? Why do we study statistics
Statistical Thinking the science of collecting, organizing, and analyzing data the mathematics of the collection, organization and interpretation of numerical data The branch of mathematics which is the study of the methods of collecting and analyzing data a branch of applied mathematics concerned with the collection and interpretation of quantitative data and the use of probability theory to estimate population parameters
Statistical Thinking Statistics is a discipline which is concerned with: –designing experiments and other data collection, –summarizing information to aid understanding, –drawing conclusions from data, and –estimating the present or predicting the future.
Statistical Thinking "I like to think of statistics as the science of learning from data...." Jon Kettenring, ASA President, 1997 Steps of statistical analysis involve: –collecting information (Data Collection) –evaluating the information (Data Analysis) –drawing conclusions (Statistical Inference)
Statistical Thinking What type of information? –A test group's favorite amount of sweetness in a blend of fruit juices –The number of men and women hired by a city government –The velocity of a burning gas on the sun's surface –Clinical trials to investigate the effectiveness of new treatments –Field experiments to evaluate irrigation methods –Measurements of water quality
Statistical Thinking Problems Is a new treatment for heart disease more effective than a standard one? Is using a high octane gas beneficial to car performance? Does reading an article in statistics improve students’ statistics grade?
Statistical Thinking Is a new treatment for heart disease more effective than a standard one? –Pick, say, 100 heart patients –Divide them into two groups, 50 in each group –Group New treatment –Group Standard treatment
Statistical Thinking Results 40 out of 50 of Group 1 patients improved 30 out of 50 of Group 2 patients improved Conclusion: New treatment is more effective!
Statistical Thinking How do you divide the patients? Have you controlled other factors? (fitness level, life style, age, etc) How do you decide who gets what treatment? Ethical issues????
Statistical Thinking Comparing Test Scores Select 10 students and give them a journal article in statistics. Test their knowledge about the article and record their scores Repeat the test after they take STT 231.
Statistical Thinking Result 8 out of the 10 students improved their scores. Question: Can we conclude that reading the article has improved students’ knowledge about statistics?
Statistical Thinking Look at worst case scenarios: “Under the assumption that the new treatment is no better than the standard one, what is the chance that 80% of the patients benefit from this treatment?” “Under the assumption that STT 231 brings no benefit, how likely is it that we see 80% of the students improve their scores? “
Statistical Thinking Need a model to answer these questions!! If STT 231 is not beneficial, then students’ scores may go up or down with 50% chance. This is equivalent to flipping a coin: 50% chance you get Head 50% chance you get Tail
Statistical Thinking Comparing pre and post test scores for 10 students is equivalent to –flipping a coin 10 times and calculating the chance of observing 8H Relevant Questions: –Will the chance of observing 80% of the time H depend on the number of students involved in the experiment? –Will this chance go up, down or remain the same if you repeat the experiment with 200 students?
Statistical Thinking Suppose the proportion of improvement in 10 trials is 4.4%. What does this mean? –If STT 231 is not beneficial, then there is a 4.4%chance that we will observe 8 out of 10 students’ scores improve. –There is little hope that 8 students’ scores will improve by just by CHANCE
Statistical Thinking Suppose the proportion of improvement in 10 trials is 4.4%. We observed 8 students’ scores out of 10 improve. What does this mean?
Statistical Thinking Course is highly effective Course is ineffective and we observed an unlikely event. We do not know which one!
Statistical Thinking Suppose there is a “small” chance that an event happens by CHANCE, Then this is an indication for a strong evidence that the change that we observe did not happen by CHANCE. Hence there is a strong evidence for a factor to be responsible for this change.
Statistical Thinking The course is highly effective!! Reasoning: What we observed is very unlikely if the course was ineffective. Hence the course is effective. The 80% score increment is unlikely to be achieved if the course was ineffective.
Statistical Thinking Some Remarks For questions that involve uncertainty: –Carefully formulate the question you want to answer (Modeling) –Collect Data –Summarize, analyze and present data –Draw Conclusions. Conclusions always include uncertainty –Support your conclusions by quantifying how confident you are about your conclusions.
Chapter 2 A Design Example The Polio Vaccine Case Caused by virus Especially deadly in children Big problem during the first half of the 20 th Century Develop vaccine to fight the disease Jonas Salk (~1950)
A Design Example Problem with vaccines: – Are they safe? –Are they effective? Undertake a large scale trial to answer these questions
A Design Example Case 1: A Simple Study –Distribute the vaccine widely (under the assumption it is safe) –Decrease in the number of polio cases after the vaccine provides evidence that the vaccine is effective Problem?????
A Design Example Problems Lack of control group –Is decrease in number of polio due to the vaccine or other factors? How reliable is the assumption “vaccine is safe”?
A Design Example Case 2: Adding a Control Group –Have two groups Control group-----gets salt solution Treatment group---gets the actual vaccine
A Design Example Example (Observed Control Study) –Control Group---all 1 st and 3 rd grade children –Treatment group---all 2 nd graders Assumption: –Age difference between control and treatment group was felt to be unimportant
A Design Example Potential Problems: –Parents of 2 nd graders may not agree to vaccinating their kids –Parents of sicker kids are most likely to accept the vaccine –More educated parents tend to accept the vaccine –Parents of sick 1 st and 3 rd graders may object that their kids are not getting treatment
A Design Example Difficulty in diagnosing polio –Extreme case of polio are easy to diagnose –Less severe cases of polio have symptoms similar to other common illnesses
A Design Example Potential Problems –Physicians are aware of who has received the vaccine and who has not –Less severe case of polio in a 2 nd grader (who has received the vaccine) may wrongly diagnosed as another illness –Less severe case in a 1 st or 3 rd grader will most likely be diagnosed as polio
A Design Example Case 3: Randomization, Placebo Control, Double Blindness –Random assignment of control and treatment groups Select a child Flip a coin H Treatment Group T Control Group
Design Example Placebo Control –Kids in the control group receive salt solution Double Blind –Neither the child –nor the parents –nor the doctors/nurses who make the diagnosis of polio know whether a kid receives the vaccine or the placebo
A Design Example Summary In designing experiments –Introduce some sort of control group –Use randomization to avoid bias in selection and assignment of subjects for the study –Double blind experiments give protection against biases, both intentional and unintentional
A Design Example Perform the experiment on a large number of subjects (Polio case ~in millions of kids) Repeat the experiment several times before making definitive conclusions
A Design Example Basic Principles of Experimental Designs Randomization Blocking (Treatment/Control Groups) Replication