Download presentation
Presentation is loading. Please wait.
Published byAbner Jenkins Modified over 6 years ago
1
Thoughts on the Future of Statistics Teaching in the light of Big Data
Louisiana State University - Stephenson Dept. of Entrepreneurship and Decision Sciences Helmut Schneider, PhD, Xuan Wang
2
Overview Hypothesis Testing Causal Inference Big Data
Causal Inference, Miguel A. Hernan, James M. Robins Judea Pearl Causal Judea Pearl:Causal Inference: Miguel A. Hernan, James M. Robins, Causal Inference
3
Hypothesis Testing Formulate a Theory State Hypothesis: Ho versus H1
Take a sample Compute statistics Make decision What is the reason for these steps?
4
Problem Identification Traditional Data Sources
Big Data Traditional Data Sources Small volume – low statistical power Limited variety – Biased estimates Low velocity – estimates may not be valid in the future Untapped Sources High volume – high statistical significance - small p value High variety – small bias High velocity – dynamic update of estimates
5
Statistical Significance versus Practical Significance
Accounting faculty research… Auditors take samples…
6
Statistical Significance versus Practical Significance
Cancer Doctors Cite Risks of Drinking Alcohol 12 million women and over a quarter of a million breast cancer cases Statistical significance versus practical significance Risk Ratio 9% versus Risk Difference 0.18 percentage points.
7
Big Data Implications Big data makes everything statistically significant. This is how the real world works. Implications for teaching statistics Need for students to understand practical significance versus statistical significance.
8
Causal Inference Correlation is not causation.
Statisticians only deal with correlations. But yet they also teach students that there is spurious correlation. Myth: In Big Data correlation is causation. Need for students to learn to judge causation.
9
Even in Big Data Correlation is not causation!
Need for students to learn about causality. 9
10
When can we Make Causal Claims
Randomized Designs Observational Data Well – Defined Treatment Positivity Exchangeability 10
11
Confounding: Directed Acyclic Graphs (DAG)
Treatment Outcome Need for students to learn about confounding and DAGs. Confounder Factor
12
Statistical Significance versus Unbiased Estimates
Causality Unbiased Estimates Timely Estimates Variety Velocity Statistical Significance Volume
13
Causal Inference
14
Conclusions Students need to learn about the reasons for using hypothesis testing in todays Big Data environment. Need for students to learn to judge practical significance versus statistical significance. Need for students to learn about DAGs. Students need to learn about methods to establish causation.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.