Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © 2010 Lumina Decision Systems, Inc. Statistical Hypothesis Testing (8 th Session in “Gentle Introduction to Modeling Uncertainty”) Lonnie Chrisman,

Similar presentations


Presentation on theme: "Copyright © 2010 Lumina Decision Systems, Inc. Statistical Hypothesis Testing (8 th Session in “Gentle Introduction to Modeling Uncertainty”) Lonnie Chrisman,"— Presentation transcript:

1 Copyright © 2010 Lumina Decision Systems, Inc. Statistical Hypothesis Testing (8 th Session in “Gentle Introduction to Modeling Uncertainty”) Lonnie Chrisman, Ph.D. Lumina Decision Systems Analytica User Group 15 July 2010

2 Copyright © 2010 Lumina Decision Systems, Inc. Scope of Today’s Webinar Included: Conceptual underpinnings of classical hypothesis testing. Interpretation of statistical significance (p-values). General methodology for applying it in any scenario. Intended to promote conceptual understanding. Building on Monte Carlo tools. Not included: Standard canned hypothesis tests (like t-tests, etc)

3 Copyright © 2010 Lumina Decision Systems, Inc. Outline Motivating example Statistical significance The Statistic Methodology Modeling the Null hypothesis Computing the pValue Interpretation of results Drawbacks of methodology Additional exercise

4 Copyright © 2010 Lumina Decision Systems, Inc. Does Stock Market Volatility Vary with Day of Week? Random selected 100 trading days (from 2000-2010). Computed day change (close-open)/open for S&P 500 index. Day of week# samplesVolatility Mon2019.4% Tue2011.9% Wed2021.5% Thu2020.1% Fri2014.3% Alice: “This shows that the market volatility does depend on the day of the week.” Bob: “No, the variation is just due to random sampling variation.” Side note: Annualized volatility := SDeviation * sqrt(T) where T = # trading days/yr = 250 Total volatility: 18.1%

5 Copyright © 2010 Lumina Decision Systems, Inc. Download Model with S&P Data Please download: “Hypothesis Test S&P Volatility.ana” the download link is at the bottom of talk abstract on Analytica Wiki. You’ll use this data for exercises…

6 Copyright © 2010 Lumina Decision Systems, Inc. Statistical Significance Alice: “This shows that the market volatility depends on the day of the week.” Alice’s mission: To show that this observed variation is unlikely if it is just due to random sampling variation. Null Hypothesis: The “true” underlying volatility is the same for every day of the week. Level of significance: The probability that this much variation in volatility would be observed if the Null Hypothesis is true. (termed the “p-value”) Day of week# samplesObserved Volatility Mon2019.4% Tue2011.9% Wed2021.5% Thu2020.1% Fri2014.3%

7 Copyright © 2010 Lumina Decision Systems, Inc. Statistical Significance #2 After her statistical analysis, Alice might say: “This shows at a significance level p=3% that market volatility varies with the day of the week.” By convention, p ≤ 5% is usually considered to be “statistically significant”. p>5% is said to be “not statistically significant”. What can you conclude if the p-value turns out to be 20%? Day of week# samplesObserved Volatility Mon2019.4% Tue2011.9% Wed2021.5% Thu2020.1% Fri2014.3%

8 Copyright © 2010 Lumina Decision Systems, Inc. The “Statistic” We need a scalar metric to summarize degree of conflict with Null-hypothesis (H 0 ). Smaller value  more consistent with H 0 Larger value  greater disagreement with H 0 Examples: Max(vol,day) – Min(vol,day) SDeviation(vol,day) F = Variance(vol,day) / Total_volatility^2 Exercise: Pick a statistic and compute its value for the S&P 500 dataset in your Analytica model. Day# samplesObserved Volatility (vol) Mon2019.4% Tue2011.9% Wed2021.5% Thu2020.1% Fri2014.3% Total volatility: 18.1%

9 Copyright © 2010 Lumina Decision Systems, Inc. Methodology Construct a model that simulates measurements given that the null-hypothesis is true. Typically makes various assumptions. Use Monte Carlo simulation to produce several simulated data sets. Apply the statistic to each. pValue: Pr( Stat sim ≥ Stat meas )

10 Copyright © 2010 Lumina Decision Systems, Inc. Modeling the Null Hypothesis Null Hypothesis: The volatility is 18.1% on every day of the week. How could you simulate the data? (Hint: There are multiple possible approaches) What assumptions are you making? Some ideas: Randomly generate each day’s price change from a LogNormal distribution. Shuffle existing data. Exercise: Implement a model of the null-hypothesis in your Analytica model. (One random dataset for each item in Run) Day# samplesObserved Volatility (vol) Mon2019.4% Tue2011.9% Wed2021.5% Thu2020.1% Fri2014.3% Total volatility: 18.1%

11 Copyright © 2010 Lumina Decision Systems, Inc. Computing Statistic on Simulated Exercise: Apply your statistic to each simulated dataset. Note: Larger statistic values occur when the variation in volatility by day is largest. Exercise: What fraction of simulated datasets have a larger statistic value than the actual data? This is the p-value Is Alice’s hypothesis statistically significant?

12 Copyright © 2010 Lumina Decision Systems, Inc. Common Misuse of Paradigm: Multiple Hypotheses Scenario: Alice identifies 20 other plausible hypotheses to test, e.g.: Volatility on Tues is different than the other 4 days. Volatility varies my month. September has a higher volatility than other months. … She tests each of these individually and finds one of them to be statistically significant at a 5% level. She publishes this result. What’s wrong here? What should she do differently?

13 Copyright © 2010 Lumina Decision Systems, Inc. Interpreting p-Value Small value (< 5%) Accept main hypothesis Data is inconsistent with Null-hypothesis Otherwise (p > 5%) Conclude only that data sample was too small to detect relationship. Hypothesis may still be true or false: “Larger research study required” P-value is not: A measure of the strength of relationship. The probability that the hypothesis is true.

14 Copyright © 2010 Lumina Decision Systems, Inc. Drawbacks with Statistical Hypothesis Testing Paradigm 1 in 20 false hypotheses are accepted (at 5% significance level). Often abused by people testing many hypotheses. Nearly any hypothesis is confirmed with a large enough sample. Most hypotheses will have at least a miniscule “true” effect. With enough data, even the most miniscule effect becomes statistically significant. The “uncertainty” about the hypothesis is not available. Doesn’t provide P(H), which would be useful in model that use the results. Numerous subjective components that are not recognized or reported explicitly. “Cookbook tests” are very often misapplied when assumptions don’t hold, leading to greater confidence than is warranted by the data.

15 Copyright © 2010 Lumina Decision Systems, Inc. New Exercise Parkinson’sNo Parkinson’s Not exposed10140 Exposed to TCE425 Hypothesis: TCE exposure is associated with an increased risk of getting Parkinson’s disease. Null Hypothesis: Parkinson’s rates are the same among those exposed and not exposed to TCE. Exercise: Identify an appropriate statistic. Model the null-hypothesis Compute the p-Value Number of subjects: (purely fictional data)

16 Copyright © 2010 Lumina Decision Systems, Inc. Standard Hypothesis Tests Statisticians have packed dozens common scenarios as standardized hypothesis tests (e.g., t-Tests, F-tests, continency table tests, Whitney-Mann-Wilcoxon tests,…) If one fits your situation, use it! Usually no need for Monte Carlo. Better precision Easy to describe in publication More accepted (subjective aspects are standardized)

17 Copyright © 2010 Lumina Decision Systems, Inc. Subjective Aspects of Statistical Hypothesis Testing Selection of the statistic used. When is one choice of statistic superior to another? Assumptions built into the model of the null-hypothesis. E.g., Independence of data points Distributional assumptions Assumed parameters & values Incorporated background knowledge

18 Copyright © 2010 Lumina Decision Systems, Inc. Summary Statistical Hypothesis Testing tests: Is the support for a hypothesis statistically significant given a dataset. Significance level (p-value) is: Probability of seeing data at least as extreme as the actual data when the Null hypothesis is true. p-value 5%  conclude nothing, need more data. Methodology: Identify statistic (scalar metric): A measure of divergence from null-hypothesis. Build model of null-hypothesis to “simulate” data sets. Compute p-value.


Download ppt "Copyright © 2010 Lumina Decision Systems, Inc. Statistical Hypothesis Testing (8 th Session in “Gentle Introduction to Modeling Uncertainty”) Lonnie Chrisman,"

Similar presentations


Ads by Google