MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD
Determination of Distribution The underlying distribution can be established in one of the following ways: Drawing a frequency diagram Plotting the data on probability paper Conducting statistical tests known as goodness-of-fit tests for distribution
Probability Paper Gumbel (1954) N observations (X 1, X 2, X 3 …X N ) Arrange Data in increasing order ith value is plotted at the CDF of i/(N+1)
Probability Paper
Plotted versus Normal Dist
Goodness of Fit Question: Whether two independent samples come from identical continuous distributions? Dataset compared to the theoretical distribution Restated: Is the theoretical distribution an acceptable representation of the dataset? Chi Square based on PDF Kolmogorov-Smirnov based on the CDF
Based on error between the observed and assumed PDF of the distribution Methodology: Arrange N data points in increasing order Break data into m intervals Determine: n i – observed frequency of data points in interval “i” e i – theoretical Frequency of data points in interval “i” Chi-Square Test ( 2 )
Methodology: Determine c 1- ,f = Significance Level (usually between 1% and 10%) f = degrees of freedom = m – 1 – k m = # of intervals k = # of distribution parameters (= 2 for normal or lognormal) Obtain c 1- ,f from Appendix 3 The assumed distribution is acceptable at the significance level if: Chi-Square Test ( 2 ) NOTE: m should be > = 5 to obtain satisfactory results
Significance Level, Significance level, , represents probability that any differences between sample and theoretical distribution are due to chance A higher value implies a more stringent requirement to accept proposed distribution, i.e., better agreement Values as low as 1% to 10% are common
Example (Haldar 5.2)
a) Uniform distributed random variables Ordinary graph paper can be prob. paper b)
Example (Haldar 5.2) c) f = m – 1 - k
Example (Haldar 5.5) Perform Chi-square test on the data from Problem 3.1 n = 30 data points Can the underlying distribution be accepted as normal at a 5% significance level? f = degrees of freedom = m – 1 – k m = # of intervals k = # of distribution parameters
Solution (Haldar 5.5a)
Kolmogorov-Smirnov (K-S) Test Based on the error between the observed and assumed CDF of the distribution Methodology: Arrange data in increasing order and assign index, m to each data point where m = 1,2,…,n Determine S n (x i ) = manual CDF: S n (x i ) = 0; x < x 1 S n (x i ) = m/n; x m ≤ x ≤ x m+1 S n (x i ) = 1;x ≥ x n Determine F X (x i ) = Assumed distribution
K-S Test Methodology: Determine D n = max| F x (x i ) – S n (x i ) | Determine D n = Significance Level D n value found in Appendix 4 The assumed distribution is acceptable at the significance level if the maximum difference D n is less than or equal to the tabulated value of D n
Example (Haldar 5.8) Perform K-S test on the data from Problem 3.1. Can the underlying distribution be accepted as normal at a 5% significance level?
Solution (Haldar, 5.8)
Parameter Estimation
Method of Moments Moments are statistical parameters of a dataset 1 st moment (mean = E(X)) 2 nd central moment (Var(X)) 3 rd central moment (skewness) Distribution parameters are derived from the moments PDF forms and parameters for distributions in Table 5.6 on page 118 All are based on first two moments, E(x) and Var(X)
Method of Maximum Likelihood
Interval Estimation Differences exist between expected values of populations and samples Distribution parameters ( ) are typically Estimated from samples Applied to populations Intervals estimate the range of possible values for the parameter to a specified level of confidence
Confidence Intervals Distributions can be linked to probability – making possible predictions and evaluations of the likelihood of a particular occurrence In a normal distribution, the number of standard deviations from the mean tells us the percent distribution of the data and thus the probability of occurrence
x = Mean = Standard Deviation n = Sample Size (1 – ) = Confidence Interval k /2 = value of the standard normal variate (z) = -1 (p) (found using Appendix 1) Interval Estimation for the Mean with Known Variance Two tailed interval!
Lower Confidence Limit for Upper Confidence Limit for Lower and Upper Confidence Limit for the Mean with Known Variance Each is a one tailed interval!
Interval Estimation for the Mean with Unknown Variance t /2,n-1 = value of Student’s t distribution – found using Appendix 5 Standard normal distribution valid for… Known population variance Large n ( > 30) If n is small (< 10), s ≠ use Student’s t
Student’s t distribution f = n – 1 = DOF
Interval Estimation for Variance C ,n-1 = value of Chi Square distribution – found using Appendix 3