Download presentation
1
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005
Dr. John Lipp Copyright © Dr. John Lipp
2
Copyright 2003-2005 Dr. John Lipp
Course Outline Part 1: Rank (Order) and Non-Parametric Statistics. Part 2: Statistical Process Control. Part 3: Reliability. Mid-term Exam. EMIS7300 Fall 2005 Copyright Dr. John Lipp
3
Copyright 2003-2005 Dr. John Lipp
Today’s Topics Empirical Cumulative Distribution Function Rank Transform. Sign Test. Tukey’s Two Sample Quick Test. Circular Error Probability Confidence Interval. EMIS7300 Fall 2005 Copyright Dr. John Lipp
4
Non-Parametric and Rank Statistics
Non-parametric statistical procedures are designed without the use of the underlying data distribution and its parameters. The only assumption is the data samples are statistically independent and come from the same distribution. Also known as distribution-free. Hypothesis tests and confidence intervals are on the median, quartiles, percentiles, or other quantiles. EMIS7300 Fall 2005 Copyright Dr. John Lipp
5
Empirical Cumulative Distribution Function
The sample CDF or empirical CDF is defined by is equivalent to sorting the data yi = sort(xi) and plotting yi vs. i/n as a stair-step function. EMIS7300 Fall 2005 Copyright Dr. John Lipp
6
Empirical Cumulative Distribution Function (cont.)
The sample PDF is an unbiased estimator The variance is given by The calculations of the expected values are actually easy The distribution of #xi < x has a binomial distribution with p = F(x) !!! EMIS7300 Fall 2005 Copyright Dr. John Lipp
7
Empirical Cumulative Distribution Function (cont.)
Print / make transparency EMIS7300 Fall 2005 Copyright Dr. John Lipp
8
Copyright 2003-2005 Dr. John Lipp
Rank Transform The rank transform is simply replacing the data with the data’s ranks from sorting in ascending order. Using the ranks often simplifies calculations: Rank Sample Mean (unaffected by ties) Rank Sample Standard Deviation (affected by ties) EMIS7300 Fall 2005 Copyright Dr. John Lipp
9
Copyright 2003-2005 Dr. John Lipp
Sign Test Consider a hypothesis test on the median of a data set {xi} The test is performed by subtracting C from the data {xi} and taking the sign, {si} = {sign(xi – C)}. The number of “+” and “–” values of {si} are counted, denoted r+ and r–, respectively. xi 13.5 9.8 11.4 12.2 7.9 8.6 9.1 10.6 11.3 10.1 xi-10 3.5 -0.2 1.4 2.2 -2.1 -1.4 -0.9 0.6 1.3 0.1 si + – EMIS7300 Fall 2005 Copyright Dr. John Lipp
10
Copyright 2003-2005 Dr. John Lipp
Sign Test (cont.) What is the distribution of r+? r+ is a discrete random variable. r+ can be thought of as the count of successful “+”s in {si}. This success rate is a constant, p = 0.5. Ergo, r+ has a binomial distribution, EMIS7300 Fall 2005 Copyright Dr. John Lipp
11
Copyright 2003-2005 Dr. John Lipp
Sign Test (cont.) For n large (n >> 10), can use a Z test with Otherwise, a table built from the binomial PDF is needed Two-sided One-sided Acceptance Region r+ = 5 4 r+ 6 3 r+ 7 2 r+ 8 1 r+ 9 0.754 0.344 0.109 0.022 0.002 Acceptance Region r+ 3 r+ 4 r+ 5 r+ 6 r+ 7 r+ 8 r+ 9 0.828 0.623 0.377 0.172 0.055 0.011 0.001 EMIS7300 Fall 2005 Copyright Dr. John Lipp
12
Copyright 2003-2005 Dr. John Lipp
Sign Test (cont.) The sign test can be used to test any quantile p, F(p) = p. The null hypothesis test is H0: p = C. The distribution of the test statistic r+ is binomial for p, Example: test H0: first quartile = 8.5 (q1 = 0.25= 8.5) xi 13.5 9.8 11.4 12.2 7.9 8.6 9.1 10.6 11.3 10.1 xi-8.5 5.0 1.3 2.9 3.7 -0.6 0.1 0.6 2.1 2.8 1.6 si + – Acceptance Region 7 r+ 8 6 r+ 9 5 r+ 10 0.4682 0.1344 0.0197 EMIS7300 Fall 2005 Copyright Dr. John Lipp
13
Tukey’s Two-Sample Quick Test
Plot two data samples {xi} and {yi} on the same graph, using a different symbol for each point. Count the number of points of {xi} that protrude past {yi} at one end, and the number of points of {yi} that protrude past {xi} at the opposite end. The total is denoted the end-count. If {xi} protrudes at both ends, or visa-versa, then the end-count is 0. EMIS7300 Fall 2005 Copyright Dr. John Lipp
14
Tukey’s Two-Sample Quick Test (cont.)
Use the table below for the significance level Confidence level, 1 – a, is the chance a difference in the medians exists between {xi} and {yi} (or their means, if the PDF is symmetric). Sample Size n = 0.05 = 0.01 = 0.001 4-8 7 9 13 9-21 10 22-24 14 25+ 8 EMIS7300 Fall 2005 Copyright Dr. John Lipp
15
Circular Error Probability
Circular Error Probability, or CEP, is specified in many weapon system’s requirements. The CEP is the median, radial miss distance. The standard model for the radial miss distance is the Rayleigh distribution. FR(R) CEP R fR(R) R EMIS7300 Fall 2005 Copyright Dr. John Lipp
16
Circular Error Probability (cont.)
The appropriateness of the Rayleigh radial miss distance model tends to decrease as the system complexity increases. Point Estimator: the sample median. Need the distribution of R to analyze! Solution: Use non-parametric methods! Confidence Interval: sort the sample radial miss data so that R1 R2 R3 … Rk … Rn and find the value of k such that P(CEP Rk) = 1 – a Finding the appropriate value of k takes a little manipulation. EMIS7300 Fall 2005 Copyright Dr. John Lipp
17
Circular Error Probability (cont.)
First, let m be the index of the largest radial miss that is less than or equal to the population median (= CEP) 1< m < n: R1 R2 R3 … Rm CEP … Rn, or m = 0: CEP R1 R2 R3 … Rn, or m = n: R1 R2 R3 … Rn CEP The PDF of m, fM[m], is binomial with p = ½ ! The radial miss distances are assumed to be statistically independent. The probability that a particular radial miss distance is less than the median is a constant ½ (by definition). m is the # of radial miss distances less than the median. EMIS7300 Fall 2005 Copyright Dr. John Lipp
18
Circular Error Probability (cont.)
The desired probability can be rewritten using the total probability rule: Evaluate P(CEP Rk|m) If m < k: P(CEP Rk|m) = 1 R1 R2 R3 … Rm CEP … Rk … Rn If m = k: P(CEP Rk|m) = 1 R1 R2 R3 … CEP Rk … Rn If m > k: P(CEP Rk|m) = 0 R1 R2 R3 … Rk … Rm CEP … Rn EMIS7300 Fall 2005 Copyright Dr. John Lipp
19
Circular Error Probability (cont.)
That is, and thus A similar result holds for a two-sided confidence interval EMIS7300 Fall 2005 Copyright Dr. John Lipp
20
Circular Error Probability (cont.)
A one-sided test for n = 10, If the desired value of is not on the table, linear interpolation can be used: where k k-1. k 1 2 3 4 5 6 7 8 9 10 1-k 0.001 0.01 0.05 0.17 0.38 0.62 0.83 0.95 0.99 0.999 1.0 k 0.0 EMIS7300 Fall 2005 Copyright Dr. John Lipp
21
Circular Error Probability (cont.)
i Raw Data Sorted Data k <0.0001 The data on the left is Rayleigh with a median of 2ln(2) The sample median is Select = n = 16. Looking at the table, k = 11. Using the interpolation formula, CEP R R10 Final result: CEP 1.630 with 95% confidence. EMIS7300 Fall 2005 Copyright Dr. John Lipp
22
Copyright 2003-2005 Dr. John Lipp
Homework Use the rank transform on the time data for the Hot Wheels launcher experiment and repeat the regression analysis for HW S2P4-1 modify your Excel spreadsheet to use the ranks instead of the raw data. EMIS7300 Fall 2005 Copyright Dr. John Lipp
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.