Download presentation
Presentation is loading. Please wait.
Published byTamsin Shields Modified over 9 years ago
1
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND STATISTICS FOR SCIENTISTS AND ENGINEERS Systems Engineering Program Department of Engineering Management, Information and Systems
2
2 Time Series Graph or Run Chart Box Plot Histogram and Relative Frequency Histogram Frequency Distribution Probability Plotting
3
3 A plot of the data set x 1, x 2, …, x n in the order in which the data were obtained Used to detect trends or patterns in the data over time Time Series Graph or Run Chart
4
4 A pictorial summary used to describe the most prominent statistical features of the data set, x 1, x 2, …, x n, including its: - Center or location - Spread or variability - Extent and nature of any deviation from symmetry - Identification of ‘outliers’ Box Plot
5
5 Shows only certain statistics rather than all the data, namely - median - quartiles - smallest and greatest values in the sample Immediate visuals of a box plot are the center, the spread, and the overall range of the data Box Plot
6
6 Given the following random sample of size 25: 38, 10, 60, 90, 88, 96, 1, 41, 86, 14, 25, 5, 16, 22, 29, 34, 55, 36, 37, 36, 91, 47, 43, 30, 98 Arranged in order from least to greatest: 1, 5, 10, 14, 16, 22, 25, 29, 30, 34, 36, 36, 37, 38, 41, 43, 47, 55, 60, 86, 88, 90, 91, 96, 98 Box Plot
7
7 First, find the median, the value exactly in the middle of an ordered set of numbers. The median is 37 Next, we consider only the values to the left of the median: 1, 5, 10, 14, 16, 22, 25, 29, 30, 34, 36, 36 We now find the median of this set of numbers. The median for this group is (22 + 25)/2 = 23.5, which is the lower quartile. Box Plot
8
8 Now consider the values to the right of the median. 38, 41, 43, 47, 55, 60, 86, 88, 90, 91, 96, 98 The median for this set is (60 + 86)/2 = 73, which is the upper quartile. We are now ready to find the interquartile range (IQR), which is the difference between the upper and lower quartiles, 73 - 23.5 = 49.5 49.5 is the interquartile range Box Plot
9
9 The lower quartile 23.5 The median is 37 The upper quartile 73 The interquartile range is 49.5 The mean is 45.1 upper quartile 0 102030405060708090100 lower extreme upper extreme lower quartile median mean Box Plot
10
10 A graph of the observed frequencies in the data set, x 1, x 2, …, x n versus data magnitude to visually indicate its statistical properties, including - shape - location or central tendency - scatter or variability Histogram Guidelines for Constructing Histograms – Discrete Data
11
11 If the data x 1, x 2, …, x n are from a discrete random variable with possible values y 1, y 2, …, y k count the number of occurrences of each value of y and associate the frequency f i with y i, for i = 1, …, k, Note that Guidelines for Constructing Histograms – Discrete Data
12
12 If the data x 1, x 2, …, x n are from a continuous random variable - select the number of intervals or cells, r, to be a number between 3 and 20, as an initial value use r = (n) 1/2, where n is the number of observations - establish r intervals of equal width, starting just below the smallest value of x - count the number of values of x within each interval to obtain the frequency associated with each interval - construct graph by plotting (f i, i) for i = 1, 2, …, k Guidelines for Constructing Histograms – Continuous Data
13
13 To illustrate the construction of a relative frequency distribution, consider the following data which represent the lives of 40 car batteries of a given type recorded to the nearest tenth of a year. The batteries were guaranteed to last 3 years. Histogram and Relative Frequency Example
14
14 For this example, using the guidelines for constructing a histogram, the number of classes selected is 7 with a class width of 0.5. The frequency and relative frequency distribution for the data are shown in the following table. Histogram and Relative Frequency Example
15
15 The following diagram is a relative frequency histogram of the battery lives with an approximate estimate of the probability density function superimposed. Histogram and Relative Frequency
16
16 Data are plotted on special graph paper designed for a particular distribution - Normal- Weibull - Lognormal- Exponential If the assumed model is adequate, the plotted points will tend to fall in a straight line If the model is inadequate, the plot will not be linear and the type & extent of departures can be seen Once a model appears to fit the data reasonably well, percentiles and parameters can be estimated from the plot Probability Plotting
17
17 We need value estimates corresponding to each of the sample values in order to plot the data on the probability paper. These estimates are accomplished with what are called median ranks. Median ranks represent the 50% confidence level (“best guess”) estimate for the true value of F(t), based on the total sample size and the order number (first, second, etc.) of the data. Probability Plotting General Procedure
18
18 There is an approximation that can be used to estimate median ranks, called Benard’s approximation. It has the form: where n is the sample size and i is the sample order number. Tables of median ranks can be found in many statistics and reliability texts. Benard’s Approximation
19
19 Step 1: Obtain special graph paper, known as probability paper, designed for the distribution under examination. Weibull, Lognormal and Normal paper are available at: http://www.http://www.weibull.com/GPaper/index.htm Step 2: Rank the sample values from smallest to largest in magnitude i.e., X 1 X 2 ..., X n. Probability Plotting Procedure
20
20 Step 3: Plot the X i ’s on the paper versus or, depending on whether the marked axis on the paper refers to the % or the proportion of observations. The axis of the graph paper on which the X i ’s are plotted is referred to as the observational scale, and the axis for as the cumulative scale. Probability Plotting General Procedure
21
21 Probability Plotting General Procedure Step 4: If a straight line appears to fit the data, draw a line on the graph, ‘by eye’. Step 5: Estimate the model parameters from the graph.
22
22 If the cumulative probability distribution function is We now need to linearize this function into the form y = ax +b Weibull Probability Plotting Paper
23
23 Then which is the equation of a straight line of the form y = ax +b Weibull Probability Plotting Paper
24
24 where and Weibull Probability Plotting Paper
25
25 which is a linear equation with a slope of b and an intercept of. Now the x- and y-axes of the Weibull probability plotting paper can be constructed. The x-axis is simply logarithmic, since x = ln(T) and Weibull Probability Plotting Paper
26
26 cumulative probability (in %) x Weibull Probability Plotting Paper
27
27 To illustrate the process let 10, 20, 30, 40, 50, and 80 be a random sample of size n = 6. Probability Plotting - Example
28
28 Based on Benard’s approximation, we can now calculate F(t) for each observed value of X. For example, for x 2 =20, ^ Probability Plotting - Example
29
29 In summary, Probability Plotting - Example ^
30
30 Now that we have y-coordinate values to go with the x- coordinate sample values so we can plot the points on Weibull probability paper. F(x) (in %) x ^ Probability Plotting - Example
31
31 The line represents the estimated relationship between x and F(x): x F(x) (in %) ^ Probability Plotting - Example
32
32 In this example, the points on Weibull probability paper fall in a fairly linear fashion, indicating that the Weibull distribution provides a good fit to the data. If the points did not seem to follow a straight line, we might want to consider using another probability distribution to analyze the data. Probability Plotting - Example
33
33 Probability Plotting - Example
34
34 Probability Plotting - Example
35
35 Probability Paper - Normal
36
36 Probability Paper - Lognormal
37
37 Probability Paper - Exponential
38
38 Given the following random sample of size n=8, which probability distribution provides the best fit ? Example - Probability Plotting
39
39 40 specimens are cut from a plate for tensile tests. The tensile tests were made, resulting in Tensile Strength, x, as follows: Perform a statistical analysis of the tensile strength data. 40 Specimens
40
40 Time Series plot: By visual inspection of the scatter plot, there seems to be no trend. Therefore, sample appears to be a random sample. 40 Specimens
41
41 40 Specimens Using the descriptive statistics function in Excel, the following were calculated:
42
42 40 Specimens From looking at the Histogram and the Normal Probability Plot, we see that the tensile strength can be estimated by a normal distribution. Using the histogram feature of excel the following data was calculated: and the graph:
43
43 40 Specimens Box Plot The lower quartile 49.45 The median is 53.03 The mean 52.6 The upper quartile 55.3 The interquartile range is 5.86 40 4550556065 lower extreme upper extreme lower quartile upper quartile median mean
44
44 40 Specimens
45
45 40 Specimens
46
46 40 Specimens
47
47 The tensile strength distribution can be estimated by 40 Specimens f(x) F(x) ^ ^
48
48 Solve the Example using Minitab http://www.minitab.com/en-US/default.aspx
49
49
50
50
51
51
52
52
53
53
54
54
55
55
56
56
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.