South Dakota School of Mines & Technology Introduction to Probability & Statistics Industrial Engineering
Introduction to Probability & Statistics Data Analysis Industrial Engineering
Data Analysis Histograms Industrial Engineering
Experimental Data u Suppose we wish to make some estimates on time to fail for a new power supply. 40 units are randomly selected and tested to failure. Failure times are recorded follow:
Histogram u Perhaps the most useful method, histograms give the analyst a feel for the distribution from which the data was obtained. u Count observations within a set of ranges u Average 5 observations per interval class
Histogram u Perhaps the most useful method, histograms give the analyst a feel for the distribution from which the data was obtained. u Count observations with a set of ranges u Average 5 observations per interval class u Range for power supply data: u Intervals:
Histogram u Class Interval
Histogram
Histogram u Class Interval Count = 15
Histogram u Class Interval
Histogram
Histogram u Class Interval Count = 11
Histogram Frequency Count Class IntervalsFrequency
Histogram Class IntervalsFrequency
Exponential Distribution fxe x () Density Cumulative Mean 1/ Variance 1/ 2 Fxe x () 1, x > 0
Histogram; Change Interval Class IntervalsFrequency
Histogram; Change Interval Class IntervalsFrequency
Histogram; Change Class Mark Class IntervalsFrequency
Class Problem u The following data represents independent observations on deviations from the desired diameter of ball bearings produced on a new high speed machine.
Class Problem
Histogram u Intervals & Class marks can alter the histogram u too many intervals leaves too many voids u too few intervals doesn’t give a good picture u Rule of Thumb u # Intervals = n/5 u Sturges’ Rule k = [1 + log 2 n] = [ log 10 n]
Class Problem u The following represents demand for a particular inventory during a 70 day period. Construct a histogram and hypothesize a distribution.
Class Problem
Relative Histogram ClassFreqRel
Relative Histogram
Histogram u Class Excel Exercise
South Dakota School of Mines & Technology Data Analysis Industrial Engineering
Data Analysis Empirical Distributions Industrial Engineering
Empirical Cumulative u Rank Order the data smallest to largest u u Example: Suppose we collect gpa’s on 10 students 3.5, 2.8, 2.7, 3.3, 3.0, 3.9, 2.9, 3.0, 2.4, 3.1 n i xF i 0.5 )(
Empirical Cumulative u RankObs n i xF i 0.5 )(
Empirical Cumulative xF i )( Obs
Time to Failure
Exponential Distribution fxe x () Density Cumulative Mean 1/ Variance 1/ 2 Fxe x () 1, x > 0
Inventory Data
Diameter Errors
Normal Distribution
Scatter Plots (Paired Data) u Shows the relationship between paired data u Example: Suppose for example we wish to look at state per student expenditures versus achievement results on the Stanford Achievement Test
Scatter Plots (Paired Data)
South Dakota School of Mines & Technology Data Analysis Industrial Engineering
Data Analysis Box Plots Industrial Engineering
Box Plots u Problem with empirical is we may simply not have enough data u For small data sets, analysts often like to provide a rough graphical measure of how data is dispersed u Consider our student data 2.4, 2.7, 2.8, 2.9, 3.0, 3.0, 3.1, 3.3, 3.5, 3.9
Box Plots u Ranked student Gpa data 2.4, 2.7, 2.8, 2.9, 3.0, 3.0, 3.1, 3.3, 3.5, 3.9 Min = 2.4Max = 3.9
Box Plots u Ranked student Gpa data 2.4, 2.7, 2.8, 2.9, 3.0, 3.0, 3.1, 3.3, 3.5, 3.9 Min = 2.4Max =
Box Plots u Ranked student Gpa data 2.4, 2.7, 2.8, 2.9, 3.0, 3.0, 3.1, 3.3, 3.5, 3.9 Median = ( )/2 =
Box Plots u Ranked student Gpa data 2.4, 2.7, 2.8, 2.9, 3.0, 3.0, 3.1, 3.3, 3.5, 3.9 Median Bottom= ( )/2 =
Box Plots u Ranked student Gpa data 2.4, 2.7, 2.8, 2.9, 3.0, 3.0, 3.1, 3.3, 3.5, 3.9 Median Top = ( )/2 =
Box Plots u Ranked student Gpa data 2.4, 2.7, 2.8, 2.9, 3.0, 3.0, 3.1, 3.3, 3.5,
Fail Time Data u Min = 0.5 u Max = 73.8 u Lower Quartile = 6.1 u Median= 14.3 u Upper Quartile =
Class Problem u The following data represents sorted observations on deviations from desired diameters of ball bearings. Compute a box plot.
Class Problem
South Dakota School of Mines & Technology Data Analysis Industrial Engineering
Data Analysis Statistical Measures Industrial Engineering
Aside: Mean, Variance Mean : Variance: xpxxdiscrete x (), 22 ()()xpx x
Example Consider the discrete uniform die example: x p(x) 1 / 6 1 / 6 1 / 6 1 / 6 1 / 6 1 / 6 = E[X] = 1(1/6) + 2(1/6) + 3(1/6) + 4(1/6) + 5(1/6) + 6(1/6) = 3.5
Example Consider the discrete uniform die example: x p(x) 1 / 6 1 / 6 1 / 6 1 / 6 1 / 6 1 / 6 2 = E[(X- ) 2 ] = (1-3.5) 2 (1/6) + (2-3.5) 2 (1/6) + (3-3.5) 2 (1/6) + (4-3.5) 2 (1/6) + (5-3.5) 2 (1/6) + (6-3.5) 2 (1/6) = 2.92
Binomial Mean = 1p(1) + 2p(2) + 3p(3) np(n) xpx x () xnx n x pp xnx n x )1( )!(! ! 0
Binomial Mean = 1p(1) + 2p(2) + 3p(3) np(n) xpx x () xnx n x pp xnx n x )1( )!(! ! 0 Miracle 1 occurs = np
Binomial Measures Mean : Variance: xpx x () 22 ()()xpx x = np = np(1-p)
Binomial Distribution x P(x) x P(x) n=5, p=.3n=8, p=.5 x P(x) n=4, p= x P(x) n=20, p=.5
Measures of Centrality u Mean u Median u Mode
Measures of Centrality u Mean xpxxdiscrete x (), xfxdxxcontinuous(), u u Sample Mean n i i n x X 1
Measures of Centrality u Exercise: Compute the sample mean for the student Gpa data 2.4, 2.7, 2.8, 2.9, 3.0, 3.0, 3.1, 3.3, 3.5, 3.9
Measures of Centrality u Exercise: Compute the sample mean for the student Gpa data 2.4, 2.7, 2.8, 2.9, 3.0, 3.0, 3.1, 3.3, 3.5, 3.9 n i i n x X = 3.06
Measures of Centrality u Failure Data X1.19
Measures of Centrality u Median Compute the median for the student Gpa data 2.4, 2.7, 2.8, 2.9, 3.0, 3.0, 3.1, 3.3, 3.5, X
Measures of Centrality u Mode Class mark of most frequently occurring interval For Failure data, mode = class mark first interval 0.5 X
Measures of Centrality MeasureStudent Gpa Failure Data Mean Median Mode
Measures of Centrality MeasureStudent Gpa Failure Data Mean Median Mode Sample mean X is a blue estimator of true mean X u.b.
Measures of Centrality MeasureStudent Gpa Failure Data Mean Median Mode Sample mean X is a blue estimator of true mean X E[ X ] = u.b.
Measures of Dispersion u Range u Sample Variance
Measures of Dispersion u Range Compute the range for the student Gpa data 2.4, 2.7, 2.8, 2.9, 3.0, 3.0, 3.1, 3.3, 3.5, 3.9
Measures of Dispersion u Range Compute the range for the student Gpa data 2.4, 2.7, 2.8, 2.9, 3.0, 3.0, 3.1, 3.3, 3.5, 3.9 Min = 2.4Max = 3.9 Range = = 1.5
Measures of Dispersion u Variance 22 ()()xpx x 22 ()()xfxdx u u Sample variance x n nx s n i i
Measures of Dispersion u Exercise: Compute the sample variance for the student Gpa data 2.4, 2.7, 2.8, 2.9, 3.0, 3.0, 3.1, 3.3, 3.5, 3.9 x n nx s n i i
Measures of Dispersion u Sample Variance x n nx s n i i
Measures of Dispersion u Exercise: Compute the variance for failure time data s 2 =
An Aside u For Failure Time data, we now have three measures for the data s 2 = X1.19
An Aside u For Failure Time data, we now have three measures for the data Expontial ?? s 2 = X1.19
An Aside u Recall that for the exponential distribution = 1/ 2 = 1/ 2 If E[ X ] = and E [s 2 ] = s 2, then 1/ = 19.1 s 2 = X1.19 ˆ
An Aside u Recall that for the exponential distribution = 1/ s 2 = 1/ 2 If E[ X ] = and E [s 2 ] = s 2, then 1/ = 19.1 or 1/ 2 = s 2 = X1.19 ˆ ˆ