Download presentation
Presentation is loading. Please wait.
1
Regression Analysis
2
Unscheduled Maintenance Issue: l 36 flight squadrons l Each experiences unscheduled maintenance actions (UMAs) l UMAs costs $1000 to repair, on average.
3
You’ve got the Data… Now What? Unscheduled Maintenance Actions (UMAs)
4
What do you want to know? l How many UMAs will there be next month? l What is the average number of UMAs ?
5
Sample Mean
6
Sample Standard Deviation
7
UMA Sample Statistics
8
UMAs Next Month 95% Confidence Interval
9
Average UMAs 95% Confidence Interval
10
Model: Cost of UMAs for one squadron If the cost per UMA = $1000, the Expected cost for one squadron = $60,000
11
Model: Total Cost of UMAs Expected Cost for all squadrons = 60 * $1000 * 36 = $2,160,000
12
Model: Total Cost of UMAs Expected Cost for all squadrons = 60 * $1000 * 36 = $2,160,000 How confident are we about this estimate?
13
~ 95% mean (=60) standard error =12/ 36 = 2
14
~ 95% ~56 ~58 60 ~62 ~64 (1 standard unit = 2)
15
95% Confidence Interval on our estimate of UMAs and costs l 60 + 2(2) = [56, 64] l low cost: 56 * $1000 * 36 = $2,016,000 l high cost: 64 * $1000 * 36 = $2,304,000
16
What do you want to know? l How many UMAs will there be next month? l What is the average number of UMAs ? l Is there a relationship between UMAs and and some other variable that may be used to predict UMAs? l What is that relationship?
17
Relationships l What might be related to UMAs? n Pilot Experience ? n Flight hours ? n Sorties flown ? n Mean time to failure (for specific parts) ? n Number of landings / takeoffs ?
18
Regression: l To estimate the expected or mean value of UMAs for next month: n look for a linear relationship between UMAs and a “predictive” variable n If a linear relationship exists, use regression analysis
19
Regression analysis: describes and evaluates relationships between one variable (dependent or explained variable), and one or more other variables (called the independent or explanatory variables).
20
What is a good estimating variable for UMAs? l quantifiable l predictable l logical relationship with dependent variable l must be a linear relationship: Y = a + bX
21
Sorties
22
Pilot Experience
23
Sample Statistics
24
Describing the Relationship l Is there a relationship? n Do the two variables (UMAs and sorties or experience) move together? n Do they move in the same direction or in opposite directions? l How strong is the relationship? n How closely do they move together?
25
Positive Relationship
26
Strong Positive Relationship
27
Negative Relationship
28
Strong Negative Relationship
29
No Relationship
30
Relationship?
31
Correlation Coefficient l Statistical measure of how closely two variables are moving together in a coordinated fashion n Measures strength and direction l Value ranges from -1.0 to +1.0 n +1.0 indicates “perfect” positive linear relation n -1.0 indicates “perfect” negative linear relation n 0 indicates no relation between the two variables
32
Correlation Coefficient
33
Sorties vs. UMAs r =.9788
34
Experience vs. UMAs r =.1896
35
Correlation Matrix
36
A Word of Caution... l Correlation does NOT imply causation n It simply measures the coordinated movement of two variables l Variation in two variables may be due to a third common variable l The observed relationship may be due to chance alone
37
What is the Relationship? l In order to use the correlation information to help describe the relationship between two variables we need a model l The simplest one is a linear model:
38
Fitting a Line to the Data
39
One Possibility Sum of errors = 0
40
Another Possibility Sum of errors = 0
41
Which is Better? l Both have sum of errors = 0 l Compare sum of absolute errors:
42
Fitting a Line to the Data
43
One Possibility Sum of absolute errors = 6
44
Another Possibility Sum of absolute errors = 6
45
Which is Better? l Sum of the absolute errors are equal l Compare sum of errors squared:
46
50 60 70 80 90 100 110120130 X Y The Correct Relationship: Y = a + bX + U systematic random
47
50 60 70 80 90 100 110120130 X Y The correct relationship: Y = a + bX + U systematic random
48
Least-Squares Method l Penalizes large absolute errors l Y- intercept: l Slope:
49
Assumptions l Linear relationship: l Errors are random and normally distributed with mean = 0 and variance = n Supported by Central Limit Theorem
50
Least Squares Regression for Sorties and UMAs
51
Regression Calculations
52
Sorties vs. UMAs
53
Regression Calculations: Confidence in the predictions
54
Confidence Interval for Estimate
55
95% Confidence Interval for the model (b) X Y
56
Testing Model Parameters l How well does the model explain the variation in the dependent variable? l Does the independent variable really seem to matter? l Is the intercept constant statistically significant?
57
Variation
58
Coefficient of Determination l Values between 0 and 1 l R 2 = 1 when all data on line (r=1) l R 2 = 0 when no correlation (r=0)
59
Regression Calculations: How well does the model explain the variation?
60
Does the Independent Variable Matter? l If sorties do not help predict UMAs we expect b = 0 l If b is not 0, is it statistically significant?
61
Regression Calculations: Does the Independent Variable Matter?
62
95% Confidence Interval for the slope (a) Mean of Y Mean of XX Y
63
Confidence Interval for Slope
64
Is the Intercept Statistically Significant?
65
Confidence Interval for Y-intercept
66
Basic Steps of Regression Analysis l Formulate the model l Plot scatter diagram for visual inspection l Compute correlation coefficient l Fit the regression line l Test the model
67
Factors affecting estimation accuracy l Sample size (larger is better) l Range of X values (wider is better) l Standard deviation of U (smaller is better)
68
Uses and Limitations of Regression Analysis l Identifying relationships n Not necessarily cause n May be due to chance only l Forecasting future outcomes n Only valid over the range of the data n Past may not be good predictor of future
69
Common pitfalls in regression l Failure to draw scatter diagrams l Omitting important variables from the model l The “two point” phenomenon l Unfounded claims of model sophistication l Insufficient attention to interval estimates and predictions l Predicting too far outside of known range
70
Lines can be deceiving... R 2 =.6662
71
Nonlinear Relationship
72
Best fit?
73
Misleading data
74
Summary l Regression Analysis is a useful tool n Helps quantify relationships l But be careful n Does not imply cause and effect n Don’t go outside range of data n Check linearity assumptions n Use common sense!
75
Non-linear relationship between output and cost
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.