Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 CH1. What is what CH2. A simple SPF CH3. EDA CH4. Curve fitting CH5. A first parametric SPF CH6: Which fit is fitter CH7: Choosing the objective function.

Similar presentations


Presentation on theme: "1 CH1. What is what CH2. A simple SPF CH3. EDA CH4. Curve fitting CH5. A first parametric SPF CH6: Which fit is fitter CH7: Choosing the objective function."— Presentation transcript:

1 1 CH1. What is what CH2. A simple SPF CH3. EDA CH4. Curve fitting CH5. A first parametric SPF CH6: Which fit is fitter CH7: Choosing the objective function CH8: Theoretical stuff Ch9: Adding variables CH10. Choosing a model equation 5. A first parametric SPF In this session: 1.How to build a C-F spreadsheet; 2.How to estimate  3.Can one get CMFs? 4.How accurate are the parameters.

2 2 Assemble data Postulate model like: Estimate the  ’s and check significance Remove (add) variables, Re-estimate parameters Report The one-through approach to modeling: SPF workshop February 2014, UBCO

3 3 The snakes and ladders view of modeling

4 4 Gradualism - its merits 1.Didactic 2.Model equation emerges gradually 3.Usable model at each stage First variable Simple model equation Begin with X 1 ≡ Segment Length. Add X 2 ≡ AADT, X 3... later. SPF workshop February 2014, UBCO

5 EDA hinted at non-linearity Why not linear? Possible reasons: Longer segments have fewer driveways/mile, higher speed, are they further from trauma centers, … But all this is unknown. We can choose only by: parameter parsimony goodness-of-fit quality of prediction 5 The first variable: Segment Length Which function?

6 SPF workshop February 2014, UBCO6 If fitting is only by: parameter parsimony goodness-of-fit quality of prediction can it be a source of CMFs? In parametric curve-fitting the function is chosen for its goodness-of-fit and parsimony of parameters. When the function is not chosen on the basis of some commonly accepted theory or pre-existing body of consistent empirical evidence, the fitted model does not amount to a ‘causal law’, ‘understanding’, or an ‘explanation’ [1] and may not be used to predict the consequences of interventions or design changes [2]. Why then, without batting an eye, do so many researchers present their parametric regressions models as a [1] [2]In parametric curve-fitting the function is chosen for its goodness-of-fit and parsimony of parameters. When the function is not chosen on the basis of some commonly accepted theory or pre-existing body of consistent empirical evidence, the fitted model does not amount to a ‘causal law’, ‘understanding’, or an ‘explanation’ [1] and may not be used to predict the consequences of interventions or design changes [2]. Why then, without batting an eye, do so many researchers present their parametric regressions models as a [1] [2]

7 SPF workshop February 2014, UBCO7 Two perspectives on SPF E{  } and  = f(Traits, parameters) Applications centered perspective Cause and effect centered perspective The perspective determines how modeling is done. If you want CMFs you do it one way; If you want E{μ} and σ{μ} for use in applications you do it differently. Recall:

8 SPF workshop February 2014, UBCO8 The cause-effect issue: Illustration 1 Do ‘A’ or ‘B’? Designer asks: How many crashes on Tangent of ‘A’ vs. Tangent of ‘B’ Curve ‘A’ vs. curve ‘B’ Sum

9 SPF workshop February 2014, UBCO9 Suppose we have model equation and estimates (as will happen soon) Trouble TangentLength A1 mile1.67 crashes B0.5 miles0.92 crashes By choosing ‘B’ save 1.67-0.92=0.75 crashes on the tangent. Why by choosing ‘B’ and eliminating 0.5 miles we save 0.75 crashes when on the remaining (identical) 0.5 miles we expect 0.92 crashes?

10 The reason why is that segments found to be 1 mile long differ from segments found to be 0.5 miles long in many traits other than length. (Were it not so would be 1.) The is a fit to 5323 segments of varying length. For segments found to be 1 mile long =1.67 crashes; For segments found to be 0.5 miles =0.92 crashes. 10 The explanation … SPF workshop February 2014, UBCO

11 11 The question was: Why by eliminating 0.5 miles we save 0.75 crashes if on the remaining (identical) 0.5 miles we expect 0.92 crashes? The designer’s 0.5 mile and 1 mile tangents are identical in traits while the model predicts for segments that differ in traits Conclusion: Model cannot be used by designer. The answer: In General: If we had data about all safety-relevant traits, and if we knew the function by which they combine, then models might be trusted to predict the effect of design changes (manipulations). But, as it is, ….

12 SPF workshop February 2014, UBCO12 Garber & Gadirau, 1988 This relationship is ‘regular’ and a smooth function could be fitted to it. Does this mean that increasing the average speed on a road reduces the accident rate? No. Roads in population A differ from roads in population B by many traits, not only in average speed. The cause-effect issue: Illustration 2

13 SPF workshop February 2014, UBCO13 Hauer, et al. (2004), “Safety Models for Urban Four-lane Undivided Road Segments.” Transportation Research Record 1897. Would accounting for many variables help? After accounting for (1)Traffic flow, (2)Segment Length, (3) Percent trucks, (4) Degree of curve, (5) Lane width, (6) Shoulder traits, (7) driveways

14 Here is how to establish ‘Hooke’s Law’: Take a spring. Hang weight and measure elongation Increase weight and measure again,… Plot weight against elongation If straight line, regress to find ‘Spring constant’ Your data is experimental. You can predict the effect of weight on elongation. SPF workshop February 2014, UBCO Why “has the enterprise not been successful”? 14

15 SPF workshop February 2014, UBCO15 In contrast, imagine a roomful of springs with different weights. You can measure the length of the springs and their weights, the way you found them. Now your data is observational… You could still find Hooke’s Law and predict the effect of weight on length if all spring were identical. But if they are not, the task is difficult*, perhaps impossible. * It would be particularly difficult if, say, heavy weights would tend to go with stiff springs.

16 Opinion differ My opinion: In road safety there are no identical springs. A model equation does not allow one to say: “If I change predictor variable X by ΔX then E{  } will change by Δ  CMFs obtained from SPFs based on cross- section data cannot be trusted for use in practice. 16SPF workshop February 2014, UBCO

17 17 Begin with X 1 ≡ Segment Length. The first variable

18 The first C-F Spreadsheet Open: Spreadsheet #6 ‘OLS without constraint’ on ‘Data’. 18 Data (AADT used later)

19 Proceed to ‘Add fitted Values’ worksheet 19 1. Add ‘initial guesses’ 2. Add formula and copy down SPF workshop February 2014, UBCO

20 20 Use this initial guess. Check correspondence graph. OK? Better initial guess

21 SPF workshop February 2014, UBCO21 1. Add ‘squared difference’ formula and copy down 2. Sum of SD’s 3. Sum of ‘Observed’ and of ‘Fitted’ Complete C-F spreadsheet

22 The four parts of a C-F Spreadsheet 22 1. Data 2. Parameters 3. Fitted Values 4.Objective Function

23 SPF workshop February 2014, UBCO23 Click on ‘Data’ tab and then on ‘Solver’

24 SPF workshop February 2014, UBCO24 These make the sum of squared differences smallest

25 25 How reliable are parameter estimates? Does this mean that the E  increases with segment length less than linearly? Scale vs. Shape SPF workshop February 2014, UBCO

26 26 1.The estimate of (=0.866) would be different if the accident counts (or, later, the AADT) were a bit different. Parameter values are uncertain for several reasons: 2. The estimate of will change as new variables are added to the model equation*. 3. The estimate of will depend on the function chosen to represent the new variables and on what objective function is minimized or maximized * The ‘omitted variable bias’ Statistical inaccuracy Non- Statistical inaccuracy

27 SPF workshop February 2014, UBCO27  1 will change as other (correlated) variables are added. In every model there are ‘omitted variables’. Were these in the model parameter estimates would be different. By how much? We do not know! Non statistical inaccuracy (item 2): New variable When AADT will be added  1 will change to 1.078. When Terrain will be added it will change to 0.986 Conclusion: Parameter estimates depend on which variables the modeller puts into the model equation.

28 28 When may one estimate parameters by OLS? When the variances of observed values are equal (and the distribution is symmetrical) The problem of unequal variances is easy to correct by WLS. With weighting  1 changes from 0.866 to 0.860. Non-statistical inaccuracy (item 3): Which objective function? Conclusion: Parameter estimates depend on what the modeller chooses to be the objective function Method 11 Conventional OLS0.866 Poisson Likelihood0.860 Negative Binomial Likelihood0.871 Unconventional Absolute Differences0.911 0.737 Total Absolute Bias0.882

29 29 Reported! +AADT Absolute differences AADT & Terrain SPF workshop February 2014, UBCO

30 30 Modelers report only on the ‘statistical accuracy’. As if: a.There were no omitted variables; b.The functional form was the right one; c.The objective function used was the only choice. All are untrue to some extent. Conclusion: The estimated parameters are always less accurate than what is reported; By how much less accurate cannot be said.

31 SPF workshop February 2014, UBCO31 Implications For parameter-based CMFs; For accuracy of model prediction when predicated on accuracy of model parameters; One can begin to trust them after they stop changing

32 SPF workshop February 2014, UBCO32 Estimating  SPF... many populations... estimates of E{  } and  A general method N-W non parametric N-W Slope ≌0.5 OLS slope≌0.7 ?

33 SPF workshop February 2014, UBCO33 Summary for section 5. (A first parametric SPF) 1.The merits of gradualism; 2.There are no grounds for choosing a model equation other than parsimony and goodness of fit; 3.Detour. Can SPFs be used to determine the effect of change? 4.The four part of a C-F spreadsheet; 5.How the C-F spreadsheet is used to estimate model equation parameters; 6.How accurate are the parameters? What is reported is an exaggeration; 7.How to estimate .


Download ppt "1 CH1. What is what CH2. A simple SPF CH3. EDA CH4. Curve fitting CH5. A first parametric SPF CH6: Which fit is fitter CH7: Choosing the objective function."

Similar presentations


Ads by Google