Presentation is loading. Please wait.

Presentation is loading. Please wait.

Estimating the Causal Effect of Contributing Factors on Crashes

Similar presentations


Presentation on theme: "Estimating the Causal Effect of Contributing Factors on Crashes"— Presentation transcript:

1 Estimating the Causal Effect of Contributing Factors on Crashes
Dr. Zhi Chen (UW-Milwaukee) Dr. Xiao Qin (UW-Milwaukee) Dr. Phoenix Do (UW-Milwaukee)

2 Outline Background Problem Statement Research Objectives Methodology
Case Study Conclusions

3 Background Advanced transportation information systems (ATIS) have made real-time traffic data readily available. The changes in traffic flow observed from inductive loop detectors (ILD) present important cues to crash detection. Real-time crash prediction models (RTCPM) have been developed to identify traffic conditions pertinent to crashes. Advanced transportation information systems (ATIS) have been widely implemented to monitor the traffic in a real-time fashion. Inductive loop detectors (ILDs) are used in most systems as the data collection device. ILD can record the changes in traffic flow which can be used to detect crash-prone conditions. With real-time traffic data collected from inductive loop detectors (ILD) near the prospective crashes, real-time crash prediction models (RTCPM) have been developed to identify traffic conditions that could lead to the crash occurrence. However, there are some issues in previous real-time crash studies.

4 Previous studies are predominately predictive analysis;
Problem Statement Previous studies are predominately predictive analysis; The causal effect of one traffic factor may be biased by other confounding traffic factors. Almost all previous studies are predictive analysis and are focused on achieving better performance in predicting crashes. There lacks focus on how to estimate the causal effect of traffic factors in a rigorous manner. When estimating the causal effect of one traffic factor based on predictive analysis, the estimate may be biased due to other confounding traffic factors.

5 Research Objectives Propose an approach to estimating the causal effect of traffic factors with higher accuracy with real-time traffic data collected from ILD stations The objective of this study is to propose a rigorous approach to estimating the causal effect of traffic factors on the crash occurrence using real-time traffic data from ILD stations. The proposed approach should yield the causal effect with higher accuracy.

6 Causal Effect Here comes the question. What is causal effect?
Imagine a patient with disease. He takes the medicine, and the disease cures. Imagine in a parallel universe, the same patient does not take the medicine, and the disease does not cure. Then we can say the medicine causes the disease to cure. However, we cannot observe these two cases in reality as the same patient can either take the medicine or not take it. So what should we do in a single universe?

7 Randomized Experiments
Causal Effect In the single universe where we are, the gold standard to estimate the causal effect is through the randomized experiment. A randomized experiment works by randomly assigning subjects to either the treated group or the control group and then comparing the outcomes. Since the assignment is random, the two groups are equivalent, and the difference in outcome of these two groups would be attributed to the treatment as the treatment is the only difference. In crash analysis, we cannot randomly assign subjects to different groups and then observe the crash outcome. Instead, we can only collect retrospective data of crashes and non-crashes with known treatment assignments. Cases in different treatment groups may not have similar characteristics because the assignment is not random. To estimate the causal effect of the treatment, the treated group and the control group should be as similar as possible; otherwise, the estimate would be biased. (Source: Randomized Experiments

8 Adjusted Control Group
Causal Effect Treated Group Control Group One straightforward way we can think of is that we can adjust the sample of two groups to make them similar. For example, in the treated group, there are 2 males and 4 females. While in the control group, there are 2 males but 4 females. We can adjust the control group by keeping the two males and randomly select 2 out of 4 females. Here we only need to consider one factor, the gender. When there are many factors, it becomes very complicated to take all of them into account. In this case, we can apply the propensity score to achieve similar characteristics over two groups and mimic randomized experiments. So what is propensity score? Adjusted Control Group

9 Propensity Score The propensity score denotes the probability that a subject is assigned to the treated group given its characteristics (Rosenbaum & Rubin, 1983); The covariate distribution may be different between the treated group and the control group. However, conditional on the propensity score, the covariate distribution should be similar between the two groups (Rosenbaum & Rubin, 1983). It can mimic randomized experiments and has been applied to to get causal effects of various safety treatments (Durbin, Elliott, & Winston, 2009, Sasidharan & Donnell, 2013,Li & Graham, 2016). PS has one attractive property. It has been applied to mimic randomized experiments to get causal effects of various safety treatments including intersection lighting and speed-limit change. Next, I’ll demonstrate the PS-based approach in a case study.

10 Data A 4.15-mile corridor on I-94 in Wisconsin;
103 crashes and 2,060 non-crash cases (1:20 ratio) in ; Real-time ILD traffic data from upstream and downstream stations; First, the study site is a 4.15-mile corridor on I-94 in Wisconsin. The crash dataset includes 103 crashes that happened on this corridor in , and 2,060 non-crash cases were collected to achieve a 1:20 crash to non-crash ratio. Crash cases are traffic conditions before the crash occurrence, while non-crash cases are normal conditions that are not associated with crashes. Real-time traffic data was collected from seven ILD stations along this corridor.

11 Variables Traffic variables: 2×3×2=12 Non-traffic variables:
Average and standard deviation of flow, speed, and density in the prior 5-10-min period at upstream and downstream stations; Non-traffic variables: Presence of horizontal curve, on-ramp, off-ramp Weather factors: Snow, Rain Speed variation: the greatest contributor to crash occurrence (Roshandel, Zheng, & Washington, 2015); StdSpd_U (standard deviation of upstream speed) and StdSpd_D (standard deviation of downstream speed) Variables used in this study include traffic variables and non-traffic variables. Speed variation is the focus of this study. It’s considered as the greatest contributor to the crash occurrence. There are two speed variation variables of interest: std of upstream speed and std. of downstream speed.

12 Variable Conversion High upstream speed variation (HUSV) = 1 (treated) if StdSpd_U > 3 mph; HUSV = 0 (control), otherwise; High downstream speed variation (HDSV) = 1 (treated) if StdSpd_D > 3 mph; HDSV = 0 (control), otherwise. Crash Outcome HUSV HDSV Treated Control 1 (Crash) 52 51 57 46 0 (Non-Crash) 455 1,605 404 1,656 Ratio 1:8.8 1:31.5 1:7.1 1:36.0 First, continuous variables need to be converted into binary treatments. There are two treatments: HUSV and HDSV. If the StdSpd_U of one case is larger than 3 mph, it is assigned to the treated group; otherwise, it is assigned to the control group. This is the distribution table of crash outcome by treatment group. For both treatments, the crash to non-crash ratio in the treated group is much higher than that in the control group. Intuitively, it indicates both treatments increase the crash likelihood.

13 Balance Check HUSV Variable Unadjusted Sample Adjusted Sample Control Treated SMD N 1656 507 356.43 373.51 StdSpd_D 3.13 (3.01) 6.40 (6.79) 0.623 4.56 (5.47) 4.54 (4.82) 0.004 AvgVol_U (1383.8) (1728.2) 0.205 (1319.7) (1622.3) 0.024 StdVol_U (214.97) (284.91) 0.153 (222.04) (273.24) 0.013 AvgDen_U 37.68 (29.63) 51.21 (61.03) 0.282 35.40 (42.85) 36.29 (46.26) 0.02 StdDen_U 7.37 (5.36) 13.80 (17.01) 0.51 7.88 (8.88) 8.15 (8.36) 0.031 AvgSpd_U 66.01 (9.35) 56.58 (17.40) 0.675 61.95 (13.84) 61.70 (14.27) 0.018 AvgVol_D (1395.8) (1835.8) 0.103 (1497.9) (1752.6) StdVol_D (214.69) (278.88) 0.16 (215.56) (267.27) AvgDen_D 39.82 (29.99) 56.69 (60.46) 0.353 42.11 (45.37) 42.43 (46.58) 0.007 StdDen_D 8.12 (6.57) 12.83 (14.54) 0.418 9.45 (10.71) 9.34 (10.32) 0.011 AvgSpd_D 64.15 (9.44) 52.47 (15.15) 0.925 57.63 (12.93) 57.71 (11.72) Curve=1 0.29 (0.45) 0.23 (0.42) 0.129 0.22 (0.41) 0.22 (0.42) 0.015 HDSV Variable Unadjusted Sample Adjusted Sample Control Treated SMD N 1702 461 321.5 333.55 StdSpd_U 63.44 (9.08) 53.92 (17.66) 0.679 5.22 (5.83) 5.03 (5.15) 0.033 AvgVol_U 2265.8(1375.0) (1808.3) 0.01 1967.2(1522.3) 1993.9(1691.5) 0.017 StdVol_U 431.6 (220.6) 412.7 (276.6) 0.076 377.5 (227.6) (253.73) 0.009 AvgDen_U 37.43 (31.38) 53.47 (59.72) 0.336 39.83 (47.67) 40.07 (46.79) 0.005 StdDen_U 7.80 (7.07) 12.87 (15.86) 0.413 8.94 (11.55) 9.01 (11.13) 0.006 AvgSpd_U 65.51 (10.30) 57.48 (16.72) 0.578 61.71 (14.68) 61.83 (14.26) 0.008 AvgVol_D 3.34 (3.03) 7.04 (7.13) 0.676 ( ) ( ) 0.026 StdVol_D 2402 (1417) 2144. (1803.9) 0.16 365.2 (207.5) 370.8 (243.6) 0.025 AvgDen_D (221.79) (261.39) 0.232 39.37 (47.07) 40.67 (50.03) 0.027 StdDen_D 40.42 (30.90) 56.18 (61.40) 0.324 8.12 (9.18) 8.52 (8.43) 0.045 AvgSpd_D 7.87 (5.40) 14.23 (16.33) 0.523 60.19 (13.92) 59.77 (14.95) 0.03 Curve=1 0.26 (0.44) 0.34 (0.48) 0.186 0.35 (0.48) 0.001 Then the propensity score based approach is applied to adjust the two groups to make them similar to each other. If one covariate has similar distribution in both groups, we say it is balanced. If all variables are balanced, it would yield more accurate estimate of causal effect as the data would approximate a random assignment setting. SMD, short for standardized mean difference, is a balance check measure. If the SMD of one variable is lower than 0.1, the variable is balanced across two groups. In this table, SMDs below 0.1 are in bold. We can observe that the unadjusted sample, or the raw data, presents very poor balance, while the adjusted sample achieves satisfactory balance.

14 Causal Effect Conclusion:
Estimate Treatment Unadjusted Effect PS-Adjusted Effect HUSV 0.455 (0.264) 0.346 (0.268) HDSV 0.780 (0.272) 0.435 (0.301) Conclusion: It demonstrates that the proposed approach is able to obtain the causal effect of a contributing factor with higher accuracy, while the predictive analysis may yield biased effects. This table shows the estimated causal effect based on the two samples. Unadjusted effect is based on unadjusted sample, and PS-adjusted effect is based on adjusted sample. The standard errors are presented in parentheses. Effects in red are not significant, while the effect in green is significant. We can observe that for HUSV treatment, neither effect is significant. While for HDSV treatment, the unadjusted effect is significant, while PS-adjusted effect is not. In conclusion, it demonstrates the proposed approach can obtain the causal effect of one traffic factor with higher reliability, while the unadjusted effect based on the predictive analysis may be biased.

15 Sensitivity Analysis (HUSV)
Since the treatment assignment is dependent on the cutoff value of speed variation, sensitivity analysis is presented here to show if the conclusion holds with different cutoff values. The cutoff value ranges from 3 to 10. The blue solid line represents the estimate, and the green dashed line is the upper bound of the 95% confidence interval, and the red dashed line is the lower bound. The black lines indicates the estimate of 0. If this black line is between the green and red dashed lines, it means the estimated effect is not significant. For the HUSV treatment, we can observe that both unadjusted effects and PS-adjusted effects are consistently insignificant. Unadjusted Effect PS-Adjusted Effect

16 Sensitivity Analysis (HDSV)
For the HDSV treatment, we can observe that unadjusted effects are consistently significant while PS-adjusted effects are consistently insignificant. Unadjusted Effect PS-Adjusted Effect

17 Conclusions High speed variation does not have significant causal effect on the crash after controlling for other factors; The causal effect of one contributing factor may be biased if other variables are not appropriately controlled for; The proposed approach is able to estimate the causal effect of contributing factors with higher accuracy.

18 Thank you!

19 Data Sources Inductive Loop Detector (ILD) Traffic Data Crash Data
Regarding data sources, a sample of ILD traffic data and crash data are shown here. This is 1-min ILD data. It records the volume, speed and occupancy of vehicles passing over the loop detector in 1-min interval. This is crash data. The two most important pieces of information are the crash time and location. These two fields show the date and time of the crash, and these two fields show the latitude and longitude of the crash location.

20 ILD Data for Crash Modeling
8 1 Upstream Station Downstream Station 7 5 Based on the crash time and location, the related ILD data can be collected for crash modeling. the nearest upstream and downstream ILD stations can be determined. The downstream means the direction of the traffic flow, and upstream means the opposite direction. and the traffic data from those stations before the crash occurrence can be used to develop the crash prediction model. 8

21 Variable Conversion Continuous speed variation needs to be converted into binary factor

22 Propensity Score The propensity score, P(Z=1|X), denotes the probability that a subject is assigned to the treated group (Z=1) given its characteristics (Rosenbaum & Rubin, 1983); The covariate distribution of X may be different between the treated group and the control group. However, conditional on the propensity score, the covariate distribution should be similar between the two groups (Rosenbaum & Rubin, 1983). It has been applied to mimic randomized experiments to get causal effects of various safety treatments (Durbin, Elliott, & Winston, 2009, Sasidharan & Donnell, 2013,Li & Graham, 2016). Instead, we can use the propensity score to mimic randomized experiments. PS has one attractive property. It has been applied to mimic randomized experiments to get causal effects of various safety treatments including intersection lighting and speed-limit zones

23 Methodology Logit model for propensity score estimation:
𝑝=𝑃 𝑍=1 𝑿 = exp(𝜷𝑿) 1+exp(𝜷𝑿) where 𝑿 is the vector of variables and 𝜷 is the vector of regression coefficients. Matching weighting (Li and Greene, 2013): 𝑤 𝑖 𝑍,𝑋 = 𝑚𝑖𝑛⁡(1−𝑝,𝑝) 𝑍∗𝑝+ 1−𝑍 ∗(1−𝑝) where 𝑤 𝑖 𝑍,𝑋 is the weight assigned to subject i to achieve similar covariate distributions between groups A logit model was applied for propensity score estimation. Based on one subject’s covariates, its propensity score can be estimated. Then based on the estimated propensity score, the weight assigned to one subject is calculated based on IPTW method. Ideally, based on the weights, the weighted sample should have similar covariate distributions between groups. It will be proven later on in the analysis.

24 References Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), Durbin, D. R., Elliott, M. R., & Winston, F. K. (2009). A propensity score approach to estimating child restraint effectiveness in preventing mortality. Statistics and Its Interface, 2(4), Sasidharan, L., & Donnell, E. T. (2013). Application of propensity scores and potential outcomes to estimate effectiveness of traffic safety countermeasures: Exploratory analysis using intersection lighting data. Accident Analysis & Prevention, 50, Li, H., & Graham, D. J. (2016). Quantifying the causal effects of 20 mph zones on road casualties in London via doubly robust estimation. Accident Analysis & Prevention, 93, Roshandel, S., Zheng, Z., & Washington, S. (2015). Impact of real-time traffic characteristics on freeway crash occurrence: Systematic review and meta-analysis. Accident Analysis & Prevention, 79, doi: /j.aap Li, L., & Greene, T. (2013). A weighting analogue to pair matching in propensity score analysis. The international journal of biostatistics, 9(2),


Download ppt "Estimating the Causal Effect of Contributing Factors on Crashes"

Similar presentations


Ads by Google